2017-12-18 11:00:59 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2003-2006, Cluster File Systems, Inc, info@clusterfs.com
|
|
|
|
* Written by Alex Tomas <alex@clusterfs.com>
|
|
|
|
*
|
|
|
|
* Architecture independence:
|
|
|
|
* Copyright (c) 2005, Bull S.A.
|
|
|
|
* Written by Pierre Peiffer <pierre.peiffer@bull.net>
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Extents support for EXT4
|
|
|
|
*
|
|
|
|
* TODO:
|
|
|
|
* - ext4*_error() should be used in some situations
|
|
|
|
* - analyze all BUG()/BUG_ON(), use -EIO where appropriate
|
|
|
|
* - smart tree reduction
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/time.h>
|
2007-10-17 06:38:25 +08:00
|
|
|
#include <linux/jbd2.h>
|
2006-10-11 16:21:03 +08:00
|
|
|
#include <linux/highuid.h>
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
#include <linux/quotaops.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/slab.h>
|
2016-12-25 03:46:01 +08:00
|
|
|
#include <linux/uaccess.h>
|
2008-10-07 12:46:36 +08:00
|
|
|
#include <linux/fiemap.h>
|
2020-02-28 17:26:58 +08:00
|
|
|
#include <linux/iomap.h>
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:07:14 +08:00
|
|
|
#include <linux/sched/mm.h>
|
2008-04-30 06:13:32 +08:00
|
|
|
#include "ext4_jbd2.h"
|
2012-11-29 02:03:30 +08:00
|
|
|
#include "ext4_extents.h"
|
2012-12-11 03:05:51 +08:00
|
|
|
#include "xattr.h"
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2011-03-22 09:38:05 +08:00
|
|
|
#include <trace/events/ext4.h>
|
|
|
|
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
/*
|
|
|
|
* used by extent splitting.
|
|
|
|
*/
|
|
|
|
#define EXT4_EXT_MAY_ZEROOUT 0x1 /* safe to zeroout if split fails \
|
|
|
|
due to ENOSPC */
|
2014-04-21 11:45:47 +08:00
|
|
|
#define EXT4_EXT_MARK_UNWRIT1 0x2 /* mark first half unwritten */
|
|
|
|
#define EXT4_EXT_MARK_UNWRIT2 0x4 /* mark second half unwritten */
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
|
2012-10-10 13:04:58 +08:00
|
|
|
#define EXT4_EXT_DATA_VALID1 0x8 /* first half contains valid data */
|
|
|
|
#define EXT4_EXT_DATA_VALID2 0x10 /* second half contains valid data */
|
|
|
|
|
2012-04-30 06:37:10 +08:00
|
|
|
static __le32 ext4_extent_block_csum(struct inode *inode,
|
|
|
|
struct ext4_extent_header *eh)
|
|
|
|
{
|
|
|
|
struct ext4_inode_info *ei = EXT4_I(inode);
|
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
|
|
|
__u32 csum;
|
|
|
|
|
|
|
|
csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)eh,
|
|
|
|
EXT4_EXTENT_TAIL_OFFSET(eh));
|
|
|
|
return cpu_to_le32(csum);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int ext4_extent_block_csum_verify(struct inode *inode,
|
|
|
|
struct ext4_extent_header *eh)
|
|
|
|
{
|
|
|
|
struct ext4_extent_tail *et;
|
|
|
|
|
2014-10-13 15:36:16 +08:00
|
|
|
if (!ext4_has_metadata_csum(inode->i_sb))
|
2012-04-30 06:37:10 +08:00
|
|
|
return 1;
|
|
|
|
|
|
|
|
et = find_ext4_extent_tail(eh);
|
|
|
|
if (et->et_checksum != ext4_extent_block_csum(inode, eh))
|
|
|
|
return 0;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ext4_extent_block_csum_set(struct inode *inode,
|
|
|
|
struct ext4_extent_header *eh)
|
|
|
|
{
|
|
|
|
struct ext4_extent_tail *et;
|
|
|
|
|
2014-10-13 15:36:16 +08:00
|
|
|
if (!ext4_has_metadata_csum(inode->i_sb))
|
2012-04-30 06:37:10 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
et = find_ext4_extent_tail(eh);
|
|
|
|
et->et_checksum = ext4_extent_block_csum(inode, eh);
|
|
|
|
}
|
|
|
|
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
static int ext4_split_extent_at(handle_t *handle,
|
|
|
|
struct inode *inode,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
ext4_lblk_t split,
|
|
|
|
int split_flag,
|
|
|
|
int flags);
|
|
|
|
|
2019-11-06 00:44:16 +08:00
|
|
|
static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
The function jbd2_journal_extend() takes as its argument the number of
new credits to be added to the handle. We weren't taking into account
the currently unused handle credits; worse, we would try to extend the
handle by N credits when it had N credits available.
In the case where jbd2_journal_extend() fails because the transaction
is too large, when jbd2_journal_restart() gets called, the N credits
owned by the handle gets returned to the transaction, and the
transaction commit is asynchronously requested, and then
start_this_handle() will be able to successfully attach the handle to
the current transaction since the required credits are now available.
This is mostly harmless, but since ext4_ext_truncate_extend_restart()
returns EAGAIN, the truncate machinery will once again try to call
ext4_ext_truncate_extend_restart(), which will do the above sequence
over and over again until the transaction has committed.
This was found while I was debugging a lockup in caused by running
xfstests generic/074 in the data=journal case. I'm still not sure why
we ended up looping forever, which suggests there may still be another
bug hiding in the transaction accounting machinery, but this commit
prevents us from looping in the first place.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-26 11:13:17 +08:00
|
|
|
/*
|
2019-11-06 00:44:16 +08:00
|
|
|
* Drop i_data_sem to avoid deadlock with ext4_map_blocks. At this
|
|
|
|
* moment, get_block can be called only for blocks inside i_size since
|
|
|
|
* page cache has been already dropped and writes are blocked by
|
2022-01-21 15:06:11 +08:00
|
|
|
* i_rwsem. So we can safely drop the i_data_sem here.
|
ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()
The function jbd2_journal_extend() takes as its argument the number of
new credits to be added to the handle. We weren't taking into account
the currently unused handle credits; worse, we would try to extend the
handle by N credits when it had N credits available.
In the case where jbd2_journal_extend() fails because the transaction
is too large, when jbd2_journal_restart() gets called, the N credits
owned by the handle gets returned to the transaction, and the
transaction commit is asynchronously requested, and then
start_this_handle() will be able to successfully attach the handle to
the current transaction since the required credits are now available.
This is mostly harmless, but since ext4_ext_truncate_extend_restart()
returns EAGAIN, the truncate machinery will once again try to call
ext4_ext_truncate_extend_restart(), which will do the above sequence
over and over again until the transaction has committed.
This was found while I was debugging a lockup in caused by running
xfstests generic/074 in the data=journal case. I'm still not sure why
we ended up looping forever, which suggests there may still be another
bug hiding in the transaction accounting machinery, but this commit
prevents us from looping in the first place.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-26 11:13:17 +08:00
|
|
|
*/
|
2019-11-06 00:44:16 +08:00
|
|
|
BUG_ON(EXT4_JOURNAL(inode) == NULL);
|
2020-08-17 15:36:15 +08:00
|
|
|
ext4_discard_preallocations(inode, 0);
|
2019-11-06 00:44:16 +08:00
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
*dropped = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
2009-08-18 10:17:20 +08:00
|
|
|
|
2022-09-24 10:12:11 +08:00
|
|
|
static void ext4_ext_drop_refs(struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
int depth, i;
|
|
|
|
|
|
|
|
if (!path)
|
|
|
|
return;
|
|
|
|
depth = path->p_depth;
|
|
|
|
for (i = 0; i <= depth; i++, path++) {
|
|
|
|
brelse(path->p_bh);
|
|
|
|
path->p_bh = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void ext4_free_ext_path(struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
ext4_ext_drop_refs(path);
|
|
|
|
kfree(path);
|
|
|
|
}
|
|
|
|
|
2019-11-06 00:44:16 +08:00
|
|
|
/*
|
|
|
|
* Make sure 'handle' has at least 'check_cred' credits. If not, restart
|
|
|
|
* transaction with 'restart_cred' credits. The function drops i_data_sem
|
|
|
|
* when restarting transaction and gets it after transaction is restarted.
|
|
|
|
*
|
|
|
|
* The function returns 0 on success, 1 if transaction had to be restarted,
|
|
|
|
* and < 0 in case of fatal error.
|
|
|
|
*/
|
|
|
|
int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode,
|
2019-11-06 00:44:29 +08:00
|
|
|
int check_cred, int restart_cred,
|
|
|
|
int revoke_cred)
|
2019-11-06 00:44:16 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
int dropped = 0;
|
|
|
|
|
|
|
|
ret = ext4_journal_ensure_credits_fn(handle, check_cred, restart_cred,
|
2019-11-06 00:44:29 +08:00
|
|
|
revoke_cred, ext4_ext_trunc_restart_fn(inode, &dropped));
|
2019-11-06 00:44:16 +08:00
|
|
|
if (dropped)
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
return ret;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* could return:
|
|
|
|
* - EROFS
|
|
|
|
* - ENOMEM
|
|
|
|
*/
|
|
|
|
static int ext4_ext_get_access(handle_t *handle, struct inode *inode,
|
|
|
|
struct ext4_ext_path *path)
|
|
|
|
{
|
2021-09-08 20:08:50 +08:00
|
|
|
int err = 0;
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
if (path->p_bh) {
|
|
|
|
/* path points to block */
|
2014-05-13 10:06:43 +08:00
|
|
|
BUFFER_TRACE(path->p_bh, "get_write_access");
|
2021-09-08 20:08:50 +08:00
|
|
|
err = ext4_journal_get_write_access(handle, inode->i_sb,
|
|
|
|
path->p_bh, EXT4_JTR_NONE);
|
|
|
|
/*
|
|
|
|
* The extent buffer's verified bit will be set again in
|
|
|
|
* __ext4_ext_dirty(). We could leave an inconsistent
|
|
|
|
* buffer if the extents updating procudure break off du
|
|
|
|
* to some error happens, force to check it again.
|
|
|
|
*/
|
|
|
|
if (!err)
|
|
|
|
clear_buffer_verified(path->p_bh);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
/* path points to leaf/index in inode body */
|
|
|
|
/* we use in-core data, no need to protect them */
|
2021-09-08 20:08:50 +08:00
|
|
|
return err;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* could return:
|
|
|
|
* - EROFS
|
|
|
|
* - ENOMEM
|
|
|
|
* - EIO
|
|
|
|
*/
|
2020-01-01 02:04:40 +08:00
|
|
|
static int __ext4_ext_dirty(const char *where, unsigned int line,
|
|
|
|
handle_t *handle, struct inode *inode,
|
|
|
|
struct ext4_ext_path *path)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int err;
|
2014-07-28 10:28:15 +08:00
|
|
|
|
|
|
|
WARN_ON(!rwsem_is_locked(&EXT4_I(inode)->i_data_sem));
|
2006-10-11 16:21:03 +08:00
|
|
|
if (path->p_bh) {
|
2012-04-30 06:37:10 +08:00
|
|
|
ext4_extent_block_csum_set(inode, ext_block_hdr(path->p_bh));
|
2006-10-11 16:21:03 +08:00
|
|
|
/* path points to block */
|
2011-09-04 22:18:14 +08:00
|
|
|
err = __ext4_handle_dirty_metadata(where, line, handle,
|
|
|
|
inode, path->p_bh);
|
2021-09-08 20:08:50 +08:00
|
|
|
/* Extents updating done, re-set verified flag */
|
|
|
|
if (!err)
|
|
|
|
set_buffer_verified(path->p_bh);
|
2006-10-11 16:21:03 +08:00
|
|
|
} else {
|
|
|
|
/* path points to leaf/index in inode body */
|
|
|
|
err = ext4_mark_inode_dirty(handle, inode);
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-01-01 02:04:40 +08:00
|
|
|
#define ext4_ext_dirty(handle, inode, path) \
|
|
|
|
__ext4_ext_dirty(__func__, __LINE__, (handle), (inode), (path))
|
|
|
|
|
2006-10-11 16:21:05 +08:00
|
|
|
static ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_ext_path *path,
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_lblk_t block)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
if (path) {
|
2011-10-29 21:23:38 +08:00
|
|
|
int depth = path->p_depth;
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_extent *ex;
|
|
|
|
|
2011-01-11 01:12:28 +08:00
|
|
|
/*
|
|
|
|
* Try to predict block placement assuming that we are
|
|
|
|
* filling in a file which will eventually be
|
|
|
|
* non-sparse --- i.e., in the case of libbfd writing
|
|
|
|
* an ELF object sections out-of-order but in a way
|
|
|
|
* the eventually results in a contiguous object or
|
|
|
|
* executable file, or some database extending a table
|
|
|
|
* space file. However, this is actually somewhat
|
|
|
|
* non-ideal if we are writing a sparse file such as
|
|
|
|
* qemu or KVM writing a raw image file that is going
|
|
|
|
* to stay fairly sparse, since it will end up
|
|
|
|
* fragmenting the file system's free space. Maybe we
|
|
|
|
* should have some hueristics or some way to allow
|
|
|
|
* userspace to pass a hint to file system,
|
2011-01-21 23:21:31 +08:00
|
|
|
* especially if the latter case turns out to be
|
2011-01-11 01:12:28 +08:00
|
|
|
* common.
|
|
|
|
*/
|
2006-12-07 12:41:33 +08:00
|
|
|
ex = path[depth].p_ext;
|
2011-01-11 01:12:28 +08:00
|
|
|
if (ex) {
|
|
|
|
ext4_fsblk_t ext_pblk = ext4_ext_pblock(ex);
|
|
|
|
ext4_lblk_t ext_block = le32_to_cpu(ex->ee_block);
|
|
|
|
|
|
|
|
if (block > ext_block)
|
|
|
|
return ext_pblk + (block - ext_block);
|
|
|
|
else
|
|
|
|
return ext_pblk - (ext_block - block);
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* it looks like index is empty;
|
|
|
|
* try to find starting block from index itself */
|
2006-10-11 16:21:03 +08:00
|
|
|
if (path[depth].p_bh)
|
|
|
|
return path[depth].p_bh->b_blocknr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* OK. use inode's group */
|
2011-06-28 22:01:31 +08:00
|
|
|
return ext4_inode_to_goal_block(inode);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
2008-07-12 07:27:31 +08:00
|
|
|
/*
|
|
|
|
* Allocation for a meta data block
|
|
|
|
*/
|
2006-10-11 16:21:05 +08:00
|
|
|
static ext4_fsblk_t
|
2008-07-12 07:27:31 +08:00
|
|
|
ext4_ext_new_meta_block(handle_t *handle, struct inode *inode,
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_ext_path *path,
|
2011-05-25 19:41:26 +08:00
|
|
|
struct ext4_extent *ex, int *err, unsigned int flags)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_fsblk_t goal, newblock;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
goal = ext4_ext_find_goal(inode, path, le32_to_cpu(ex->ee_block));
|
2011-05-25 19:41:26 +08:00
|
|
|
newblock = ext4_new_meta_blocks(handle, inode, goal, flags,
|
|
|
|
NULL, err);
|
2006-10-11 16:21:03 +08:00
|
|
|
return newblock;
|
|
|
|
}
|
|
|
|
|
2009-08-28 22:40:33 +08:00
|
|
|
static inline int ext4_ext_space_block(struct inode *inode, int check)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int size;
|
|
|
|
|
|
|
|
size = (inode->i_sb->s_blocksize - sizeof(struct ext4_extent_header))
|
|
|
|
/ sizeof(struct ext4_extent);
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2011-10-29 21:29:11 +08:00
|
|
|
if (!check && size > 6)
|
|
|
|
size = 6;
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2009-08-28 22:40:33 +08:00
|
|
|
static inline int ext4_ext_space_block_idx(struct inode *inode, int check)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int size;
|
|
|
|
|
|
|
|
size = (inode->i_sb->s_blocksize - sizeof(struct ext4_extent_header))
|
|
|
|
/ sizeof(struct ext4_extent_idx);
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2011-10-29 21:29:11 +08:00
|
|
|
if (!check && size > 5)
|
|
|
|
size = 5;
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2009-08-28 22:40:33 +08:00
|
|
|
static inline int ext4_ext_space_root(struct inode *inode, int check)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int size;
|
|
|
|
|
|
|
|
size = sizeof(EXT4_I(inode)->i_data);
|
|
|
|
size -= sizeof(struct ext4_extent_header);
|
|
|
|
size /= sizeof(struct ext4_extent);
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2011-10-29 21:29:11 +08:00
|
|
|
if (!check && size > 3)
|
|
|
|
size = 3;
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2009-08-28 22:40:33 +08:00
|
|
|
static inline int ext4_ext_space_root_idx(struct inode *inode, int check)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int size;
|
|
|
|
|
|
|
|
size = sizeof(EXT4_I(inode)->i_data);
|
|
|
|
size -= sizeof(struct ext4_extent_header);
|
|
|
|
size /= sizeof(struct ext4_extent_idx);
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2011-10-29 21:29:11 +08:00
|
|
|
if (!check && size > 4)
|
|
|
|
size = 4;
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2014-08-31 11:52:19 +08:00
|
|
|
static inline int
|
|
|
|
ext4_force_split_extent_at(handle_t *handle, struct inode *inode,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath, ext4_lblk_t lblk,
|
2014-08-31 11:52:19 +08:00
|
|
|
int nofail)
|
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2014-08-31 11:52:19 +08:00
|
|
|
int unwritten = ext4_ext_is_unwritten(path[path->p_depth].p_ext);
|
2020-05-08 01:50:28 +08:00
|
|
|
int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_PRE_IO;
|
|
|
|
|
|
|
|
if (nofail)
|
|
|
|
flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL | EXT4_EX_NOFAIL;
|
2014-08-31 11:52:19 +08:00
|
|
|
|
2014-09-02 02:37:09 +08:00
|
|
|
return ext4_split_extent_at(handle, inode, ppath, lblk, unwritten ?
|
2014-08-31 11:52:19 +08:00
|
|
|
EXT4_EXT_MARK_UNWRIT1|EXT4_EXT_MARK_UNWRIT2 : 0,
|
2020-05-08 01:50:28 +08:00
|
|
|
flags);
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
|
|
|
|
2007-07-18 21:19:09 +08:00
|
|
|
static int
|
|
|
|
ext4_ext_max_entries(struct inode *inode, int depth)
|
|
|
|
{
|
|
|
|
int max;
|
|
|
|
|
|
|
|
if (depth == ext_depth(inode)) {
|
|
|
|
if (depth == 0)
|
2009-08-28 22:40:33 +08:00
|
|
|
max = ext4_ext_space_root(inode, 1);
|
2007-07-18 21:19:09 +08:00
|
|
|
else
|
2009-08-28 22:40:33 +08:00
|
|
|
max = ext4_ext_space_root_idx(inode, 1);
|
2007-07-18 21:19:09 +08:00
|
|
|
} else {
|
|
|
|
if (depth == 0)
|
2009-08-28 22:40:33 +08:00
|
|
|
max = ext4_ext_space_block(inode, 1);
|
2007-07-18 21:19:09 +08:00
|
|
|
else
|
2009-08-28 22:40:33 +08:00
|
|
|
max = ext4_ext_space_block_idx(inode, 1);
|
2007-07-18 21:19:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return max;
|
|
|
|
}
|
|
|
|
|
2009-03-12 21:51:20 +08:00
|
|
|
static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
|
|
|
|
{
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_fsblk_t block = ext4_ext_pblock(ext);
|
2009-03-12 21:51:20 +08:00
|
|
|
int len = ext4_ext_get_actual_len(ext);
|
2013-12-04 10:22:21 +08:00
|
|
|
ext4_lblk_t lblock = le32_to_cpu(ext->ee_block);
|
2009-04-23 08:52:25 +08:00
|
|
|
|
2016-06-30 23:53:46 +08:00
|
|
|
/*
|
|
|
|
* We allow neither:
|
|
|
|
* - zero length
|
|
|
|
* - overflow/wrap-around
|
|
|
|
*/
|
|
|
|
if (lblock + len <= lblock)
|
2012-03-12 11:30:16 +08:00
|
|
|
return 0;
|
2020-07-28 21:04:34 +08:00
|
|
|
return ext4_inode_block_valid(inode, block, len);
|
2009-03-12 21:51:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int ext4_valid_extent_idx(struct inode *inode,
|
|
|
|
struct ext4_extent_idx *ext_idx)
|
|
|
|
{
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_fsblk_t block = ext4_idx_pblock(ext_idx);
|
2009-04-23 08:52:25 +08:00
|
|
|
|
2020-07-28 21:04:34 +08:00
|
|
|
return ext4_inode_block_valid(inode, block, 1);
|
2009-03-12 21:51:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int ext4_valid_extent_entries(struct inode *inode,
|
2020-03-29 07:33:43 +08:00
|
|
|
struct ext4_extent_header *eh,
|
2021-09-08 20:08:49 +08:00
|
|
|
ext4_lblk_t lblk, ext4_fsblk_t *pblk,
|
|
|
|
int depth)
|
2009-03-12 21:51:20 +08:00
|
|
|
{
|
|
|
|
unsigned short entries;
|
2021-09-08 20:08:48 +08:00
|
|
|
ext4_lblk_t lblock = 0;
|
2022-05-18 20:08:16 +08:00
|
|
|
ext4_lblk_t cur = 0;
|
2021-09-08 20:08:48 +08:00
|
|
|
|
2009-03-12 21:51:20 +08:00
|
|
|
if (eh->eh_entries == 0)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
entries = le16_to_cpu(eh->eh_entries);
|
|
|
|
|
|
|
|
if (depth == 0) {
|
|
|
|
/* leaf entries */
|
2011-10-29 21:23:38 +08:00
|
|
|
struct ext4_extent *ext = EXT_FIRST_EXTENT(eh);
|
2021-09-08 20:08:49 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The logical block in the first entry should equal to
|
|
|
|
* the number in the index block.
|
|
|
|
*/
|
|
|
|
if (depth != ext_depth(inode) &&
|
|
|
|
lblk != le32_to_cpu(ext->ee_block))
|
|
|
|
return 0;
|
2009-03-12 21:51:20 +08:00
|
|
|
while (entries) {
|
|
|
|
if (!ext4_valid_extent(inode, ext))
|
|
|
|
return 0;
|
2013-12-04 10:22:21 +08:00
|
|
|
|
|
|
|
/* Check for overlapping extents */
|
|
|
|
lblock = le32_to_cpu(ext->ee_block);
|
2022-05-18 20:08:16 +08:00
|
|
|
if (lblock < cur) {
|
2020-03-29 07:33:43 +08:00
|
|
|
*pblk = ext4_ext_pblock(ext);
|
2013-12-04 10:22:21 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2022-05-18 20:08:16 +08:00
|
|
|
cur = lblock + ext4_ext_get_actual_len(ext);
|
2009-03-12 21:51:20 +08:00
|
|
|
ext++;
|
|
|
|
entries--;
|
|
|
|
}
|
|
|
|
} else {
|
2011-10-29 21:23:38 +08:00
|
|
|
struct ext4_extent_idx *ext_idx = EXT_FIRST_INDEX(eh);
|
2021-09-08 20:08:49 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The logical block in the first entry should equal to
|
|
|
|
* the number in the parent index block.
|
|
|
|
*/
|
|
|
|
if (depth != ext_depth(inode) &&
|
|
|
|
lblk != le32_to_cpu(ext_idx->ei_block))
|
|
|
|
return 0;
|
2009-03-12 21:51:20 +08:00
|
|
|
while (entries) {
|
|
|
|
if (!ext4_valid_extent_idx(inode, ext_idx))
|
|
|
|
return 0;
|
2021-09-08 20:08:48 +08:00
|
|
|
|
|
|
|
/* Check for overlapping index extents */
|
|
|
|
lblock = le32_to_cpu(ext_idx->ei_block);
|
2022-05-18 20:08:16 +08:00
|
|
|
if (lblock < cur) {
|
2021-09-08 20:08:48 +08:00
|
|
|
*pblk = ext4_idx_pblock(ext_idx);
|
|
|
|
return 0;
|
|
|
|
}
|
2009-03-12 21:51:20 +08:00
|
|
|
ext_idx++;
|
|
|
|
entries--;
|
2022-05-18 20:08:16 +08:00
|
|
|
cur = lblock + 1;
|
2009-03-12 21:51:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2010-07-27 23:56:40 +08:00
|
|
|
static int __ext4_ext_check(const char *function, unsigned int line,
|
|
|
|
struct inode *inode, struct ext4_extent_header *eh,
|
2021-09-08 20:08:49 +08:00
|
|
|
int depth, ext4_fsblk_t pblk, ext4_lblk_t lblk)
|
2007-07-18 21:19:09 +08:00
|
|
|
{
|
|
|
|
const char *error_msg;
|
2015-10-18 04:16:04 +08:00
|
|
|
int max = 0, err = -EFSCORRUPTED;
|
2007-07-18 21:19:09 +08:00
|
|
|
|
|
|
|
if (unlikely(eh->eh_magic != EXT4_EXT_MAGIC)) {
|
|
|
|
error_msg = "invalid magic";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
|
|
|
if (unlikely(le16_to_cpu(eh->eh_depth) != depth)) {
|
|
|
|
error_msg = "unexpected eh_depth";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
|
|
|
if (unlikely(eh->eh_max == 0)) {
|
|
|
|
error_msg = "invalid eh_max";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
|
|
|
max = ext4_ext_max_entries(inode, depth);
|
|
|
|
if (unlikely(le16_to_cpu(eh->eh_max) > max)) {
|
|
|
|
error_msg = "too large eh_max";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
|
|
|
if (unlikely(le16_to_cpu(eh->eh_entries) > le16_to_cpu(eh->eh_max))) {
|
|
|
|
error_msg = "invalid eh_entries";
|
|
|
|
goto corrupted;
|
2022-08-22 17:42:35 +08:00
|
|
|
}
|
|
|
|
if (unlikely((eh->eh_entries == 0) && (depth > 0))) {
|
|
|
|
error_msg = "eh_entries is 0 but eh_depth is > 0";
|
|
|
|
goto corrupted;
|
2007-07-18 21:19:09 +08:00
|
|
|
}
|
2021-09-08 20:08:49 +08:00
|
|
|
if (!ext4_valid_extent_entries(inode, eh, lblk, &pblk, depth)) {
|
2009-03-12 21:51:20 +08:00
|
|
|
error_msg = "invalid extent entries";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
2016-07-15 12:22:07 +08:00
|
|
|
if (unlikely(depth > 32)) {
|
|
|
|
error_msg = "too large eh_depth";
|
|
|
|
goto corrupted;
|
|
|
|
}
|
2012-04-30 06:37:10 +08:00
|
|
|
/* Verify checksum on non-root extent tree nodes */
|
|
|
|
if (ext_depth(inode) != depth &&
|
|
|
|
!ext4_extent_block_csum_verify(inode, eh)) {
|
|
|
|
error_msg = "extent tree corrupted";
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSBADCRC;
|
2012-04-30 06:37:10 +08:00
|
|
|
goto corrupted;
|
|
|
|
}
|
2007-07-18 21:19:09 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
corrupted:
|
2020-03-29 07:33:43 +08:00
|
|
|
ext4_error_inode_err(inode, function, line, 0, -err,
|
|
|
|
"pblk %llu bad header/extent: %s - magic %x, "
|
|
|
|
"entries %u, max %u(%u), depth %u(%u)",
|
|
|
|
(unsigned long long) pblk, error_msg,
|
|
|
|
le16_to_cpu(eh->eh_magic),
|
|
|
|
le16_to_cpu(eh->eh_entries),
|
|
|
|
le16_to_cpu(eh->eh_max),
|
|
|
|
max, le16_to_cpu(eh->eh_depth), depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
return err;
|
2007-07-18 21:19:09 +08:00
|
|
|
}
|
|
|
|
|
2013-08-17 09:21:41 +08:00
|
|
|
#define ext4_ext_check(inode, eh, depth, pblk) \
|
2021-09-08 20:08:49 +08:00
|
|
|
__ext4_ext_check(__func__, __LINE__, (inode), (eh), (depth), (pblk), 0)
|
2007-07-18 21:19:09 +08:00
|
|
|
|
2009-03-28 04:39:58 +08:00
|
|
|
int ext4_ext_check_inode(struct inode *inode)
|
|
|
|
{
|
2013-08-17 09:21:41 +08:00
|
|
|
return ext4_ext_check(inode, ext_inode_hdr(inode), ext_depth(inode), 0);
|
2009-03-28 04:39:58 +08:00
|
|
|
}
|
|
|
|
|
ext4: fix extent_status fragmentation for plain files
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06 20:25:02 +08:00
|
|
|
static void ext4_cache_extents(struct inode *inode,
|
|
|
|
struct ext4_extent_header *eh)
|
|
|
|
{
|
|
|
|
struct ext4_extent *ex = EXT_FIRST_EXTENT(eh);
|
|
|
|
ext4_lblk_t prev = 0;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = le16_to_cpu(eh->eh_entries); i > 0; i--, ex++) {
|
|
|
|
unsigned int status = EXTENT_STATUS_WRITTEN;
|
|
|
|
ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
|
|
|
|
int len = ext4_ext_get_actual_len(ex);
|
|
|
|
|
|
|
|
if (prev && (prev != lblk))
|
|
|
|
ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
|
|
|
|
EXTENT_STATUS_HOLE);
|
|
|
|
|
|
|
|
if (ext4_ext_is_unwritten(ex))
|
|
|
|
status = EXTENT_STATUS_UNWRITTEN;
|
|
|
|
ext4_es_cache_extent(inode, lblk, len,
|
|
|
|
ext4_ext_pblock(ex), status);
|
|
|
|
prev = lblk + len;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-08-17 09:20:41 +08:00
|
|
|
static struct buffer_head *
|
|
|
|
__read_extent_tree_block(const char *function, unsigned int line,
|
2021-09-08 20:08:49 +08:00
|
|
|
struct inode *inode, struct ext4_extent_idx *idx,
|
|
|
|
int depth, int flags)
|
2012-04-30 06:21:10 +08:00
|
|
|
{
|
2013-08-17 09:20:41 +08:00
|
|
|
struct buffer_head *bh;
|
|
|
|
int err;
|
2020-05-08 01:50:28 +08:00
|
|
|
gfp_t gfp_flags = __GFP_MOVABLE | GFP_NOFS;
|
2021-09-08 20:08:49 +08:00
|
|
|
ext4_fsblk_t pblk;
|
2020-05-08 01:50:28 +08:00
|
|
|
|
|
|
|
if (flags & EXT4_EX_NOFAIL)
|
|
|
|
gfp_flags |= __GFP_NOFAIL;
|
2013-08-17 09:20:41 +08:00
|
|
|
|
2021-09-08 20:08:49 +08:00
|
|
|
pblk = ext4_idx_pblock(idx);
|
2020-05-08 01:50:28 +08:00
|
|
|
bh = sb_getblk_gfp(inode->i_sb, pblk, gfp_flags);
|
2013-08-17 09:20:41 +08:00
|
|
|
if (unlikely(!bh))
|
|
|
|
return ERR_PTR(-ENOMEM);
|
2012-04-30 06:21:10 +08:00
|
|
|
|
2013-08-17 09:20:41 +08:00
|
|
|
if (!bh_uptodate_or_lock(bh)) {
|
|
|
|
trace_ext4_ext_load_extent(inode, pblk, _RET_IP_);
|
2020-09-24 15:33:33 +08:00
|
|
|
err = ext4_read_bh(bh, 0, NULL);
|
2013-08-17 09:20:41 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto errout;
|
|
|
|
}
|
2013-08-17 10:05:14 +08:00
|
|
|
if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE))
|
2013-08-17 09:20:41 +08:00
|
|
|
return bh;
|
2021-09-08 20:08:49 +08:00
|
|
|
err = __ext4_ext_check(function, line, inode, ext_block_hdr(bh),
|
|
|
|
depth, pblk, le32_to_cpu(idx->ei_block));
|
2020-07-28 21:04:34 +08:00
|
|
|
if (err)
|
|
|
|
goto errout;
|
2012-04-30 06:21:10 +08:00
|
|
|
set_buffer_verified(bh);
|
2013-08-17 09:23:41 +08:00
|
|
|
/*
|
|
|
|
* If this is a leaf block, cache all of its entries
|
|
|
|
*/
|
|
|
|
if (!(flags & EXT4_EX_NOCACHE) && depth == 0) {
|
|
|
|
struct ext4_extent_header *eh = ext_block_hdr(bh);
|
ext4: fix extent_status fragmentation for plain files
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06 20:25:02 +08:00
|
|
|
ext4_cache_extents(inode, eh);
|
2013-08-17 09:23:41 +08:00
|
|
|
}
|
2013-08-17 09:20:41 +08:00
|
|
|
return bh;
|
|
|
|
errout:
|
|
|
|
put_bh(bh);
|
|
|
|
return ERR_PTR(err);
|
|
|
|
|
2012-04-30 06:21:10 +08:00
|
|
|
}
|
|
|
|
|
2021-09-08 20:08:49 +08:00
|
|
|
#define read_extent_tree_block(inode, idx, depth, flags) \
|
|
|
|
__read_extent_tree_block(__func__, __LINE__, (inode), (idx), \
|
2013-08-17 09:23:41 +08:00
|
|
|
(depth), (flags))
|
2012-04-30 06:21:10 +08:00
|
|
|
|
2013-08-17 10:05:14 +08:00
|
|
|
/*
|
|
|
|
* This function is called to cache a file's extent information in the
|
|
|
|
* extent status tree
|
|
|
|
*/
|
|
|
|
int ext4_ext_precache(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct ext4_inode_info *ei = EXT4_I(inode);
|
|
|
|
struct ext4_ext_path *path = NULL;
|
|
|
|
struct buffer_head *bh;
|
|
|
|
int i = 0, depth, ret = 0;
|
|
|
|
|
|
|
|
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
|
|
|
|
return 0; /* not an extent-mapped inode */
|
|
|
|
|
|
|
|
down_read(&ei->i_data_sem);
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
|
2020-02-28 17:26:55 +08:00
|
|
|
/* Don't cache anything if there are no external extent blocks */
|
|
|
|
if (!depth) {
|
|
|
|
up_read(&ei->i_data_sem);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:03:40 +08:00
|
|
|
path = kcalloc(depth + 1, sizeof(struct ext4_ext_path),
|
2013-08-17 10:05:14 +08:00
|
|
|
GFP_NOFS);
|
|
|
|
if (path == NULL) {
|
|
|
|
up_read(&ei->i_data_sem);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
path[0].p_hdr = ext_inode_hdr(inode);
|
|
|
|
ret = ext4_ext_check(inode, path[0].p_hdr, depth, 0);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
path[0].p_idx = EXT_FIRST_INDEX(path[0].p_hdr);
|
|
|
|
while (i >= 0) {
|
|
|
|
/*
|
|
|
|
* If this is a leaf block or we've reached the end of
|
|
|
|
* the index block, go up
|
|
|
|
*/
|
|
|
|
if ((i == depth) ||
|
|
|
|
path[i].p_idx > EXT_LAST_INDEX(path[i].p_hdr)) {
|
|
|
|
brelse(path[i].p_bh);
|
|
|
|
path[i].p_bh = NULL;
|
|
|
|
i--;
|
|
|
|
continue;
|
|
|
|
}
|
2021-09-08 20:08:49 +08:00
|
|
|
bh = read_extent_tree_block(inode, path[i].p_idx++,
|
2013-08-17 10:05:14 +08:00
|
|
|
depth - i - 1,
|
|
|
|
EXT4_EX_FORCE_CACHE);
|
|
|
|
if (IS_ERR(bh)) {
|
|
|
|
ret = PTR_ERR(bh);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
i++;
|
|
|
|
path[i].p_bh = bh;
|
|
|
|
path[i].p_hdr = ext_block_hdr(bh);
|
|
|
|
path[i].p_idx = EXT_FIRST_INDEX(path[i].p_hdr);
|
|
|
|
}
|
|
|
|
ext4_set_inode_state(inode, EXT4_STATE_EXT_PRECACHED);
|
|
|
|
out:
|
|
|
|
up_read(&ei->i_data_sem);
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2013-08-17 10:05:14 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
#ifdef EXT_DEBUG
|
|
|
|
static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
int k, l = path->p_depth;
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "path:");
|
2006-10-11 16:21:03 +08:00
|
|
|
for (k = 0; k <= l; k++, path++) {
|
|
|
|
if (path->p_idx) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " %d->%llu",
|
2020-01-01 02:04:43 +08:00
|
|
|
le32_to_cpu(path->p_idx->ei_block),
|
|
|
|
ext4_idx_pblock(path->p_idx));
|
2006-10-11 16:21:03 +08:00
|
|
|
} else if (path->p_ext) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " %d:[%d]%d:%llu ",
|
2006-10-11 16:21:03 +08:00
|
|
|
le32_to_cpu(path->p_ext->ee_block),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(path->p_ext),
|
2007-07-18 09:42:41 +08:00
|
|
|
ext4_ext_get_actual_len(path->p_ext),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_pblock(path->p_ext));
|
2006-10-11 16:21:03 +08:00
|
|
|
} else
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " []");
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "\n");
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void ext4_ext_show_leaf(struct inode *inode, struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
int depth = ext_depth(inode);
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!path)
|
|
|
|
return;
|
|
|
|
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
ex = EXT_FIRST_EXTENT(eh);
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "Displaying leaf extents\n");
|
2009-09-19 01:34:55 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ex++) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "%d:[%d]%d:%llu ", le32_to_cpu(ex->ee_block),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(ex),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_get_actual_len(ex), ext4_ext_pblock(ex));
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "\n");
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2011-05-26 05:41:48 +08:00
|
|
|
|
|
|
|
static void ext4_ext_show_move(struct inode *inode, struct ext4_ext_path *path,
|
|
|
|
ext4_fsblk_t newblock, int level)
|
|
|
|
{
|
|
|
|
int depth = ext_depth(inode);
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
|
|
|
|
if (depth != level) {
|
|
|
|
struct ext4_extent_idx *idx;
|
|
|
|
idx = path[level].p_idx;
|
|
|
|
while (idx <= EXT_MAX_INDEX(path[level].p_hdr)) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "%d: move %d:%llu in new index %llu\n",
|
|
|
|
level, le32_to_cpu(idx->ei_block),
|
|
|
|
ext4_idx_pblock(idx), newblock);
|
2011-05-26 05:41:48 +08:00
|
|
|
idx++;
|
|
|
|
}
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
while (ex <= EXT_MAX_EXTENT(path[depth].p_hdr)) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "move %d:%llu:[%d]%d in new leaf %llu\n",
|
2011-05-26 05:41:48 +08:00
|
|
|
le32_to_cpu(ex->ee_block),
|
|
|
|
ext4_ext_pblock(ex),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(ex),
|
2011-05-26 05:41:48 +08:00
|
|
|
ext4_ext_get_actual_len(ex),
|
|
|
|
newblock);
|
|
|
|
ex++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
#else
|
2008-09-09 10:25:24 +08:00
|
|
|
#define ext4_ext_show_path(inode, path)
|
|
|
|
#define ext4_ext_show_leaf(inode, path)
|
2011-05-26 05:41:48 +08:00
|
|
|
#define ext4_ext_show_move(inode, path, newblock, level)
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_binsearch_idx:
|
|
|
|
* binary search for the closest index of the given block
|
2007-07-18 21:19:09 +08:00
|
|
|
* the header must be checked before calling this
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
static void
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_ext_binsearch_idx(struct inode *inode,
|
|
|
|
struct ext4_ext_path *path, ext4_lblk_t block)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh = path->p_hdr;
|
|
|
|
struct ext4_extent_idx *r, *l, *m;
|
|
|
|
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "binsearch for %u(idx): ", block);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
l = EXT_FIRST_INDEX(eh) + 1;
|
2007-07-18 21:09:15 +08:00
|
|
|
r = EXT_LAST_INDEX(eh);
|
2006-10-11 16:21:03 +08:00
|
|
|
while (l <= r) {
|
|
|
|
m = l + (r - l) / 2;
|
2021-09-03 14:27:46 +08:00
|
|
|
ext_debug(inode, "%p(%u):%p(%u):%p(%u) ", l,
|
|
|
|
le32_to_cpu(l->ei_block), m, le32_to_cpu(m->ei_block),
|
|
|
|
r, le32_to_cpu(r->ei_block));
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
if (block < le32_to_cpu(m->ei_block))
|
|
|
|
r = m - 1;
|
|
|
|
else
|
|
|
|
l = m + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
path->p_idx = l - 1;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " -> %u->%lld ", le32_to_cpu(path->p_idx->ei_block),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_idx_pblock(path->p_idx));
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
#ifdef CHECK_BINSEARCH
|
|
|
|
{
|
|
|
|
struct ext4_extent_idx *chix, *ix;
|
|
|
|
int k;
|
|
|
|
|
|
|
|
chix = ix = EXT_FIRST_INDEX(eh);
|
|
|
|
for (k = 0; k < le16_to_cpu(eh->eh_entries); k++, ix++) {
|
2020-01-01 02:04:43 +08:00
|
|
|
if (k != 0 && le32_to_cpu(ix->ei_block) <=
|
|
|
|
le32_to_cpu(ix[-1].ei_block)) {
|
2008-09-09 11:00:52 +08:00
|
|
|
printk(KERN_DEBUG "k=%d, ix=0x%p, "
|
|
|
|
"first=0x%p\n", k,
|
|
|
|
ix, EXT_FIRST_INDEX(eh));
|
|
|
|
printk(KERN_DEBUG "%u <= %u\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
le32_to_cpu(ix->ei_block),
|
|
|
|
le32_to_cpu(ix[-1].ei_block));
|
|
|
|
}
|
|
|
|
BUG_ON(k && le32_to_cpu(ix->ei_block)
|
2007-05-25 01:04:54 +08:00
|
|
|
<= le32_to_cpu(ix[-1].ei_block));
|
2006-10-11 16:21:03 +08:00
|
|
|
if (block < le32_to_cpu(ix->ei_block))
|
|
|
|
break;
|
|
|
|
chix = ix;
|
|
|
|
}
|
|
|
|
BUG_ON(chix != path->p_idx);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_binsearch:
|
|
|
|
* binary search for closest extent of the given block
|
2007-07-18 21:19:09 +08:00
|
|
|
* the header must be checked before calling this
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
static void
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_ext_binsearch(struct inode *inode,
|
|
|
|
struct ext4_ext_path *path, ext4_lblk_t block)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh = path->p_hdr;
|
|
|
|
struct ext4_extent *r, *l, *m;
|
|
|
|
|
|
|
|
if (eh->eh_entries == 0) {
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* this leaf is empty:
|
|
|
|
* we get such a leaf in split/add case
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "binsearch for %u: ", block);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
l = EXT_FIRST_EXTENT(eh) + 1;
|
2007-07-18 21:09:15 +08:00
|
|
|
r = EXT_LAST_EXTENT(eh);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
while (l <= r) {
|
|
|
|
m = l + (r - l) / 2;
|
2021-09-03 14:27:46 +08:00
|
|
|
ext_debug(inode, "%p(%u):%p(%u):%p(%u) ", l,
|
|
|
|
le32_to_cpu(l->ee_block), m, le32_to_cpu(m->ee_block),
|
|
|
|
r, le32_to_cpu(r->ee_block));
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
if (block < le32_to_cpu(m->ee_block))
|
|
|
|
r = m - 1;
|
|
|
|
else
|
|
|
|
l = m + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
path->p_ext = l - 1;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " -> %d:%llu:[%d]%d ",
|
2007-05-25 01:04:54 +08:00
|
|
|
le32_to_cpu(path->p_ext->ee_block),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_pblock(path->p_ext),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(path->p_ext),
|
2007-07-18 09:42:41 +08:00
|
|
|
ext4_ext_get_actual_len(path->p_ext));
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
#ifdef CHECK_BINSEARCH
|
|
|
|
{
|
|
|
|
struct ext4_extent *chex, *ex;
|
|
|
|
int k;
|
|
|
|
|
|
|
|
chex = ex = EXT_FIRST_EXTENT(eh);
|
|
|
|
for (k = 0; k < le16_to_cpu(eh->eh_entries); k++, ex++) {
|
|
|
|
BUG_ON(k && le32_to_cpu(ex->ee_block)
|
2007-05-25 01:04:54 +08:00
|
|
|
<= le32_to_cpu(ex[-1].ee_block));
|
2006-10-11 16:21:03 +08:00
|
|
|
if (block < le32_to_cpu(ex->ee_block))
|
|
|
|
break;
|
|
|
|
chex = ex;
|
|
|
|
}
|
|
|
|
BUG_ON(chex != path->p_ext);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2020-04-27 09:34:37 +08:00
|
|
|
void ext4_ext_tree_init(handle_t *handle, struct inode *inode)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
|
|
|
|
eh = ext_inode_hdr(inode);
|
|
|
|
eh->eh_depth = 0;
|
|
|
|
eh->eh_entries = 0;
|
|
|
|
eh->eh_magic = EXT4_EXT_MAGIC;
|
2009-08-28 22:40:33 +08:00
|
|
|
eh->eh_max = cpu_to_le16(ext4_ext_space_root(inode, 0));
|
2021-05-07 02:56:54 +08:00
|
|
|
eh->eh_generation = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_mark_inode_dirty(handle, inode);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct ext4_ext_path *
|
2014-09-02 02:43:09 +08:00
|
|
|
ext4_find_extent(struct inode *inode, ext4_lblk_t block,
|
|
|
|
struct ext4_ext_path **orig_path, int flags)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
struct buffer_head *bh;
|
ext4: teach ext4_ext_find_extent() to free path on error
Right now, there are a places where it is all to easy to leak memory
on an error path, via a usage like this:
struct ext4_ext_path *path = NULL
while (...) {
...
path = ext4_ext_find_extent(inode, block, path, 0);
if (IS_ERR(path)) {
/* oops, if path was non-NULL before the call to
ext4_ext_find_extent, we've leaked it! :-( */
...
return PTR_ERR(path);
}
...
}
Unfortunately, there some code paths where we are doing the following
instead:
path = ext4_ext_find_extent(inode, block, orig_path, 0);
and where it's important that we _not_ free orig_path in the case
where ext4_ext_find_extent() returns an error.
So change the function signature of ext4_ext_find_extent() so that it
takes a struct ext4_ext_path ** for its third argument, and by
default, on an error, it will free the struct ext4_ext_path, and then
zero out the struct ext4_ext_path * pointer. In order to avoid
causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
ext4_ext_find_extent() to use the original behavior of forcing the
caller to deal with freeing the original path pointer on the error
case.
The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
allows for a gentle transition and makes the patches easier to verify.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-02 02:34:09 +08:00
|
|
|
struct ext4_ext_path *path = orig_path ? *orig_path : NULL;
|
|
|
|
short int depth, i, ppos = 0;
|
2013-01-13 05:19:36 +08:00
|
|
|
int ret;
|
2020-05-08 01:50:28 +08:00
|
|
|
gfp_t gfp_flags = GFP_NOFS;
|
|
|
|
|
|
|
|
if (flags & EXT4_EX_NOFAIL)
|
|
|
|
gfp_flags |= __GFP_NOFAIL;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
eh = ext_inode_hdr(inode);
|
2007-07-18 21:19:09 +08:00
|
|
|
depth = ext_depth(inode);
|
2018-06-15 00:55:10 +08:00
|
|
|
if (depth < 0 || depth > EXT4_MAX_EXTENT_DEPTH) {
|
|
|
|
EXT4_ERROR_INODE(inode, "inode has invalid extent depth: %d",
|
|
|
|
depth);
|
|
|
|
ret = -EFSCORRUPTED;
|
|
|
|
goto err;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2014-09-02 02:40:09 +08:00
|
|
|
if (path) {
|
2014-09-02 02:38:09 +08:00
|
|
|
ext4_ext_drop_refs(path);
|
2014-09-02 02:40:09 +08:00
|
|
|
if (depth > path[0].p_maxdepth) {
|
|
|
|
kfree(path);
|
|
|
|
*orig_path = path = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!path) {
|
2014-09-02 02:38:09 +08:00
|
|
|
/* account possible depth increase */
|
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:03:40 +08:00
|
|
|
path = kcalloc(depth + 2, sizeof(struct ext4_ext_path),
|
2020-05-08 01:50:28 +08:00
|
|
|
gfp_flags);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(!path))
|
2006-10-11 16:21:03 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2014-09-02 02:40:09 +08:00
|
|
|
path[0].p_maxdepth = depth + 1;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
path[0].p_hdr = eh;
|
2008-07-12 07:27:31 +08:00
|
|
|
path[0].p_bh = NULL;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2007-07-18 21:19:09 +08:00
|
|
|
i = depth;
|
ext4: fix extent_status fragmentation for plain files
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-06 20:25:02 +08:00
|
|
|
if (!(flags & EXT4_EX_NOCACHE) && depth == 0)
|
|
|
|
ext4_cache_extents(inode, eh);
|
2006-10-11 16:21:03 +08:00
|
|
|
/* walk through the tree */
|
|
|
|
while (i) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "depth %d: num %d, max %d\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
ppos, le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));
|
2007-07-18 21:19:09 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_ext_binsearch_idx(inode, path + ppos, block);
|
2010-10-28 09:30:14 +08:00
|
|
|
path[ppos].p_block = ext4_idx_pblock(path[ppos].p_idx);
|
2006-10-11 16:21:03 +08:00
|
|
|
path[ppos].p_depth = i;
|
|
|
|
path[ppos].p_ext = NULL;
|
|
|
|
|
2021-09-08 20:08:49 +08:00
|
|
|
bh = read_extent_tree_block(inode, path[ppos].p_idx, --i, flags);
|
2015-08-12 18:29:44 +08:00
|
|
|
if (IS_ERR(bh)) {
|
2013-08-17 09:20:41 +08:00
|
|
|
ret = PTR_ERR(bh);
|
2006-10-11 16:21:03 +08:00
|
|
|
goto err;
|
2013-01-13 05:19:36 +08:00
|
|
|
}
|
2013-08-17 09:20:41 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
eh = ext_block_hdr(bh);
|
|
|
|
ppos++;
|
|
|
|
path[ppos].p_bh = bh;
|
|
|
|
path[ppos].p_hdr = eh;
|
|
|
|
}
|
|
|
|
|
|
|
|
path[ppos].p_depth = i;
|
|
|
|
path[ppos].p_ext = NULL;
|
|
|
|
path[ppos].p_idx = NULL;
|
|
|
|
|
|
|
|
/* find extent */
|
|
|
|
ext4_ext_binsearch(inode, path + ppos, block);
|
2008-07-12 07:27:31 +08:00
|
|
|
/* if not an empty leaf */
|
|
|
|
if (path[ppos].p_ext)
|
2010-10-28 09:30:14 +08:00
|
|
|
path[ppos].p_block = ext4_ext_pblock(path[ppos].p_ext);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
ext4_ext_show_path(inode, path);
|
|
|
|
|
|
|
|
return path;
|
|
|
|
|
|
|
|
err:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2014-09-02 02:37:09 +08:00
|
|
|
if (orig_path)
|
|
|
|
*orig_path = NULL;
|
2013-01-13 05:19:36 +08:00
|
|
|
return ERR_PTR(ret);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_insert_index:
|
|
|
|
* insert new index [@logical;@ptr] into the block at @curp;
|
|
|
|
* check where to insert: before @curp or after @curp
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2010-10-28 09:30:14 +08:00
|
|
|
static int ext4_ext_insert_index(handle_t *handle, struct inode *inode,
|
|
|
|
struct ext4_ext_path *curp,
|
|
|
|
int logical, ext4_fsblk_t ptr)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_idx *ix;
|
|
|
|
int len, err;
|
|
|
|
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, curp);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(logical == le32_to_cpu(curp->p_idx->ei_block))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"logical %d == ei_block %d!",
|
|
|
|
logical, le32_to_cpu(curp->p_idx->ei_block));
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2011-07-18 11:43:42 +08:00
|
|
|
|
|
|
|
if (unlikely(le16_to_cpu(curp->p_hdr->eh_entries)
|
|
|
|
>= le16_to_cpu(curp->p_hdr->eh_max))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"eh_entries %d >= eh_max %d!",
|
|
|
|
le16_to_cpu(curp->p_hdr->eh_entries),
|
|
|
|
le16_to_cpu(curp->p_hdr->eh_max));
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2011-07-18 11:43:42 +08:00
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
if (logical > le32_to_cpu(curp->p_idx->ei_block)) {
|
|
|
|
/* insert after */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert new index %d after: %llu\n",
|
|
|
|
logical, ptr);
|
2006-10-11 16:21:03 +08:00
|
|
|
ix = curp->p_idx + 1;
|
|
|
|
} else {
|
|
|
|
/* insert before */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert new index %d before: %llu\n",
|
|
|
|
logical, ptr);
|
2006-10-11 16:21:03 +08:00
|
|
|
ix = curp->p_idx;
|
|
|
|
}
|
|
|
|
|
2011-10-27 23:52:18 +08:00
|
|
|
len = EXT_LAST_INDEX(curp->p_hdr) - ix + 1;
|
|
|
|
BUG_ON(len < 0);
|
|
|
|
if (len > 0) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert new index %d: "
|
2011-10-27 23:52:18 +08:00
|
|
|
"move %d indices from 0x%p to 0x%p\n",
|
|
|
|
logical, len, ix, ix + 1);
|
|
|
|
memmove(ix + 1, ix, len * sizeof(struct ext4_extent_idx));
|
|
|
|
}
|
|
|
|
|
2011-10-17 22:13:46 +08:00
|
|
|
if (unlikely(ix > EXT_MAX_INDEX(curp->p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode, "ix > EXT_MAX_INDEX!");
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2011-10-17 22:13:46 +08:00
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
ix->ei_block = cpu_to_le32(logical);
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_idx_store_pblock(ix, ptr);
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&curp->p_hdr->eh_entries, 1);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(ix > EXT_LAST_INDEX(curp->p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode, "ix > EXT_LAST_INDEX!");
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
err = ext4_ext_dirty(handle, inode, curp);
|
|
|
|
ext4_std_error(inode->i_sb, err);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_split:
|
|
|
|
* inserts new subtree into the path, using free index entry
|
|
|
|
* at depth @at:
|
|
|
|
* - allocates all needed blocks (new leaf and all intermediate index blocks)
|
|
|
|
* - makes decision where to split
|
|
|
|
* - moves remaining extents and index entries (right to the split point)
|
|
|
|
* into the newly allocated blocks
|
|
|
|
* - initializes subtree
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
static int ext4_ext_split(handle_t *handle, struct inode *inode,
|
2011-05-25 19:41:26 +08:00
|
|
|
unsigned int flags,
|
|
|
|
struct ext4_ext_path *path,
|
|
|
|
struct ext4_extent *newext, int at)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct buffer_head *bh = NULL;
|
|
|
|
int depth = ext_depth(inode);
|
|
|
|
struct ext4_extent_header *neh;
|
|
|
|
struct ext4_extent_idx *fidx;
|
|
|
|
int i = at, k, m, a;
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_fsblk_t newblock, oldblock;
|
2006-10-11 16:21:03 +08:00
|
|
|
__le32 border;
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_fsblk_t *ablocks = NULL; /* array of allocated blocks */
|
2020-05-08 01:50:28 +08:00
|
|
|
gfp_t gfp_flags = GFP_NOFS;
|
2006-10-11 16:21:03 +08:00
|
|
|
int err = 0;
|
2019-05-11 07:28:06 +08:00
|
|
|
size_t ext_size = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2020-05-08 01:50:28 +08:00
|
|
|
if (flags & EXT4_EX_NOFAIL)
|
|
|
|
gfp_flags |= __GFP_NOFAIL;
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/* make decision: where to split? */
|
2006-10-11 16:21:07 +08:00
|
|
|
/* FIXME: now decision is simplest: at current extent */
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* if current leaf will be split, then we should use
|
2006-10-11 16:21:03 +08:00
|
|
|
* border from split point */
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path[depth].p_ext > EXT_MAX_EXTENT(path[depth].p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode, "p_ext > EXT_MAX_EXTENT!");
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
if (path[depth].p_ext != EXT_MAX_EXTENT(path[depth].p_hdr)) {
|
|
|
|
border = path[depth].p_ext[1].ee_block;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "leaf will be split."
|
2006-10-11 16:21:03 +08:00
|
|
|
" next leaf starts at %d\n",
|
2007-05-25 01:04:54 +08:00
|
|
|
le32_to_cpu(border));
|
2006-10-11 16:21:03 +08:00
|
|
|
} else {
|
|
|
|
border = newext->ee_block;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "leaf will be added."
|
2006-10-11 16:21:03 +08:00
|
|
|
" next leaf starts at %d\n",
|
2007-05-25 01:04:54 +08:00
|
|
|
le32_to_cpu(border));
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* If error occurs, then we break processing
|
|
|
|
* and mark filesystem read-only. index won't
|
2006-10-11 16:21:03 +08:00
|
|
|
* be inserted and tree will be in consistent
|
2006-10-11 16:21:07 +08:00
|
|
|
* state. Next mount will repair buffers too.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* Get array to track all allocated blocks.
|
|
|
|
* We need this to handle errors and free blocks
|
|
|
|
* upon them.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2020-05-08 01:50:28 +08:00
|
|
|
ablocks = kcalloc(depth, sizeof(ext4_fsblk_t), gfp_flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (!ablocks)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* allocate all needed blocks */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "allocate %d blocks for indexes/leaf\n", depth - at);
|
2006-10-11 16:21:03 +08:00
|
|
|
for (a = 0; a < depth - at; a++) {
|
2008-07-12 07:27:31 +08:00
|
|
|
newblock = ext4_ext_new_meta_block(handle, inode, path,
|
2011-05-25 19:41:26 +08:00
|
|
|
newext, &err, flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (newblock == 0)
|
|
|
|
goto cleanup;
|
|
|
|
ablocks[a] = newblock;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize new leaf */
|
|
|
|
newblock = ablocks[--a];
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(newblock == 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "newblock == 0!");
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2015-07-02 13:34:07 +08:00
|
|
|
bh = sb_getblk_gfp(inode->i_sb, newblock, __GFP_MOVABLE | GFP_NOFS);
|
2013-01-13 05:28:47 +08:00
|
|
|
if (unlikely(!bh)) {
|
2013-01-13 05:19:36 +08:00
|
|
|
err = -ENOMEM;
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
lock_buffer(bh);
|
|
|
|
|
2021-08-16 17:57:04 +08:00
|
|
|
err = ext4_journal_get_create_access(handle, inode->i_sb, bh,
|
|
|
|
EXT4_JTR_NONE);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
neh = ext_block_hdr(bh);
|
|
|
|
neh->eh_entries = 0;
|
2009-08-28 22:40:33 +08:00
|
|
|
neh->eh_max = cpu_to_le16(ext4_ext_space_block(inode, 0));
|
2006-10-11 16:21:03 +08:00
|
|
|
neh->eh_magic = EXT4_EXT_MAGIC;
|
|
|
|
neh->eh_depth = 0;
|
2021-05-07 02:56:54 +08:00
|
|
|
neh->eh_generation = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* move remainder of path[depth] to the new leaf */
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path[depth].p_hdr->eh_entries !=
|
|
|
|
path[depth].p_hdr->eh_max)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "eh_entries %d != eh_max %d!",
|
|
|
|
path[depth].p_hdr->eh_entries,
|
|
|
|
path[depth].p_hdr->eh_max);
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
/* start copy from next extent */
|
2011-05-26 05:41:48 +08:00
|
|
|
m = EXT_MAX_EXTENT(path[depth].p_hdr) - path[depth].p_ext++;
|
|
|
|
ext4_ext_show_move(inode, path, newblock, depth);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (m) {
|
2011-05-26 05:41:48 +08:00
|
|
|
struct ext4_extent *ex;
|
|
|
|
ex = EXT_FIRST_EXTENT(neh);
|
|
|
|
memmove(ex, path[depth].p_ext, sizeof(struct ext4_extent) * m);
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&neh->eh_entries, m);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
2019-05-11 07:28:06 +08:00
|
|
|
/* zero out unused area in the extent block */
|
|
|
|
ext_size = sizeof(struct ext4_extent_header) +
|
|
|
|
sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries);
|
|
|
|
memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size);
|
2012-04-30 06:37:10 +08:00
|
|
|
ext4_extent_block_csum_set(inode, neh);
|
2006-10-11 16:21:03 +08:00
|
|
|
set_buffer_uptodate(bh);
|
|
|
|
unlock_buffer(bh);
|
|
|
|
|
2009-01-07 13:06:22 +08:00
|
|
|
err = ext4_handle_dirty_metadata(handle, inode, bh);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
brelse(bh);
|
|
|
|
bh = NULL;
|
|
|
|
|
|
|
|
/* correct old leaf */
|
|
|
|
if (m) {
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&path[depth].p_hdr->eh_entries, -m);
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* create intermediate indexes */
|
|
|
|
k = depth - at - 1;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(k < 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "k %d < 0!", k);
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
if (k)
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "create %d intermediate indices\n", k);
|
2006-10-11 16:21:03 +08:00
|
|
|
/* insert new index into current index block */
|
|
|
|
/* current depth stored in i var */
|
|
|
|
i = depth - 1;
|
|
|
|
while (k--) {
|
|
|
|
oldblock = newblock;
|
|
|
|
newblock = ablocks[--a];
|
2008-01-29 12:58:27 +08:00
|
|
|
bh = sb_getblk(inode->i_sb, newblock);
|
2013-01-13 05:28:47 +08:00
|
|
|
if (unlikely(!bh)) {
|
2013-01-13 05:19:36 +08:00
|
|
|
err = -ENOMEM;
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
lock_buffer(bh);
|
|
|
|
|
2021-08-16 17:57:04 +08:00
|
|
|
err = ext4_journal_get_create_access(handle, inode->i_sb, bh,
|
|
|
|
EXT4_JTR_NONE);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
neh = ext_block_hdr(bh);
|
|
|
|
neh->eh_entries = cpu_to_le16(1);
|
|
|
|
neh->eh_magic = EXT4_EXT_MAGIC;
|
2009-08-28 22:40:33 +08:00
|
|
|
neh->eh_max = cpu_to_le16(ext4_ext_space_block_idx(inode, 0));
|
2006-10-11 16:21:03 +08:00
|
|
|
neh->eh_depth = cpu_to_le16(depth - i);
|
2021-05-07 02:56:54 +08:00
|
|
|
neh->eh_generation = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
fidx = EXT_FIRST_INDEX(neh);
|
|
|
|
fidx->ei_block = border;
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_idx_store_pblock(fidx, oldblock);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "int.index at %d (block %llu): %u -> %llu\n",
|
2008-01-29 12:58:27 +08:00
|
|
|
i, newblock, le32_to_cpu(border), oldblock);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2011-05-26 05:41:48 +08:00
|
|
|
/* move remainder of path[i] to the new index block */
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(EXT_MAX_INDEX(path[i].p_hdr) !=
|
|
|
|
EXT_LAST_INDEX(path[i].p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"EXT_MAX_INDEX != EXT_LAST_INDEX ee_block %d!",
|
|
|
|
le32_to_cpu(path[i].p_ext->ee_block));
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2011-05-26 05:41:48 +08:00
|
|
|
/* start copy indexes */
|
|
|
|
m = EXT_MAX_INDEX(path[i].p_hdr) - path[i].p_idx++;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "cur 0x%p, last 0x%p\n", path[i].p_idx,
|
2011-05-26 05:41:48 +08:00
|
|
|
EXT_MAX_INDEX(path[i].p_hdr));
|
|
|
|
ext4_ext_show_move(inode, path, newblock, i);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (m) {
|
2011-05-26 05:41:48 +08:00
|
|
|
memmove(++fidx, path[i].p_idx,
|
2006-10-11 16:21:03 +08:00
|
|
|
sizeof(struct ext4_extent_idx) * m);
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&neh->eh_entries, m);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2019-05-11 07:28:06 +08:00
|
|
|
/* zero out unused area in the extent block */
|
|
|
|
ext_size = sizeof(struct ext4_extent_header) +
|
|
|
|
(sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries));
|
|
|
|
memset(bh->b_data + ext_size, 0,
|
|
|
|
inode->i_sb->s_blocksize - ext_size);
|
2012-04-30 06:37:10 +08:00
|
|
|
ext4_extent_block_csum_set(inode, neh);
|
2006-10-11 16:21:03 +08:00
|
|
|
set_buffer_uptodate(bh);
|
|
|
|
unlock_buffer(bh);
|
|
|
|
|
2009-01-07 13:06:22 +08:00
|
|
|
err = ext4_handle_dirty_metadata(handle, inode, bh);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
brelse(bh);
|
|
|
|
bh = NULL;
|
|
|
|
|
|
|
|
/* correct old index */
|
|
|
|
if (m) {
|
|
|
|
err = ext4_ext_get_access(handle, inode, path + i);
|
|
|
|
if (err)
|
|
|
|
goto cleanup;
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&path[i].p_hdr->eh_entries, -m);
|
2006-10-11 16:21:03 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + i);
|
|
|
|
if (err)
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
i--;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* insert new index */
|
|
|
|
err = ext4_ext_insert_index(handle, inode, path + at,
|
|
|
|
le32_to_cpu(border), newblock);
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
if (bh) {
|
|
|
|
if (buffer_locked(bh))
|
|
|
|
unlock_buffer(bh);
|
|
|
|
brelse(bh);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err) {
|
|
|
|
/* free all allocated blocks in error case */
|
|
|
|
for (i = 0; i < depth; i++) {
|
|
|
|
if (!ablocks[i])
|
|
|
|
continue;
|
2011-02-22 10:01:42 +08:00
|
|
|
ext4_free_blocks(handle, inode, NULL, ablocks[i], 1,
|
2009-11-23 20:17:05 +08:00
|
|
|
EXT4_FREE_BLOCKS_METADATA);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
kfree(ablocks);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_grow_indepth:
|
|
|
|
* implements tree growing procedure:
|
|
|
|
* - allocates new block
|
|
|
|
* - moves top-level data (index block or leaf) into the new block
|
|
|
|
* - initializes new top-level, creating index that points to the
|
|
|
|
* just created block
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
|
2014-10-02 10:57:09 +08:00
|
|
|
unsigned int flags)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *neh;
|
|
|
|
struct buffer_head *bh;
|
2014-10-02 10:57:09 +08:00
|
|
|
ext4_fsblk_t newblock, goal = 0;
|
|
|
|
struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es;
|
2006-10-11 16:21:03 +08:00
|
|
|
int err = 0;
|
2019-05-11 07:28:06 +08:00
|
|
|
size_t ext_size = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2014-10-02 10:57:09 +08:00
|
|
|
/* Try to prepend new index to old one */
|
|
|
|
if (ext_depth(inode))
|
|
|
|
goal = ext4_idx_pblock(EXT_FIRST_INDEX(ext_inode_hdr(inode)));
|
|
|
|
if (goal > le32_to_cpu(es->s_first_data_block)) {
|
|
|
|
flags |= EXT4_MB_HINT_TRY_GOAL;
|
|
|
|
goal--;
|
|
|
|
} else
|
|
|
|
goal = ext4_inode_to_goal_block(inode);
|
|
|
|
newblock = ext4_new_meta_blocks(handle, inode, goal, flags,
|
|
|
|
NULL, &err);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (newblock == 0)
|
|
|
|
return err;
|
|
|
|
|
2015-07-02 13:34:07 +08:00
|
|
|
bh = sb_getblk_gfp(inode->i_sb, newblock, __GFP_MOVABLE | GFP_NOFS);
|
2013-01-13 05:28:47 +08:00
|
|
|
if (unlikely(!bh))
|
2013-01-13 05:19:36 +08:00
|
|
|
return -ENOMEM;
|
2006-10-11 16:21:03 +08:00
|
|
|
lock_buffer(bh);
|
|
|
|
|
2021-08-16 17:57:04 +08:00
|
|
|
err = ext4_journal_get_create_access(handle, inode->i_sb, bh,
|
|
|
|
EXT4_JTR_NONE);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err) {
|
2006-10-11 16:21:03 +08:00
|
|
|
unlock_buffer(bh);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2019-05-11 07:28:06 +08:00
|
|
|
ext_size = sizeof(EXT4_I(inode)->i_data);
|
2006-10-11 16:21:03 +08:00
|
|
|
/* move top-level index/leaf into new block */
|
2019-05-11 07:28:06 +08:00
|
|
|
memmove(bh->b_data, EXT4_I(inode)->i_data, ext_size);
|
|
|
|
/* zero out unused area in the extent block */
|
|
|
|
memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* set size of new block */
|
|
|
|
neh = ext_block_hdr(bh);
|
|
|
|
/* old root could have indexes or leaves
|
|
|
|
* so calculate e_max right way */
|
|
|
|
if (ext_depth(inode))
|
2009-08-28 22:40:33 +08:00
|
|
|
neh->eh_max = cpu_to_le16(ext4_ext_space_block_idx(inode, 0));
|
2006-10-11 16:21:03 +08:00
|
|
|
else
|
2009-08-28 22:40:33 +08:00
|
|
|
neh->eh_max = cpu_to_le16(ext4_ext_space_block(inode, 0));
|
2006-10-11 16:21:03 +08:00
|
|
|
neh->eh_magic = EXT4_EXT_MAGIC;
|
2012-04-30 06:37:10 +08:00
|
|
|
ext4_extent_block_csum_set(inode, neh);
|
2006-10-11 16:21:03 +08:00
|
|
|
set_buffer_uptodate(bh);
|
2021-06-09 15:55:45 +08:00
|
|
|
set_buffer_verified(bh);
|
2006-10-11 16:21:03 +08:00
|
|
|
unlock_buffer(bh);
|
|
|
|
|
2009-01-07 13:06:22 +08:00
|
|
|
err = ext4_handle_dirty_metadata(handle, inode, bh);
|
2006-12-07 12:41:33 +08:00
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto out;
|
|
|
|
|
2011-10-22 13:26:05 +08:00
|
|
|
/* Update top-level index: num,max,pointer */
|
2006-10-11 16:21:03 +08:00
|
|
|
neh = ext_inode_hdr(inode);
|
2011-10-22 13:26:05 +08:00
|
|
|
neh->eh_entries = cpu_to_le16(1);
|
|
|
|
ext4_idx_store_pblock(EXT_FIRST_INDEX(neh), newblock);
|
|
|
|
if (neh->eh_depth == 0) {
|
|
|
|
/* Root extent block becomes index block */
|
|
|
|
neh->eh_max = cpu_to_le16(ext4_ext_space_root_idx(inode, 0));
|
|
|
|
EXT_FIRST_INDEX(neh)->ei_block =
|
|
|
|
EXT_FIRST_EXTENT(neh)->ee_block;
|
|
|
|
}
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "new root: num %d(%d), lblock %d, ptr %llu\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
le16_to_cpu(neh->eh_entries), le16_to_cpu(neh->eh_max),
|
2010-06-15 01:28:03 +08:00
|
|
|
le32_to_cpu(EXT_FIRST_INDEX(neh)->ei_block),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_idx_pblock(EXT_FIRST_INDEX(neh)));
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2012-09-27 21:37:53 +08:00
|
|
|
le16_add_cpu(&neh->eh_depth, 1);
|
2020-04-27 09:34:37 +08:00
|
|
|
err = ext4_mark_inode_dirty(handle, inode);
|
2006-10-11 16:21:03 +08:00
|
|
|
out:
|
|
|
|
brelse(bh);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_create_new_leaf:
|
|
|
|
* finds empty index and adds new leaf.
|
|
|
|
* if no free index is found, then it requests in-depth growing.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
|
2013-08-17 09:23:41 +08:00
|
|
|
unsigned int mb_flags,
|
|
|
|
unsigned int gb_flags,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2011-05-25 19:41:26 +08:00
|
|
|
struct ext4_extent *newext)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_ext_path *curp;
|
|
|
|
int depth, i, err = 0;
|
|
|
|
|
|
|
|
repeat:
|
|
|
|
i = depth = ext_depth(inode);
|
|
|
|
|
|
|
|
/* walk up to the tree and look for free index entry */
|
|
|
|
curp = path + depth;
|
|
|
|
while (i > 0 && !EXT_HAS_FREE_INDEX(curp)) {
|
|
|
|
i--;
|
|
|
|
curp--;
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* we use already allocated block for index block,
|
|
|
|
* so subsequent data blocks should be contiguous */
|
2006-10-11 16:21:03 +08:00
|
|
|
if (EXT_HAS_FREE_INDEX(curp)) {
|
|
|
|
/* if we found index with free entry, then use that
|
|
|
|
* entry: create all needed subtree and add new leaf */
|
2013-08-17 09:23:41 +08:00
|
|
|
err = ext4_ext_split(handle, inode, mb_flags, path, newext, i);
|
2008-07-12 07:27:31 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* refill path */
|
2014-09-02 02:43:09 +08:00
|
|
|
path = ext4_find_extent(inode,
|
2008-01-29 12:58:27 +08:00
|
|
|
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
|
2014-09-02 02:37:09 +08:00
|
|
|
ppath, gb_flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
err = PTR_ERR(path);
|
|
|
|
} else {
|
|
|
|
/* tree is full, time to grow in depth */
|
2014-10-02 10:57:09 +08:00
|
|
|
err = ext4_ext_grow_indepth(handle, inode, mb_flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* refill path */
|
2014-09-02 02:43:09 +08:00
|
|
|
path = ext4_find_extent(inode,
|
2008-01-29 12:58:27 +08:00
|
|
|
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
|
2014-09-02 02:37:09 +08:00
|
|
|
ppath, gb_flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (IS_ERR(path)) {
|
|
|
|
err = PTR_ERR(path);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* only first (depth 0 -> 1) produces free space;
|
|
|
|
* in all other cases we have to split the grown tree
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
if (path[depth].p_hdr->eh_entries == path[depth].p_hdr->eh_max) {
|
2006-10-11 16:21:07 +08:00
|
|
|
/* now we need to split */
|
2006-10-11 16:21:03 +08:00
|
|
|
goto repeat;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2008-01-29 12:58:27 +08:00
|
|
|
/*
|
|
|
|
* search the closest allocated block to the left for *logical
|
|
|
|
* and returns it at @logical + it's physical address at @phys
|
|
|
|
* if *logical is the smallest allocated block, the function
|
|
|
|
* returns 0 at @phys
|
|
|
|
* return value contains 0 (success) or error code
|
|
|
|
*/
|
2010-10-28 09:30:14 +08:00
|
|
|
static int ext4_ext_search_left(struct inode *inode,
|
|
|
|
struct ext4_ext_path *path,
|
|
|
|
ext4_lblk_t *logical, ext4_fsblk_t *phys)
|
2008-01-29 12:58:27 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_idx *ix;
|
|
|
|
struct ext4_extent *ex;
|
2008-01-29 12:58:27 +08:00
|
|
|
int depth, ee_len;
|
2008-01-29 12:58:27 +08:00
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path == NULL)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
depth = path->p_depth;
|
|
|
|
*phys = 0;
|
|
|
|
|
|
|
|
if (depth == 0 && path->p_ext == NULL)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* usually extent in the path covers blocks smaller
|
|
|
|
* then *logical, but it can be that extent is the
|
|
|
|
* first one in the file */
|
|
|
|
|
|
|
|
ex = path[depth].p_ext;
|
2008-01-29 12:58:27 +08:00
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2008-01-29 12:58:27 +08:00
|
|
|
if (*logical < le32_to_cpu(ex->ee_block)) {
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"EXT_FIRST_EXTENT != ex *logical %d ee_block %d!",
|
|
|
|
*logical, le32_to_cpu(ex->ee_block));
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
while (--depth >= 0) {
|
|
|
|
ix = path[depth].p_idx;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"ix (%d) != EXT_FIRST_INDEX (%d) (depth %d)!",
|
2011-10-09 04:08:34 +08:00
|
|
|
ix != NULL ? le32_to_cpu(ix->ei_block) : 0,
|
2021-11-16 01:20:20 +08:00
|
|
|
le32_to_cpu(EXT_FIRST_INDEX(path[depth].p_hdr)->ei_block),
|
2010-03-03 00:46:09 +08:00
|
|
|
depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"logical %d < ee_block %d + ee_len %d!",
|
|
|
|
*logical, le32_to_cpu(ex->ee_block), ee_len);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
|
2008-01-29 12:58:27 +08:00
|
|
|
*logical = le32_to_cpu(ex->ee_block) + ee_len - 1;
|
2010-10-28 09:30:14 +08:00
|
|
|
*phys = ext4_ext_pblock(ex) + ee_len - 1;
|
2008-01-29 12:58:27 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-10-28 13:56:17 +08:00
|
|
|
* Search the closest allocated block to the right for *logical
|
|
|
|
* and returns it at @logical + it's physical address at @phys.
|
|
|
|
* If not exists, return 0 and @phys is set to 0. We will return
|
|
|
|
* 1 which means we found an allocated block and ret_ex is valid.
|
|
|
|
* Or return a (< 0) error code.
|
2008-01-29 12:58:27 +08:00
|
|
|
*/
|
2010-10-28 09:30:14 +08:00
|
|
|
static int ext4_ext_search_right(struct inode *inode,
|
|
|
|
struct ext4_ext_path *path,
|
2011-09-10 06:52:51 +08:00
|
|
|
ext4_lblk_t *logical, ext4_fsblk_t *phys,
|
2020-10-28 13:56:17 +08:00
|
|
|
struct ext4_extent *ret_ex)
|
2008-01-29 12:58:27 +08:00
|
|
|
{
|
|
|
|
struct buffer_head *bh = NULL;
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
struct ext4_extent_idx *ix;
|
|
|
|
struct ext4_extent *ex;
|
2009-03-11 06:18:47 +08:00
|
|
|
int depth; /* Note, NOT eh_depth; depth from top of tree */
|
|
|
|
int ee_len;
|
2008-01-29 12:58:27 +08:00
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path == NULL)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
depth = path->p_depth;
|
|
|
|
*phys = 0;
|
|
|
|
|
|
|
|
if (depth == 0 && path->p_ext == NULL)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* usually extent in the path covers blocks smaller
|
|
|
|
* then *logical, but it can be that extent is the
|
|
|
|
* first one in the file */
|
|
|
|
|
|
|
|
ex = path[depth].p_ext;
|
2008-01-29 12:58:27 +08:00
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2008-01-29 12:58:27 +08:00
|
|
|
if (*logical < le32_to_cpu(ex->ee_block)) {
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"first_extent(path[%d].p_hdr) != ex",
|
|
|
|
depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
while (--depth >= 0) {
|
|
|
|
ix = path[depth].p_idx;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"ix != EXT_FIRST_INDEX *logical %d!",
|
|
|
|
*logical);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
2011-09-10 06:52:51 +08:00
|
|
|
goto found_extent;
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
|
|
|
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"logical %d < ee_block %d + ee_len %d!",
|
|
|
|
*logical, le32_to_cpu(ex->ee_block), ee_len);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2008-01-29 12:58:27 +08:00
|
|
|
|
|
|
|
if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
|
|
|
|
/* next allocated block in this leaf */
|
|
|
|
ex++;
|
2011-09-10 06:52:51 +08:00
|
|
|
goto found_extent;
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* go up and search for index to the right */
|
|
|
|
while (--depth >= 0) {
|
|
|
|
ix = path[depth].p_idx;
|
|
|
|
if (ix != EXT_LAST_INDEX(path[depth].p_hdr))
|
2008-11-26 06:24:23 +08:00
|
|
|
goto got_index;
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
|
|
|
|
2008-11-26 06:24:23 +08:00
|
|
|
/* we've gone up to the root and found no index to the right */
|
|
|
|
return 0;
|
2008-01-29 12:58:27 +08:00
|
|
|
|
2008-11-26 06:24:23 +08:00
|
|
|
got_index:
|
2008-01-29 12:58:27 +08:00
|
|
|
/* we've found index to the right, let's
|
|
|
|
* follow it and find the closest allocated
|
|
|
|
* block to the right */
|
|
|
|
ix++;
|
|
|
|
while (++depth < path->p_depth) {
|
2009-03-11 06:18:47 +08:00
|
|
|
/* subtract from p_depth to get proper eh_depth */
|
2021-09-08 20:08:49 +08:00
|
|
|
bh = read_extent_tree_block(inode, ix, path->p_depth - depth, 0);
|
2013-08-17 09:20:41 +08:00
|
|
|
if (IS_ERR(bh))
|
|
|
|
return PTR_ERR(bh);
|
|
|
|
eh = ext_block_hdr(bh);
|
2008-01-29 12:58:27 +08:00
|
|
|
ix = EXT_FIRST_INDEX(eh);
|
|
|
|
put_bh(bh);
|
|
|
|
}
|
|
|
|
|
2021-09-08 20:08:49 +08:00
|
|
|
bh = read_extent_tree_block(inode, ix, path->p_depth - depth, 0);
|
2013-08-17 09:20:41 +08:00
|
|
|
if (IS_ERR(bh))
|
|
|
|
return PTR_ERR(bh);
|
2008-01-29 12:58:27 +08:00
|
|
|
eh = ext_block_hdr(bh);
|
|
|
|
ex = EXT_FIRST_EXTENT(eh);
|
2011-09-10 06:52:51 +08:00
|
|
|
found_extent:
|
2008-01-29 12:58:27 +08:00
|
|
|
*logical = le32_to_cpu(ex->ee_block);
|
2010-10-28 09:30:14 +08:00
|
|
|
*phys = ext4_ext_pblock(ex);
|
2020-10-28 13:56:17 +08:00
|
|
|
if (ret_ex)
|
|
|
|
*ret_ex = *ex;
|
2011-09-10 06:52:51 +08:00
|
|
|
if (bh)
|
|
|
|
put_bh(bh);
|
2020-10-28 13:56:17 +08:00
|
|
|
return 1;
|
2008-01-29 12:58:27 +08:00
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_next_allocated_block:
|
2011-06-06 12:05:17 +08:00
|
|
|
* returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
|
2006-10-11 16:21:07 +08:00
|
|
|
* NOTE: it considers block number from index entry as
|
|
|
|
* allocated block. Thus, index entries have to be consistent
|
|
|
|
* with leaves.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2014-08-31 11:52:19 +08:00
|
|
|
ext4_lblk_t
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_ext_next_allocated_block(struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
int depth;
|
|
|
|
|
|
|
|
BUG_ON(path == NULL);
|
|
|
|
depth = path->p_depth;
|
|
|
|
|
|
|
|
if (depth == 0 && path->p_ext == NULL)
|
2011-06-06 12:05:17 +08:00
|
|
|
return EXT_MAX_BLOCKS;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
while (depth >= 0) {
|
2020-01-01 02:04:43 +08:00
|
|
|
struct ext4_ext_path *p = &path[depth];
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
if (depth == path->p_depth) {
|
|
|
|
/* leaf */
|
2020-01-01 02:04:43 +08:00
|
|
|
if (p->p_ext && p->p_ext != EXT_LAST_EXTENT(p->p_hdr))
|
|
|
|
return le32_to_cpu(p->p_ext[1].ee_block);
|
2006-10-11 16:21:03 +08:00
|
|
|
} else {
|
|
|
|
/* index */
|
2020-01-01 02:04:43 +08:00
|
|
|
if (p->p_idx != EXT_LAST_INDEX(p->p_hdr))
|
|
|
|
return le32_to_cpu(p->p_idx[1].ei_block);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
depth--;
|
|
|
|
}
|
|
|
|
|
2011-06-06 12:05:17 +08:00
|
|
|
return EXT_MAX_BLOCKS;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_next_leaf_block:
|
2011-06-06 12:05:17 +08:00
|
|
|
* returns first allocated block from next leaf or EXT_MAX_BLOCKS
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2011-07-24 09:49:07 +08:00
|
|
|
static ext4_lblk_t ext4_ext_next_leaf_block(struct ext4_ext_path *path)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int depth;
|
|
|
|
|
|
|
|
BUG_ON(path == NULL);
|
|
|
|
depth = path->p_depth;
|
|
|
|
|
|
|
|
/* zero-tree has no leaf blocks at all */
|
|
|
|
if (depth == 0)
|
2011-06-06 12:05:17 +08:00
|
|
|
return EXT_MAX_BLOCKS;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* go to index block */
|
|
|
|
depth--;
|
|
|
|
|
|
|
|
while (depth >= 0) {
|
|
|
|
if (path[depth].p_idx !=
|
|
|
|
EXT_LAST_INDEX(path[depth].p_hdr))
|
2008-01-29 12:58:27 +08:00
|
|
|
return (ext4_lblk_t)
|
|
|
|
le32_to_cpu(path[depth].p_idx[1].ei_block);
|
2006-10-11 16:21:03 +08:00
|
|
|
depth--;
|
|
|
|
}
|
|
|
|
|
2011-06-06 12:05:17 +08:00
|
|
|
return EXT_MAX_BLOCKS;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_correct_indexes:
|
|
|
|
* if leaf gets modified and modified extent is first in the leaf,
|
|
|
|
* then we have to correct all indexes above.
|
2006-10-11 16:21:03 +08:00
|
|
|
* TODO: do we need to correct tree in all cases?
|
|
|
|
*/
|
2008-01-29 12:58:27 +08:00
|
|
|
static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
int depth = ext_depth(inode);
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
__le32 border;
|
|
|
|
int k, err = 0;
|
|
|
|
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
ex = path[depth].p_ext;
|
2010-03-03 00:46:09 +08:00
|
|
|
|
|
|
|
if (unlikely(ex == NULL || eh == NULL)) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"ex %p == NULL or eh %p == NULL", ex, eh);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
if (depth == 0) {
|
|
|
|
/* there is no tree at all */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ex != EXT_FIRST_EXTENT(eh)) {
|
|
|
|
/* we correct tree if first leaf got modified only */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* TODO: we need correction if border is smaller than current one
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
k = depth - 1;
|
|
|
|
border = path[depth].p_ext->ee_block;
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + k);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
|
|
|
path[k].p_idx->ei_block = border;
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + k);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
|
|
|
|
|
|
|
while (k--) {
|
|
|
|
/* change all left-side indexes */
|
|
|
|
if (path[k+1].p_idx != EXT_FIRST_INDEX(path[k+1].p_hdr))
|
|
|
|
break;
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + k);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
break;
|
|
|
|
path[k].p_idx->ei_block = border;
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + k);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-01-01 02:04:40 +08:00
|
|
|
static int ext4_can_extents_be_merged(struct inode *inode,
|
|
|
|
struct ext4_extent *ex1,
|
|
|
|
struct ext4_extent *ex2)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2013-11-04 22:58:26 +08:00
|
|
|
unsigned short ext1_ee_len, ext2_ee_len;
|
2007-07-18 09:42:41 +08:00
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
if (ext4_ext_is_unwritten(ex1) != ext4_ext_is_unwritten(ex2))
|
2007-07-18 09:42:41 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
ext1_ee_len = ext4_ext_get_actual_len(ex1);
|
|
|
|
ext2_ee_len = ext4_ext_get_actual_len(ex2);
|
|
|
|
|
|
|
|
if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
|
2006-10-11 16:21:24 +08:00
|
|
|
le32_to_cpu(ex2->ee_block))
|
2006-10-11 16:21:03 +08:00
|
|
|
return 0;
|
|
|
|
|
2013-11-04 22:58:26 +08:00
|
|
|
if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN)
|
2006-10-11 16:21:06 +08:00
|
|
|
return 0;
|
2019-11-05 20:02:39 +08:00
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
if (ext4_ext_is_unwritten(ex1) &&
|
2019-11-05 20:02:39 +08:00
|
|
|
ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN)
|
2014-02-21 10:17:35 +08:00
|
|
|
return 0;
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2008-01-29 12:58:27 +08:00
|
|
|
if (ext1_ee_len >= 4)
|
2006-10-11 16:21:03 +08:00
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
|
2010-10-28 09:30:14 +08:00
|
|
|
if (ext4_ext_pblock(ex1) + ext1_ee_len == ext4_ext_pblock(ex2))
|
2006-10-11 16:21:03 +08:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-07-18 09:42:38 +08:00
|
|
|
/*
|
|
|
|
* This function tries to merge the "ex" extent to the next extent in the tree.
|
|
|
|
* It always tries to merge towards right. If you want to merge towards
|
|
|
|
* left, pass "ex - 1" as argument instead of "ex".
|
|
|
|
* Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
|
|
|
|
* 1 if they got merged.
|
|
|
|
*/
|
2011-05-03 23:45:29 +08:00
|
|
|
static int ext4_ext_try_to_merge_right(struct inode *inode,
|
2010-10-28 09:30:14 +08:00
|
|
|
struct ext4_ext_path *path,
|
|
|
|
struct ext4_extent *ex)
|
2007-07-18 09:42:38 +08:00
|
|
|
{
|
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
unsigned int depth, len;
|
2014-04-21 11:45:47 +08:00
|
|
|
int merge_done = 0, unwritten;
|
2007-07-18 09:42:38 +08:00
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
BUG_ON(path[depth].p_hdr == NULL);
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
|
|
|
|
while (ex < EXT_LAST_EXTENT(eh)) {
|
|
|
|
if (!ext4_can_extents_be_merged(inode, ex, ex + 1))
|
|
|
|
break;
|
|
|
|
/* merge with next extent! */
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = ext4_ext_is_unwritten(ex);
|
2007-07-18 09:42:38 +08:00
|
|
|
ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
|
|
|
|
+ ext4_ext_get_actual_len(ex + 1));
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2007-07-18 09:42:38 +08:00
|
|
|
|
|
|
|
if (ex + 1 < EXT_LAST_EXTENT(eh)) {
|
|
|
|
len = (EXT_LAST_EXTENT(eh) - ex - 1)
|
|
|
|
* sizeof(struct ext4_extent);
|
|
|
|
memmove(ex + 1, ex + 2, len);
|
|
|
|
}
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&eh->eh_entries, -1);
|
2007-07-18 09:42:38 +08:00
|
|
|
merge_done = 1;
|
|
|
|
WARN_ON(eh->eh_entries == 0);
|
|
|
|
if (!eh->eh_entries)
|
2010-05-17 09:00:00 +08:00
|
|
|
EXT4_ERROR_INODE(inode, "eh->eh_entries = 0!");
|
2007-07-18 09:42:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return merge_done;
|
|
|
|
}
|
|
|
|
|
2012-08-17 21:44:17 +08:00
|
|
|
/*
|
|
|
|
* This function does a very simple check to see if we can collapse
|
|
|
|
* an extent tree with a single extent tree leaf block into the inode.
|
|
|
|
*/
|
|
|
|
static void ext4_ext_try_to_merge_up(handle_t *handle,
|
|
|
|
struct inode *inode,
|
|
|
|
struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
size_t s;
|
|
|
|
unsigned max_root = ext4_ext_space_root(inode, 0);
|
|
|
|
ext4_fsblk_t blk;
|
|
|
|
|
|
|
|
if ((path[0].p_depth != 1) ||
|
|
|
|
(le16_to_cpu(path[0].p_hdr->eh_entries) != 1) ||
|
|
|
|
(le16_to_cpu(path[1].p_hdr->eh_entries) > max_root))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to modify the block allocation bitmap and the block
|
|
|
|
* group descriptor to release the extent tree block. If we
|
|
|
|
* can't get the journal credits, give up.
|
|
|
|
*/
|
2019-11-06 00:44:29 +08:00
|
|
|
if (ext4_journal_extend(handle, 2,
|
|
|
|
ext4_free_metadata_revoke_credits(inode->i_sb, 1)))
|
2012-08-17 21:44:17 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Copy the extent data up to the inode
|
|
|
|
*/
|
|
|
|
blk = ext4_idx_pblock(path[0].p_idx);
|
|
|
|
s = le16_to_cpu(path[1].p_hdr->eh_entries) *
|
|
|
|
sizeof(struct ext4_extent_idx);
|
|
|
|
s += sizeof(struct ext4_extent_header);
|
|
|
|
|
2014-09-02 02:40:09 +08:00
|
|
|
path[1].p_maxdepth = path[0].p_maxdepth;
|
2012-08-17 21:44:17 +08:00
|
|
|
memcpy(path[0].p_hdr, path[1].p_hdr, s);
|
|
|
|
path[0].p_depth = 0;
|
|
|
|
path[0].p_ext = EXT_FIRST_EXTENT(path[0].p_hdr) +
|
|
|
|
(path[1].p_ext - EXT_FIRST_EXTENT(path[1].p_hdr));
|
|
|
|
path[0].p_hdr->eh_max = cpu_to_le16(max_root);
|
|
|
|
|
|
|
|
brelse(path[1].p_bh);
|
|
|
|
ext4_free_blocks(handle, inode, NULL, blk, 1,
|
2014-07-15 18:02:38 +08:00
|
|
|
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
|
2012-08-17 21:44:17 +08:00
|
|
|
}
|
|
|
|
|
2011-05-03 23:45:29 +08:00
|
|
|
/*
|
2020-01-01 02:04:41 +08:00
|
|
|
* This function tries to merge the @ex extent to neighbours in the tree, then
|
|
|
|
* tries to collapse the extent tree into the inode.
|
2011-05-03 23:45:29 +08:00
|
|
|
*/
|
2012-08-17 21:44:17 +08:00
|
|
|
static void ext4_ext_try_to_merge(handle_t *handle,
|
|
|
|
struct inode *inode,
|
2011-05-03 23:45:29 +08:00
|
|
|
struct ext4_ext_path *path,
|
2020-01-01 02:04:41 +08:00
|
|
|
struct ext4_extent *ex)
|
|
|
|
{
|
2011-05-03 23:45:29 +08:00
|
|
|
struct ext4_extent_header *eh;
|
|
|
|
unsigned int depth;
|
|
|
|
int merge_done = 0;
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
BUG_ON(path[depth].p_hdr == NULL);
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
|
|
|
|
if (ex > EXT_FIRST_EXTENT(eh))
|
|
|
|
merge_done = ext4_ext_try_to_merge_right(inode, path, ex - 1);
|
|
|
|
|
|
|
|
if (!merge_done)
|
2012-08-17 21:44:17 +08:00
|
|
|
(void) ext4_ext_try_to_merge_right(inode, path, ex);
|
2011-05-03 23:45:29 +08:00
|
|
|
|
2012-08-17 21:44:17 +08:00
|
|
|
ext4_ext_try_to_merge_up(handle, inode, path);
|
2011-05-03 23:45:29 +08:00
|
|
|
}
|
|
|
|
|
2007-05-25 01:04:13 +08:00
|
|
|
/*
|
|
|
|
* check if a portion of the "newext" extent overlaps with an
|
|
|
|
* existing extent.
|
|
|
|
*
|
|
|
|
* If there is an overlap discovered, it updates the length of the newext
|
|
|
|
* such that there will be no overlap, and then returns 1.
|
|
|
|
* If there is no overlap found, it returns 0.
|
|
|
|
*/
|
2011-09-10 06:52:51 +08:00
|
|
|
static unsigned int ext4_ext_check_overlap(struct ext4_sb_info *sbi,
|
|
|
|
struct inode *inode,
|
2010-10-28 09:30:14 +08:00
|
|
|
struct ext4_extent *newext,
|
|
|
|
struct ext4_ext_path *path)
|
2007-05-25 01:04:13 +08:00
|
|
|
{
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_lblk_t b1, b2;
|
2007-05-25 01:04:13 +08:00
|
|
|
unsigned int depth, len1;
|
|
|
|
unsigned int ret = 0;
|
|
|
|
|
|
|
|
b1 = le32_to_cpu(newext->ee_block);
|
2007-07-18 09:42:41 +08:00
|
|
|
len1 = ext4_ext_get_actual_len(newext);
|
2007-05-25 01:04:13 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
if (!path[depth].p_ext)
|
|
|
|
goto out;
|
2013-12-20 22:29:35 +08:00
|
|
|
b2 = EXT4_LBLK_CMASK(sbi, le32_to_cpu(path[depth].p_ext->ee_block));
|
2007-05-25 01:04:13 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* get the next allocated block if the extent in the path
|
2008-07-27 04:15:44 +08:00
|
|
|
* is before the requested block(s)
|
2007-05-25 01:04:13 +08:00
|
|
|
*/
|
|
|
|
if (b2 < b1) {
|
|
|
|
b2 = ext4_ext_next_allocated_block(path);
|
2011-06-06 12:05:17 +08:00
|
|
|
if (b2 == EXT_MAX_BLOCKS)
|
2007-05-25 01:04:13 +08:00
|
|
|
goto out;
|
2013-12-20 22:29:35 +08:00
|
|
|
b2 = EXT4_LBLK_CMASK(sbi, b2);
|
2007-05-25 01:04:13 +08:00
|
|
|
}
|
|
|
|
|
2008-01-29 12:58:27 +08:00
|
|
|
/* check for wrap through zero on extent logical start block*/
|
2007-05-25 01:04:13 +08:00
|
|
|
if (b1 + len1 < b1) {
|
2011-06-06 12:05:17 +08:00
|
|
|
len1 = EXT_MAX_BLOCKS - b1;
|
2007-05-25 01:04:13 +08:00
|
|
|
newext->ee_len = cpu_to_le16(len1);
|
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* check for overlap */
|
|
|
|
if (b1 + len1 > b2) {
|
|
|
|
newext->ee_len = cpu_to_le16(b2 - b1);
|
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_insert_extent:
|
2020-06-11 11:19:46 +08:00
|
|
|
* tries to merge requested extent into the existing extent or
|
2006-10-11 16:21:07 +08:00
|
|
|
* inserts requested extent as new one into the tree,
|
|
|
|
* creating new leaf in the no-space case.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2013-08-17 09:23:41 +08:00
|
|
|
struct ext4_extent *newext, int gb_flags)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2008-09-09 10:25:24 +08:00
|
|
|
struct ext4_extent_header *eh;
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_extent *ex, *fex;
|
|
|
|
struct ext4_extent *nearex; /* nearest extent */
|
|
|
|
struct ext4_ext_path *npath = NULL;
|
2008-01-29 12:58:27 +08:00
|
|
|
int depth, len, err;
|
|
|
|
ext4_lblk_t next;
|
2014-04-21 11:45:47 +08:00
|
|
|
int mb_flags = 0, unwritten;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2014-09-05 06:07:25 +08:00
|
|
|
if (gb_flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
|
|
|
|
mb_flags |= EXT4_MB_DELALLOC_RESERVED;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(ext4_ext_get_actual_len(newext) == 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "ext4_ext_get_actual_len(newext) == 0");
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
2013-04-04 11:33:28 +08:00
|
|
|
eh = path[depth].p_hdr;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path[depth].p_hdr == NULL)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* try to insert block into found extent and return */
|
2013-08-17 09:23:41 +08:00
|
|
|
if (ex && !(gb_flags & EXT4_GET_BLOCKS_PRE_IO)) {
|
2007-07-18 09:42:41 +08:00
|
|
|
|
|
|
|
/*
|
2013-04-04 11:33:28 +08:00
|
|
|
* Try to see whether we should rather test the extent on
|
|
|
|
* right from ex, or from the left of ex. This is because
|
2014-09-02 02:43:09 +08:00
|
|
|
* ext4_find_extent() can return either extent on the
|
2013-04-04 11:33:28 +08:00
|
|
|
* left, or on the right from the searched position. This
|
|
|
|
* will make merging more effective.
|
2007-07-18 09:42:41 +08:00
|
|
|
*/
|
2013-04-04 11:33:28 +08:00
|
|
|
if (ex < EXT_LAST_EXTENT(eh) &&
|
|
|
|
(le32_to_cpu(ex->ee_block) +
|
|
|
|
ext4_ext_get_actual_len(ex) <
|
|
|
|
le32_to_cpu(newext->ee_block))) {
|
|
|
|
ex += 1;
|
|
|
|
goto prepend;
|
|
|
|
} else if ((ex > EXT_FIRST_EXTENT(eh)) &&
|
|
|
|
(le32_to_cpu(newext->ee_block) +
|
|
|
|
ext4_ext_get_actual_len(newext) <
|
|
|
|
le32_to_cpu(ex->ee_block)))
|
|
|
|
ex -= 1;
|
|
|
|
|
|
|
|
/* Try to append newex to the ex */
|
|
|
|
if (ext4_can_extents_be_merged(inode, ex, newext)) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "append [%d]%d block to %u:[%d]%d"
|
2013-04-04 11:33:28 +08:00
|
|
|
"(from %llu)\n",
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2013-04-04 11:33:28 +08:00
|
|
|
ext4_ext_get_actual_len(newext),
|
|
|
|
le32_to_cpu(ex->ee_block),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(ex),
|
2013-04-04 11:33:28 +08:00
|
|
|
ext4_ext_get_actual_len(ex),
|
|
|
|
ext4_ext_pblock(ex));
|
|
|
|
err = ext4_ext_get_access(handle, inode,
|
|
|
|
path + depth);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = ext4_ext_is_unwritten(ex);
|
2013-04-04 11:33:28 +08:00
|
|
|
ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
|
2007-07-18 09:42:41 +08:00
|
|
|
+ ext4_ext_get_actual_len(newext));
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2013-04-04 11:33:28 +08:00
|
|
|
nearex = ex;
|
|
|
|
goto merge;
|
|
|
|
}
|
|
|
|
|
|
|
|
prepend:
|
|
|
|
/* Try to prepend newex to the ex */
|
|
|
|
if (ext4_can_extents_be_merged(inode, newext, ex)) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "prepend %u[%d]%d block to %u:[%d]%d"
|
2013-04-04 11:33:28 +08:00
|
|
|
"(from %llu)\n",
|
|
|
|
le32_to_cpu(newext->ee_block),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2013-04-04 11:33:28 +08:00
|
|
|
ext4_ext_get_actual_len(newext),
|
|
|
|
le32_to_cpu(ex->ee_block),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(ex),
|
2013-04-04 11:33:28 +08:00
|
|
|
ext4_ext_get_actual_len(ex),
|
|
|
|
ext4_ext_pblock(ex));
|
|
|
|
err = ext4_ext_get_access(handle, inode,
|
|
|
|
path + depth);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = ext4_ext_is_unwritten(ex);
|
2013-04-04 11:33:28 +08:00
|
|
|
ex->ee_block = newext->ee_block;
|
|
|
|
ext4_ext_store_pblock(ex, ext4_ext_pblock(newext));
|
|
|
|
ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
|
|
|
|
+ ext4_ext_get_actual_len(newext));
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2013-04-04 11:33:28 +08:00
|
|
|
nearex = ex;
|
|
|
|
goto merge;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
if (le16_to_cpu(eh->eh_entries) < le16_to_cpu(eh->eh_max))
|
|
|
|
goto has_space;
|
|
|
|
|
|
|
|
/* probably next leaf has space for us? */
|
|
|
|
fex = EXT_LAST_EXTENT(eh);
|
2011-07-12 06:24:01 +08:00
|
|
|
next = EXT_MAX_BLOCKS;
|
|
|
|
if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block))
|
2011-07-24 09:49:07 +08:00
|
|
|
next = ext4_ext_next_leaf_block(path);
|
2011-07-12 06:24:01 +08:00
|
|
|
if (next != EXT_MAX_BLOCKS) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "next leaf block - %u\n", next);
|
2006-10-11 16:21:03 +08:00
|
|
|
BUG_ON(npath != NULL);
|
2020-05-08 01:50:28 +08:00
|
|
|
npath = ext4_find_extent(inode, next, NULL, gb_flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (IS_ERR(npath))
|
|
|
|
return PTR_ERR(npath);
|
|
|
|
BUG_ON(npath->p_depth != path->p_depth);
|
|
|
|
eh = npath[depth].p_hdr;
|
|
|
|
if (le16_to_cpu(eh->eh_entries) < le16_to_cpu(eh->eh_max)) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "next leaf isn't full(%d)\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
le16_to_cpu(eh->eh_entries));
|
|
|
|
path = npath;
|
2011-07-11 23:43:59 +08:00
|
|
|
goto has_space;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "next leaf has no free space(%d,%d)\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* There is no free space in the found leaf.
|
|
|
|
* We're gonna add a new leaf in the tree.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2013-08-17 09:23:41 +08:00
|
|
|
if (gb_flags & EXT4_GET_BLOCKS_METADATA_NOFAIL)
|
2014-09-05 06:07:25 +08:00
|
|
|
mb_flags |= EXT4_MB_USE_RESERVED;
|
2013-08-17 09:23:41 +08:00
|
|
|
err = ext4_ext_create_new_leaf(handle, inode, mb_flags, gb_flags,
|
2014-09-02 02:37:09 +08:00
|
|
|
ppath, newext);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (err)
|
|
|
|
goto cleanup;
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
eh = path[depth].p_hdr;
|
|
|
|
|
|
|
|
has_space:
|
|
|
|
nearex = path[depth].p_ext;
|
|
|
|
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
goto cleanup;
|
|
|
|
|
|
|
|
if (!nearex) {
|
|
|
|
/* there is no extent in this leaf, create first one */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "first extent in the leaf: %u:%llu:[%d]%d\n",
|
2007-05-25 01:04:54 +08:00
|
|
|
le32_to_cpu(newext->ee_block),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_pblock(newext),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2007-07-18 09:42:41 +08:00
|
|
|
ext4_ext_get_actual_len(newext));
|
2011-10-27 23:52:18 +08:00
|
|
|
nearex = EXT_FIRST_EXTENT(eh);
|
|
|
|
} else {
|
|
|
|
if (le32_to_cpu(newext->ee_block)
|
2007-05-25 01:04:54 +08:00
|
|
|
> le32_to_cpu(nearex->ee_block)) {
|
2011-10-27 23:52:18 +08:00
|
|
|
/* Insert after */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert %u:%llu:[%d]%d before: "
|
2011-11-02 06:56:41 +08:00
|
|
|
"nearest %p\n",
|
2011-10-27 23:52:18 +08:00
|
|
|
le32_to_cpu(newext->ee_block),
|
|
|
|
ext4_ext_pblock(newext),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2011-10-27 23:52:18 +08:00
|
|
|
ext4_ext_get_actual_len(newext),
|
|
|
|
nearex);
|
|
|
|
nearex++;
|
|
|
|
} else {
|
|
|
|
/* Insert before */
|
|
|
|
BUG_ON(newext->ee_block == nearex->ee_block);
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert %u:%llu:[%d]%d after: "
|
2011-11-02 06:56:41 +08:00
|
|
|
"nearest %p\n",
|
2007-05-25 01:04:54 +08:00
|
|
|
le32_to_cpu(newext->ee_block),
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_pblock(newext),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2007-07-18 09:42:41 +08:00
|
|
|
ext4_ext_get_actual_len(newext),
|
2011-10-27 23:52:18 +08:00
|
|
|
nearex);
|
|
|
|
}
|
|
|
|
len = EXT_LAST_EXTENT(eh) - nearex + 1;
|
|
|
|
if (len > 0) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "insert %u:%llu:[%d]%d: "
|
2011-10-27 23:52:18 +08:00
|
|
|
"move %d extents from 0x%p to 0x%p\n",
|
|
|
|
le32_to_cpu(newext->ee_block),
|
|
|
|
ext4_ext_pblock(newext),
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_is_unwritten(newext),
|
2011-10-27 23:52:18 +08:00
|
|
|
ext4_ext_get_actual_len(newext),
|
|
|
|
len, nearex, nearex + 1);
|
|
|
|
memmove(nearex + 1, nearex,
|
|
|
|
len * sizeof(struct ext4_extent));
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&eh->eh_entries, 1);
|
2011-10-27 23:52:18 +08:00
|
|
|
path[depth].p_ext = nearex;
|
2006-10-11 16:21:03 +08:00
|
|
|
nearex->ee_block = newext->ee_block;
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_store_pblock(nearex, ext4_ext_pblock(newext));
|
2006-10-11 16:21:03 +08:00
|
|
|
nearex->ee_len = newext->ee_len;
|
|
|
|
|
|
|
|
merge:
|
2012-07-10 04:29:28 +08:00
|
|
|
/* try to merge extents */
|
2013-08-17 09:23:41 +08:00
|
|
|
if (!(gb_flags & EXT4_GET_BLOCKS_PRE_IO))
|
2012-08-17 21:44:17 +08:00
|
|
|
ext4_ext_try_to_merge(handle, inode, path, nearex);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
|
|
|
|
/* time to correct all indexes above */
|
|
|
|
err = ext4_ext_correct_indexes(handle, inode, path);
|
|
|
|
if (err)
|
|
|
|
goto cleanup;
|
|
|
|
|
2012-08-17 21:44:17 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
cleanup:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(npath);
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-08-12 04:32:41 +08:00
|
|
|
static int ext4_fill_es_cache_info(struct inode *inode,
|
|
|
|
ext4_lblk_t block, ext4_lblk_t num,
|
|
|
|
struct fiemap_extent_info *fieinfo)
|
|
|
|
{
|
|
|
|
ext4_lblk_t next, end = block + num - 1;
|
|
|
|
struct extent_status es;
|
|
|
|
unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
|
|
|
|
unsigned int flags;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
while (block <= end) {
|
|
|
|
next = 0;
|
|
|
|
flags = 0;
|
|
|
|
if (!ext4_es_lookup_extent(inode, block, &next, &es))
|
|
|
|
break;
|
|
|
|
if (ext4_es_is_unwritten(&es))
|
|
|
|
flags |= FIEMAP_EXTENT_UNWRITTEN;
|
|
|
|
if (ext4_es_is_delayed(&es))
|
|
|
|
flags |= (FIEMAP_EXTENT_DELALLOC |
|
|
|
|
FIEMAP_EXTENT_UNKNOWN);
|
|
|
|
if (ext4_es_is_hole(&es))
|
|
|
|
flags |= EXT4_FIEMAP_EXTENT_HOLE;
|
|
|
|
if (next == 0)
|
|
|
|
flags |= FIEMAP_EXTENT_LAST;
|
|
|
|
if (flags & (FIEMAP_EXTENT_DELALLOC|
|
|
|
|
EXT4_FIEMAP_EXTENT_HOLE))
|
|
|
|
es.es_pblk = 0;
|
|
|
|
else
|
|
|
|
es.es_pblk = ext4_es_pblock(&es);
|
|
|
|
err = fiemap_fill_next_extent(fieinfo,
|
|
|
|
(__u64)es.es_lblk << blksize_bits,
|
|
|
|
(__u64)es.es_pblk << blksize_bits,
|
|
|
|
(__u64)es.es_len << blksize_bits,
|
|
|
|
flags);
|
|
|
|
if (next == 0)
|
|
|
|
break;
|
|
|
|
block = next;
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
if (err == 1)
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
2016-03-10 11:46:57 +08:00
|
|
|
* ext4_ext_determine_hole - determine hole around given block
|
|
|
|
* @inode: inode we lookup in
|
|
|
|
* @path: path in extent tree to @lblk
|
|
|
|
* @lblk: pointer to logical block around which we want to determine hole
|
|
|
|
*
|
|
|
|
* Determine hole length (and start if easily possible) around given logical
|
|
|
|
* block. We don't try too hard to find the beginning of the hole but @path
|
|
|
|
* actually points to extent before @lblk, we provide it.
|
|
|
|
*
|
|
|
|
* The function returns the length of a hole starting at @lblk. We update @lblk
|
|
|
|
* to the beginning of the hole if we managed to find it.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2016-03-10 11:46:57 +08:00
|
|
|
static ext4_lblk_t ext4_ext_determine_hole(struct inode *inode,
|
|
|
|
struct ext4_ext_path *path,
|
|
|
|
ext4_lblk_t *lblk)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int depth = ext_depth(inode);
|
|
|
|
struct ext4_extent *ex;
|
2016-03-10 11:46:57 +08:00
|
|
|
ext4_lblk_t len;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
if (ex == NULL) {
|
2014-11-26 00:44:37 +08:00
|
|
|
/* there is no extent yet, so gap is [0;-] */
|
2016-03-10 11:46:57 +08:00
|
|
|
*lblk = 0;
|
2014-11-26 00:44:37 +08:00
|
|
|
len = EXT_MAX_BLOCKS;
|
2016-03-10 11:46:57 +08:00
|
|
|
} else if (*lblk < le32_to_cpu(ex->ee_block)) {
|
|
|
|
len = le32_to_cpu(ex->ee_block) - *lblk;
|
|
|
|
} else if (*lblk >= le32_to_cpu(ex->ee_block)
|
2007-07-18 09:42:41 +08:00
|
|
|
+ ext4_ext_get_actual_len(ex)) {
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_lblk_t next;
|
|
|
|
|
2016-03-10 11:46:57 +08:00
|
|
|
*lblk = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
2008-01-29 12:58:27 +08:00
|
|
|
next = ext4_ext_next_allocated_block(path);
|
2016-03-10 11:46:57 +08:00
|
|
|
BUG_ON(next == *lblk);
|
|
|
|
len = next - *lblk;
|
2006-10-11 16:21:03 +08:00
|
|
|
} else {
|
|
|
|
BUG();
|
|
|
|
}
|
2016-03-10 11:46:57 +08:00
|
|
|
return len;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2016-03-10 11:46:57 +08:00
|
|
|
/*
|
|
|
|
* ext4_ext_put_gap_in_cache:
|
|
|
|
* calculate boundaries of the gap that the requested block fits into
|
|
|
|
* and cache this gap
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ext4_ext_put_gap_in_cache(struct inode *inode, ext4_lblk_t hole_start,
|
|
|
|
ext4_lblk_t hole_len)
|
|
|
|
{
|
|
|
|
struct extent_status es;
|
|
|
|
|
2018-10-02 02:10:39 +08:00
|
|
|
ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start,
|
|
|
|
hole_start + hole_len - 1, &es);
|
2014-11-26 00:44:37 +08:00
|
|
|
if (es.es_len) {
|
|
|
|
/* There's delayed extent containing lblock? */
|
2016-03-10 11:46:57 +08:00
|
|
|
if (es.es_lblk <= hole_start)
|
2014-11-26 00:44:37 +08:00
|
|
|
return;
|
2016-03-10 11:46:57 +08:00
|
|
|
hole_len = min(es.es_lblk - hole_start, hole_len);
|
2014-11-26 00:44:37 +08:00
|
|
|
}
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " -> %u:%u\n", hole_start, hole_len);
|
2016-03-10 11:46:57 +08:00
|
|
|
ext4_es_insert_extent(inode, hole_start, hole_len, ~0,
|
|
|
|
EXTENT_STATUS_HOLE);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_rm_idx:
|
|
|
|
* removes index from the index block.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2008-01-29 12:58:27 +08:00
|
|
|
static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
|
2012-12-17 22:55:39 +08:00
|
|
|
struct ext4_ext_path *path, int depth)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
int err;
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_fsblk_t leaf;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* free index block */
|
2012-12-17 22:55:39 +08:00
|
|
|
depth--;
|
|
|
|
path = path + depth;
|
2010-10-28 09:30:14 +08:00
|
|
|
leaf = ext4_idx_pblock(path->p_idx);
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path->p_hdr->eh_entries == 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
2011-07-28 09:29:33 +08:00
|
|
|
|
|
|
|
if (path->p_idx != EXT_LAST_INDEX(path->p_hdr)) {
|
|
|
|
int len = EXT_LAST_INDEX(path->p_hdr) - path->p_idx;
|
|
|
|
len *= sizeof(struct ext4_extent_idx);
|
|
|
|
memmove(path->p_idx, path->p_idx + 1, len);
|
|
|
|
}
|
|
|
|
|
2008-04-17 22:38:59 +08:00
|
|
|
le16_add_cpu(&path->p_hdr->eh_entries, -1);
|
2006-12-07 12:41:33 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path);
|
|
|
|
if (err)
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "index is empty, remove it, free block %llu\n", leaf);
|
2011-09-10 07:18:51 +08:00
|
|
|
trace_ext4_ext_rm_idx(inode, leaf);
|
|
|
|
|
2011-02-22 10:01:42 +08:00
|
|
|
ext4_free_blocks(handle, inode, NULL, leaf, 1,
|
2009-11-23 20:17:05 +08:00
|
|
|
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
|
2012-12-17 22:55:39 +08:00
|
|
|
|
|
|
|
while (--depth >= 0) {
|
|
|
|
if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr))
|
|
|
|
break;
|
|
|
|
path--;
|
|
|
|
err = ext4_ext_get_access(handle, inode, path);
|
|
|
|
if (err)
|
|
|
|
break;
|
|
|
|
path->p_idx->ei_block = (path+1)->p_idx->ei_block;
|
|
|
|
err = ext4_ext_dirty(handle, inode, path);
|
|
|
|
if (err)
|
|
|
|
break;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2008-08-20 10:16:05 +08:00
|
|
|
* ext4_ext_calc_credits_for_single_extent:
|
|
|
|
* This routine returns max. credits that needed to insert an extent
|
|
|
|
* to the extent tree.
|
|
|
|
* When pass the actual path, the caller should calculate credits
|
|
|
|
* under i_data_sem.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2008-08-20 10:15:58 +08:00
|
|
|
int ext4_ext_calc_credits_for_single_extent(struct inode *inode, int nrblocks,
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
if (path) {
|
2008-08-20 10:16:05 +08:00
|
|
|
int depth = ext_depth(inode);
|
2008-08-20 10:16:03 +08:00
|
|
|
int ret = 0;
|
2008-08-20 10:16:05 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/* probably there is space in leaf? */
|
|
|
|
if (le16_to_cpu(path[depth].p_hdr->eh_entries)
|
2008-08-20 10:16:05 +08:00
|
|
|
< le16_to_cpu(path[depth].p_hdr->eh_max)) {
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2008-08-20 10:16:05 +08:00
|
|
|
/*
|
|
|
|
* There are some space in the leaf tree, no
|
|
|
|
* need to account for leaf block credit
|
|
|
|
*
|
|
|
|
* bitmaps and block group descriptor blocks
|
2011-10-09 03:53:49 +08:00
|
|
|
* and other metadata blocks still need to be
|
2008-08-20 10:16:05 +08:00
|
|
|
* accounted.
|
|
|
|
*/
|
2008-08-20 10:15:58 +08:00
|
|
|
/* 1 bitmap, 1 block group descriptor */
|
2008-08-20 10:16:05 +08:00
|
|
|
ret = 2 + EXT4_META_TRANS_BLOCKS(inode->i_sb);
|
2009-07-06 11:12:04 +08:00
|
|
|
return ret;
|
2008-08-20 10:16:05 +08:00
|
|
|
}
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2008-08-20 10:15:58 +08:00
|
|
|
return ext4_chunk_trans_blocks(inode, nrblocks);
|
2008-08-20 10:16:05 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2008-08-20 10:16:05 +08:00
|
|
|
/*
|
2013-06-05 01:01:11 +08:00
|
|
|
* How many index/leaf blocks need to change/allocate to add @extents extents?
|
2008-08-20 10:16:05 +08:00
|
|
|
*
|
2013-06-05 01:01:11 +08:00
|
|
|
* If we add a single extent, then in the worse case, each tree level
|
|
|
|
* index/leaf need to be changed in case of the tree split.
|
2008-08-20 10:16:05 +08:00
|
|
|
*
|
2013-06-05 01:01:11 +08:00
|
|
|
* If more extents are inserted, they could cause the whole tree split more
|
|
|
|
* than once, but this is really rare.
|
2008-08-20 10:16:05 +08:00
|
|
|
*/
|
2013-06-05 01:01:11 +08:00
|
|
|
int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
|
2008-08-20 10:16:05 +08:00
|
|
|
{
|
|
|
|
int index;
|
2012-12-11 03:05:51 +08:00
|
|
|
int depth;
|
|
|
|
|
|
|
|
/* If we are converting the inline data, only one is needed here. */
|
|
|
|
if (ext4_has_inline_data(inode))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2013-06-05 01:01:11 +08:00
|
|
|
if (extents <= 1)
|
2008-08-20 10:16:05 +08:00
|
|
|
index = depth * 2;
|
|
|
|
else
|
|
|
|
index = depth * 3;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2008-08-20 10:16:05 +08:00
|
|
|
return index;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
2013-06-12 23:48:29 +08:00
|
|
|
static inline int get_default_free_blocks_flags(struct inode *inode)
|
|
|
|
{
|
2017-06-22 09:36:51 +08:00
|
|
|
if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode) ||
|
|
|
|
ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE))
|
2013-06-12 23:48:29 +08:00
|
|
|
return EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET;
|
|
|
|
else if (ext4_should_journal_data(inode))
|
|
|
|
return EXT4_FREE_BLOCKS_FORGET;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
/*
|
|
|
|
* ext4_rereserve_cluster - increment the reserved cluster count when
|
|
|
|
* freeing a cluster with a pending reservation
|
|
|
|
*
|
|
|
|
* @inode - file containing the cluster
|
|
|
|
* @lblk - logical block in cluster to be reserved
|
|
|
|
*
|
|
|
|
* Increments the reserved cluster count and adjusts quota in a bigalloc
|
|
|
|
* file system when freeing a partial cluster containing at least one
|
|
|
|
* delayed and unwritten block. A partial cluster meeting that
|
|
|
|
* requirement will have a pending reservation. If so, the
|
|
|
|
* RERESERVE_CLUSTER flag is used when calling ext4_free_blocks() to
|
|
|
|
* defer reserved and allocated space accounting to a subsequent call
|
|
|
|
* to this function.
|
|
|
|
*/
|
|
|
|
static void ext4_rereserve_cluster(struct inode *inode, ext4_lblk_t lblk)
|
|
|
|
{
|
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
|
|
|
struct ext4_inode_info *ei = EXT4_I(inode);
|
|
|
|
|
|
|
|
dquot_reclaim_block(inode, EXT4_C2B(sbi, 1));
|
|
|
|
|
|
|
|
spin_lock(&ei->i_block_reservation_lock);
|
|
|
|
ei->i_reserved_data_blocks++;
|
|
|
|
percpu_counter_add(&sbi->s_dirtyclusters_counter, 1);
|
|
|
|
spin_unlock(&ei->i_block_reservation_lock);
|
|
|
|
|
|
|
|
percpu_counter_add(&sbi->s_freeclusters_counter, 1);
|
|
|
|
ext4_remove_pending(inode, lblk);
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
|
2011-09-10 06:54:51 +08:00
|
|
|
struct ext4_extent *ex,
|
2018-10-02 02:25:08 +08:00
|
|
|
struct partial_cluster *partial,
|
2011-09-10 06:54:51 +08:00
|
|
|
ext4_lblk_t from, ext4_lblk_t to)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2011-09-10 06:54:51 +08:00
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
2014-11-23 13:59:39 +08:00
|
|
|
unsigned short ee_len = ext4_ext_get_actual_len(ex);
|
2018-10-02 02:25:08 +08:00
|
|
|
ext4_fsblk_t last_pblk, pblk;
|
|
|
|
ext4_lblk_t num;
|
|
|
|
int flags;
|
|
|
|
|
|
|
|
/* only extent tail removal is allowed */
|
|
|
|
if (from < le32_to_cpu(ex->ee_block) ||
|
|
|
|
to != le32_to_cpu(ex->ee_block) + ee_len - 1) {
|
|
|
|
ext4_error(sbi->s_sb,
|
|
|
|
"strange request: removal(2) %u-%u from %u:%u",
|
|
|
|
from, to, le32_to_cpu(ex->ee_block), ee_len);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef EXTENTS_STATS
|
|
|
|
spin_lock(&sbi->s_ext_stats_lock);
|
|
|
|
sbi->s_ext_blocks += ee_len;
|
|
|
|
sbi->s_ext_extents++;
|
|
|
|
if (ee_len < sbi->s_ext_min)
|
|
|
|
sbi->s_ext_min = ee_len;
|
|
|
|
if (ee_len > sbi->s_ext_max)
|
|
|
|
sbi->s_ext_max = ee_len;
|
|
|
|
if (ext_depth(inode) > sbi->s_depth_max)
|
|
|
|
sbi->s_depth_max = ext_depth(inode);
|
|
|
|
spin_unlock(&sbi->s_ext_stats_lock);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
trace_ext4_remove_blocks(inode, ex, from, to, partial);
|
2012-09-20 02:14:53 +08:00
|
|
|
|
2011-09-10 06:54:51 +08:00
|
|
|
/*
|
2018-10-02 02:25:08 +08:00
|
|
|
* if we have a partial cluster, and it's different from the
|
|
|
|
* cluster of the last block in the extent, we free it
|
2011-09-10 06:54:51 +08:00
|
|
|
*/
|
2018-10-02 02:25:08 +08:00
|
|
|
last_pblk = ext4_ext_pblock(ex) + ee_len - 1;
|
|
|
|
|
|
|
|
if (partial->state != initial &&
|
|
|
|
partial->pclu != EXT4_B2C(sbi, last_pblk)) {
|
|
|
|
if (partial->state == tofree) {
|
|
|
|
flags = get_default_free_blocks_flags(inode);
|
|
|
|
if (ext4_is_pending(inode, partial->lblk))
|
|
|
|
flags |= EXT4_FREE_BLOCKS_RERESERVE_CLUSTER;
|
|
|
|
ext4_free_blocks(handle, inode, NULL,
|
|
|
|
EXT4_C2B(sbi, partial->pclu),
|
|
|
|
sbi->s_cluster_ratio, flags);
|
|
|
|
if (flags & EXT4_FREE_BLOCKS_RERESERVE_CLUSTER)
|
|
|
|
ext4_rereserve_cluster(inode, partial->lblk);
|
|
|
|
}
|
|
|
|
partial->state = initial;
|
|
|
|
}
|
|
|
|
|
|
|
|
num = le32_to_cpu(ex->ee_block) + ee_len - from;
|
|
|
|
pblk = ext4_ext_pblock(ex) + ee_len - num;
|
2011-09-10 06:54:51 +08:00
|
|
|
|
|
|
|
/*
|
2018-10-02 02:25:08 +08:00
|
|
|
* We free the partial cluster at the end of the extent (if any),
|
|
|
|
* unless the cluster is used by another extent (partial_cluster
|
|
|
|
* state is nofree). If a partial cluster exists here, it must be
|
|
|
|
* shared with the last block in the extent.
|
2011-09-10 06:54:51 +08:00
|
|
|
*/
|
2018-10-02 02:25:08 +08:00
|
|
|
flags = get_default_free_blocks_flags(inode);
|
|
|
|
|
|
|
|
/* partial, left end cluster aligned, right end unaligned */
|
|
|
|
if ((EXT4_LBLK_COFF(sbi, to) != sbi->s_cluster_ratio - 1) &&
|
|
|
|
(EXT4_LBLK_CMASK(sbi, to) >= from) &&
|
|
|
|
(partial->state != nofree)) {
|
|
|
|
if (ext4_is_pending(inode, to))
|
|
|
|
flags |= EXT4_FREE_BLOCKS_RERESERVE_CLUSTER;
|
2011-09-10 06:54:51 +08:00
|
|
|
ext4_free_blocks(handle, inode, NULL,
|
2018-10-02 02:25:08 +08:00
|
|
|
EXT4_PBLK_CMASK(sbi, last_pblk),
|
2011-09-10 06:54:51 +08:00
|
|
|
sbi->s_cluster_ratio, flags);
|
2018-10-02 02:25:08 +08:00
|
|
|
if (flags & EXT4_FREE_BLOCKS_RERESERVE_CLUSTER)
|
|
|
|
ext4_rereserve_cluster(inode, to);
|
|
|
|
partial->state = initial;
|
|
|
|
flags = get_default_free_blocks_flags(inode);
|
2011-09-10 06:54:51 +08:00
|
|
|
}
|
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
|
2013-05-28 11:33:35 +08:00
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
/*
|
|
|
|
* For bigalloc file systems, we never free a partial cluster
|
|
|
|
* at the beginning of the extent. Instead, we check to see if we
|
|
|
|
* need to free it on a subsequent call to ext4_remove_blocks,
|
|
|
|
* or at the end of ext4_ext_rm_leaf or ext4_ext_remove_space.
|
|
|
|
*/
|
|
|
|
flags |= EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER;
|
|
|
|
ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
|
|
|
|
|
|
|
|
/* reset the partial cluster if we've freed past it */
|
|
|
|
if (partial->state != initial && partial->pclu != EXT4_B2C(sbi, pblk))
|
|
|
|
partial->state = initial;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we've freed the entire extent but the beginning is not left
|
|
|
|
* cluster aligned and is not marked as ineligible for freeing we
|
|
|
|
* record the partial cluster at the beginning of the extent. It
|
|
|
|
* wasn't freed by the preceding ext4_free_blocks() call, and we
|
|
|
|
* need to look farther to the left to determine if it's to be freed
|
|
|
|
* (not shared with another extent). Else, reset the partial
|
|
|
|
* cluster - we're either done freeing or the beginning of the
|
|
|
|
* extent is left cluster aligned.
|
|
|
|
*/
|
|
|
|
if (EXT4_LBLK_COFF(sbi, from) && num == ee_len) {
|
|
|
|
if (partial->state == initial) {
|
|
|
|
partial->pclu = EXT4_B2C(sbi, pblk);
|
|
|
|
partial->lblk = from;
|
|
|
|
partial->state = tofree;
|
2014-11-23 13:59:39 +08:00
|
|
|
}
|
2018-10-02 02:25:08 +08:00
|
|
|
} else {
|
|
|
|
partial->state = initial;
|
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-05-25 19:41:43 +08:00
|
|
|
/*
|
|
|
|
* ext4_ext_rm_leaf() Removes the extents associated with the
|
2014-11-23 13:58:11 +08:00
|
|
|
* blocks appearing between "start" and "end". Both "start"
|
|
|
|
* and "end" must appear in the same extent or EIO is returned.
|
2011-05-25 19:41:43 +08:00
|
|
|
*
|
|
|
|
* @handle: The journal handle
|
|
|
|
* @inode: The files inode
|
|
|
|
* @path: The path to the leaf
|
2013-05-28 11:33:35 +08:00
|
|
|
* @partial_cluster: The cluster which we'll have to free if all extents
|
2014-11-23 13:58:11 +08:00
|
|
|
* has been released from it. However, if this value is
|
|
|
|
* negative, it's a cluster just to the right of the
|
|
|
|
* punched region and it must not be freed.
|
2011-05-25 19:41:43 +08:00
|
|
|
* @start: The first block to remove
|
|
|
|
* @end: The last block to remove
|
|
|
|
*/
|
2006-10-11 16:21:03 +08:00
|
|
|
static int
|
|
|
|
ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
|
2013-05-28 11:33:35 +08:00
|
|
|
struct ext4_ext_path *path,
|
2018-10-02 02:25:08 +08:00
|
|
|
struct partial_cluster *partial,
|
2011-09-10 06:54:51 +08:00
|
|
|
ext4_lblk_t start, ext4_lblk_t end)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2011-09-10 06:54:51 +08:00
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
2006-10-11 16:21:03 +08:00
|
|
|
int err = 0, correct_index = 0;
|
2019-11-06 00:44:29 +08:00
|
|
|
int depth = ext_depth(inode), credits, revoke_credits;
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_extent_header *eh;
|
2011-10-25 17:35:05 +08:00
|
|
|
ext4_lblk_t a, b;
|
2008-01-29 12:58:27 +08:00
|
|
|
unsigned num;
|
|
|
|
ext4_lblk_t ex_ee_block;
|
2006-10-11 16:21:03 +08:00
|
|
|
unsigned short ex_ee_len;
|
2014-04-21 11:45:47 +08:00
|
|
|
unsigned unwritten = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
struct ext4_extent *ex;
|
2013-05-28 11:33:35 +08:00
|
|
|
ext4_fsblk_t pblk;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2007-07-18 21:19:09 +08:00
|
|
|
/* the header must be checked already in ext4_ext_remove_space() */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "truncate since %u in leaf to %u\n", start, end);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (!path[depth].p_hdr)
|
|
|
|
path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
|
|
|
|
eh = path[depth].p_hdr;
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path[depth].p_hdr == NULL)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2010-03-03 00:46:09 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
/* find where to start removing */
|
2013-07-01 20:12:41 +08:00
|
|
|
ex = path[depth].p_ext;
|
|
|
|
if (!ex)
|
|
|
|
ex = EXT_LAST_EXTENT(eh);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
ex_ee_block = le32_to_cpu(ex->ee_block);
|
2007-07-18 09:42:41 +08:00
|
|
|
ex_ee_len = ext4_ext_get_actual_len(ex);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
trace_ext4_ext_rm_leaf(inode, start, ex, partial);
|
2011-09-10 07:18:51 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
while (ex >= EXT_FIRST_EXTENT(eh) &&
|
|
|
|
ex_ee_block + ex_ee_len > start) {
|
2009-06-11 02:22:55 +08:00
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
if (ext4_ext_is_unwritten(ex))
|
|
|
|
unwritten = 1;
|
2009-06-11 02:22:55 +08:00
|
|
|
else
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = 0;
|
2009-06-11 02:22:55 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "remove ext %u:[%d]%d\n", ex_ee_block,
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten, ex_ee_len);
|
2006-10-11 16:21:03 +08:00
|
|
|
path[depth].p_ext = ex;
|
|
|
|
|
2022-08-17 10:59:28 +08:00
|
|
|
a = max(ex_ee_block, start);
|
|
|
|
b = min(ex_ee_block + ex_ee_len - 1, end);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, " border %u:%u\n", a, b);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2011-05-25 19:41:43 +08:00
|
|
|
/* If this extent is beyond the end of the hole, skip it */
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
if (end < ex_ee_block) {
|
2013-05-28 11:33:35 +08:00
|
|
|
/*
|
|
|
|
* We're going to skip this extent and move to another,
|
2014-11-23 13:55:42 +08:00
|
|
|
* so note that its first cluster is in use to avoid
|
|
|
|
* freeing it when removing blocks. Eventually, the
|
|
|
|
* right edge of the truncated/punched region will
|
|
|
|
* be just to the left.
|
2013-05-28 11:33:35 +08:00
|
|
|
*/
|
2014-11-23 13:55:42 +08:00
|
|
|
if (sbi->s_cluster_ratio > 1) {
|
|
|
|
pblk = ext4_ext_pblock(ex);
|
2018-10-02 02:25:08 +08:00
|
|
|
partial->pclu = EXT4_B2C(sbi, pblk);
|
|
|
|
partial->state = nofree;
|
2014-11-23 13:55:42 +08:00
|
|
|
}
|
2011-05-25 19:41:43 +08:00
|
|
|
ex--;
|
|
|
|
ex_ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ex_ee_len = ext4_ext_get_actual_len(ex);
|
|
|
|
continue;
|
2011-10-25 17:35:05 +08:00
|
|
|
} else if (b != ex_ee_block + ex_ee_len - 1) {
|
2012-03-20 11:07:43 +08:00
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"can not handle truncate %u:%u "
|
|
|
|
"on extent %u:%u",
|
|
|
|
start, end, ex_ee_block,
|
|
|
|
ex_ee_block + ex_ee_len - 1);
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2011-10-25 17:35:05 +08:00
|
|
|
goto out;
|
2006-10-11 16:21:03 +08:00
|
|
|
} else if (a != ex_ee_block) {
|
|
|
|
/* remove tail of the extent */
|
2011-10-25 17:35:05 +08:00
|
|
|
num = a - ex_ee_block;
|
2006-10-11 16:21:03 +08:00
|
|
|
} else {
|
|
|
|
/* remove whole extent: excellent! */
|
|
|
|
num = 0;
|
|
|
|
}
|
2008-08-02 09:59:19 +08:00
|
|
|
/*
|
|
|
|
* 3 for leaf, sb, and inode plus 2 (bmap and group
|
|
|
|
* descriptor) for each block group; assume two block
|
|
|
|
* groups plus ex_ee_len/blocks_per_block_group for
|
|
|
|
* the worst case
|
|
|
|
*/
|
|
|
|
credits = 7 + 2*(ex_ee_len/EXT4_BLOCKS_PER_GROUP(inode->i_sb));
|
2006-10-11 16:21:03 +08:00
|
|
|
if (ex == EXT_FIRST_EXTENT(eh)) {
|
|
|
|
correct_index = 1;
|
|
|
|
credits += (ext_depth(inode)) + 1;
|
|
|
|
}
|
2009-12-09 11:42:15 +08:00
|
|
|
credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);
|
2019-11-06 00:44:29 +08:00
|
|
|
/*
|
|
|
|
* We may end up freeing some index blocks and data from the
|
|
|
|
* punched range. Note that partial clusters are accounted for
|
|
|
|
* by ext4_free_data_revoke_credits().
|
|
|
|
*/
|
|
|
|
revoke_credits =
|
|
|
|
ext4_free_metadata_revoke_credits(inode->i_sb,
|
|
|
|
ext_depth(inode)) +
|
|
|
|
ext4_free_data_revoke_credits(inode, b - a + 1);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2019-11-06 00:44:16 +08:00
|
|
|
err = ext4_datasem_ensure_credits(handle, inode, credits,
|
2019-11-06 00:44:29 +08:00
|
|
|
credits, revoke_credits);
|
2019-11-06 00:44:16 +08:00
|
|
|
if (err) {
|
|
|
|
if (err > 0)
|
|
|
|
err = -EAGAIN;
|
2006-10-11 16:21:03 +08:00
|
|
|
goto out;
|
2019-11-06 00:44:16 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
err = ext4_remove_blocks(handle, inode, ex, partial, a, b);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2011-10-25 17:35:05 +08:00
|
|
|
if (num == 0)
|
2006-10-11 16:21:07 +08:00
|
|
|
/* this extent is removed; mark slot entirely unused */
|
2006-10-11 16:21:05 +08:00
|
|
|
ext4_ext_store_pblock(ex, 0);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
ex->ee_len = cpu_to_le16(num);
|
2007-07-18 21:02:56 +08:00
|
|
|
/*
|
2014-04-21 11:45:47 +08:00
|
|
|
* Do not mark unwritten if all the blocks in the
|
2007-07-18 21:02:56 +08:00
|
|
|
* extent have been removed.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten && num)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2011-05-25 19:41:43 +08:00
|
|
|
/*
|
|
|
|
* If the extent was completely released,
|
|
|
|
* we need to remove it from the leaf
|
|
|
|
*/
|
|
|
|
if (num == 0) {
|
2011-06-06 12:05:17 +08:00
|
|
|
if (end != EXT_MAX_BLOCKS - 1) {
|
2011-05-25 19:41:43 +08:00
|
|
|
/*
|
|
|
|
* For hole punching, we need to scoot all the
|
|
|
|
* extents up when an extent is removed so that
|
|
|
|
* we dont have blank extents in the middle
|
|
|
|
*/
|
|
|
|
memmove(ex, ex+1, (EXT_LAST_EXTENT(eh) - ex) *
|
|
|
|
sizeof(struct ext4_extent));
|
|
|
|
|
|
|
|
/* Now get rid of the one at the end */
|
|
|
|
memset(EXT_LAST_EXTENT(eh), 0,
|
|
|
|
sizeof(struct ext4_extent));
|
|
|
|
}
|
|
|
|
le16_add_cpu(&eh->eh_entries, -1);
|
2014-11-23 13:58:11 +08:00
|
|
|
}
|
2011-05-25 19:41:43 +08:00
|
|
|
|
2011-10-25 17:35:05 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "new extent: %u:%u:%llu\n", ex_ee_block, num,
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_ext_pblock(ex));
|
2006-10-11 16:21:03 +08:00
|
|
|
ex--;
|
|
|
|
ex_ee_block = le32_to_cpu(ex->ee_block);
|
2007-07-18 09:42:41 +08:00
|
|
|
ex_ee_len = ext4_ext_get_actual_len(ex);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (correct_index && eh->eh_entries)
|
|
|
|
err = ext4_ext_correct_indexes(handle, inode, path);
|
|
|
|
|
2011-09-10 06:54:51 +08:00
|
|
|
/*
|
2014-04-02 07:49:30 +08:00
|
|
|
* If there's a partial cluster and at least one extent remains in
|
|
|
|
* the leaf, free the partial cluster if it isn't shared with the
|
2014-11-23 13:58:11 +08:00
|
|
|
* current extent. If it is shared with the current extent
|
2018-10-02 02:25:08 +08:00
|
|
|
* we reset the partial cluster because we've reached the start of the
|
2014-11-23 13:58:11 +08:00
|
|
|
* truncated/punched region and we're done removing blocks.
|
2011-09-10 06:54:51 +08:00
|
|
|
*/
|
2018-10-02 02:25:08 +08:00
|
|
|
if (partial->state == tofree && ex >= EXT_FIRST_EXTENT(eh)) {
|
2014-11-23 13:58:11 +08:00
|
|
|
pblk = ext4_ext_pblock(ex) + ex_ee_len - 1;
|
2018-10-02 02:25:08 +08:00
|
|
|
if (partial->pclu != EXT4_B2C(sbi, pblk)) {
|
|
|
|
int flags = get_default_free_blocks_flags(inode);
|
|
|
|
|
|
|
|
if (ext4_is_pending(inode, partial->lblk))
|
|
|
|
flags |= EXT4_FREE_BLOCKS_RERESERVE_CLUSTER;
|
2014-11-23 13:58:11 +08:00
|
|
|
ext4_free_blocks(handle, inode, NULL,
|
2018-10-02 02:25:08 +08:00
|
|
|
EXT4_C2B(sbi, partial->pclu),
|
|
|
|
sbi->s_cluster_ratio, flags);
|
|
|
|
if (flags & EXT4_FREE_BLOCKS_RERESERVE_CLUSTER)
|
|
|
|
ext4_rereserve_cluster(inode, partial->lblk);
|
2014-11-23 13:58:11 +08:00
|
|
|
}
|
2018-10-02 02:25:08 +08:00
|
|
|
partial->state = initial;
|
2011-09-10 06:54:51 +08:00
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/* if this leaf is free, then we should
|
|
|
|
* remove it from index block above */
|
|
|
|
if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL)
|
2012-12-17 22:55:39 +08:00
|
|
|
err = ext4_ext_rm_idx(handle, inode, path, depth);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
out:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* ext4_ext_more_to_rm:
|
|
|
|
* returns 1 if current index has to be freed (even partial)
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2006-12-07 12:41:36 +08:00
|
|
|
static int
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_ext_more_to_rm(struct ext4_ext_path *path)
|
|
|
|
{
|
|
|
|
BUG_ON(path->p_idx == NULL);
|
|
|
|
|
|
|
|
if (path->p_idx < EXT_FIRST_INDEX(path->p_hdr))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* if truncate on deeper level happened, it wasn't partial,
|
2006-10-11 16:21:03 +08:00
|
|
|
* so we have to consider current index for truncation
|
|
|
|
*/
|
|
|
|
if (le16_to_cpu(path->p_hdr->eh_entries) == path->p_block)
|
|
|
|
return 0;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2013-04-04 00:45:17 +08:00
|
|
|
int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
|
|
|
|
ext4_lblk_t end)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
2014-11-23 13:55:42 +08:00
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
2006-10-11 16:21:03 +08:00
|
|
|
int depth = ext_depth(inode);
|
2012-07-23 10:49:08 +08:00
|
|
|
struct ext4_ext_path *path = NULL;
|
2018-10-02 02:25:08 +08:00
|
|
|
struct partial_cluster partial;
|
2006-10-11 16:21:03 +08:00
|
|
|
handle_t *handle;
|
2012-10-01 11:03:50 +08:00
|
|
|
int i = 0, err = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
partial.pclu = 0;
|
|
|
|
partial.lblk = 0;
|
|
|
|
partial.state = initial;
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "truncate since %u to %u\n", start, end);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* probably first extent we're gonna free will be last in block */
|
2019-11-06 00:44:29 +08:00
|
|
|
handle = ext4_journal_start_with_revoke(inode, EXT4_HT_TRUNCATE,
|
|
|
|
depth + 1,
|
|
|
|
ext4_free_metadata_revoke_credits(inode->i_sb, depth));
|
2006-10-11 16:21:03 +08:00
|
|
|
if (IS_ERR(handle))
|
|
|
|
return PTR_ERR(handle);
|
|
|
|
|
2010-05-17 13:00:00 +08:00
|
|
|
again:
|
2013-05-28 11:32:35 +08:00
|
|
|
trace_ext4_ext_remove_space(inode, start, end, depth);
|
2011-09-10 07:18:51 +08:00
|
|
|
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
/*
|
|
|
|
* Check if we are removing extents inside the extent tree. If that
|
|
|
|
* is the case, we are going to punch a hole inside the extent tree
|
|
|
|
* so we have to check whether we need to split the extent covering
|
|
|
|
* the last block to remove so we can easily remove the part of it
|
|
|
|
* in ext4_ext_rm_leaf().
|
|
|
|
*/
|
|
|
|
if (end < EXT_MAX_BLOCKS - 1) {
|
|
|
|
struct ext4_extent *ex;
|
2014-11-23 13:55:42 +08:00
|
|
|
ext4_lblk_t ee_block, ex_end, lblk;
|
|
|
|
ext4_fsblk_t pblk;
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
|
2014-11-23 13:55:42 +08:00
|
|
|
/* find extent for or closest extent to this block */
|
2020-05-08 01:50:28 +08:00
|
|
|
path = ext4_find_extent(inode, end, NULL,
|
|
|
|
EXT4_EX_NOCACHE | EXT4_EX_NOFAIL);
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
if (IS_ERR(path)) {
|
|
|
|
ext4_journal_stop(handle);
|
|
|
|
return PTR_ERR(path);
|
|
|
|
}
|
|
|
|
depth = ext_depth(inode);
|
2012-10-01 11:03:50 +08:00
|
|
|
/* Leaf not may not exist only if inode has no blocks at all */
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
ex = path[depth].p_ext;
|
2012-07-23 10:49:08 +08:00
|
|
|
if (!ex) {
|
2012-10-01 11:03:50 +08:00
|
|
|
if (depth) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"path[%d].p_hdr == NULL",
|
|
|
|
depth);
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2012-10-01 11:03:50 +08:00
|
|
|
}
|
|
|
|
goto out;
|
2012-07-23 10:49:08 +08:00
|
|
|
}
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
2014-11-23 13:55:42 +08:00
|
|
|
ex_end = ee_block + ext4_ext_get_actual_len(ex) - 1;
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* See if the last block is inside the extent, if so split
|
|
|
|
* the extent at 'end' block so we can easily remove the
|
|
|
|
* tail of the first part of the split extent in
|
|
|
|
* ext4_ext_rm_leaf().
|
|
|
|
*/
|
2014-11-23 13:55:42 +08:00
|
|
|
if (end >= ee_block && end < ex_end) {
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're going to split the extent, note that
|
|
|
|
* the cluster containing the block after 'end' is
|
|
|
|
* in use to avoid freeing it when removing blocks.
|
|
|
|
*/
|
|
|
|
if (sbi->s_cluster_ratio > 1) {
|
ext4: fix partial cluster initialization when splitting extent
Fix the bug when calculating the physical block number of the first
block in the split extent.
This bug will cause xfstests shared/298 failure on ext4 with bigalloc
enabled occasionally. Ext4 error messages indicate that previously freed
blocks are being freed again, and the following fsck will fail due to
the inconsistency of block bitmap and bg descriptor.
The following is an example case:
1. First, Initialize a ext4 filesystem with cluster size '16K', block size
'4K', in which case, one cluster contains four blocks.
2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
tree of this file is like:
...
36864:[0]4:220160
36868:[0]14332:145408
51200:[0]2:231424
...
3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
like:
..
ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
...
4. Then the extent tree of this file after punching is like
...
49507:[0]37:158047
49547:[0]58:158087
...
5. Detailed procedure of punching hole [49544, 49546]
5.1. The block address space:
```
lblk ~49505 49506 49507~49543 49544~49546 49547~
---------+------+-------------+----------------+--------
extent | hole | extent | hole | extent
---------+------+-------------+----------------+--------
pblk ~158045 158046 158047~158083 158084~158086 158087~
```
5.2. The detailed layout of cluster 39521:
```
cluster 39521
<------------------------------->
hole extent
<----------------------><--------
lblk 49544 49545 49546 49547
+-------+-------+-------+-------+
| | | | |
+-------+-------+-------+-------+
pblk 158084 1580845 158086 158087
```
5.3. The ftrace output when punching hole [49544, 49546]:
- ext4_ext_remove_space (start 49544, end 49546)
- ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
- ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
- ext4_free_blocks: (block 158084 count 4)
- ext4_mballoc_free (extent 1/6753/1)
5.4. Ext4 error message in dmesg:
EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters
In this case, the whole cluster 39521 is freed mistakenly when freeing
pblock 158084~158086 (i.e., the first three blocks of this cluster),
although pblock 158087 (the last remaining block of this cluster) has
not been freed yet.
The root cause of this isuue is that, the pclu of the partial cluster is
calculated mistakenly in ext4_ext_remove_space(). The correct
partial_cluster.pclu (i.e., the cluster number of the first block in the
next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
than 39522.
Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Eric Whitney <enwlinux@gmail.com>
Cc: stable@kernel.org # v3.19+
Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-22 12:18:44 +08:00
|
|
|
pblk = ext4_ext_pblock(ex) + end - ee_block + 1;
|
2018-10-02 02:25:08 +08:00
|
|
|
partial.pclu = EXT4_B2C(sbi, pblk);
|
|
|
|
partial.state = nofree;
|
2014-11-23 13:55:42 +08:00
|
|
|
}
|
|
|
|
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
/*
|
|
|
|
* Split the extent in two so that 'end' is the last
|
2013-04-10 10:11:22 +08:00
|
|
|
* block in the first new extent. Also we should not
|
|
|
|
* fail removing space due to ENOSPC so try to use
|
|
|
|
* reserved block if that happens.
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
*/
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_force_split_extent_at(handle, inode, &path,
|
2014-08-31 11:52:19 +08:00
|
|
|
end + 1, 1);
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto out;
|
2014-11-23 13:55:42 +08:00
|
|
|
|
2019-03-01 12:34:11 +08:00
|
|
|
} else if (sbi->s_cluster_ratio > 1 && end >= ex_end &&
|
|
|
|
partial.state == initial) {
|
2014-11-23 13:55:42 +08:00
|
|
|
/*
|
2019-03-01 12:34:11 +08:00
|
|
|
* If we're punching, there's an extent to the right.
|
|
|
|
* If the partial cluster hasn't been set, set it to
|
|
|
|
* that extent's first cluster and its state to nofree
|
|
|
|
* so it won't be freed should it contain blocks to be
|
|
|
|
* removed. If it's already set (tofree/nofree), we're
|
|
|
|
* retrying and keep the original partial cluster info
|
|
|
|
* so a cluster marked tofree as a result of earlier
|
|
|
|
* extent removal is not lost.
|
2014-11-23 13:55:42 +08:00
|
|
|
*/
|
|
|
|
lblk = ex_end + 1;
|
|
|
|
err = ext4_ext_search_right(inode, path, &lblk, &pblk,
|
2020-10-28 13:56:17 +08:00
|
|
|
NULL);
|
|
|
|
if (err < 0)
|
2014-11-23 13:55:42 +08:00
|
|
|
goto out;
|
2018-10-02 02:25:08 +08:00
|
|
|
if (pblk) {
|
|
|
|
partial.pclu = EXT4_B2C(sbi, pblk);
|
|
|
|
partial.state = nofree;
|
|
|
|
}
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
}
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* We start scanning from right side, freeing all the blocks
|
|
|
|
* after i_size and walking into the tree depth-wise.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2010-05-17 13:00:00 +08:00
|
|
|
depth = ext_depth(inode);
|
2012-07-23 10:49:08 +08:00
|
|
|
if (path) {
|
|
|
|
int k = i = depth;
|
|
|
|
while (--k > 0)
|
|
|
|
path[k].p_block =
|
|
|
|
le16_to_cpu(path[k].p_hdr->eh_entries)+1;
|
|
|
|
} else {
|
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:03:40 +08:00
|
|
|
path = kcalloc(depth + 1, sizeof(struct ext4_ext_path),
|
2020-05-08 01:50:28 +08:00
|
|
|
GFP_NOFS | __GFP_NOFAIL);
|
2012-07-23 10:49:08 +08:00
|
|
|
if (path == NULL) {
|
|
|
|
ext4_journal_stop(handle);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2014-09-02 02:40:09 +08:00
|
|
|
path[0].p_maxdepth = path[0].p_depth = depth;
|
2012-07-23 10:49:08 +08:00
|
|
|
path[0].p_hdr = ext_inode_hdr(inode);
|
2012-08-17 20:54:52 +08:00
|
|
|
i = 0;
|
ext4: rewrite punch hole to use ext4_ext_remove_space()
This commit rewrites ext4 punch hole implementation to use
ext4_ext_remove_space() instead of its home gown way of doing this via
ext4_ext_map_blocks(). There are several reasons for changing this.
Firstly it is quite non obvious that punching hole needs to
ext4_ext_map_blocks() to punch a hole, especially given that this
function should map blocks, not unmap it. It also required a lot of new
code in ext4_ext_map_blocks().
Secondly the design of it is not very effective. The reason is that we
are trying to punch out blocks in ext4_ext_punch_hole() in opposite
direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf()
to iterate through the whole tree from the end to the start to find the
requested extent for every extent we are going to punch out.
And finally the current implementation does not use the existing code,
but bring a lot of new code, which is IMO unnecessary since there
already is some infrastructure we can use. Specifically
ext4_ext_remove_space().
This commit changes ext4_ext_remove_space() to accept 'end' parameter so
we can not only truncate to the end of file, but also remove the space
in the middle of the file (punch a hole). Moreover, because the last
block to punch out, might be in the middle of the extent, we have to
split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either
remove the whole fist part of split extent, or change its size.
ext4_ext_remove_space() is then used to actually remove the space
(extents) from within the hole, instead of ext4_ext_map_blocks().
Note that this also fix the issue with punch hole, where we would forget
to remove empty index blocks from the extent tree, resulting in double
free block error and file system corruption. This is simply because we
now use different code path, where this problem does not exist.
This has been tested with fsx running for several days and xfstests,
plus xfstest #251 with '-o discard' run on the loop image (which
converts discard requestes into punch hole to the backing file). All of
it on 1K and 4K file system block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20 11:03:19 +08:00
|
|
|
|
2013-08-17 09:21:41 +08:00
|
|
|
if (ext4_ext_check(inode, path[0].p_hdr, depth, 0)) {
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2012-07-23 10:49:08 +08:00
|
|
|
goto out;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2012-07-23 10:49:08 +08:00
|
|
|
err = 0;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
while (i >= 0 && err == 0) {
|
|
|
|
if (i == depth) {
|
|
|
|
/* this is leaf block */
|
2011-05-25 19:41:43 +08:00
|
|
|
err = ext4_ext_rm_leaf(handle, inode, path,
|
2018-10-02 02:25:08 +08:00
|
|
|
&partial, start, end);
|
2006-10-11 16:21:07 +08:00
|
|
|
/* root level has p_bh == NULL, brelse() eats this */
|
2006-10-11 16:21:03 +08:00
|
|
|
brelse(path[i].p_bh);
|
|
|
|
path[i].p_bh = NULL;
|
|
|
|
i--;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* this is index block */
|
|
|
|
if (!path[i].p_hdr) {
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "initialize header\n");
|
2006-10-11 16:21:03 +08:00
|
|
|
path[i].p_hdr = ext_block_hdr(path[i].p_bh);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!path[i].p_idx) {
|
2006-10-11 16:21:07 +08:00
|
|
|
/* this level hasn't been touched yet */
|
2006-10-11 16:21:03 +08:00
|
|
|
path[i].p_idx = EXT_LAST_INDEX(path[i].p_hdr);
|
|
|
|
path[i].p_block = le16_to_cpu(path[i].p_hdr->eh_entries)+1;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "init index ptr: hdr 0x%p, num %d\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
path[i].p_hdr,
|
|
|
|
le16_to_cpu(path[i].p_hdr->eh_entries));
|
|
|
|
} else {
|
2006-10-11 16:21:07 +08:00
|
|
|
/* we were already here, see at next index */
|
2006-10-11 16:21:03 +08:00
|
|
|
path[i].p_idx--;
|
|
|
|
}
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "level %d - index, first 0x%p, cur 0x%p\n",
|
2006-10-11 16:21:03 +08:00
|
|
|
i, EXT_FIRST_INDEX(path[i].p_hdr),
|
|
|
|
path[i].p_idx);
|
|
|
|
if (ext4_ext_more_to_rm(path + i)) {
|
2007-07-18 21:19:09 +08:00
|
|
|
struct buffer_head *bh;
|
2006-10-11 16:21:03 +08:00
|
|
|
/* go to the next level */
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "move to level %d (block %llu)\n",
|
2010-10-28 09:30:14 +08:00
|
|
|
i + 1, ext4_idx_pblock(path[i].p_idx));
|
2006-10-11 16:21:03 +08:00
|
|
|
memset(path + i + 1, 0, sizeof(*path));
|
2021-09-08 20:08:49 +08:00
|
|
|
bh = read_extent_tree_block(inode, path[i].p_idx,
|
|
|
|
depth - i - 1,
|
|
|
|
EXT4_EX_NOCACHE);
|
2013-08-17 09:20:41 +08:00
|
|
|
if (IS_ERR(bh)) {
|
2006-10-11 16:21:03 +08:00
|
|
|
/* should we reset i_size? */
|
2013-08-17 09:20:41 +08:00
|
|
|
err = PTR_ERR(bh);
|
2006-10-11 16:21:03 +08:00
|
|
|
break;
|
|
|
|
}
|
2013-07-16 00:27:47 +08:00
|
|
|
/* Yield here to deal with large extent trees.
|
|
|
|
* Should be a no-op if we did IO above. */
|
|
|
|
cond_resched();
|
2007-07-18 21:19:09 +08:00
|
|
|
if (WARN_ON(i + 1 > depth)) {
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2007-07-18 21:19:09 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
path[i + 1].p_bh = bh;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* save actual number of indexes since this
|
|
|
|
* number is changed at the next iteration */
|
2006-10-11 16:21:03 +08:00
|
|
|
path[i].p_block = le16_to_cpu(path[i].p_hdr->eh_entries);
|
|
|
|
i++;
|
|
|
|
} else {
|
2006-10-11 16:21:07 +08:00
|
|
|
/* we finished processing this index, go up */
|
2006-10-11 16:21:03 +08:00
|
|
|
if (path[i].p_hdr->eh_entries == 0 && i > 0) {
|
2006-10-11 16:21:07 +08:00
|
|
|
/* index is empty, remove it;
|
2006-10-11 16:21:03 +08:00
|
|
|
* handle must be already prepared by the
|
|
|
|
* truncatei_leaf() */
|
2012-12-17 22:55:39 +08:00
|
|
|
err = ext4_ext_rm_idx(handle, inode, path, i);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2006-10-11 16:21:07 +08:00
|
|
|
/* root level has p_bh == NULL, brelse() eats this */
|
2006-10-11 16:21:03 +08:00
|
|
|
brelse(path[i].p_bh);
|
|
|
|
path[i].p_bh = NULL;
|
|
|
|
i--;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "return to level %d\n", i);
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-10-02 02:25:08 +08:00
|
|
|
trace_ext4_ext_remove_space_done(inode, start, end, depth, &partial,
|
|
|
|
path->p_hdr->eh_entries);
|
2011-09-10 07:18:51 +08:00
|
|
|
|
2014-11-23 13:59:39 +08:00
|
|
|
/*
|
2018-10-02 02:25:08 +08:00
|
|
|
* if there's a partial cluster and we have removed the first extent
|
|
|
|
* in the file, then we also free the partial cluster, if any
|
2014-11-23 13:59:39 +08:00
|
|
|
*/
|
2018-10-02 02:25:08 +08:00
|
|
|
if (partial.state == tofree && err == 0) {
|
|
|
|
int flags = get_default_free_blocks_flags(inode);
|
|
|
|
|
|
|
|
if (ext4_is_pending(inode, partial.lblk))
|
|
|
|
flags |= EXT4_FREE_BLOCKS_RERESERVE_CLUSTER;
|
2011-09-10 07:04:51 +08:00
|
|
|
ext4_free_blocks(handle, inode, NULL,
|
2018-10-02 02:25:08 +08:00
|
|
|
EXT4_C2B(sbi, partial.pclu),
|
|
|
|
sbi->s_cluster_ratio, flags);
|
|
|
|
if (flags & EXT4_FREE_BLOCKS_RERESERVE_CLUSTER)
|
|
|
|
ext4_rereserve_cluster(inode, partial.lblk);
|
|
|
|
partial.state = initial;
|
2011-09-10 07:04:51 +08:00
|
|
|
}
|
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/* TODO: flexible tree reduction should be here */
|
|
|
|
if (path->p_hdr->eh_entries == 0) {
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* truncate to zero freed all the tree,
|
|
|
|
* so we need to correct eh_depth
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
err = ext4_ext_get_access(handle, inode, path);
|
|
|
|
if (err == 0) {
|
|
|
|
ext_inode_hdr(inode)->eh_depth = 0;
|
|
|
|
ext_inode_hdr(inode)->eh_max =
|
2009-08-28 22:40:33 +08:00
|
|
|
cpu_to_le16(ext4_ext_space_root(inode, 0));
|
2006-10-11 16:21:03 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
out:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2014-09-02 02:39:09 +08:00
|
|
|
path = NULL;
|
2014-09-02 02:37:09 +08:00
|
|
|
if (err == -EAGAIN)
|
|
|
|
goto again;
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_journal_stop(handle);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* called at mount time
|
|
|
|
*/
|
|
|
|
void ext4_ext_init(struct super_block *sb)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* possible initialization would be here
|
|
|
|
*/
|
|
|
|
|
2015-10-18 04:18:43 +08:00
|
|
|
if (ext4_has_feature_extents(sb)) {
|
2009-09-30 03:51:30 +08:00
|
|
|
#if defined(AGGRESSIVE_TEST) || defined(CHECK_BINSEARCH) || defined(EXTENTS_STATS)
|
2012-03-20 11:41:49 +08:00
|
|
|
printk(KERN_INFO "EXT4-fs: file extents enabled"
|
2007-02-18 02:20:16 +08:00
|
|
|
#ifdef AGGRESSIVE_TEST
|
2012-03-20 11:41:49 +08:00
|
|
|
", aggressive tests"
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
#ifdef CHECK_BINSEARCH
|
2012-03-20 11:41:49 +08:00
|
|
|
", check binsearch"
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
|
|
|
#ifdef EXTENTS_STATS
|
2012-03-20 11:41:49 +08:00
|
|
|
", stats"
|
2006-10-11 16:21:03 +08:00
|
|
|
#endif
|
2012-03-20 11:41:49 +08:00
|
|
|
"\n");
|
2009-09-30 03:51:30 +08:00
|
|
|
#endif
|
2006-10-11 16:21:03 +08:00
|
|
|
#ifdef EXTENTS_STATS
|
|
|
|
spin_lock_init(&EXT4_SB(sb)->s_ext_stats_lock);
|
|
|
|
EXT4_SB(sb)->s_ext_min = 1 << 30;
|
|
|
|
EXT4_SB(sb)->s_ext_max = 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* called at umount time
|
|
|
|
*/
|
|
|
|
void ext4_ext_release(struct super_block *sb)
|
|
|
|
{
|
2015-10-18 04:18:43 +08:00
|
|
|
if (!ext4_has_feature_extents(sb))
|
2006-10-11 16:21:03 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
#ifdef EXTENTS_STATS
|
|
|
|
if (EXT4_SB(sb)->s_ext_blocks && EXT4_SB(sb)->s_ext_extents) {
|
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
|
|
|
printk(KERN_ERR "EXT4-fs: %lu blocks in %lu extents (%lu ave)\n",
|
|
|
|
sbi->s_ext_blocks, sbi->s_ext_extents,
|
|
|
|
sbi->s_ext_blocks / sbi->s_ext_extents);
|
|
|
|
printk(KERN_ERR "EXT4-fs: extents: %lu min, %lu max, max depth %lu\n",
|
|
|
|
sbi->s_ext_min, sbi->s_ext_max, sbi->s_depth_max);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2013-08-29 02:47:06 +08:00
|
|
|
static int ext4_zeroout_es(struct inode *inode, struct ext4_extent *ex)
|
|
|
|
{
|
|
|
|
ext4_lblk_t ee_block;
|
|
|
|
ext4_fsblk_t ee_pblock;
|
|
|
|
unsigned int ee_len;
|
|
|
|
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
|
|
|
ee_pblock = ext4_ext_pblock(ex);
|
|
|
|
|
|
|
|
if (ee_len == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return ext4_es_insert_extent(inode, ee_block, ee_len, ee_pblock,
|
|
|
|
EXTENT_STATUS_WRITTEN);
|
|
|
|
}
|
|
|
|
|
2008-04-29 20:11:12 +08:00
|
|
|
/* FIXME!! we need to try to merge to left or right after zero-out */
|
|
|
|
static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
|
|
|
|
{
|
2010-10-28 09:30:06 +08:00
|
|
|
ext4_fsblk_t ee_pblock;
|
|
|
|
unsigned int ee_len;
|
2008-04-29 20:11:12 +08:00
|
|
|
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2010-10-28 09:30:14 +08:00
|
|
|
ee_pblock = ext4_ext_pblock(ex);
|
2015-12-08 04:09:35 +08:00
|
|
|
return ext4_issue_zeroout(inode, le32_to_cpu(ex->ee_block), ee_pblock,
|
|
|
|
ee_len);
|
2008-04-29 20:11:12 +08:00
|
|
|
}
|
|
|
|
|
2011-05-04 00:23:07 +08:00
|
|
|
/*
|
|
|
|
* ext4_split_extent_at() splits an extent at given block.
|
|
|
|
*
|
|
|
|
* @handle: the journal handle
|
|
|
|
* @inode: the file inode
|
|
|
|
* @path: the path to the extent
|
|
|
|
* @split: the logical block where the extent is splitted.
|
|
|
|
* @split_flags: indicates if the extent could be zeroout if split fails, and
|
2014-04-21 11:45:47 +08:00
|
|
|
* the states(init or unwritten) of new extents.
|
2011-05-04 00:23:07 +08:00
|
|
|
* @flags: flags used to insert new extent to extent tree.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Splits extent [a, b] into two extents [a, @split) and [@split, b], states
|
2020-06-11 11:19:46 +08:00
|
|
|
* of which are determined by split_flag.
|
2011-05-04 00:23:07 +08:00
|
|
|
*
|
|
|
|
* There are two cases:
|
|
|
|
* a> the extent are splitted into two extent.
|
|
|
|
* b> split is not needed, and just mark the extent.
|
|
|
|
*
|
|
|
|
* return 0 on success.
|
|
|
|
*/
|
|
|
|
static int ext4_split_extent_at(handle_t *handle,
|
|
|
|
struct inode *inode,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2011-05-04 00:23:07 +08:00
|
|
|
ext4_lblk_t split,
|
|
|
|
int split_flag,
|
|
|
|
int flags)
|
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2011-05-04 00:23:07 +08:00
|
|
|
ext4_fsblk_t newblock;
|
|
|
|
ext4_lblk_t ee_block;
|
2013-03-11 09:13:05 +08:00
|
|
|
struct ext4_extent *ex, newex, orig_ex, zero_ex;
|
2011-05-04 00:23:07 +08:00
|
|
|
struct ext4_extent *ex2 = NULL;
|
|
|
|
unsigned int ee_len, depth;
|
|
|
|
int err = 0;
|
|
|
|
|
2012-10-10 13:04:58 +08:00
|
|
|
BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) ==
|
|
|
|
(EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu\n", (unsigned long long)split);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
|
|
|
newblock = split - ee_block + ext4_ext_pblock(ex);
|
|
|
|
|
|
|
|
BUG_ON(split < ee_block || split >= (ee_block + ee_len));
|
2014-04-21 11:45:47 +08:00
|
|
|
BUG_ON(!ext4_ext_is_unwritten(ex) &&
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
split_flag & (EXT4_EXT_MAY_ZEROOUT |
|
2014-04-21 11:45:47 +08:00
|
|
|
EXT4_EXT_MARK_UNWRIT1 |
|
|
|
|
EXT4_EXT_MARK_UNWRIT2));
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (split == ee_block) {
|
|
|
|
/*
|
|
|
|
* case b: block @split is the block that the extent begins with
|
|
|
|
* then we just change the state of the extent, and splitting
|
|
|
|
* is not needed.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if (split_flag & EXT4_EXT_MARK_UNWRIT2)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2011-05-04 00:23:07 +08:00
|
|
|
else
|
|
|
|
ext4_ext_mark_initialized(ex);
|
|
|
|
|
|
|
|
if (!(flags & EXT4_GET_BLOCKS_PRE_IO))
|
2012-08-17 21:44:17 +08:00
|
|
|
ext4_ext_try_to_merge(handle, inode, path, ex);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
2012-08-17 21:44:17 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
2011-05-04 00:23:07 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* case a */
|
|
|
|
memcpy(&orig_ex, ex, sizeof(orig_ex));
|
|
|
|
ex->ee_len = cpu_to_le16(split - ee_block);
|
2014-04-21 11:45:47 +08:00
|
|
|
if (split_flag & EXT4_EXT_MARK_UNWRIT1)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* path may lead to new leaf, not to original leaf any more
|
|
|
|
* after ext4_ext_insert_extent() returns,
|
|
|
|
*/
|
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto fix_extent_len;
|
|
|
|
|
|
|
|
ex2 = &newex;
|
|
|
|
ex2->ee_block = cpu_to_le32(split);
|
|
|
|
ex2->ee_len = cpu_to_le16(ee_len - (split - ee_block));
|
|
|
|
ext4_ext_store_pblock(ex2, newblock);
|
2014-04-21 11:45:47 +08:00
|
|
|
if (split_flag & EXT4_EXT_MARK_UNWRIT2)
|
|
|
|
ext4_ext_mark_unwritten(ex2);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
|
2021-05-06 22:10:42 +08:00
|
|
|
if (err != -ENOSPC && err != -EDQUOT)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
|
2012-10-10 13:04:58 +08:00
|
|
|
if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
|
2013-03-11 09:13:05 +08:00
|
|
|
if (split_flag & EXT4_EXT_DATA_VALID1) {
|
2012-10-10 13:04:58 +08:00
|
|
|
err = ext4_ext_zeroout(inode, ex2);
|
2013-03-11 09:13:05 +08:00
|
|
|
zero_ex.ee_block = ex2->ee_block;
|
2013-04-04 00:27:18 +08:00
|
|
|
zero_ex.ee_len = cpu_to_le16(
|
|
|
|
ext4_ext_get_actual_len(ex2));
|
2013-03-11 09:13:05 +08:00
|
|
|
ext4_ext_store_pblock(&zero_ex,
|
|
|
|
ext4_ext_pblock(ex2));
|
|
|
|
} else {
|
2012-10-10 13:04:58 +08:00
|
|
|
err = ext4_ext_zeroout(inode, ex);
|
2013-03-11 09:13:05 +08:00
|
|
|
zero_ex.ee_block = ex->ee_block;
|
2013-04-04 00:27:18 +08:00
|
|
|
zero_ex.ee_len = cpu_to_le16(
|
|
|
|
ext4_ext_get_actual_len(ex));
|
2013-03-11 09:13:05 +08:00
|
|
|
ext4_ext_store_pblock(&zero_ex,
|
|
|
|
ext4_ext_pblock(ex));
|
|
|
|
}
|
|
|
|
} else {
|
2012-10-10 13:04:58 +08:00
|
|
|
err = ext4_ext_zeroout(inode, &orig_ex);
|
2013-03-11 09:13:05 +08:00
|
|
|
zero_ex.ee_block = orig_ex.ee_block;
|
2013-04-04 00:27:18 +08:00
|
|
|
zero_ex.ee_len = cpu_to_le16(
|
|
|
|
ext4_ext_get_actual_len(&orig_ex));
|
2013-03-11 09:13:05 +08:00
|
|
|
ext4_ext_store_pblock(&zero_ex,
|
|
|
|
ext4_ext_pblock(&orig_ex));
|
|
|
|
}
|
2012-10-10 13:04:58 +08:00
|
|
|
|
2021-05-06 22:10:42 +08:00
|
|
|
if (!err) {
|
|
|
|
/* update the extent length and mark as initialized */
|
|
|
|
ex->ee_len = cpu_to_le16(ee_len);
|
|
|
|
ext4_ext_try_to_merge(handle, inode, path, ex);
|
|
|
|
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
|
|
|
if (!err)
|
|
|
|
/* update extent status tree */
|
|
|
|
err = ext4_zeroout_es(inode, &zero_ex);
|
|
|
|
/* If we failed at this point, we don't know in which
|
|
|
|
* state the extent tree exactly is so don't try to fix
|
|
|
|
* length of the original extent as it may do even more
|
|
|
|
* damage.
|
|
|
|
*/
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
fix_extent_len:
|
|
|
|
ex->ee_len = orig_ex.ee_len;
|
2020-04-27 09:34:38 +08:00
|
|
|
/*
|
|
|
|
* Ignore ext4_ext_dirty return value since we are already in error path
|
|
|
|
* and err is a non-zero error code.
|
|
|
|
*/
|
2014-07-28 10:30:29 +08:00
|
|
|
ext4_ext_dirty(handle, inode, path + path->p_depth);
|
2011-05-04 00:23:07 +08:00
|
|
|
return err;
|
2021-05-06 22:10:42 +08:00
|
|
|
out:
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
return err;
|
2011-05-04 00:23:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ext4_split_extents() splits an extent and mark extent which is covered
|
|
|
|
* by @map as split_flags indicates
|
|
|
|
*
|
2013-08-29 02:40:12 +08:00
|
|
|
* It may result in splitting the extent into multiple extents (up to three)
|
2011-05-04 00:23:07 +08:00
|
|
|
* There are three possibilities:
|
|
|
|
* a> There is no split required
|
|
|
|
* b> Splits in two extents: Split is happening at either end of the extent
|
|
|
|
* c> Splits in three extents: Somone is splitting in middle of the extent
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static int ext4_split_extent(handle_t *handle,
|
|
|
|
struct inode *inode,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2011-05-04 00:23:07 +08:00
|
|
|
struct ext4_map_blocks *map,
|
|
|
|
int split_flag,
|
|
|
|
int flags)
|
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2011-05-04 00:23:07 +08:00
|
|
|
ext4_lblk_t ee_block;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
unsigned int ee_len, depth;
|
|
|
|
int err = 0;
|
2014-04-21 11:45:47 +08:00
|
|
|
int unwritten;
|
2011-05-04 00:23:07 +08:00
|
|
|
int split_flag1, flags1;
|
2013-03-11 09:20:23 +08:00
|
|
|
int allocated = map->m_len;
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = ext4_ext_is_unwritten(ex);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
if (map->m_lblk + map->m_len < ee_block + ee_len) {
|
2012-10-10 13:04:58 +08:00
|
|
|
split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
|
2011-05-04 00:23:07 +08:00
|
|
|
flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten)
|
|
|
|
split_flag1 |= EXT4_EXT_MARK_UNWRIT1 |
|
|
|
|
EXT4_EXT_MARK_UNWRIT2;
|
2012-10-10 13:04:58 +08:00
|
|
|
if (split_flag & EXT4_EXT_DATA_VALID2)
|
|
|
|
split_flag1 |= EXT4_EXT_DATA_VALID1;
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_split_extent_at(handle, inode, ppath,
|
2011-05-04 00:23:07 +08:00
|
|
|
map->m_lblk + map->m_len, split_flag1, flags1);
|
2011-05-23 08:49:12 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
2013-03-11 09:20:23 +08:00
|
|
|
} else {
|
|
|
|
allocated = ee_len - (map->m_lblk - ee_block);
|
2011-05-04 00:23:07 +08:00
|
|
|
}
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
/*
|
|
|
|
* Update path is required because previous ext4_split_extent_at() may
|
|
|
|
* result in split of original leaf or extent zeroout.
|
|
|
|
*/
|
2020-05-08 01:50:28 +08:00
|
|
|
path = ext4_find_extent(inode, map->m_lblk, ppath, flags);
|
2011-05-04 00:23:07 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
2014-04-14 03:41:13 +08:00
|
|
|
if (!ex) {
|
|
|
|
EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
|
|
|
|
(unsigned long) map->m_lblk);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2014-04-14 03:41:13 +08:00
|
|
|
}
|
2014-04-21 11:45:47 +08:00
|
|
|
unwritten = ext4_ext_is_unwritten(ex);
|
2011-05-04 00:23:07 +08:00
|
|
|
|
|
|
|
if (map->m_lblk >= ee_block) {
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
|
2014-04-21 11:45:47 +08:00
|
|
|
if (unwritten) {
|
|
|
|
split_flag1 |= EXT4_EXT_MARK_UNWRIT1;
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT |
|
2014-04-21 11:45:47 +08:00
|
|
|
EXT4_EXT_MARK_UNWRIT2);
|
ext4: ext4_split_extent should take care of extent zeroout
When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized
resulting in end IO code getting confused from large unwritten extents
and may result in data loss.
The example of problematic behavior is:
lblk len lblk len
ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
ext4_split_extent_at() (split [1000,30,uninit] at 1020)
ext4_ext_insert_extent() -> ENOSPC
ext4_ext_zeroout()
-> extent [1000,30] is now initialized
ext4_split_extent_at() (split [1000,30,init] at 1010,
MARK_UNINIT1 | MARK_UNINIT2)
-> extent is split and parts marked as uninitialized
Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied
to initialized extent so this patch also add BUG_ON to prevent similar
issues in future.
TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-03-04 13:34:34 +08:00
|
|
|
}
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_split_extent_at(handle, inode, ppath,
|
2011-05-04 00:23:07 +08:00
|
|
|
map->m_lblk, split_flag1, flags);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
out:
|
2013-03-11 09:20:23 +08:00
|
|
|
return err ? err : allocated;
|
2011-05-04 00:23:07 +08:00
|
|
|
}
|
|
|
|
|
2007-07-18 09:42:38 +08:00
|
|
|
/*
|
2010-05-17 07:00:00 +08:00
|
|
|
* This function is called by ext4_ext_map_blocks() if someone tries to write
|
2014-04-21 11:45:47 +08:00
|
|
|
* to an unwritten extent. It may result in splitting the unwritten
|
2011-03-31 09:57:33 +08:00
|
|
|
* extent into multiple extents (up to three - one initialized and two
|
2014-04-21 11:45:47 +08:00
|
|
|
* unwritten).
|
2007-07-18 09:42:38 +08:00
|
|
|
* There are three possibilities:
|
|
|
|
* a> There is no split required: Entire extent should be initialized
|
|
|
|
* b> Splits in two extents: Write is happening at either end of the extent
|
|
|
|
* c> Splits in three extents: Somone is writing in middle of the extent
|
2011-10-27 23:43:23 +08:00
|
|
|
*
|
|
|
|
* Pre-conditions:
|
2014-04-21 11:45:47 +08:00
|
|
|
* - The extent pointed to by 'path' is unwritten.
|
2011-10-27 23:43:23 +08:00
|
|
|
* - The extent pointed to by 'path' contains a superset
|
|
|
|
* of the logical span [map->m_lblk, map->m_lblk + map->m_len).
|
|
|
|
*
|
|
|
|
* Post-conditions on success:
|
|
|
|
* - the returned value is the number of blocks beyond map->l_lblk
|
|
|
|
* that are allocated and initialized.
|
|
|
|
* It is guaranteed to be >= map->m_len.
|
2007-07-18 09:42:38 +08:00
|
|
|
*/
|
2008-01-29 12:58:27 +08:00
|
|
|
static int ext4_ext_convert_to_initialized(handle_t *handle,
|
2010-05-17 07:00:00 +08:00
|
|
|
struct inode *inode,
|
|
|
|
struct ext4_map_blocks *map,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2013-04-10 10:11:22 +08:00
|
|
|
int flags)
|
2007-07-18 09:42:38 +08:00
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2012-08-17 21:54:17 +08:00
|
|
|
struct ext4_sb_info *sbi;
|
2011-10-27 23:43:23 +08:00
|
|
|
struct ext4_extent_header *eh;
|
2011-05-04 00:25:07 +08:00
|
|
|
struct ext4_map_blocks split_map;
|
2017-05-27 05:40:52 +08:00
|
|
|
struct ext4_extent zero_ex1, zero_ex2;
|
2013-04-04 11:33:27 +08:00
|
|
|
struct ext4_extent *ex, *abut_ex;
|
2010-05-16 18:00:00 +08:00
|
|
|
ext4_lblk_t ee_block, eof_block;
|
2013-04-04 11:33:27 +08:00
|
|
|
unsigned int ee_len, depth, map_len = map->m_len;
|
|
|
|
int allocated = 0, max_zeroout = 0;
|
2007-07-18 09:42:38 +08:00
|
|
|
int err = 0;
|
2017-05-27 05:40:52 +08:00
|
|
|
int split_flag = EXT4_EXT_DATA_VALID2;
|
2010-05-16 18:00:00 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu, max_blocks %u\n",
|
|
|
|
(unsigned long long)map->m_lblk, map_len);
|
2010-05-16 18:00:00 +08:00
|
|
|
|
2012-08-17 21:54:17 +08:00
|
|
|
sbi = EXT4_SB(inode->i_sb);
|
2020-03-31 18:50:16 +08:00
|
|
|
eof_block = (EXT4_I(inode)->i_disksize + inode->i_sb->s_blocksize - 1)
|
|
|
|
>> inode->i_sb->s_blocksize_bits;
|
2013-04-04 11:33:27 +08:00
|
|
|
if (eof_block < map->m_lblk + map_len)
|
|
|
|
eof_block = map->m_lblk + map_len;
|
2007-07-18 09:42:38 +08:00
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
2011-10-27 23:43:23 +08:00
|
|
|
eh = path[depth].p_hdr;
|
2007-07-18 09:42:38 +08:00
|
|
|
ex = path[depth].p_ext;
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2017-05-27 05:40:52 +08:00
|
|
|
zero_ex1.ee_len = 0;
|
|
|
|
zero_ex2.ee_len = 0;
|
2007-07-18 09:42:38 +08:00
|
|
|
|
2011-10-27 23:43:23 +08:00
|
|
|
trace_ext4_ext_convert_to_initialized_enter(inode, map, ex);
|
|
|
|
|
|
|
|
/* Pre-conditions */
|
2014-04-21 11:45:47 +08:00
|
|
|
BUG_ON(!ext4_ext_is_unwritten(ex));
|
2011-10-27 23:43:23 +08:00
|
|
|
BUG_ON(!in_range(map->m_lblk, ee_block, ee_len));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to transfer newly initialized blocks from the currently
|
2014-04-21 11:45:47 +08:00
|
|
|
* unwritten extent to its neighbor. This is much cheaper
|
2011-10-27 23:43:23 +08:00
|
|
|
* than an insertion followed by a merge as those involve costly
|
2013-04-04 11:33:27 +08:00
|
|
|
* memmove() calls. Transferring to the left is the common case in
|
|
|
|
* steady state for workloads doing fallocate(FALLOC_FL_KEEP_SIZE)
|
|
|
|
* followed by append writes.
|
2011-10-27 23:43:23 +08:00
|
|
|
*
|
|
|
|
* Limitations of the current logic:
|
2013-04-04 11:33:27 +08:00
|
|
|
* - L1: we do not deal with writes covering the whole extent.
|
2011-10-27 23:43:23 +08:00
|
|
|
* This would require removing the extent if the transfer
|
|
|
|
* is possible.
|
2013-04-04 11:33:27 +08:00
|
|
|
* - L2: we only attempt to merge with an extent stored in the
|
2011-10-27 23:43:23 +08:00
|
|
|
* same extent tree node.
|
|
|
|
*/
|
2013-04-04 11:33:27 +08:00
|
|
|
if ((map->m_lblk == ee_block) &&
|
|
|
|
/* See if we can merge left */
|
|
|
|
(map_len < ee_len) && /*L1*/
|
|
|
|
(ex > EXT_FIRST_EXTENT(eh))) { /*L2*/
|
2011-10-27 23:43:23 +08:00
|
|
|
ext4_lblk_t prev_lblk;
|
|
|
|
ext4_fsblk_t prev_pblk, ee_pblk;
|
2013-04-04 11:33:27 +08:00
|
|
|
unsigned int prev_len;
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2013-04-04 11:33:27 +08:00
|
|
|
abut_ex = ex - 1;
|
|
|
|
prev_lblk = le32_to_cpu(abut_ex->ee_block);
|
|
|
|
prev_len = ext4_ext_get_actual_len(abut_ex);
|
|
|
|
prev_pblk = ext4_ext_pblock(abut_ex);
|
2011-10-27 23:43:23 +08:00
|
|
|
ee_pblk = ext4_ext_pblock(ex);
|
|
|
|
|
|
|
|
/*
|
2013-04-04 11:33:27 +08:00
|
|
|
* A transfer of blocks from 'ex' to 'abut_ex' is allowed
|
2011-10-27 23:43:23 +08:00
|
|
|
* upon those conditions:
|
2013-04-04 11:33:27 +08:00
|
|
|
* - C1: abut_ex is initialized,
|
|
|
|
* - C2: abut_ex is logically abutting ex,
|
|
|
|
* - C3: abut_ex is physically abutting ex,
|
|
|
|
* - C4: abut_ex can receive the additional blocks without
|
2011-10-27 23:43:23 +08:00
|
|
|
* overflowing the (initialized) length limit.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if ((!ext4_ext_is_unwritten(abut_ex)) && /*C1*/
|
2011-10-27 23:43:23 +08:00
|
|
|
((prev_lblk + prev_len) == ee_block) && /*C2*/
|
|
|
|
((prev_pblk + prev_len) == ee_pblk) && /*C3*/
|
2013-04-04 11:33:27 +08:00
|
|
|
(prev_len < (EXT_INIT_MAX_LEN - map_len))) { /*C4*/
|
2011-10-27 23:43:23 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
trace_ext4_ext_convert_to_initialized_fastpath(inode,
|
2013-04-04 11:33:27 +08:00
|
|
|
map, ex, abut_ex);
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2013-04-04 11:33:27 +08:00
|
|
|
/* Shift the start of ex by 'map_len' blocks */
|
|
|
|
ex->ee_block = cpu_to_le32(ee_block + map_len);
|
|
|
|
ext4_ext_store_pblock(ex, ee_pblk + map_len);
|
|
|
|
ex->ee_len = cpu_to_le16(ee_len - map_len);
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_mark_unwritten(ex); /* Restore the flag */
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2013-04-04 11:33:27 +08:00
|
|
|
/* Extend abut_ex by 'map_len' blocks */
|
|
|
|
abut_ex->ee_len = cpu_to_le16(prev_len + map_len);
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2013-04-04 11:33:27 +08:00
|
|
|
/* Result: number of initialized blocks past m_lblk */
|
|
|
|
allocated = map_len;
|
|
|
|
}
|
|
|
|
} else if (((map->m_lblk + map_len) == (ee_block + ee_len)) &&
|
|
|
|
(map_len < ee_len) && /*L1*/
|
|
|
|
ex < EXT_LAST_EXTENT(eh)) { /*L2*/
|
|
|
|
/* See if we can merge right */
|
|
|
|
ext4_lblk_t next_lblk;
|
|
|
|
ext4_fsblk_t next_pblk, ee_pblk;
|
|
|
|
unsigned int next_len;
|
|
|
|
|
|
|
|
abut_ex = ex + 1;
|
|
|
|
next_lblk = le32_to_cpu(abut_ex->ee_block);
|
|
|
|
next_len = ext4_ext_get_actual_len(abut_ex);
|
|
|
|
next_pblk = ext4_ext_pblock(abut_ex);
|
|
|
|
ee_pblk = ext4_ext_pblock(ex);
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2013-04-04 11:33:27 +08:00
|
|
|
/*
|
|
|
|
* A transfer of blocks from 'ex' to 'abut_ex' is allowed
|
|
|
|
* upon those conditions:
|
|
|
|
* - C1: abut_ex is initialized,
|
|
|
|
* - C2: abut_ex is logically abutting ex,
|
|
|
|
* - C3: abut_ex is physically abutting ex,
|
|
|
|
* - C4: abut_ex can receive the additional blocks without
|
|
|
|
* overflowing the (initialized) length limit.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if ((!ext4_ext_is_unwritten(abut_ex)) && /*C1*/
|
2013-04-04 11:33:27 +08:00
|
|
|
((map->m_lblk + map_len) == next_lblk) && /*C2*/
|
|
|
|
((ee_pblk + ee_len) == next_pblk) && /*C3*/
|
|
|
|
(next_len < (EXT_INIT_MAX_LEN - map_len))) { /*C4*/
|
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
trace_ext4_ext_convert_to_initialized_fastpath(inode,
|
|
|
|
map, ex, abut_ex);
|
|
|
|
|
|
|
|
/* Shift the start of abut_ex by 'map_len' blocks */
|
|
|
|
abut_ex->ee_block = cpu_to_le32(next_lblk - map_len);
|
|
|
|
ext4_ext_store_pblock(abut_ex, next_pblk - map_len);
|
|
|
|
ex->ee_len = cpu_to_le16(ee_len - map_len);
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_mark_unwritten(ex); /* Restore the flag */
|
2013-04-04 11:33:27 +08:00
|
|
|
|
|
|
|
/* Extend abut_ex by 'map_len' blocks */
|
|
|
|
abut_ex->ee_len = cpu_to_le16(next_len + map_len);
|
2011-10-27 23:43:23 +08:00
|
|
|
|
|
|
|
/* Result: number of initialized blocks past m_lblk */
|
2013-04-04 11:33:27 +08:00
|
|
|
allocated = map_len;
|
2011-10-27 23:43:23 +08:00
|
|
|
}
|
|
|
|
}
|
2013-04-04 11:33:27 +08:00
|
|
|
if (allocated) {
|
|
|
|
/* Mark the block containing both extents as dirty */
|
2020-04-27 09:34:38 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
2013-04-04 11:33:27 +08:00
|
|
|
|
|
|
|
/* Update path to point to the right extent */
|
|
|
|
path[depth].p_ext = abut_ex;
|
|
|
|
goto out;
|
|
|
|
} else
|
|
|
|
allocated = ee_len - (map->m_lblk - ee_block);
|
2011-10-27 23:43:23 +08:00
|
|
|
|
2011-05-04 00:25:07 +08:00
|
|
|
WARN_ON(map->m_lblk < ee_block);
|
2010-05-16 18:00:00 +08:00
|
|
|
/*
|
|
|
|
* It is safe to convert extent to initialized via explicit
|
2014-01-07 03:05:23 +08:00
|
|
|
* zeroout only if extent is fully inside i_size or new_size.
|
2010-05-16 18:00:00 +08:00
|
|
|
*/
|
2011-05-04 00:25:07 +08:00
|
|
|
split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
|
2010-05-16 18:00:00 +08:00
|
|
|
|
2012-08-17 21:54:17 +08:00
|
|
|
if (EXT4_EXT_MAY_ZEROOUT & split_flag)
|
|
|
|
max_zeroout = sbi->s_extent_max_zeroout_kb >>
|
2013-03-13 00:40:04 +08:00
|
|
|
(inode->i_sb->s_blocksize_bits - 10);
|
2012-08-17 21:54:17 +08:00
|
|
|
|
2007-07-18 09:42:38 +08:00
|
|
|
/*
|
2017-05-27 05:40:52 +08:00
|
|
|
* five cases:
|
2011-05-04 00:25:07 +08:00
|
|
|
* 1. split the extent into three extents.
|
2017-05-27 05:40:52 +08:00
|
|
|
* 2. split the extent into two extents, zeroout the head of the first
|
|
|
|
* extent.
|
|
|
|
* 3. split the extent into two extents, zeroout the tail of the second
|
|
|
|
* extent.
|
2011-05-04 00:25:07 +08:00
|
|
|
* 4. split the extent into two extents with out zeroout.
|
2017-05-27 05:40:52 +08:00
|
|
|
* 5. no splitting needed, just possibly zeroout the head and / or the
|
|
|
|
* tail of the extent.
|
2007-07-18 09:42:38 +08:00
|
|
|
*/
|
2011-05-04 00:25:07 +08:00
|
|
|
split_map.m_lblk = map->m_lblk;
|
|
|
|
split_map.m_len = map->m_len;
|
|
|
|
|
2017-05-27 05:40:52 +08:00
|
|
|
if (max_zeroout && (allocated > split_map.m_len)) {
|
2012-08-17 21:54:17 +08:00
|
|
|
if (allocated <= max_zeroout) {
|
2017-05-27 05:40:52 +08:00
|
|
|
/* case 3 or 5 */
|
|
|
|
zero_ex1.ee_block =
|
|
|
|
cpu_to_le32(split_map.m_lblk +
|
|
|
|
split_map.m_len);
|
|
|
|
zero_ex1.ee_len =
|
|
|
|
cpu_to_le16(allocated - split_map.m_len);
|
|
|
|
ext4_ext_store_pblock(&zero_ex1,
|
|
|
|
ext4_ext_pblock(ex) + split_map.m_lblk +
|
|
|
|
split_map.m_len - ee_block);
|
|
|
|
err = ext4_ext_zeroout(inode, &zero_ex1);
|
2007-07-18 09:42:38 +08:00
|
|
|
if (err)
|
2021-08-13 23:20:48 +08:00
|
|
|
goto fallback;
|
2011-05-04 00:25:07 +08:00
|
|
|
split_map.m_len = allocated;
|
2017-05-27 05:40:52 +08:00
|
|
|
}
|
|
|
|
if (split_map.m_lblk - ee_block + split_map.m_len <
|
|
|
|
max_zeroout) {
|
|
|
|
/* case 2 or 5 */
|
|
|
|
if (split_map.m_lblk != ee_block) {
|
|
|
|
zero_ex2.ee_block = ex->ee_block;
|
|
|
|
zero_ex2.ee_len = cpu_to_le16(split_map.m_lblk -
|
2011-05-04 00:25:07 +08:00
|
|
|
ee_block);
|
2017-05-27 05:40:52 +08:00
|
|
|
ext4_ext_store_pblock(&zero_ex2,
|
2011-05-04 00:25:07 +08:00
|
|
|
ext4_ext_pblock(ex));
|
2017-05-27 05:40:52 +08:00
|
|
|
err = ext4_ext_zeroout(inode, &zero_ex2);
|
2011-05-04 00:25:07 +08:00
|
|
|
if (err)
|
2021-08-13 23:20:48 +08:00
|
|
|
goto fallback;
|
2011-05-04 00:25:07 +08:00
|
|
|
}
|
|
|
|
|
2017-05-27 05:40:52 +08:00
|
|
|
split_map.m_len += split_map.m_lblk - ee_block;
|
2011-05-04 00:25:07 +08:00
|
|
|
split_map.m_lblk = ee_block;
|
2011-05-16 22:11:09 +08:00
|
|
|
allocated = map->m_len;
|
2007-07-18 09:42:38 +08:00
|
|
|
}
|
|
|
|
}
|
2011-05-04 00:25:07 +08:00
|
|
|
|
2021-08-13 23:20:48 +08:00
|
|
|
fallback:
|
2014-10-30 22:53:17 +08:00
|
|
|
err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
|
|
|
|
flags);
|
|
|
|
if (err > 0)
|
|
|
|
err = 0;
|
2007-07-18 09:42:38 +08:00
|
|
|
out:
|
2013-03-11 09:13:05 +08:00
|
|
|
/* If we have gotten a failure, don't zero out status tree */
|
2017-05-27 05:40:52 +08:00
|
|
|
if (!err) {
|
|
|
|
err = ext4_zeroout_es(inode, &zero_ex1);
|
|
|
|
if (!err)
|
|
|
|
err = ext4_zeroout_es(inode, &zero_ex2);
|
|
|
|
}
|
2007-07-18 09:42:38 +08:00
|
|
|
return err ? err : allocated;
|
|
|
|
}
|
|
|
|
|
2009-09-29 03:49:08 +08:00
|
|
|
/*
|
2010-05-17 07:00:00 +08:00
|
|
|
* This function is called by ext4_ext_map_blocks() from
|
2009-09-29 03:49:08 +08:00
|
|
|
* ext4_get_blocks_dio_write() when DIO to write
|
2014-04-21 11:45:47 +08:00
|
|
|
* to an unwritten extent.
|
2009-09-29 03:49:08 +08:00
|
|
|
*
|
2014-04-21 11:45:47 +08:00
|
|
|
* Writing to an unwritten extent may result in splitting the unwritten
|
|
|
|
* extent into multiple initialized/unwritten extents (up to three)
|
2009-09-29 03:49:08 +08:00
|
|
|
* There are three possibilities:
|
2014-04-21 11:45:47 +08:00
|
|
|
* a> There is no split required: Entire extent should be unwritten
|
2009-09-29 03:49:08 +08:00
|
|
|
* b> Splits in two extents: Write is happening at either end of the extent
|
|
|
|
* c> Splits in three extents: Somone is writing in middle of the extent
|
|
|
|
*
|
2014-03-19 06:05:35 +08:00
|
|
|
* This works the same way in the case of initialized -> unwritten conversion.
|
|
|
|
*
|
2009-09-29 03:49:08 +08:00
|
|
|
* One of more index blocks maybe needed if the extent tree grow after
|
2014-04-21 11:45:47 +08:00
|
|
|
* the unwritten extent split. To prevent ENOSPC occur at the IO
|
|
|
|
* complete, we need to split the unwritten extent before DIO submit
|
|
|
|
* the IO. The unwritten extent called at this time will be split
|
|
|
|
* into three unwritten extent(at most). After IO complete, the part
|
2009-09-29 03:49:08 +08:00
|
|
|
* being filled will be convert to initialized by the end_io callback function
|
|
|
|
* via ext4_convert_unwritten_extents().
|
2009-11-06 17:01:23 +08:00
|
|
|
*
|
2014-04-21 11:45:47 +08:00
|
|
|
* Returns the size of unwritten extent to be written on success.
|
2009-09-29 03:49:08 +08:00
|
|
|
*/
|
2014-03-19 06:05:35 +08:00
|
|
|
static int ext4_split_convert_extents(handle_t *handle,
|
2009-09-29 03:49:08 +08:00
|
|
|
struct inode *inode,
|
2010-05-17 07:00:00 +08:00
|
|
|
struct ext4_map_blocks *map,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2009-09-29 03:49:08 +08:00
|
|
|
int flags)
|
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2011-05-04 00:25:07 +08:00
|
|
|
ext4_lblk_t eof_block;
|
|
|
|
ext4_lblk_t ee_block;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
unsigned int ee_len;
|
|
|
|
int split_flag = 0, depth;
|
2010-05-16 18:00:00 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu, max_blocks %u\n",
|
2014-03-19 06:05:35 +08:00
|
|
|
(unsigned long long)map->m_lblk, map->m_len);
|
2010-05-16 18:00:00 +08:00
|
|
|
|
2020-03-31 18:50:16 +08:00
|
|
|
eof_block = (EXT4_I(inode)->i_disksize + inode->i_sb->s_blocksize - 1)
|
|
|
|
>> inode->i_sb->s_blocksize_bits;
|
2010-05-17 07:00:00 +08:00
|
|
|
if (eof_block < map->m_lblk + map->m_len)
|
|
|
|
eof_block = map->m_lblk + map->m_len;
|
2010-05-16 18:00:00 +08:00
|
|
|
/*
|
|
|
|
* It is safe to convert extent to initialized via explicit
|
2020-06-11 11:19:46 +08:00
|
|
|
* zeroout only if extent is fully inside i_size or new_size.
|
2010-05-16 18:00:00 +08:00
|
|
|
*/
|
2011-05-04 00:25:07 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2009-09-29 03:49:08 +08:00
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
/* Convert to unwritten */
|
|
|
|
if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) {
|
|
|
|
split_flag |= EXT4_EXT_DATA_VALID1;
|
|
|
|
/* Convert to initialized */
|
|
|
|
} else if (flags & EXT4_GET_BLOCKS_CONVERT) {
|
|
|
|
split_flag |= ee_block + ee_len <= eof_block ?
|
|
|
|
EXT4_EXT_MAY_ZEROOUT : 0;
|
2014-04-21 11:45:47 +08:00
|
|
|
split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2);
|
2014-03-19 06:05:35 +08:00
|
|
|
}
|
2011-05-04 00:25:07 +08:00
|
|
|
flags |= EXT4_GET_BLOCKS_PRE_IO;
|
2014-09-02 02:37:09 +08:00
|
|
|
return ext4_split_extent(handle, inode, ppath, map, split_flag, flags);
|
2009-09-29 03:49:08 +08:00
|
|
|
}
|
2011-05-03 23:45:29 +08:00
|
|
|
|
2010-03-03 02:28:44 +08:00
|
|
|
static int ext4_convert_unwritten_extents_endio(handle_t *handle,
|
2012-10-10 13:04:58 +08:00
|
|
|
struct inode *inode,
|
|
|
|
struct ext4_map_blocks *map,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath)
|
2009-09-29 03:49:08 +08:00
|
|
|
{
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2009-09-29 03:49:08 +08:00
|
|
|
struct ext4_extent *ex;
|
2012-10-10 13:04:58 +08:00
|
|
|
ext4_lblk_t ee_block;
|
|
|
|
unsigned int ee_len;
|
2009-09-29 03:49:08 +08:00
|
|
|
int depth;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
2012-10-10 13:04:58 +08:00
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2009-09-29 03:49:08 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu, max_blocks %u\n",
|
2012-10-10 13:04:58 +08:00
|
|
|
(unsigned long long)ee_block, ee_len);
|
|
|
|
|
2013-03-04 13:41:05 +08:00
|
|
|
/* If extent is larger than requested it is a clear sign that we still
|
|
|
|
* have some extent state machine issues left. So extent_split is still
|
|
|
|
* required.
|
|
|
|
* TODO: Once all related issues will be fixed this situation should be
|
|
|
|
* illegal.
|
|
|
|
*/
|
2012-10-10 13:04:58 +08:00
|
|
|
if (ee_block != map->m_lblk || ee_len > map->m_len) {
|
2019-08-23 10:53:46 +08:00
|
|
|
#ifdef CONFIG_EXT4_DEBUG
|
|
|
|
ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
|
2016-04-27 13:11:21 +08:00
|
|
|
" len %u; IO logical block %llu, len %u",
|
2013-03-04 13:41:05 +08:00
|
|
|
inode->i_ino, (unsigned long long)ee_block, ee_len,
|
|
|
|
(unsigned long long)map->m_lblk, map->m_len);
|
|
|
|
#endif
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_split_convert_extents(handle, inode, map, ppath,
|
2014-03-19 06:05:35 +08:00
|
|
|
EXT4_GET_BLOCKS_CONVERT);
|
2012-10-10 13:04:58 +08:00
|
|
|
if (err < 0)
|
2014-09-02 02:37:09 +08:00
|
|
|
return err;
|
2014-09-02 02:43:09 +08:00
|
|
|
path = ext4_find_extent(inode, map->m_lblk, ppath, 0);
|
2014-09-02 02:37:09 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
2012-10-10 13:04:58 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
}
|
2011-05-03 23:45:29 +08:00
|
|
|
|
2009-09-29 03:49:08 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
/* first mark the extent as initialized */
|
|
|
|
ext4_ext_mark_initialized(ex);
|
|
|
|
|
2011-05-03 23:45:29 +08:00
|
|
|
/* note: ext4_ext_correct_indexes() isn't needed here because
|
|
|
|
* borders are not changed
|
2009-09-29 03:49:08 +08:00
|
|
|
*/
|
2012-08-17 21:44:17 +08:00
|
|
|
ext4_ext_try_to_merge(handle, inode, path, ex);
|
2011-05-03 23:45:29 +08:00
|
|
|
|
2009-09-29 03:49:08 +08:00
|
|
|
/* Mark modified extent as dirty */
|
2012-08-17 21:44:17 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
2009-09-29 03:49:08 +08:00
|
|
|
out:
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
static int
|
2014-09-02 02:35:09 +08:00
|
|
|
convert_initialized_extent(handle_t *handle, struct inode *inode,
|
|
|
|
struct ext4_map_blocks *map,
|
2016-02-23 11:58:55 +08:00
|
|
|
struct ext4_ext_path **ppath,
|
2020-02-19 04:26:56 +08:00
|
|
|
unsigned int *allocated)
|
2014-03-19 06:05:35 +08:00
|
|
|
{
|
2014-09-02 02:36:09 +08:00
|
|
|
struct ext4_ext_path *path = *ppath;
|
2014-09-02 02:35:09 +08:00
|
|
|
struct ext4_extent *ex;
|
|
|
|
ext4_lblk_t ee_block;
|
|
|
|
unsigned int ee_len;
|
|
|
|
int depth;
|
2014-03-19 06:05:35 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that the extent is no bigger than we support with
|
2014-04-21 11:45:47 +08:00
|
|
|
* unwritten extent
|
2014-03-19 06:05:35 +08:00
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if (map->m_len > EXT_UNWRITTEN_MAX_LEN)
|
|
|
|
map->m_len = EXT_UNWRITTEN_MAX_LEN / 2;
|
2014-03-19 06:05:35 +08:00
|
|
|
|
2014-09-02 02:35:09 +08:00
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu, max_blocks %u\n",
|
2014-09-02 02:35:09 +08:00
|
|
|
(unsigned long long)ee_block, ee_len);
|
|
|
|
|
|
|
|
if (ee_block != map->m_lblk || ee_len > map->m_len) {
|
2014-09-02 02:37:09 +08:00
|
|
|
err = ext4_split_convert_extents(handle, inode, map, ppath,
|
2014-09-02 02:35:09 +08:00
|
|
|
EXT4_GET_BLOCKS_CONVERT_UNWRITTEN);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2014-09-02 02:43:09 +08:00
|
|
|
path = ext4_find_extent(inode, map->m_lblk, ppath, 0);
|
2014-09-02 02:35:09 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
ex = path[depth].p_ext;
|
|
|
|
if (!ex) {
|
|
|
|
EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
|
|
|
|
(unsigned long) map->m_lblk);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2014-09-02 02:35:09 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
/* first mark the extent as unwritten */
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
|
|
|
|
|
|
|
/* note: ext4_ext_correct_indexes() isn't needed here because
|
|
|
|
* borders are not changed
|
|
|
|
*/
|
|
|
|
ext4_ext_try_to_merge(handle, inode, path, ex);
|
|
|
|
|
|
|
|
/* Mark modified extent as dirty */
|
|
|
|
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
|
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
ext4: remove EXT4_EOFBLOCKS_FL and associated code
The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
contains unwritten blocks past i_size. It's set when ext4_fallocate
is called with the KEEP_SIZE flag to extend a file with an unwritten
extent. However, this flag hasn't been useful functionally since
March, 2012, when a decision was made to remove it from ext4.
All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
handling") at that time. Now that enough time has passed to make
e2fsprogs versions containing this modification common, this patch now
removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
well.
This change has two implications. First, because pre-1.42.2 e2fsck
versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
because that bit will never be set by newer kernels containing this
patch, old versions of e2fsck won't have a compatibility problem with
files created by newer kernels.
Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
belonging to a file written by an older kernel. If set, it will remain
in that state until the file is deleted. Because e2fsck versions since
1.42.2 don't check the flag at all, no adverse effect is expected.
However, pre-1.42.2 e2fsck versions that do check the flag may report
that it is set when it ought not to be after a file has been truncated
or had its unwritten blocks written. In this case, the old version of
e2fsck will offer to clear the flag. No adverse effect would then
occur whether the user chooses to clear the flag or not.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-12 05:02:16 +08:00
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
map->m_flags |= EXT4_MAP_UNWRITTEN;
|
2020-02-19 04:26:56 +08:00
|
|
|
if (*allocated > map->m_len)
|
|
|
|
*allocated = map->m_len;
|
|
|
|
map->m_len = *allocated;
|
|
|
|
return 0;
|
2014-03-19 06:05:35 +08:00
|
|
|
}
|
|
|
|
|
2009-09-29 03:49:08 +08:00
|
|
|
static int
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
|
2010-05-17 07:00:00 +08:00
|
|
|
struct ext4_map_blocks *map,
|
2014-09-02 02:37:09 +08:00
|
|
|
struct ext4_ext_path **ppath, int flags,
|
2010-05-17 07:00:00 +08:00
|
|
|
unsigned int allocated, ext4_fsblk_t newblock)
|
2009-09-29 03:49:08 +08:00
|
|
|
{
|
2020-05-10 14:24:53 +08:00
|
|
|
struct ext4_ext_path __maybe_unused *path = *ppath;
|
2009-09-29 03:49:08 +08:00
|
|
|
int ret = 0;
|
|
|
|
int err = 0;
|
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "logical block %llu, max_blocks %u, flags 0x%x, allocated %u\n",
|
|
|
|
(unsigned long long)map->m_lblk, map->m_len, flags,
|
|
|
|
allocated);
|
2009-09-29 03:49:08 +08:00
|
|
|
ext4_ext_show_leaf(inode, path);
|
|
|
|
|
2013-04-10 10:11:22 +08:00
|
|
|
/*
|
2014-04-21 11:45:47 +08:00
|
|
|
* When writing into unwritten space, we should not fail to
|
2013-04-10 10:11:22 +08:00
|
|
|
* allocate metadata blocks for the new extent block if needed.
|
|
|
|
*/
|
|
|
|
flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL;
|
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
trace_ext4_ext_handle_unwritten_extents(inode, map, flags,
|
2012-11-09 03:33:43 +08:00
|
|
|
allocated, newblock);
|
2011-09-10 07:18:51 +08:00
|
|
|
|
2020-05-01 02:53:19 +08:00
|
|
|
/* get_block() before submitting IO, split the extent */
|
2014-05-13 00:55:07 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_PRE_IO) {
|
2014-09-02 02:37:09 +08:00
|
|
|
ret = ext4_split_convert_extents(handle, inode, map, ppath,
|
|
|
|
flags | EXT4_GET_BLOCKS_CONVERT);
|
2020-05-01 02:53:19 +08:00
|
|
|
if (ret < 0) {
|
|
|
|
err = ret;
|
|
|
|
goto out2;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* shouldn't get a 0 return when splitting an extent unless
|
|
|
|
* m_len is 0 (bug) or extent has been corrupted
|
|
|
|
*/
|
|
|
|
if (unlikely(ret == 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"unexpected ret == 0, m_len = %u",
|
|
|
|
map->m_len);
|
|
|
|
err = -EFSCORRUPTED;
|
|
|
|
goto out2;
|
|
|
|
}
|
2013-02-18 13:28:04 +08:00
|
|
|
map->m_flags |= EXT4_MAP_UNWRITTEN;
|
2009-09-29 03:49:08 +08:00
|
|
|
goto out;
|
|
|
|
}
|
2010-03-03 02:28:44 +08:00
|
|
|
/* IO end_io complete, convert the filled extent to written */
|
2014-05-13 00:55:07 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_CONVERT) {
|
2020-05-01 02:53:18 +08:00
|
|
|
err = ext4_convert_unwritten_extents_endio(handle, inode, map,
|
2014-09-02 02:37:09 +08:00
|
|
|
ppath);
|
2020-05-01 02:53:18 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto out2;
|
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
|
|
|
goto map_out;
|
2009-09-29 03:49:08 +08:00
|
|
|
}
|
2020-05-01 02:53:18 +08:00
|
|
|
/* buffered IO cases */
|
2009-09-29 03:49:08 +08:00
|
|
|
/*
|
|
|
|
* repeat fallocate creation request
|
|
|
|
* we already have an unwritten extent
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) {
|
2013-02-18 13:28:04 +08:00
|
|
|
map->m_flags |= EXT4_MAP_UNWRITTEN;
|
2009-09-29 03:49:08 +08:00
|
|
|
goto map_out;
|
2013-02-18 13:28:04 +08:00
|
|
|
}
|
2009-09-29 03:49:08 +08:00
|
|
|
|
|
|
|
/* buffered READ or buffered write_begin() lookup */
|
|
|
|
if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
|
|
|
|
/*
|
|
|
|
* We have blocks reserved already. We
|
|
|
|
* return allocated blocks so that delalloc
|
|
|
|
* won't do block reservation for us. But
|
|
|
|
* the buffer head will be unmapped so that
|
|
|
|
* a read from the block returns 0s.
|
|
|
|
*/
|
2010-05-17 07:00:00 +08:00
|
|
|
map->m_flags |= EXT4_MAP_UNWRITTEN;
|
2009-09-29 03:49:08 +08:00
|
|
|
goto out1;
|
|
|
|
}
|
|
|
|
|
2020-05-01 02:53:20 +08:00
|
|
|
/*
|
|
|
|
* Default case when (flags & EXT4_GET_BLOCKS_CREATE) == 1.
|
|
|
|
* For buffered writes, at writepage time, etc. Convert a
|
|
|
|
* discovered unwritten extent to written.
|
|
|
|
*/
|
2014-09-02 02:37:09 +08:00
|
|
|
ret = ext4_ext_convert_to_initialized(handle, inode, map, ppath, flags);
|
2020-05-01 02:53:20 +08:00
|
|
|
if (ret < 0) {
|
2009-09-29 03:49:08 +08:00
|
|
|
err = ret;
|
|
|
|
goto out2;
|
2020-05-01 02:53:19 +08:00
|
|
|
}
|
2020-05-01 02:53:20 +08:00
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
|
|
|
/*
|
|
|
|
* shouldn't get a 0 return when converting an unwritten extent
|
|
|
|
* unless m_len is 0 (bug) or extent has been corrupted
|
|
|
|
*/
|
|
|
|
if (unlikely(ret == 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "unexpected ret == 0, m_len = %u",
|
|
|
|
map->m_len);
|
|
|
|
err = -EFSCORRUPTED;
|
|
|
|
goto out2;
|
|
|
|
}
|
|
|
|
|
2020-05-01 02:53:19 +08:00
|
|
|
out:
|
|
|
|
allocated = ret;
|
2010-05-17 07:00:00 +08:00
|
|
|
map->m_flags |= EXT4_MAP_NEW;
|
2009-09-29 03:49:08 +08:00
|
|
|
map_out:
|
2010-05-17 07:00:00 +08:00
|
|
|
map->m_flags |= EXT4_MAP_MAPPED;
|
2009-09-29 03:49:08 +08:00
|
|
|
out1:
|
2020-05-01 02:53:18 +08:00
|
|
|
map->m_pblk = newblock;
|
2010-05-17 07:00:00 +08:00
|
|
|
if (allocated > map->m_len)
|
|
|
|
allocated = map->m_len;
|
|
|
|
map->m_len = allocated;
|
2020-05-01 02:53:18 +08:00
|
|
|
ext4_ext_show_leaf(inode, path);
|
2009-09-29 03:49:08 +08:00
|
|
|
out2:
|
|
|
|
return err ? err : allocated;
|
|
|
|
}
|
2010-10-28 09:23:12 +08:00
|
|
|
|
2011-09-10 06:52:51 +08:00
|
|
|
/*
|
|
|
|
* get_implied_cluster_alloc - check to see if the requested
|
|
|
|
* allocation (in the map structure) overlaps with a cluster already
|
|
|
|
* allocated in an extent.
|
2011-09-10 07:18:51 +08:00
|
|
|
* @sb The filesystem superblock structure
|
2011-09-10 06:52:51 +08:00
|
|
|
* @map The requested lblk->pblk mapping
|
|
|
|
* @ex The extent structure which might contain an implied
|
|
|
|
* cluster allocation
|
|
|
|
*
|
|
|
|
* This function is called by ext4_ext_map_blocks() after we failed to
|
|
|
|
* find blocks that were already in the inode's extent tree. Hence,
|
|
|
|
* we know that the beginning of the requested region cannot overlap
|
|
|
|
* the extent from the inode's extent tree. There are three cases we
|
|
|
|
* want to catch. The first is this case:
|
|
|
|
*
|
|
|
|
* |--- cluster # N--|
|
|
|
|
* |--- extent ---| |---- requested region ---|
|
|
|
|
* |==========|
|
|
|
|
*
|
|
|
|
* The second case that we need to test for is this one:
|
|
|
|
*
|
|
|
|
* |--------- cluster # N ----------------|
|
|
|
|
* |--- requested region --| |------- extent ----|
|
|
|
|
* |=======================|
|
|
|
|
*
|
|
|
|
* The third case is when the requested region lies between two extents
|
|
|
|
* within the same cluster:
|
|
|
|
* |------------- cluster # N-------------|
|
|
|
|
* |----- ex -----| |---- ex_right ----|
|
|
|
|
* |------ requested region ------|
|
|
|
|
* |================|
|
|
|
|
*
|
|
|
|
* In each of the above cases, we need to set the map->m_pblk and
|
|
|
|
* map->m_len so it corresponds to the return the extent labelled as
|
|
|
|
* "|====|" from cluster #N, since it is already in use for data in
|
|
|
|
* cluster EXT4_B2C(sbi, map->m_lblk). We will then return 1 to
|
|
|
|
* signal to ext4_ext_map_blocks() that map->m_pblk should be treated
|
|
|
|
* as a new "allocated" block region. Otherwise, we will return 0 and
|
|
|
|
* ext4_ext_map_blocks() will then allocate one or more new clusters
|
|
|
|
* by calling ext4_mb_new_blocks().
|
|
|
|
*/
|
2011-09-10 07:18:51 +08:00
|
|
|
static int get_implied_cluster_alloc(struct super_block *sb,
|
2011-09-10 06:52:51 +08:00
|
|
|
struct ext4_map_blocks *map,
|
|
|
|
struct ext4_extent *ex,
|
|
|
|
struct ext4_ext_path *path)
|
|
|
|
{
|
2011-09-10 07:18:51 +08:00
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
2013-12-20 22:29:35 +08:00
|
|
|
ext4_lblk_t c_offset = EXT4_LBLK_COFF(sbi, map->m_lblk);
|
2011-09-10 06:52:51 +08:00
|
|
|
ext4_lblk_t ex_cluster_start, ex_cluster_end;
|
2011-12-19 06:39:02 +08:00
|
|
|
ext4_lblk_t rr_cluster_start;
|
2011-09-10 06:52:51 +08:00
|
|
|
ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
|
|
|
|
ext4_fsblk_t ee_start = ext4_ext_pblock(ex);
|
|
|
|
unsigned short ee_len = ext4_ext_get_actual_len(ex);
|
|
|
|
|
|
|
|
/* The extent passed in that we are trying to match */
|
|
|
|
ex_cluster_start = EXT4_B2C(sbi, ee_block);
|
|
|
|
ex_cluster_end = EXT4_B2C(sbi, ee_block + ee_len - 1);
|
|
|
|
|
|
|
|
/* The requested region passed into ext4_map_blocks() */
|
|
|
|
rr_cluster_start = EXT4_B2C(sbi, map->m_lblk);
|
|
|
|
|
|
|
|
if ((rr_cluster_start == ex_cluster_end) ||
|
|
|
|
(rr_cluster_start == ex_cluster_start)) {
|
|
|
|
if (rr_cluster_start == ex_cluster_end)
|
|
|
|
ee_start += ee_len - 1;
|
2013-12-20 22:29:35 +08:00
|
|
|
map->m_pblk = EXT4_PBLK_CMASK(sbi, ee_start) + c_offset;
|
2011-09-10 06:52:51 +08:00
|
|
|
map->m_len = min(map->m_len,
|
|
|
|
(unsigned) sbi->s_cluster_ratio - c_offset);
|
|
|
|
/*
|
|
|
|
* Check for and handle this case:
|
|
|
|
*
|
|
|
|
* |--------- cluster # N-------------|
|
|
|
|
* |------- extent ----|
|
|
|
|
* |--- requested region ---|
|
|
|
|
* |===========|
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (map->m_lblk < ee_block)
|
|
|
|
map->m_len = min(map->m_len, ee_block - map->m_lblk);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for the case where there is already another allocated
|
|
|
|
* block to the right of 'ex' but before the end of the cluster.
|
|
|
|
*
|
|
|
|
* |------------- cluster # N-------------|
|
|
|
|
* |----- ex -----| |---- ex_right ----|
|
|
|
|
* |------ requested region ------|
|
|
|
|
* |================|
|
|
|
|
*/
|
|
|
|
if (map->m_lblk > ee_block) {
|
|
|
|
ext4_lblk_t next = ext4_ext_next_allocated_block(path);
|
|
|
|
map->m_len = min(map->m_len, next - map->m_lblk);
|
|
|
|
}
|
2011-09-10 07:18:51 +08:00
|
|
|
|
|
|
|
trace_ext4_get_implied_cluster_alloc_exit(sb, map, 1);
|
2011-09-10 06:52:51 +08:00
|
|
|
return 1;
|
|
|
|
}
|
2011-09-10 07:18:51 +08:00
|
|
|
|
|
|
|
trace_ext4_get_implied_cluster_alloc_exit(sb, map, 0);
|
2011-09-10 06:52:51 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2008-01-29 12:58:27 +08:00
|
|
|
/*
|
2008-02-26 04:29:55 +08:00
|
|
|
* Block allocation/map/preallocation routine for extents based files
|
|
|
|
*
|
|
|
|
*
|
2008-01-29 12:58:27 +08:00
|
|
|
* Need to be called with
|
2008-01-29 12:58:26 +08:00
|
|
|
* down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
|
|
|
|
* (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
|
2008-02-26 04:29:55 +08:00
|
|
|
*
|
2020-08-05 10:48:50 +08:00
|
|
|
* return > 0, number of blocks already mapped/allocated
|
2008-02-26 04:29:55 +08:00
|
|
|
* if create == 0 and these are pre-allocated blocks
|
|
|
|
* buffer head is unmapped
|
|
|
|
* otherwise blocks are mapped
|
|
|
|
*
|
|
|
|
* return = 0, if plain look up failed (blocks have not been allocated)
|
|
|
|
* buffer head is unmapped
|
|
|
|
*
|
|
|
|
* return < 0, error case.
|
2008-01-29 12:58:27 +08:00
|
|
|
*/
|
2010-05-17 07:00:00 +08:00
|
|
|
int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
|
|
|
|
struct ext4_map_blocks *map, int flags)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct ext4_ext_path *path = NULL;
|
2020-10-28 13:56:17 +08:00
|
|
|
struct ext4_extent newex, *ex, ex2;
|
2011-09-10 06:52:51 +08:00
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
2020-05-10 23:58:05 +08:00
|
|
|
ext4_fsblk_t newblock = 0, pblk;
|
2020-03-12 04:50:33 +08:00
|
|
|
int err = 0, depth, ret;
|
2011-09-10 06:52:51 +08:00
|
|
|
unsigned int allocated = 0, offset = 0;
|
2011-10-29 21:23:38 +08:00
|
|
|
unsigned int allocated_clusters = 0;
|
2008-01-29 13:19:52 +08:00
|
|
|
struct ext4_allocation_request ar;
|
2011-09-10 06:52:51 +08:00
|
|
|
ext4_lblk_t cluster_offset;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "blocks %u/%u requested\n", map->m_lblk, map->m_len);
|
2011-03-22 09:38:05 +08:00
|
|
|
trace_ext4_ext_map_blocks_enter(inode, map->m_lblk, map->m_len, flags);
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
/* find extent for this block */
|
2014-09-02 02:43:09 +08:00
|
|
|
path = ext4_find_extent(inode, map->m_lblk, NULL, 0);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (IS_ERR(path)) {
|
|
|
|
err = PTR_ERR(path);
|
|
|
|
path = NULL;
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* consistent leaf must not be empty;
|
|
|
|
* this situation is possible, though, _during_ tree modification;
|
2014-09-02 02:43:09 +08:00
|
|
|
* this is why assert can't be put in ext4_find_extent()
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
2010-03-03 00:46:09 +08:00
|
|
|
if (unlikely(path[depth].p_ext == NULL && depth != 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode, "bad extent address "
|
2010-05-17 11:00:00 +08:00
|
|
|
"lblock: %lu, depth: %d pblock %lld",
|
|
|
|
(unsigned long) map->m_lblk, depth,
|
|
|
|
path[depth].p_block);
|
2015-10-18 04:16:04 +08:00
|
|
|
err = -EFSCORRUPTED;
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2009-12-14 22:53:52 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2006-12-07 12:41:33 +08:00
|
|
|
ex = path[depth].p_ext;
|
|
|
|
if (ex) {
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
|
2010-10-28 09:30:14 +08:00
|
|
|
ext4_fsblk_t ee_start = ext4_ext_pblock(ex);
|
2007-07-18 09:42:41 +08:00
|
|
|
unsigned short ee_len;
|
2006-10-11 16:21:06 +08:00
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
|
2006-10-11 16:21:06 +08:00
|
|
|
/*
|
2014-04-21 11:45:47 +08:00
|
|
|
* unwritten extents are treated as holes, except that
|
2007-07-18 09:42:38 +08:00
|
|
|
* we split out initialized portions during a write.
|
2006-10-11 16:21:06 +08:00
|
|
|
*/
|
2007-07-18 09:42:41 +08:00
|
|
|
ee_len = ext4_ext_get_actual_len(ex);
|
2011-09-10 07:18:51 +08:00
|
|
|
|
|
|
|
trace_ext4_ext_show_extent(inode, ee_block, ee_start, ee_len);
|
|
|
|
|
2006-10-11 16:21:07 +08:00
|
|
|
/* if found extent covers block, simply return it */
|
2010-05-17 07:00:00 +08:00
|
|
|
if (in_range(map->m_lblk, ee_block, ee_len)) {
|
|
|
|
newblock = map->m_lblk - ee_block + ee_start;
|
2006-10-11 16:21:07 +08:00
|
|
|
/* number of remaining blocks in the extent */
|
2010-05-17 07:00:00 +08:00
|
|
|
allocated = ee_len - (map->m_lblk - ee_block);
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "%u fit into %u:%d -> %llu\n",
|
|
|
|
map->m_lblk, ee_block, ee_len, newblock);
|
2007-07-18 09:42:38 +08:00
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
/*
|
|
|
|
* If the extent is initialized check whether the
|
|
|
|
* caller wants to convert it to unwritten.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if ((!ext4_ext_is_unwritten(ex)) &&
|
2014-03-19 06:05:35 +08:00
|
|
|
(flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN)) {
|
2020-02-19 04:26:56 +08:00
|
|
|
err = convert_initialized_extent(handle,
|
|
|
|
inode, map, &path, &allocated);
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2020-02-19 04:26:56 +08:00
|
|
|
} else if (!ext4_ext_is_unwritten(ex)) {
|
2020-05-10 23:58:05 +08:00
|
|
|
map->m_flags |= EXT4_MAP_MAPPED;
|
|
|
|
map->m_pblk = newblock;
|
|
|
|
if (allocated > map->m_len)
|
|
|
|
allocated = map->m_len;
|
|
|
|
map->m_len = allocated;
|
|
|
|
ext4_ext_show_leaf(inode, path);
|
2012-03-20 11:05:43 +08:00
|
|
|
goto out;
|
2020-02-19 04:26:56 +08:00
|
|
|
}
|
2013-02-18 13:31:07 +08:00
|
|
|
|
2014-04-21 11:45:47 +08:00
|
|
|
ret = ext4_ext_handle_unwritten_extents(
|
2014-09-02 02:37:09 +08:00
|
|
|
handle, inode, map, &path, flags,
|
2012-03-20 11:05:43 +08:00
|
|
|
allocated, newblock);
|
2014-02-20 07:52:39 +08:00
|
|
|
if (ret < 0)
|
|
|
|
err = ret;
|
|
|
|
else
|
|
|
|
allocated = ret;
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* requested block isn't allocated yet;
|
2006-10-11 16:21:03 +08:00
|
|
|
* we couldn't try to create block if create flag is zero
|
|
|
|
*/
|
2009-05-14 12:58:52 +08:00
|
|
|
if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
|
2016-03-10 11:46:57 +08:00
|
|
|
ext4_lblk_t hole_start, hole_len;
|
|
|
|
|
2016-03-10 11:54:00 +08:00
|
|
|
hole_start = map->m_lblk;
|
|
|
|
hole_len = ext4_ext_determine_hole(inode, path, &hole_start);
|
2007-07-18 09:42:38 +08:00
|
|
|
/*
|
|
|
|
* put just found gap into cache to speed up
|
|
|
|
* subsequent requests
|
|
|
|
*/
|
2016-03-10 11:46:57 +08:00
|
|
|
ext4_ext_put_gap_in_cache(inode, hole_start, hole_len);
|
2016-03-10 11:54:00 +08:00
|
|
|
|
|
|
|
/* Update hole_len to reflect hole size after map->m_lblk */
|
|
|
|
if (hole_start != map->m_lblk)
|
|
|
|
hole_len -= map->m_lblk - hole_start;
|
|
|
|
map->m_pblk = 0;
|
|
|
|
map->m_len = min_t(unsigned int, map->m_len, hole_len);
|
|
|
|
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
2011-09-10 06:52:51 +08:00
|
|
|
|
2006-10-11 16:21:03 +08:00
|
|
|
/*
|
2008-10-10 21:40:52 +08:00
|
|
|
* Okay, we need to do block allocation.
|
2006-10-11 16:21:24 +08:00
|
|
|
*/
|
2011-09-10 06:52:51 +08:00
|
|
|
newex.ee_block = cpu_to_le32(map->m_lblk);
|
2014-01-07 03:00:23 +08:00
|
|
|
cluster_offset = EXT4_LBLK_COFF(sbi, map->m_lblk);
|
2011-09-10 06:52:51 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are doing bigalloc, check to see if the extent returned
|
2014-09-02 02:43:09 +08:00
|
|
|
* by ext4_find_extent() implies a cluster we can use.
|
2011-09-10 06:52:51 +08:00
|
|
|
*/
|
|
|
|
if (cluster_offset && ex &&
|
2011-09-10 07:18:51 +08:00
|
|
|
get_implied_cluster_alloc(inode->i_sb, map, ex, path)) {
|
2011-09-10 06:52:51 +08:00
|
|
|
ar.len = allocated = map->m_len;
|
|
|
|
newblock = map->m_pblk;
|
|
|
|
goto got_allocated_blocks;
|
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2008-01-29 13:19:52 +08:00
|
|
|
/* find neighbour allocated blocks */
|
2010-05-17 07:00:00 +08:00
|
|
|
ar.lleft = map->m_lblk;
|
2008-01-29 13:19:52 +08:00
|
|
|
err = ext4_ext_search_left(inode, path, &ar.lleft, &ar.pleft);
|
|
|
|
if (err)
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2010-05-17 07:00:00 +08:00
|
|
|
ar.lright = map->m_lblk;
|
2011-09-10 06:52:51 +08:00
|
|
|
err = ext4_ext_search_right(inode, path, &ar.lright, &ar.pright, &ex2);
|
2020-10-28 13:56:17 +08:00
|
|
|
if (err < 0)
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2007-05-25 01:04:13 +08:00
|
|
|
|
2011-09-10 06:52:51 +08:00
|
|
|
/* Check if the extent after searching to the right implies a
|
|
|
|
* cluster we can use. */
|
2020-10-28 13:56:17 +08:00
|
|
|
if ((sbi->s_cluster_ratio > 1) && err &&
|
|
|
|
get_implied_cluster_alloc(inode->i_sb, map, &ex2, path)) {
|
2011-09-10 06:52:51 +08:00
|
|
|
ar.len = allocated = map->m_len;
|
|
|
|
newblock = map->m_pblk;
|
|
|
|
goto got_allocated_blocks;
|
|
|
|
}
|
|
|
|
|
2007-07-18 21:02:56 +08:00
|
|
|
/*
|
|
|
|
* See if request is beyond maximum number of blocks we can have in
|
|
|
|
* a single extent. For an initialized extent this limit is
|
2014-04-21 11:45:47 +08:00
|
|
|
* EXT_INIT_MAX_LEN and for an unwritten extent this limit is
|
|
|
|
* EXT_UNWRITTEN_MAX_LEN.
|
2007-07-18 21:02:56 +08:00
|
|
|
*/
|
2010-05-17 07:00:00 +08:00
|
|
|
if (map->m_len > EXT_INIT_MAX_LEN &&
|
2014-04-21 11:45:47 +08:00
|
|
|
!(flags & EXT4_GET_BLOCKS_UNWRIT_EXT))
|
2010-05-17 07:00:00 +08:00
|
|
|
map->m_len = EXT_INIT_MAX_LEN;
|
2014-04-21 11:45:47 +08:00
|
|
|
else if (map->m_len > EXT_UNWRITTEN_MAX_LEN &&
|
|
|
|
(flags & EXT4_GET_BLOCKS_UNWRIT_EXT))
|
|
|
|
map->m_len = EXT_UNWRITTEN_MAX_LEN;
|
2007-07-18 21:02:56 +08:00
|
|
|
|
2010-05-17 07:00:00 +08:00
|
|
|
/* Check if we can really insert (m_lblk)::(m_lblk + m_len) extent */
|
|
|
|
newex.ee_len = cpu_to_le16(map->m_len);
|
2011-09-10 06:52:51 +08:00
|
|
|
err = ext4_ext_check_overlap(sbi, inode, &newex, path);
|
2007-05-25 01:04:13 +08:00
|
|
|
if (err)
|
2008-01-29 12:58:27 +08:00
|
|
|
allocated = ext4_ext_get_actual_len(&newex);
|
2007-05-25 01:04:13 +08:00
|
|
|
else
|
2010-05-17 07:00:00 +08:00
|
|
|
allocated = map->m_len;
|
2008-01-29 13:19:52 +08:00
|
|
|
|
|
|
|
/* allocate new block */
|
|
|
|
ar.inode = inode;
|
2010-05-17 07:00:00 +08:00
|
|
|
ar.goal = ext4_ext_find_goal(inode, path, map->m_lblk);
|
|
|
|
ar.logical = map->m_lblk;
|
2011-09-10 06:52:51 +08:00
|
|
|
/*
|
|
|
|
* We calculate the offset from the beginning of the cluster
|
|
|
|
* for the logical block number, since when we allocate a
|
|
|
|
* physical cluster, the physical block should start at the
|
|
|
|
* same offset from the beginning of the cluster. This is
|
|
|
|
* needed so that future calls to get_implied_cluster_alloc()
|
|
|
|
* work correctly.
|
|
|
|
*/
|
2013-12-20 22:29:35 +08:00
|
|
|
offset = EXT4_LBLK_COFF(sbi, map->m_lblk);
|
2011-09-10 06:52:51 +08:00
|
|
|
ar.len = EXT4_NUM_B2C(sbi, offset+allocated);
|
|
|
|
ar.goal -= offset;
|
|
|
|
ar.logical -= offset;
|
2008-01-29 13:19:52 +08:00
|
|
|
if (S_ISREG(inode->i_mode))
|
|
|
|
ar.flags = EXT4_MB_HINT_DATA;
|
|
|
|
else
|
|
|
|
/* disable in-core preallocation for non-regular files */
|
|
|
|
ar.flags = 0;
|
2011-05-25 19:41:54 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_NO_NORMALIZE)
|
|
|
|
ar.flags |= EXT4_MB_HINT_NOPREALLOC;
|
2014-09-05 06:07:25 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
|
|
|
|
ar.flags |= EXT4_MB_DELALLOC_RESERVED;
|
2015-06-21 13:25:29 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_METADATA_NOFAIL)
|
|
|
|
ar.flags |= EXT4_MB_USE_RESERVED;
|
2008-01-29 13:19:52 +08:00
|
|
|
newblock = ext4_mb_new_blocks(handle, &ar, &err);
|
2006-10-11 16:21:03 +08:00
|
|
|
if (!newblock)
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2011-09-10 07:04:51 +08:00
|
|
|
allocated_clusters = ar.len;
|
2011-09-10 06:52:51 +08:00
|
|
|
ar.len = EXT4_C2B(sbi, ar.len) - offset;
|
2020-05-10 14:24:55 +08:00
|
|
|
ext_debug(inode, "allocate new block: goal %llu, found %llu/%u, requested %u\n",
|
2020-05-10 14:24:52 +08:00
|
|
|
ar.goal, newblock, ar.len, allocated);
|
2011-09-10 06:52:51 +08:00
|
|
|
if (ar.len > allocated)
|
|
|
|
ar.len = allocated;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2011-09-10 06:52:51 +08:00
|
|
|
got_allocated_blocks:
|
2006-10-11 16:21:03 +08:00
|
|
|
/* try to insert new extent into found leaf and return */
|
2020-05-10 23:58:05 +08:00
|
|
|
pblk = newblock + offset;
|
|
|
|
ext4_ext_store_pblock(&newex, pblk);
|
2008-01-29 13:19:52 +08:00
|
|
|
newex.ee_len = cpu_to_le16(ar.len);
|
2014-04-21 11:45:47 +08:00
|
|
|
/* Mark unwritten */
|
2020-03-12 04:50:33 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) {
|
2014-04-21 11:45:47 +08:00
|
|
|
ext4_ext_mark_unwritten(&newex);
|
2013-02-18 13:28:04 +08:00
|
|
|
map->m_flags |= EXT4_MAP_UNWRITTEN;
|
2009-09-29 03:48:29 +08:00
|
|
|
}
|
2010-02-24 22:52:53 +08:00
|
|
|
|
ext4: remove EXT4_EOFBLOCKS_FL and associated code
The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
contains unwritten blocks past i_size. It's set when ext4_fallocate
is called with the KEEP_SIZE flag to extend a file with an unwritten
extent. However, this flag hasn't been useful functionally since
March, 2012, when a decision was made to remove it from ext4.
All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
handling") at that time. Now that enough time has passed to make
e2fsprogs versions containing this modification common, this patch now
removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
well.
This change has two implications. First, because pre-1.42.2 e2fsck
versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
because that bit will never be set by newer kernels containing this
patch, old versions of e2fsck won't have a compatibility problem with
files created by newer kernels.
Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
belonging to a file written by an older kernel. If set, it will remain
in that state until the file is deleted. Because e2fsck versions since
1.42.2 don't check the flag at all, no adverse effect is expected.
However, pre-1.42.2 e2fsck versions that do check the flag may report
that it is set when it ought not to be after a file has been truncated
or had its unwritten blocks written. In this case, the old version of
e2fsck will offer to clear the flag. No adverse effect would then
occur whether the user chooses to clear the flag or not.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-12 05:02:16 +08:00
|
|
|
err = ext4_ext_insert_extent(handle, inode, &path, &newex, flags);
|
2020-03-12 04:50:33 +08:00
|
|
|
if (err) {
|
|
|
|
if (allocated_clusters) {
|
|
|
|
int fb_flags = 0;
|
2012-09-29 11:36:25 +08:00
|
|
|
|
2020-03-12 04:50:33 +08:00
|
|
|
/*
|
|
|
|
* free data blocks we just allocated.
|
|
|
|
* not a good idea to call discard here directly,
|
|
|
|
* but otherwise we'd need to call it every free().
|
|
|
|
*/
|
2020-08-17 15:36:15 +08:00
|
|
|
ext4_discard_preallocations(inode, 0);
|
2020-03-12 04:50:33 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
|
|
|
|
fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
|
|
|
|
ext4_free_blocks(handle, inode, NULL, newblock,
|
|
|
|
EXT4_C2B(sbi, allocated_clusters),
|
|
|
|
fb_flags);
|
|
|
|
}
|
2020-05-10 23:58:05 +08:00
|
|
|
goto out;
|
2007-05-25 01:04:25 +08:00
|
|
|
}
|
2006-10-11 16:21:03 +08:00
|
|
|
|
2010-01-25 17:00:31 +08:00
|
|
|
/*
|
2018-10-02 02:24:08 +08:00
|
|
|
* Reduce the reserved cluster count to reflect successful deferred
|
|
|
|
* allocation of delayed allocated clusters or direct allocation of
|
|
|
|
* clusters discovered to be delayed allocated. Once allocated, a
|
|
|
|
* cluster is not included in the reserved count.
|
2010-01-25 17:00:31 +08:00
|
|
|
*/
|
2020-03-12 04:51:25 +08:00
|
|
|
if (test_opt(inode->i_sb, DELALLOC) && allocated_clusters) {
|
2018-10-02 02:24:08 +08:00
|
|
|
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
|
2013-03-11 10:46:30 +08:00
|
|
|
/*
|
2018-10-02 02:24:08 +08:00
|
|
|
* When allocating delayed allocated clusters, simply
|
|
|
|
* reduce the reserved cluster count and claim quota
|
2013-03-11 10:46:30 +08:00
|
|
|
*/
|
|
|
|
ext4_da_update_reserve_space(inode, allocated_clusters,
|
|
|
|
1);
|
2018-10-02 02:24:08 +08:00
|
|
|
} else {
|
|
|
|
ext4_lblk_t lblk, len;
|
|
|
|
unsigned int n;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When allocating non-delayed allocated clusters
|
|
|
|
* (from fallocate, filemap, DIO, or clusters
|
|
|
|
* allocated when delalloc has been disabled by
|
|
|
|
* ext4_nonda_switch), reduce the reserved cluster
|
|
|
|
* count by the number of allocated clusters that
|
|
|
|
* have previously been delayed allocated. Quota
|
|
|
|
* has been claimed by ext4_mb_new_blocks() above,
|
|
|
|
* so release the quota reservations made for any
|
|
|
|
* previously delayed allocated clusters.
|
|
|
|
*/
|
|
|
|
lblk = EXT4_LBLK_CMASK(sbi, map->m_lblk);
|
|
|
|
len = allocated_clusters << sbi->s_cluster_bits;
|
|
|
|
n = ext4_es_delayed_clu(inode, lblk, len);
|
|
|
|
if (n > 0)
|
|
|
|
ext4_da_update_reserve_space(inode, (int) n, 0);
|
2011-09-10 07:04:51 +08:00
|
|
|
}
|
|
|
|
}
|
2010-01-25 17:00:31 +08:00
|
|
|
|
2009-12-09 12:51:10 +08:00
|
|
|
/*
|
|
|
|
* Cache the extent and update transaction to commit on fdatasync only
|
2014-04-21 11:45:47 +08:00
|
|
|
* when it is _not_ an unwritten extent.
|
2009-12-09 12:51:10 +08:00
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if ((flags & EXT4_GET_BLOCKS_UNWRIT_EXT) == 0)
|
2009-12-09 12:51:10 +08:00
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
2013-02-18 13:31:07 +08:00
|
|
|
else
|
2009-12-09 12:51:10 +08:00
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 0);
|
2020-05-10 23:58:05 +08:00
|
|
|
|
|
|
|
map->m_flags |= (EXT4_MAP_NEW | EXT4_MAP_MAPPED);
|
|
|
|
map->m_pblk = pblk;
|
|
|
|
map->m_len = ar.len;
|
|
|
|
allocated = map->m_len;
|
2006-10-11 16:21:03 +08:00
|
|
|
ext4_ext_show_leaf(inode, path);
|
2020-05-10 23:58:05 +08:00
|
|
|
out:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2011-05-25 19:41:46 +08:00
|
|
|
|
2013-07-16 22:28:47 +08:00
|
|
|
trace_ext4_ext_map_blocks_exit(inode, flags, map,
|
|
|
|
err ? err : allocated);
|
2012-03-20 11:05:43 +08:00
|
|
|
return err ? err : allocated;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
2016-11-14 11:02:28 +08:00
|
|
|
int ext4_ext_truncate(handle_t *handle, struct inode *inode)
|
2006-10-11 16:21:03 +08:00
|
|
|
{
|
|
|
|
struct super_block *sb = inode->i_sb;
|
2008-01-29 12:58:27 +08:00
|
|
|
ext4_lblk_t last_block;
|
2006-10-11 16:21:03 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/*
|
2006-10-11 16:21:07 +08:00
|
|
|
* TODO: optimization is possible here.
|
|
|
|
* Probably we need not scan at all,
|
|
|
|
* because page truncation is enough.
|
2006-10-11 16:21:03 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
/* we have to know where to truncate from in crash case */
|
|
|
|
EXT4_I(inode)->i_disksize = inode->i_size;
|
2016-11-14 11:02:28 +08:00
|
|
|
err = ext4_mark_inode_dirty(handle, inode);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2006-10-11 16:21:03 +08:00
|
|
|
|
|
|
|
last_block = (inode->i_size + sb->s_blocksize - 1)
|
|
|
|
>> EXT4_BLOCK_SIZE_BITS(sb);
|
2013-07-15 12:09:19 +08:00
|
|
|
retry:
|
2012-11-09 10:57:32 +08:00
|
|
|
err = ext4_es_remove_extent(inode, last_block,
|
|
|
|
EXT_MAX_BLOCKS - last_block);
|
2013-07-30 00:12:56 +08:00
|
|
|
if (err == -ENOMEM) {
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:07:14 +08:00
|
|
|
memalloc_retry_wait(GFP_ATOMIC);
|
2013-07-15 12:09:19 +08:00
|
|
|
goto retry;
|
|
|
|
}
|
2016-11-14 11:02:28 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
2020-05-08 01:50:28 +08:00
|
|
|
retry_remove_space:
|
|
|
|
err = ext4_ext_remove_space(inode, last_block, EXT_MAX_BLOCKS - 1);
|
|
|
|
if (err == -ENOMEM) {
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:07:14 +08:00
|
|
|
memalloc_retry_wait(GFP_ATOMIC);
|
2020-05-08 01:50:28 +08:00
|
|
|
goto retry_remove_space;
|
|
|
|
}
|
|
|
|
return err;
|
2006-10-11 16:21:03 +08:00
|
|
|
}
|
|
|
|
|
2014-03-19 06:03:51 +08:00
|
|
|
static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset,
|
2014-08-28 06:40:00 +08:00
|
|
|
ext4_lblk_t len, loff_t new_size,
|
2017-08-06 10:15:45 +08:00
|
|
|
int flags)
|
2014-03-19 06:03:51 +08:00
|
|
|
{
|
|
|
|
struct inode *inode = file_inode(file);
|
|
|
|
handle_t *handle;
|
2021-03-21 12:45:37 +08:00
|
|
|
int ret = 0, ret2 = 0, ret3 = 0;
|
2014-03-19 06:03:51 +08:00
|
|
|
int retries = 0;
|
2015-06-15 12:20:46 +08:00
|
|
|
int depth = 0;
|
2014-03-19 06:03:51 +08:00
|
|
|
struct ext4_map_blocks map;
|
|
|
|
unsigned int credits;
|
2014-08-28 06:40:00 +08:00
|
|
|
loff_t epos;
|
2014-03-19 06:03:51 +08:00
|
|
|
|
2016-09-15 23:52:07 +08:00
|
|
|
BUG_ON(!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS));
|
2014-03-19 06:03:51 +08:00
|
|
|
map.m_lblk = offset;
|
2014-08-28 06:40:00 +08:00
|
|
|
map.m_len = len;
|
2014-03-19 06:03:51 +08:00
|
|
|
/*
|
|
|
|
* Don't normalize the request if it can fit in one extent so
|
|
|
|
* that it doesn't get unnecessarily split into multiple
|
|
|
|
* extents.
|
|
|
|
*/
|
2014-04-21 11:45:47 +08:00
|
|
|
if (len <= EXT_UNWRITTEN_MAX_LEN)
|
2014-03-19 06:03:51 +08:00
|
|
|
flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* credits to insert 1 extent into extent tree
|
|
|
|
*/
|
|
|
|
credits = ext4_chunk_trans_blocks(inode, len);
|
2016-09-15 23:52:07 +08:00
|
|
|
depth = ext_depth(inode);
|
2014-03-19 06:03:51 +08:00
|
|
|
|
|
|
|
retry:
|
2021-01-14 06:14:03 +08:00
|
|
|
while (len) {
|
2015-06-15 12:20:46 +08:00
|
|
|
/*
|
|
|
|
* Recalculate credits when extent tree depth changes.
|
|
|
|
*/
|
2016-12-04 05:46:58 +08:00
|
|
|
if (depth != ext_depth(inode)) {
|
2015-06-15 12:20:46 +08:00
|
|
|
credits = ext4_chunk_trans_blocks(inode, len);
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
}
|
|
|
|
|
2014-03-19 06:03:51 +08:00
|
|
|
handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
|
|
|
|
credits);
|
|
|
|
if (IS_ERR(handle)) {
|
|
|
|
ret = PTR_ERR(handle);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
ret = ext4_map_blocks(handle, inode, &map, flags);
|
|
|
|
if (ret <= 0) {
|
|
|
|
ext4_debug("inode #%lu: block %u: len %u: "
|
|
|
|
"ext4_ext_map_blocks returned %d",
|
|
|
|
inode->i_ino, map.m_lblk,
|
|
|
|
map.m_len, ret);
|
|
|
|
ext4_mark_inode_dirty(handle, inode);
|
2021-01-14 06:14:03 +08:00
|
|
|
ext4_journal_stop(handle);
|
2014-03-19 06:03:51 +08:00
|
|
|
break;
|
|
|
|
}
|
2021-01-14 06:14:03 +08:00
|
|
|
/*
|
|
|
|
* allow a full retry cycle for any remaining allocations
|
|
|
|
*/
|
|
|
|
retries = 0;
|
2014-08-28 06:40:00 +08:00
|
|
|
map.m_lblk += ret;
|
|
|
|
map.m_len = len = len - ret;
|
|
|
|
epos = (loff_t)map.m_lblk << inode->i_blkbits;
|
2016-11-15 10:40:10 +08:00
|
|
|
inode->i_ctime = current_time(inode);
|
2014-08-28 06:40:00 +08:00
|
|
|
if (new_size) {
|
|
|
|
if (epos > new_size)
|
|
|
|
epos = new_size;
|
|
|
|
if (ext4_update_inode_size(inode, epos) & 0x1)
|
|
|
|
inode->i_mtime = inode->i_ctime;
|
|
|
|
}
|
2020-04-27 09:34:37 +08:00
|
|
|
ret2 = ext4_mark_inode_dirty(handle, inode);
|
2017-12-04 11:52:51 +08:00
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
2020-04-27 09:34:37 +08:00
|
|
|
ret3 = ext4_journal_stop(handle);
|
|
|
|
ret2 = ret3 ? ret3 : ret2;
|
|
|
|
if (unlikely(ret2))
|
2014-03-19 06:03:51 +08:00
|
|
|
break;
|
|
|
|
}
|
2021-01-14 06:14:03 +08:00
|
|
|
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
|
2014-03-19 06:03:51 +08:00
|
|
|
goto retry;
|
|
|
|
|
|
|
|
return ret > 0 ? ret2 : ret;
|
|
|
|
}
|
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len);
|
2020-01-01 02:04:40 +08:00
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
static int ext4_insert_range(struct file *file, loff_t offset, loff_t len);
|
2020-01-01 02:04:40 +08:00
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
static long ext4_zero_range(struct file *file, loff_t offset,
|
|
|
|
loff_t len, int mode)
|
|
|
|
{
|
|
|
|
struct inode *inode = file_inode(file);
|
2021-02-05 01:05:42 +08:00
|
|
|
struct address_space *mapping = file->f_mapping;
|
2014-03-19 06:05:35 +08:00
|
|
|
handle_t *handle = NULL;
|
|
|
|
unsigned int max_blocks;
|
|
|
|
loff_t new_size = 0;
|
|
|
|
int ret = 0;
|
|
|
|
int flags;
|
2014-08-28 06:33:49 +08:00
|
|
|
int credits;
|
2014-08-28 06:40:00 +08:00
|
|
|
int partial_begin, partial_end;
|
2014-03-19 06:05:35 +08:00
|
|
|
loff_t start, end;
|
|
|
|
ext4_lblk_t lblk;
|
|
|
|
unsigned int blkbits = inode->i_blkbits;
|
|
|
|
|
|
|
|
trace_ext4_zero_range(inode, offset, len, mode);
|
|
|
|
|
2014-05-28 00:48:55 +08:00
|
|
|
/* Call ext4_force_commit to flush all data in case of data=journal. */
|
|
|
|
if (ext4_should_journal_data(inode)) {
|
|
|
|
ret = ext4_force_commit(inode->i_sb);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-03-19 06:05:35 +08:00
|
|
|
/*
|
2020-06-11 11:19:46 +08:00
|
|
|
* Round up offset. This is not fallocate, we need to zero out
|
2014-03-19 06:05:35 +08:00
|
|
|
* blocks, so convert interior block aligned part of the range to
|
|
|
|
* unwritten and possibly manually zero out unaligned parts of the
|
|
|
|
* range.
|
|
|
|
*/
|
|
|
|
start = round_up(offset, 1 << blkbits);
|
|
|
|
end = round_down((offset + len), 1 << blkbits);
|
|
|
|
|
|
|
|
if (start < offset || end > offset + len)
|
|
|
|
return -EINVAL;
|
2014-08-28 06:40:00 +08:00
|
|
|
partial_begin = offset & ((1 << blkbits) - 1);
|
|
|
|
partial_end = (offset + len) & ((1 << blkbits) - 1);
|
2014-03-19 06:05:35 +08:00
|
|
|
|
|
|
|
lblk = start >> blkbits;
|
|
|
|
max_blocks = (end >> blkbits);
|
|
|
|
if (max_blocks < lblk)
|
|
|
|
max_blocks = 0;
|
|
|
|
else
|
|
|
|
max_blocks -= lblk;
|
|
|
|
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_lock(inode);
|
2014-03-19 06:05:35 +08:00
|
|
|
|
|
|
|
/*
|
2020-05-04 04:06:47 +08:00
|
|
|
* Indirect files do not support unwritten extents
|
2014-03-19 06:05:35 +08:00
|
|
|
*/
|
|
|
|
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
|
2020-01-01 02:04:38 +08:00
|
|
|
(offset + len > inode->i_size ||
|
ext4: fix interaction between i_size, fallocate, and delalloc after a crash
If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).
If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:
Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)
This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.
Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
2017-10-07 11:09:55 +08:00
|
|
|
offset + len > EXT4_I(inode)->i_disksize)) {
|
2014-03-19 06:05:35 +08:00
|
|
|
new_size = offset + len;
|
|
|
|
ret = inode_newsize_ok(inode, new_size);
|
|
|
|
if (ret)
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
2015-04-03 12:09:13 +08:00
|
|
|
flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
|
|
|
|
|
2022-01-21 15:06:11 +08:00
|
|
|
/* Wait all existing dio workers, newcomers will block on i_rwsem */
|
2015-12-08 03:29:17 +08:00
|
|
|
inode_dio_wait(inode);
|
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = file_modified(file);
|
|
|
|
if (ret)
|
|
|
|
goto out_mutex;
|
|
|
|
|
2015-04-03 12:09:13 +08:00
|
|
|
/* Preallocate the range including the unaligned edges */
|
|
|
|
if (partial_begin || partial_end) {
|
|
|
|
ret = ext4_alloc_file_blocks(file,
|
|
|
|
round_down(offset, 1 << blkbits) >> blkbits,
|
|
|
|
(round_up((offset + len), 1 << blkbits) -
|
|
|
|
round_down(offset, 1 << blkbits)) >> blkbits,
|
2017-08-06 10:15:45 +08:00
|
|
|
new_size, flags);
|
2015-04-03 12:09:13 +08:00
|
|
|
if (ret)
|
2018-03-22 23:52:10 +08:00
|
|
|
goto out_mutex;
|
2015-04-03 12:09:13 +08:00
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Zero range excluding the unaligned edges */
|
2014-03-19 06:05:35 +08:00
|
|
|
if (max_blocks > 0) {
|
2015-04-03 12:09:13 +08:00
|
|
|
flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
|
|
|
|
EXT4_EX_NOCACHE);
|
2014-03-19 06:05:35 +08:00
|
|
|
|
2015-12-08 03:28:03 +08:00
|
|
|
/*
|
|
|
|
* Prevent page faults from reinstantiating pages we have
|
|
|
|
* released from page cache.
|
|
|
|
*/
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_lock(mapping);
|
2018-07-30 05:00:22 +08:00
|
|
|
|
|
|
|
ret = ext4_break_layouts(inode);
|
|
|
|
if (ret) {
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_unlock(mapping);
|
2018-07-30 05:00:22 +08:00
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
2015-12-08 03:34:49 +08:00
|
|
|
ret = ext4_update_disksize_before_punch(inode, offset, len);
|
|
|
|
if (ret) {
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_unlock(mapping);
|
2018-03-22 23:52:10 +08:00
|
|
|
goto out_mutex;
|
2015-12-08 03:34:49 +08:00
|
|
|
}
|
2015-12-08 03:28:03 +08:00
|
|
|
/* Now release the pages and zero block aligned part of pages */
|
|
|
|
truncate_pagecache_range(inode, start, end - 1);
|
2016-11-15 10:40:10 +08:00
|
|
|
inode->i_mtime = inode->i_ctime = current_time(inode);
|
2015-12-08 03:28:03 +08:00
|
|
|
|
2014-09-02 02:32:09 +08:00
|
|
|
ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
|
2017-08-06 10:15:45 +08:00
|
|
|
flags);
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_unlock(mapping);
|
2014-09-02 02:32:09 +08:00
|
|
|
if (ret)
|
2018-03-22 23:52:10 +08:00
|
|
|
goto out_mutex;
|
2014-03-19 06:05:35 +08:00
|
|
|
}
|
2014-08-28 06:40:00 +08:00
|
|
|
if (!partial_begin && !partial_end)
|
2018-03-22 23:52:10 +08:00
|
|
|
goto out_mutex;
|
2014-08-28 06:40:00 +08:00
|
|
|
|
2014-08-28 06:33:49 +08:00
|
|
|
/*
|
|
|
|
* In worst case we have to writeout two nonadjacent unwritten
|
|
|
|
* blocks and update the inode
|
|
|
|
*/
|
|
|
|
credits = (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1;
|
|
|
|
if (ext4_should_journal_data(inode))
|
|
|
|
credits += 2;
|
|
|
|
handle = ext4_journal_start(inode, EXT4_HT_MISC, credits);
|
2014-03-19 06:05:35 +08:00
|
|
|
if (IS_ERR(handle)) {
|
|
|
|
ret = PTR_ERR(handle);
|
|
|
|
ext4_std_error(inode->i_sb, ret);
|
2018-03-22 23:52:10 +08:00
|
|
|
goto out_mutex;
|
2014-03-19 06:05:35 +08:00
|
|
|
}
|
|
|
|
|
2016-11-15 10:40:10 +08:00
|
|
|
inode->i_mtime = inode->i_ctime = current_time(inode);
|
ext4: remove EXT4_EOFBLOCKS_FL and associated code
The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file
contains unwritten blocks past i_size. It's set when ext4_fallocate
is called with the KEEP_SIZE flag to extend a file with an unwritten
extent. However, this flag hasn't been useful functionally since
March, 2012, when a decision was made to remove it from ext4.
All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version
1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag
handling") at that time. Now that enough time has passed to make
e2fsprogs versions containing this modification common, this patch now
removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as
well.
This change has two implications. First, because pre-1.42.2 e2fsck
versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and
because that bit will never be set by newer kernels containing this
patch, old versions of e2fsck won't have a compatibility problem with
files created by newer kernels.
Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits
belonging to a file written by an older kernel. If set, it will remain
in that state until the file is deleted. Because e2fsck versions since
1.42.2 don't check the flag at all, no adverse effect is expected.
However, pre-1.42.2 e2fsck versions that do check the flag may report
that it is set when it ought not to be after a file has been truncated
or had its unwritten blocks written. In this case, the old version of
e2fsck will offer to clear the flag. No adverse effect would then
occur whether the user chooses to clear the flag or not.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-12 05:02:16 +08:00
|
|
|
if (new_size)
|
2014-08-24 05:48:28 +08:00
|
|
|
ext4_update_inode_size(inode, new_size);
|
2020-04-27 09:34:37 +08:00
|
|
|
ret = ext4_mark_inode_dirty(handle, inode);
|
|
|
|
if (unlikely(ret))
|
|
|
|
goto out_handle;
|
2014-03-19 06:05:35 +08:00
|
|
|
/* Zero out partial block at the edges of the range */
|
|
|
|
ret = ext4_zero_partial_blocks(handle, inode, offset, len);
|
2017-05-30 01:24:55 +08:00
|
|
|
if (ret >= 0)
|
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
2014-03-19 06:05:35 +08:00
|
|
|
|
|
|
|
if (file->f_flags & O_SYNC)
|
|
|
|
ext4_handle_sync(handle);
|
|
|
|
|
2020-04-27 09:34:37 +08:00
|
|
|
out_handle:
|
2014-03-19 06:05:35 +08:00
|
|
|
ext4_journal_stop(handle);
|
|
|
|
out_mutex:
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_unlock(inode);
|
2014-03-19 06:05:35 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2007-07-18 09:42:41 +08:00
|
|
|
/*
|
2011-01-14 20:07:43 +08:00
|
|
|
* preallocate space for a file. This implements ext4's fallocate file
|
2007-07-18 09:42:41 +08:00
|
|
|
* operation, which gets called from sys_fallocate system call.
|
|
|
|
* For block-mapped files, posix_fallocate should fall back to the method
|
|
|
|
* of writing zeroes to the required new blocks (the same behavior which is
|
|
|
|
* expected for file systems which do not support fallocate() system call).
|
|
|
|
*/
|
2011-01-14 20:07:43 +08:00
|
|
|
long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
|
2007-07-18 09:42:41 +08:00
|
|
|
{
|
2013-01-24 06:07:38 +08:00
|
|
|
struct inode *inode = file_inode(file);
|
2014-03-19 05:44:35 +08:00
|
|
|
loff_t new_size = 0;
|
2008-11-05 13:14:04 +08:00
|
|
|
unsigned int max_blocks;
|
2007-07-18 09:42:41 +08:00
|
|
|
int ret = 0;
|
2011-10-25 20:15:12 +08:00
|
|
|
int flags;
|
2014-03-19 06:03:51 +08:00
|
|
|
ext4_lblk_t lblk;
|
|
|
|
unsigned int blkbits = inode->i_blkbits;
|
2007-07-18 09:42:41 +08:00
|
|
|
|
2015-04-12 12:55:10 +08:00
|
|
|
/*
|
|
|
|
* Encrypted inodes can't handle collapse range or insert
|
|
|
|
* range since we would need to re-encrypt blocks with a
|
|
|
|
* different IV or XTS tweak (which are based on the logical
|
|
|
|
* block number).
|
|
|
|
*/
|
2018-12-12 17:50:10 +08:00
|
|
|
if (IS_ENCRYPTED(inode) &&
|
ext4: allow ZERO_RANGE on encrypted files
When ext4 encryption support was first added, ZERO_RANGE was disallowed,
supposedly because test failures (e.g. ext4/001) were seen when enabling
it, and at the time there wasn't enough time/interest to debug it.
However, there's actually no reason why ZERO_RANGE can't work on
encrypted files. And it fact it *does* work now. Whole blocks in the
zeroed range are converted to unwritten extents, as usual; encryption
makes no difference for that part. Partial blocks are zeroed in the
pagecache and then ->writepages() encrypts those blocks as usual.
ext4_block_zero_page_range() handles reading and decrypting the block if
needed before actually doing the pagecache write.
Also, f2fs has always supported ZERO_RANGE on encrypted files.
As far as I can tell, the reason that ext4/001 was failing in v4.1 was
actually because of one of the bugs fixed by commit 36086d43f657 ("ext4
crypto: fix bugs in ext4_encrypted_zeroout()"). The bug made
ext4_encrypted_zeroout() always return a positive value, which caused
unwritten extents in encrypted files to sometimes not be marked as
initialized after being written to. This bug was not actually in
ZERO_RANGE; it just happened to trigger during the extents manipulation
done in ext4/001 (and probably other tests too).
So, let's enable ZERO_RANGE on encrypted files on ext4.
Tested with:
gce-xfstests -c ext4/encrypt -g auto
gce-xfstests -c ext4/encrypt_1k -g auto
Got the same set of test failures both with and without this patch.
But with this patch 6 fewer tests are skipped: ext4/001, generic/008,
generic/009, generic/033, generic/096, and generic/511.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191226154216.4808-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26 23:42:16 +08:00
|
|
|
(mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE)))
|
2015-04-12 12:55:10 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2011-05-25 19:41:50 +08:00
|
|
|
/* Return error if mode is not supported */
|
2014-02-24 04:18:59 +08:00
|
|
|
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
|
2015-06-09 13:55:03 +08:00
|
|
|
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
|
|
|
|
FALLOC_FL_INSERT_RANGE))
|
2011-05-25 19:41:50 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2022-04-28 21:40:31 +08:00
|
|
|
inode_lock(inode);
|
|
|
|
ret = ext4_convert_inline_data(inode);
|
|
|
|
inode_unlock(inode);
|
|
|
|
if (ret)
|
|
|
|
goto exit;
|
|
|
|
|
2020-10-16 04:37:57 +08:00
|
|
|
if (mode & FALLOC_FL_PUNCH_HOLE) {
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = ext4_punch_hole(file, offset, len);
|
2020-10-16 04:37:57 +08:00
|
|
|
goto exit;
|
|
|
|
}
|
2011-05-25 19:41:50 +08:00
|
|
|
|
2020-10-16 04:37:57 +08:00
|
|
|
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = ext4_collapse_range(file, offset, len);
|
2020-10-16 04:37:57 +08:00
|
|
|
goto exit;
|
|
|
|
}
|
2015-06-09 13:55:03 +08:00
|
|
|
|
2020-10-16 04:37:57 +08:00
|
|
|
if (mode & FALLOC_FL_INSERT_RANGE) {
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = ext4_insert_range(file, offset, len);
|
2020-10-16 04:37:57 +08:00
|
|
|
goto exit;
|
|
|
|
}
|
2014-03-19 06:05:35 +08:00
|
|
|
|
2020-10-16 04:37:57 +08:00
|
|
|
if (mode & FALLOC_FL_ZERO_RANGE) {
|
|
|
|
ret = ext4_zero_range(file, offset, len, mode);
|
|
|
|
goto exit;
|
|
|
|
}
|
2011-03-22 09:38:05 +08:00
|
|
|
trace_ext4_fallocate_enter(inode, offset, len, mode);
|
2014-03-19 06:03:51 +08:00
|
|
|
lblk = offset >> blkbits;
|
|
|
|
|
2016-09-15 23:55:01 +08:00
|
|
|
max_blocks = EXT4_MAX_BLOCKS(len, offset, blkbits);
|
2014-04-21 11:45:47 +08:00
|
|
|
flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
|
2014-03-19 06:03:51 +08:00
|
|
|
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_lock(inode);
|
2014-03-19 05:44:35 +08:00
|
|
|
|
2015-05-03 11:21:15 +08:00
|
|
|
/*
|
|
|
|
* We only support preallocation for extent-based files only
|
|
|
|
*/
|
|
|
|
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2014-03-19 05:44:35 +08:00
|
|
|
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
|
2020-01-01 02:04:38 +08:00
|
|
|
(offset + len > inode->i_size ||
|
ext4: fix interaction between i_size, fallocate, and delalloc after a crash
If there are pending writes subject to delayed allocation, then i_size
will show size after the writes have completed, while i_disksize
contains the value of i_size on the disk (since the writes have not
been persisted to disk).
If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
after the fallocate(2) is between i_size and i_disksize, then after a
crash, if a journal commit has resulted in the changes made by the
fallocate() call to be persisted after a crash, but the delayed
allocation write has not resolved itself, i_size would not be updated,
and this would cause the following e2fsck complaint:
Inode 12, end of extent exceeds allowed value
(logical block 33, physical block 33441, len 7)
This can only take place on a sparse file, where the fallocate(2) call
is allocating blocks in a range which is before a pending delayed
allocation write which is extending i_size. Since this situation is
quite rare, and the window in which the crash must take place is
typically < 30 seconds, in practice this condition will rarely happen.
Nevertheless, it can be triggered in testing, and in particular by
xfstests generic/456.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
2017-10-07 11:09:55 +08:00
|
|
|
offset + len > EXT4_I(inode)->i_disksize)) {
|
2014-03-19 05:44:35 +08:00
|
|
|
new_size = offset + len;
|
|
|
|
ret = inode_newsize_ok(inode, new_size);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
2010-05-17 02:00:00 +08:00
|
|
|
}
|
2014-03-19 05:44:35 +08:00
|
|
|
|
2022-01-21 15:06:11 +08:00
|
|
|
/* Wait all existing dio workers, newcomers will block on i_rwsem */
|
2015-12-08 03:29:17 +08:00
|
|
|
inode_dio_wait(inode);
|
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = file_modified(file);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
|
2017-08-06 10:15:45 +08:00
|
|
|
ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags);
|
2014-03-19 06:03:51 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
2014-03-19 05:44:35 +08:00
|
|
|
|
2014-08-28 06:40:00 +08:00
|
|
|
if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
|
2020-10-16 04:37:57 +08:00
|
|
|
ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
|
|
|
|
EXT4_I(inode)->i_sync_tid);
|
2014-03-19 05:44:35 +08:00
|
|
|
}
|
|
|
|
out:
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_unlock(inode);
|
2014-03-19 06:03:51 +08:00
|
|
|
trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
|
2020-10-16 04:37:57 +08:00
|
|
|
exit:
|
2014-03-19 06:03:51 +08:00
|
|
|
return ret;
|
2007-07-18 09:42:41 +08:00
|
|
|
}
|
2008-10-07 12:46:36 +08:00
|
|
|
|
2009-09-29 03:49:08 +08:00
|
|
|
/*
|
|
|
|
* This function convert a range of blocks to written extents
|
|
|
|
* The caller of this function will pass the start offset and the size.
|
|
|
|
* all unwritten extents within this range will be converted to
|
|
|
|
* written extents.
|
|
|
|
*
|
|
|
|
* This function is called from the direct IO end io call back
|
|
|
|
* function, to convert the fallocated extents after IO is completed.
|
2009-11-10 23:48:08 +08:00
|
|
|
* Returns 0 on success.
|
2009-09-29 03:49:08 +08:00
|
|
|
*/
|
2013-06-05 01:21:11 +08:00
|
|
|
int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
|
|
|
|
loff_t offset, ssize_t len)
|
2009-09-29 03:49:08 +08:00
|
|
|
{
|
|
|
|
unsigned int max_blocks;
|
2020-04-27 09:34:37 +08:00
|
|
|
int ret = 0, ret2 = 0, ret3 = 0;
|
2010-05-17 08:00:00 +08:00
|
|
|
struct ext4_map_blocks map;
|
2019-10-16 15:37:08 +08:00
|
|
|
unsigned int blkbits = inode->i_blkbits;
|
|
|
|
unsigned int credits = 0;
|
2009-09-29 03:49:08 +08:00
|
|
|
|
2010-05-17 08:00:00 +08:00
|
|
|
map.m_lblk = offset >> blkbits;
|
2016-09-15 23:55:01 +08:00
|
|
|
max_blocks = EXT4_MAX_BLOCKS(len, offset, blkbits);
|
|
|
|
|
2019-10-16 15:37:08 +08:00
|
|
|
if (!handle) {
|
2013-06-05 01:21:11 +08:00
|
|
|
/*
|
|
|
|
* credits to insert 1 extent into extent tree
|
|
|
|
*/
|
|
|
|
credits = ext4_chunk_trans_blocks(inode, max_blocks);
|
|
|
|
}
|
2009-09-29 03:49:08 +08:00
|
|
|
while (ret >= 0 && ret < max_blocks) {
|
2010-05-17 08:00:00 +08:00
|
|
|
map.m_lblk += ret;
|
|
|
|
map.m_len = (max_blocks -= ret);
|
2013-06-05 01:21:11 +08:00
|
|
|
if (credits) {
|
|
|
|
handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
|
|
|
|
credits);
|
|
|
|
if (IS_ERR(handle)) {
|
|
|
|
ret = PTR_ERR(handle);
|
|
|
|
break;
|
|
|
|
}
|
2009-09-29 03:49:08 +08:00
|
|
|
}
|
2010-05-17 08:00:00 +08:00
|
|
|
ret = ext4_map_blocks(handle, inode, &map,
|
2010-03-03 02:28:44 +08:00
|
|
|
EXT4_GET_BLOCKS_IO_CONVERT_EXT);
|
2013-01-29 10:21:12 +08:00
|
|
|
if (ret <= 0)
|
|
|
|
ext4_warning(inode->i_sb,
|
|
|
|
"inode #%lu: block %u: len %u: "
|
|
|
|
"ext4_ext_map_blocks returned %d",
|
|
|
|
inode->i_ino, map.m_lblk,
|
|
|
|
map.m_len, ret);
|
2020-04-27 09:34:37 +08:00
|
|
|
ret2 = ext4_mark_inode_dirty(handle, inode);
|
|
|
|
if (credits) {
|
|
|
|
ret3 = ext4_journal_stop(handle);
|
|
|
|
if (unlikely(ret3))
|
|
|
|
ret2 = ret3;
|
|
|
|
}
|
|
|
|
|
2013-06-05 01:21:11 +08:00
|
|
|
if (ret <= 0 || ret2)
|
2009-09-29 03:49:08 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
return ret > 0 ? ret2 : ret;
|
|
|
|
}
|
2011-02-28 06:25:47 +08:00
|
|
|
|
2019-10-16 15:37:08 +08:00
|
|
|
int ext4_convert_unwritten_io_end_vec(handle_t *handle, ext4_io_end_t *io_end)
|
|
|
|
{
|
2020-10-08 23:02:48 +08:00
|
|
|
int ret = 0, err = 0;
|
2019-10-16 15:37:10 +08:00
|
|
|
struct ext4_io_end_vec *io_end_vec;
|
2019-10-16 15:37:08 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This is somewhat ugly but the idea is clear: When transaction is
|
|
|
|
* reserved, everything goes into it. Otherwise we rather start several
|
|
|
|
* smaller transactions for conversion of each extent separately.
|
|
|
|
*/
|
|
|
|
if (handle) {
|
|
|
|
handle = ext4_journal_start_reserved(handle,
|
|
|
|
EXT4_HT_EXT_CONVERT);
|
|
|
|
if (IS_ERR(handle))
|
|
|
|
return PTR_ERR(handle);
|
|
|
|
}
|
|
|
|
|
2019-10-16 15:37:10 +08:00
|
|
|
list_for_each_entry(io_end_vec, &io_end->list_vec, list) {
|
|
|
|
ret = ext4_convert_unwritten_extents(handle, io_end->inode,
|
|
|
|
io_end_vec->offset,
|
|
|
|
io_end_vec->size);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-10-16 15:37:08 +08:00
|
|
|
if (handle)
|
|
|
|
err = ext4_journal_stop(handle);
|
|
|
|
|
|
|
|
return ret < 0 ? ret : err;
|
|
|
|
}
|
|
|
|
|
2020-02-28 17:26:58 +08:00
|
|
|
static int ext4_iomap_xattr_fiemap(struct inode *inode, struct iomap *iomap)
|
2008-10-07 12:46:36 +08:00
|
|
|
{
|
|
|
|
__u64 physical = 0;
|
2020-02-28 17:26:58 +08:00
|
|
|
__u64 length = 0;
|
2008-10-07 12:46:36 +08:00
|
|
|
int blockbits = inode->i_sb->s_blocksize_bits;
|
|
|
|
int error = 0;
|
2020-02-28 17:26:58 +08:00
|
|
|
u16 iomap_type;
|
2008-10-07 12:46:36 +08:00
|
|
|
|
|
|
|
/* in-inode? */
|
2010-01-25 03:34:07 +08:00
|
|
|
if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
|
2008-10-07 12:46:36 +08:00
|
|
|
struct ext4_iloc iloc;
|
|
|
|
int offset; /* offset of xattr in inode */
|
|
|
|
|
|
|
|
error = ext4_get_inode_loc(inode, &iloc);
|
|
|
|
if (error)
|
|
|
|
return error;
|
2013-06-01 07:38:56 +08:00
|
|
|
physical = (__u64)iloc.bh->b_blocknr << blockbits;
|
2008-10-07 12:46:36 +08:00
|
|
|
offset = EXT4_GOOD_OLD_INODE_SIZE +
|
|
|
|
EXT4_I(inode)->i_extra_isize;
|
|
|
|
physical += offset;
|
|
|
|
length = EXT4_SB(inode->i_sb)->s_inode_size - offset;
|
2010-04-04 05:44:16 +08:00
|
|
|
brelse(iloc.bh);
|
2020-02-28 17:26:58 +08:00
|
|
|
iomap_type = IOMAP_INLINE;
|
|
|
|
} else if (EXT4_I(inode)->i_file_acl) { /* external block */
|
2013-06-01 07:38:56 +08:00
|
|
|
physical = (__u64)EXT4_I(inode)->i_file_acl << blockbits;
|
2008-10-07 12:46:36 +08:00
|
|
|
length = inode->i_sb->s_blocksize;
|
2020-02-28 17:26:58 +08:00
|
|
|
iomap_type = IOMAP_MAPPED;
|
|
|
|
} else {
|
|
|
|
/* no in-inode or external block for xattr, so return -ENOENT */
|
|
|
|
error = -ENOENT;
|
|
|
|
goto out;
|
2008-10-07 12:46:36 +08:00
|
|
|
}
|
|
|
|
|
2020-02-28 17:26:58 +08:00
|
|
|
iomap->addr = physical;
|
|
|
|
iomap->offset = 0;
|
|
|
|
iomap->length = length;
|
|
|
|
iomap->type = iomap_type;
|
|
|
|
iomap->flags = 0;
|
|
|
|
out:
|
|
|
|
return error;
|
2008-10-07 12:46:36 +08:00
|
|
|
}
|
|
|
|
|
2020-02-28 17:26:58 +08:00
|
|
|
static int ext4_iomap_xattr_begin(struct inode *inode, loff_t offset,
|
|
|
|
loff_t length, unsigned flags,
|
|
|
|
struct iomap *iomap, struct iomap *srcmap)
|
2008-10-07 12:46:36 +08:00
|
|
|
{
|
2020-02-28 17:26:58 +08:00
|
|
|
int error;
|
2019-08-12 04:32:41 +08:00
|
|
|
|
2020-02-28 17:26:58 +08:00
|
|
|
error = ext4_iomap_xattr_fiemap(inode, iomap);
|
|
|
|
if (error == 0 && (offset >= iomap->length))
|
|
|
|
error = -ENOENT;
|
|
|
|
return error;
|
|
|
|
}
|
2012-12-11 03:06:02 +08:00
|
|
|
|
2020-02-28 17:26:58 +08:00
|
|
|
static const struct iomap_ops ext4_iomap_xattr_ops = {
|
|
|
|
.iomap_begin = ext4_iomap_xattr_begin,
|
|
|
|
};
|
2012-12-11 03:06:02 +08:00
|
|
|
|
2020-05-05 23:43:15 +08:00
|
|
|
static int ext4_fiemap_check_ranges(struct inode *inode, u64 start, u64 *len)
|
|
|
|
{
|
|
|
|
u64 maxbytes;
|
|
|
|
|
|
|
|
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
|
|
|
|
maxbytes = inode->i_sb->s_maxbytes;
|
|
|
|
else
|
|
|
|
maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes;
|
|
|
|
|
|
|
|
if (*len == 0)
|
|
|
|
return -EINVAL;
|
|
|
|
if (start > maxbytes)
|
|
|
|
return -EFBIG;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shrink request scope to what the fs can actually handle.
|
|
|
|
*/
|
|
|
|
if (*len > maxbytes || (maxbytes - *len) < start)
|
|
|
|
*len = maxbytes - start;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-05-23 15:30:08 +08:00
|
|
|
int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
|
|
|
|
u64 start, u64 len)
|
2020-02-28 17:26:58 +08:00
|
|
|
{
|
|
|
|
int error = 0;
|
2012-12-11 03:06:02 +08:00
|
|
|
|
2013-08-17 10:05:14 +08:00
|
|
|
if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
|
|
|
|
error = ext4_ext_precache(inode);
|
|
|
|
if (error)
|
|
|
|
return error;
|
2019-08-12 04:32:41 +08:00
|
|
|
fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE;
|
2013-08-17 10:05:14 +08:00
|
|
|
}
|
|
|
|
|
2020-05-05 23:43:15 +08:00
|
|
|
/*
|
|
|
|
* For bitmap files the maximum size limit could be smaller than
|
|
|
|
* s_maxbytes, so check len here manually instead of just relying on the
|
|
|
|
* generic check.
|
|
|
|
*/
|
|
|
|
error = ext4_fiemap_check_ranges(inode, start, &len);
|
|
|
|
if (error)
|
|
|
|
return error;
|
|
|
|
|
2008-10-07 12:46:36 +08:00
|
|
|
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
|
2020-02-28 17:26:58 +08:00
|
|
|
fieinfo->fi_flags &= ~FIEMAP_FLAG_XATTR;
|
2020-05-23 15:30:08 +08:00
|
|
|
return iomap_fiemap(inode, fieinfo, start, len,
|
|
|
|
&ext4_iomap_xattr_ops);
|
2008-10-07 12:46:36 +08:00
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2020-05-23 15:30:08 +08:00
|
|
|
return iomap_fiemap(inode, fieinfo, start, len, &ext4_iomap_report_ops);
|
2019-08-12 04:32:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
|
|
|
|
__u64 start, __u64 len)
|
|
|
|
{
|
2020-05-23 15:30:08 +08:00
|
|
|
ext4_lblk_t start_blk, len_blks;
|
|
|
|
__u64 last_blk;
|
|
|
|
int error = 0;
|
|
|
|
|
2019-08-12 04:32:41 +08:00
|
|
|
if (ext4_has_inline_data(inode)) {
|
|
|
|
int has_inline;
|
|
|
|
|
|
|
|
down_read(&EXT4_I(inode)->xattr_sem);
|
|
|
|
has_inline = ext4_has_inline_data(inode);
|
|
|
|
up_read(&EXT4_I(inode)->xattr_sem);
|
|
|
|
if (has_inline)
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-05-23 15:30:08 +08:00
|
|
|
if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
|
|
|
|
error = ext4_ext_precache(inode);
|
|
|
|
if (error)
|
|
|
|
return error;
|
|
|
|
fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE;
|
|
|
|
}
|
|
|
|
|
2020-05-23 15:30:14 +08:00
|
|
|
error = fiemap_prep(inode, fieinfo, start, &len, 0);
|
2020-05-23 15:30:13 +08:00
|
|
|
if (error)
|
|
|
|
return error;
|
2019-08-12 04:32:41 +08:00
|
|
|
|
2020-05-23 15:30:08 +08:00
|
|
|
error = ext4_fiemap_check_ranges(inode, start, &len);
|
|
|
|
if (error)
|
|
|
|
return error;
|
|
|
|
|
|
|
|
start_blk = start >> inode->i_sb->s_blocksize_bits;
|
|
|
|
last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
|
|
|
|
if (last_blk >= EXT_MAX_BLOCKS)
|
|
|
|
last_blk = EXT_MAX_BLOCKS-1;
|
|
|
|
len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Walk the extent tree gathering extent information
|
|
|
|
* and pushing extents back to the user.
|
|
|
|
*/
|
|
|
|
return ext4_fill_es_cache_info(inode, start_blk, len_blks, fieinfo);
|
|
|
|
}
|
2019-08-12 04:32:41 +08:00
|
|
|
|
2014-02-24 04:18:59 +08:00
|
|
|
/*
|
|
|
|
* ext4_ext_shift_path_extents:
|
|
|
|
* Shift the extents of a path structure lying between path[depth].p_ext
|
2015-06-09 13:55:03 +08:00
|
|
|
* and EXT_LAST_EXTENT(path[depth].p_hdr), by @shift blocks. @SHIFT tells
|
|
|
|
* if it is right shift or left shift operation.
|
2014-02-24 04:18:59 +08:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
ext4_ext_shift_path_extents(struct ext4_ext_path *path, ext4_lblk_t shift,
|
|
|
|
struct inode *inode, handle_t *handle,
|
2015-06-09 13:55:03 +08:00
|
|
|
enum SHIFT_DIRECTION SHIFT)
|
2014-02-24 04:18:59 +08:00
|
|
|
{
|
|
|
|
int depth, err = 0;
|
|
|
|
struct ext4_extent *ex_start, *ex_last;
|
2019-12-25 10:45:59 +08:00
|
|
|
bool update = false;
|
2021-09-03 14:27:47 +08:00
|
|
|
int credits, restart_credits;
|
2014-02-24 04:18:59 +08:00
|
|
|
depth = path->p_depth;
|
|
|
|
|
|
|
|
while (depth >= 0) {
|
|
|
|
if (depth == path->p_depth) {
|
|
|
|
ex_start = path[depth].p_ext;
|
|
|
|
if (!ex_start)
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
ex_last = EXT_LAST_EXTENT(path[depth].p_hdr);
|
2021-09-03 14:27:47 +08:00
|
|
|
/* leaf + sb + inode */
|
|
|
|
credits = 3;
|
|
|
|
if (ex_start == EXT_FIRST_EXTENT(path[depth].p_hdr)) {
|
|
|
|
update = true;
|
|
|
|
/* extent tree + sb + inode */
|
|
|
|
credits = depth + 2;
|
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2021-09-03 14:27:47 +08:00
|
|
|
restart_credits = ext4_writepage_trans_blocks(inode);
|
|
|
|
err = ext4_datasem_ensure_credits(handle, inode, credits,
|
|
|
|
restart_credits, 0);
|
2021-09-03 14:27:48 +08:00
|
|
|
if (err) {
|
|
|
|
if (err > 0)
|
|
|
|
err = -EAGAIN;
|
2014-02-24 04:18:59 +08:00
|
|
|
goto out;
|
2021-09-03 14:27:48 +08:00
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2021-09-03 14:27:47 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
while (ex_start <= ex_last) {
|
2015-06-09 13:55:03 +08:00
|
|
|
if (SHIFT == SHIFT_LEFT) {
|
|
|
|
le32_add_cpu(&ex_start->ee_block,
|
|
|
|
-shift);
|
|
|
|
/* Try to merge to the left. */
|
|
|
|
if ((ex_start >
|
|
|
|
EXT_FIRST_EXTENT(path[depth].p_hdr))
|
|
|
|
&&
|
|
|
|
ext4_ext_try_to_merge_right(inode,
|
|
|
|
path, ex_start - 1))
|
|
|
|
ex_last--;
|
|
|
|
else
|
|
|
|
ex_start++;
|
|
|
|
} else {
|
|
|
|
le32_add_cpu(&ex_last->ee_block, shift);
|
|
|
|
ext4_ext_try_to_merge_right(inode, path,
|
|
|
|
ex_last);
|
2014-04-18 22:55:24 +08:00
|
|
|
ex_last--;
|
2015-06-09 13:55:03 +08:00
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
}
|
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (--depth < 0 || !update)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Update index too */
|
2021-09-03 14:27:47 +08:00
|
|
|
err = ext4_ext_get_access(handle, inode, path + depth);
|
2014-02-24 04:18:59 +08:00
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
if (SHIFT == SHIFT_LEFT)
|
|
|
|
le32_add_cpu(&path[depth].p_idx->ei_block, -shift);
|
|
|
|
else
|
|
|
|
le32_add_cpu(&path[depth].p_idx->ei_block, shift);
|
2014-02-24 04:18:59 +08:00
|
|
|
err = ext4_ext_dirty(handle, inode, path + depth);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* we are done if current index is not a starting index */
|
|
|
|
if (path[depth].p_idx != EXT_FIRST_INDEX(path[depth].p_hdr))
|
|
|
|
break;
|
|
|
|
|
|
|
|
depth--;
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ext4_ext_shift_extents:
|
2015-06-09 13:55:03 +08:00
|
|
|
* All the extents which lies in the range from @start to the last allocated
|
|
|
|
* block for the @inode are shifted either towards left or right (depending
|
|
|
|
* upon @SHIFT) by @shift blocks.
|
2014-02-24 04:18:59 +08:00
|
|
|
* On success, 0 is returned, error otherwise.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
|
2015-06-09 13:55:03 +08:00
|
|
|
ext4_lblk_t start, ext4_lblk_t shift,
|
|
|
|
enum SHIFT_DIRECTION SHIFT)
|
2014-02-24 04:18:59 +08:00
|
|
|
{
|
|
|
|
struct ext4_ext_path *path;
|
|
|
|
int ret = 0, depth;
|
|
|
|
struct ext4_extent *extent;
|
2015-06-09 13:55:03 +08:00
|
|
|
ext4_lblk_t stop, *iterator, ex_start, ex_end;
|
2021-09-03 14:27:48 +08:00
|
|
|
ext4_lblk_t tmp = EXT_MAX_BLOCKS;
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
/* Let path point to the last extent */
|
2017-01-09 10:00:35 +08:00
|
|
|
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
|
|
|
|
EXT4_EX_NOCACHE);
|
2014-02-24 04:18:59 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
|
|
|
|
depth = path->p_depth;
|
|
|
|
extent = path[depth].p_ext;
|
2014-09-02 02:41:09 +08:00
|
|
|
if (!extent)
|
|
|
|
goto out;
|
2014-02-24 04:18:59 +08:00
|
|
|
|
ext4: Include forgotten start block on fallocate insert range
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
2017-01-09 09:59:35 +08:00
|
|
|
stop = le32_to_cpu(extent->ee_block);
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
/*
|
2018-04-12 23:48:09 +08:00
|
|
|
* For left shifts, make sure the hole on the left is big enough to
|
|
|
|
* accommodate the shift. For right shifts, make sure the last extent
|
|
|
|
* won't be shifted beyond EXT_MAX_BLOCKS.
|
2015-06-09 13:55:03 +08:00
|
|
|
*/
|
|
|
|
if (SHIFT == SHIFT_LEFT) {
|
2017-01-09 10:00:35 +08:00
|
|
|
path = ext4_find_extent(inode, start - 1, &path,
|
|
|
|
EXT4_EX_NOCACHE);
|
2015-06-09 13:55:03 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
depth = path->p_depth;
|
|
|
|
extent = path[depth].p_ext;
|
|
|
|
if (extent) {
|
|
|
|
ex_start = le32_to_cpu(extent->ee_block);
|
|
|
|
ex_end = le32_to_cpu(extent->ee_block) +
|
|
|
|
ext4_ext_get_actual_len(extent);
|
|
|
|
} else {
|
|
|
|
ex_start = 0;
|
|
|
|
ex_end = 0;
|
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
if ((start == ex_start && shift > ex_start) ||
|
|
|
|
(shift > start - ex_end)) {
|
2018-04-12 23:48:09 +08:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (shift > EXT_MAX_BLOCKS -
|
|
|
|
(stop + ext4_ext_get_actual_len(extent))) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out;
|
2015-06-09 13:55:03 +08:00
|
|
|
}
|
2014-04-14 03:05:42 +08:00
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
/*
|
|
|
|
* In case of left shift, iterator points to start and it is increased
|
|
|
|
* till we reach stop. In case of right shift, iterator points to stop
|
|
|
|
* and it is decreased till we reach start.
|
|
|
|
*/
|
2021-09-03 14:27:48 +08:00
|
|
|
again:
|
ext4: fix use-after-free in ext4_ext_shift_extents
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220922120434.1294789-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-22 20:04:34 +08:00
|
|
|
ret = 0;
|
2015-06-09 13:55:03 +08:00
|
|
|
if (SHIFT == SHIFT_LEFT)
|
|
|
|
iterator = &start;
|
|
|
|
else
|
|
|
|
iterator = &stop;
|
2014-02-24 04:18:59 +08:00
|
|
|
|
2021-09-03 14:27:48 +08:00
|
|
|
if (tmp != EXT_MAX_BLOCKS)
|
|
|
|
*iterator = tmp;
|
|
|
|
|
ext4: Include forgotten start block on fallocate insert range
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
2017-01-09 09:59:35 +08:00
|
|
|
/*
|
|
|
|
* Its safe to start updating extents. Start and stop are unsigned, so
|
|
|
|
* in case of right shift if extent with 0 block is reached, iterator
|
|
|
|
* becomes NULL to indicate the end of the loop.
|
|
|
|
*/
|
|
|
|
while (iterator && start <= stop) {
|
2017-01-09 10:00:35 +08:00
|
|
|
path = ext4_find_extent(inode, *iterator, &path,
|
|
|
|
EXT4_EX_NOCACHE);
|
2014-02-24 04:18:59 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
depth = path->p_depth;
|
|
|
|
extent = path[depth].p_ext;
|
2014-04-14 03:41:13 +08:00
|
|
|
if (!extent) {
|
|
|
|
EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
|
2015-06-09 13:55:03 +08:00
|
|
|
(unsigned long) *iterator);
|
2015-10-18 04:16:04 +08:00
|
|
|
return -EFSCORRUPTED;
|
2014-04-14 03:41:13 +08:00
|
|
|
}
|
2015-06-09 13:55:03 +08:00
|
|
|
if (SHIFT == SHIFT_LEFT && *iterator >
|
|
|
|
le32_to_cpu(extent->ee_block)) {
|
2014-02-24 04:18:59 +08:00
|
|
|
/* Hole, move to the next extent */
|
2014-08-31 11:50:56 +08:00
|
|
|
if (extent < EXT_LAST_EXTENT(path[depth].p_hdr)) {
|
|
|
|
path[depth].p_ext++;
|
|
|
|
} else {
|
2015-06-09 13:55:03 +08:00
|
|
|
*iterator = ext4_ext_next_allocated_block(path);
|
2014-08-31 11:50:56 +08:00
|
|
|
continue;
|
2014-02-24 04:18:59 +08:00
|
|
|
}
|
|
|
|
}
|
2015-06-09 13:55:03 +08:00
|
|
|
|
2021-09-03 14:27:48 +08:00
|
|
|
tmp = *iterator;
|
2015-06-09 13:55:03 +08:00
|
|
|
if (SHIFT == SHIFT_LEFT) {
|
|
|
|
extent = EXT_LAST_EXTENT(path[depth].p_hdr);
|
|
|
|
*iterator = le32_to_cpu(extent->ee_block) +
|
|
|
|
ext4_ext_get_actual_len(extent);
|
|
|
|
} else {
|
|
|
|
extent = EXT_FIRST_EXTENT(path[depth].p_hdr);
|
ext4: fix use-after-free in ext4_ext_shift_extents
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220922120434.1294789-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-22 20:04:34 +08:00
|
|
|
if (le32_to_cpu(extent->ee_block) > start)
|
ext4: Include forgotten start block on fallocate insert range
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
2017-01-09 09:59:35 +08:00
|
|
|
*iterator = le32_to_cpu(extent->ee_block) - 1;
|
ext4: fix use-after-free in ext4_ext_shift_extents
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220922120434.1294789-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-22 20:04:34 +08:00
|
|
|
else if (le32_to_cpu(extent->ee_block) == start)
|
ext4: Include forgotten start block on fallocate insert range
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org
2017-01-09 09:59:35 +08:00
|
|
|
iterator = NULL;
|
ext4: fix use-after-free in ext4_ext_shift_extents
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220922120434.1294789-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-22 20:04:34 +08:00
|
|
|
else {
|
|
|
|
extent = EXT_LAST_EXTENT(path[depth].p_hdr);
|
|
|
|
while (le32_to_cpu(extent->ee_block) >= start)
|
|
|
|
extent--;
|
|
|
|
|
|
|
|
if (extent == EXT_LAST_EXTENT(path[depth].p_hdr))
|
|
|
|
break;
|
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
extent++;
|
ext4: fix use-after-free in ext4_ext_shift_extents
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220922120434.1294789-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-22 20:04:34 +08:00
|
|
|
iterator = NULL;
|
|
|
|
}
|
2015-06-09 13:55:03 +08:00
|
|
|
path[depth].p_ext = extent;
|
|
|
|
}
|
2014-02-24 04:18:59 +08:00
|
|
|
ret = ext4_ext_shift_path_extents(path, shift, inode,
|
2015-06-09 13:55:03 +08:00
|
|
|
handle, SHIFT);
|
2021-09-03 14:27:48 +08:00
|
|
|
/* iterator can be NULL which means we should break */
|
|
|
|
if (ret == -EAGAIN)
|
|
|
|
goto again;
|
2014-02-24 04:18:59 +08:00
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
}
|
2014-09-02 02:41:09 +08:00
|
|
|
out:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2014-02-24 04:18:59 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ext4_collapse_range:
|
|
|
|
* This implements the fallocate's collapse range functionality for ext4
|
|
|
|
* Returns: 0 and non-zero on error.
|
|
|
|
*/
|
2022-03-09 02:50:43 +08:00
|
|
|
static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
|
2014-02-24 04:18:59 +08:00
|
|
|
{
|
2022-03-09 02:50:43 +08:00
|
|
|
struct inode *inode = file_inode(file);
|
2014-02-24 04:18:59 +08:00
|
|
|
struct super_block *sb = inode->i_sb;
|
2021-02-05 01:05:42 +08:00
|
|
|
struct address_space *mapping = inode->i_mapping;
|
2014-02-24 04:18:59 +08:00
|
|
|
ext4_lblk_t punch_start, punch_stop;
|
|
|
|
handle_t *handle;
|
|
|
|
unsigned int credits;
|
2014-04-20 04:37:31 +08:00
|
|
|
loff_t new_size, ioffset;
|
2014-02-24 04:18:59 +08:00
|
|
|
int ret;
|
|
|
|
|
2015-05-15 12:24:10 +08:00
|
|
|
/*
|
|
|
|
* We need to test this early because xfstests assumes that a
|
|
|
|
* collapse range of (0, 1) will return EOPNOTSUPP if the file
|
|
|
|
* system does not support collapse range.
|
|
|
|
*/
|
|
|
|
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2020-01-01 02:04:38 +08:00
|
|
|
/* Collapse range works only on fs cluster size aligned regions. */
|
|
|
|
if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
|
2014-02-24 04:18:59 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
trace_ext4_collapse_range(inode, offset, len);
|
|
|
|
|
|
|
|
punch_start = offset >> EXT4_BLOCK_SIZE_BITS(sb);
|
|
|
|
punch_stop = (offset + len) >> EXT4_BLOCK_SIZE_BITS(sb);
|
|
|
|
|
2014-04-11 10:58:20 +08:00
|
|
|
/* Call ext4_force_commit to flush all data in case of data=journal. */
|
|
|
|
if (ext4_should_journal_data(inode)) {
|
|
|
|
ret = ext4_force_commit(inode->i_sb);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_lock(inode);
|
2014-04-12 21:56:41 +08:00
|
|
|
/*
|
|
|
|
* There is no need to overlap collapse range with EOF, in which case
|
|
|
|
* it is effectively a truncate operation
|
|
|
|
*/
|
2020-01-01 02:04:38 +08:00
|
|
|
if (offset + len >= inode->i_size) {
|
2014-04-12 21:56:41 +08:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
2014-02-24 04:18:59 +08:00
|
|
|
/* Currently just for extent based files */
|
|
|
|
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Wait for existing dio to complete */
|
|
|
|
inode_dio_wait(inode);
|
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = file_modified(file);
|
|
|
|
if (ret)
|
|
|
|
goto out_mutex;
|
|
|
|
|
2015-12-08 03:28:03 +08:00
|
|
|
/*
|
|
|
|
* Prevent page faults from reinstantiating pages we have released from
|
|
|
|
* page cache.
|
|
|
|
*/
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_lock(mapping);
|
2018-07-30 05:00:22 +08:00
|
|
|
|
|
|
|
ret = ext4_break_layouts(inode);
|
|
|
|
if (ret)
|
|
|
|
goto out_mmap;
|
|
|
|
|
2015-12-08 03:31:11 +08:00
|
|
|
/*
|
|
|
|
* Need to round down offset to be aligned with page size boundary
|
|
|
|
* for page size > block size.
|
|
|
|
*/
|
|
|
|
ioffset = round_down(offset, PAGE_SIZE);
|
|
|
|
/*
|
|
|
|
* Write tail of the last page before removed range since it will get
|
|
|
|
* removed from the page cache below.
|
|
|
|
*/
|
2021-02-05 01:05:42 +08:00
|
|
|
ret = filemap_write_and_wait_range(mapping, ioffset, offset);
|
2015-12-08 03:31:11 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_mmap;
|
|
|
|
/*
|
|
|
|
* Write data that will be shifted to preserve them when discarding
|
|
|
|
* page cache below. We are also protected from pages becoming dirty
|
2021-02-05 01:05:42 +08:00
|
|
|
* by i_rwsem and invalidate_lock.
|
2015-12-08 03:31:11 +08:00
|
|
|
*/
|
2021-02-05 01:05:42 +08:00
|
|
|
ret = filemap_write_and_wait_range(mapping, offset + len,
|
2015-12-08 03:31:11 +08:00
|
|
|
LLONG_MAX);
|
|
|
|
if (ret)
|
|
|
|
goto out_mmap;
|
2015-12-08 03:28:03 +08:00
|
|
|
truncate_pagecache(inode, ioffset);
|
|
|
|
|
2014-02-24 04:18:59 +08:00
|
|
|
credits = ext4_writepage_trans_blocks(inode);
|
|
|
|
handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
|
|
|
|
if (IS_ERR(handle)) {
|
|
|
|
ret = PTR_ERR(handle);
|
2015-12-08 03:28:03 +08:00
|
|
|
goto out_mmap;
|
2014-02-24 04:18:59 +08:00
|
|
|
}
|
2022-01-17 17:36:54 +08:00
|
|
|
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
2020-08-17 15:36:15 +08:00
|
|
|
ext4_discard_preallocations(inode, 0);
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
ret = ext4_es_remove_extent(inode, punch_start,
|
2014-04-18 22:43:21 +08:00
|
|
|
EXT_MAX_BLOCKS - punch_start);
|
2014-02-24 04:18:59 +08:00
|
|
|
if (ret) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
|
|
|
|
if (ret) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
2020-08-17 15:36:15 +08:00
|
|
|
ext4_discard_preallocations(inode, 0);
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
ret = ext4_ext_shift_extents(inode, handle, punch_stop,
|
2015-06-09 13:55:03 +08:00
|
|
|
punch_stop - punch_start, SHIFT_LEFT);
|
2014-02-24 04:18:59 +08:00
|
|
|
if (ret) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
|
|
|
|
2020-01-01 02:04:38 +08:00
|
|
|
new_size = inode->i_size - len;
|
2014-04-18 22:48:25 +08:00
|
|
|
i_size_write(inode, new_size);
|
2014-02-24 04:18:59 +08:00
|
|
|
EXT4_I(inode)->i_disksize = new_size;
|
|
|
|
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
if (IS_SYNC(inode))
|
|
|
|
ext4_handle_sync(handle);
|
2016-11-15 10:40:10 +08:00
|
|
|
inode->i_mtime = inode->i_ctime = current_time(inode);
|
2020-04-27 09:34:37 +08:00
|
|
|
ret = ext4_mark_inode_dirty(handle, inode);
|
2017-05-30 01:24:55 +08:00
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
2014-02-24 04:18:59 +08:00
|
|
|
|
|
|
|
out_stop:
|
|
|
|
ext4_journal_stop(handle);
|
2015-12-08 03:28:03 +08:00
|
|
|
out_mmap:
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_unlock(mapping);
|
2014-02-24 04:18:59 +08:00
|
|
|
out_mutex:
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_unlock(inode);
|
2014-02-24 04:18:59 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2014-08-31 11:52:19 +08:00
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
/*
|
|
|
|
* ext4_insert_range:
|
|
|
|
* This function implements the FALLOC_FL_INSERT_RANGE flag of fallocate.
|
|
|
|
* The data blocks starting from @offset to the EOF are shifted by @len
|
|
|
|
* towards right to create a hole in the @inode. Inode size is increased
|
|
|
|
* by len bytes.
|
|
|
|
* Returns 0 on success, error otherwise.
|
|
|
|
*/
|
2022-03-09 02:50:43 +08:00
|
|
|
static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
|
2015-06-09 13:55:03 +08:00
|
|
|
{
|
2022-03-09 02:50:43 +08:00
|
|
|
struct inode *inode = file_inode(file);
|
2015-06-09 13:55:03 +08:00
|
|
|
struct super_block *sb = inode->i_sb;
|
2021-02-05 01:05:42 +08:00
|
|
|
struct address_space *mapping = inode->i_mapping;
|
2015-06-09 13:55:03 +08:00
|
|
|
handle_t *handle;
|
|
|
|
struct ext4_ext_path *path;
|
|
|
|
struct ext4_extent *extent;
|
|
|
|
ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk = 0;
|
|
|
|
unsigned int credits, ee_len;
|
|
|
|
int ret = 0, depth, split_flag = 0;
|
|
|
|
loff_t ioffset;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to test this early because xfstests assumes that an
|
|
|
|
* insert range of (0, 1) will return EOPNOTSUPP if the file
|
|
|
|
* system does not support insert range.
|
|
|
|
*/
|
|
|
|
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2020-01-01 02:04:38 +08:00
|
|
|
/* Insert range works only on fs cluster size aligned regions. */
|
|
|
|
if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
|
2015-06-09 13:55:03 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
trace_ext4_insert_range(inode, offset, len);
|
|
|
|
|
|
|
|
offset_lblk = offset >> EXT4_BLOCK_SIZE_BITS(sb);
|
|
|
|
len_lblk = len >> EXT4_BLOCK_SIZE_BITS(sb);
|
|
|
|
|
|
|
|
/* Call ext4_force_commit to flush all data in case of data=journal */
|
|
|
|
if (ext4_should_journal_data(inode)) {
|
|
|
|
ret = ext4_force_commit(inode->i_sb);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_lock(inode);
|
2015-06-09 13:55:03 +08:00
|
|
|
/* Currently just for extent based files */
|
|
|
|
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
2020-01-01 02:04:38 +08:00
|
|
|
/* Check whether the maximum file size would be exceeded */
|
|
|
|
if (len > inode->i_sb->s_maxbytes - inode->i_size) {
|
2015-06-09 13:55:03 +08:00
|
|
|
ret = -EFBIG;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
2020-01-01 02:04:38 +08:00
|
|
|
/* Offset must be less than i_size */
|
|
|
|
if (offset >= inode->i_size) {
|
2015-06-09 13:55:03 +08:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_mutex;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Wait for existing dio to complete */
|
|
|
|
inode_dio_wait(inode);
|
|
|
|
|
2022-03-09 02:50:43 +08:00
|
|
|
ret = file_modified(file);
|
|
|
|
if (ret)
|
|
|
|
goto out_mutex;
|
|
|
|
|
2015-12-08 03:28:03 +08:00
|
|
|
/*
|
|
|
|
* Prevent page faults from reinstantiating pages we have released from
|
|
|
|
* page cache.
|
|
|
|
*/
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_lock(mapping);
|
2018-07-30 05:00:22 +08:00
|
|
|
|
|
|
|
ret = ext4_break_layouts(inode);
|
|
|
|
if (ret)
|
|
|
|
goto out_mmap;
|
|
|
|
|
2015-12-08 03:31:11 +08:00
|
|
|
/*
|
|
|
|
* Need to round down to align start offset to page size boundary
|
|
|
|
* for page size > block size.
|
|
|
|
*/
|
|
|
|
ioffset = round_down(offset, PAGE_SIZE);
|
|
|
|
/* Write out all dirty pages */
|
|
|
|
ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
|
|
|
|
LLONG_MAX);
|
|
|
|
if (ret)
|
|
|
|
goto out_mmap;
|
2015-12-08 03:28:03 +08:00
|
|
|
truncate_pagecache(inode, ioffset);
|
|
|
|
|
2015-06-09 13:55:03 +08:00
|
|
|
credits = ext4_writepage_trans_blocks(inode);
|
|
|
|
handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
|
|
|
|
if (IS_ERR(handle)) {
|
|
|
|
ret = PTR_ERR(handle);
|
2015-12-08 03:28:03 +08:00
|
|
|
goto out_mmap;
|
2015-06-09 13:55:03 +08:00
|
|
|
}
|
2022-01-17 17:36:54 +08:00
|
|
|
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
|
2015-06-09 13:55:03 +08:00
|
|
|
|
|
|
|
/* Expand file to avoid data loss if there is error while shifting */
|
|
|
|
inode->i_size += len;
|
|
|
|
EXT4_I(inode)->i_disksize += len;
|
2016-11-15 10:40:10 +08:00
|
|
|
inode->i_mtime = inode->i_ctime = current_time(inode);
|
2015-06-09 13:55:03 +08:00
|
|
|
ret = ext4_mark_inode_dirty(handle, inode);
|
|
|
|
if (ret)
|
|
|
|
goto out_stop;
|
|
|
|
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
2020-08-17 15:36:15 +08:00
|
|
|
ext4_discard_preallocations(inode, 0);
|
2015-06-09 13:55:03 +08:00
|
|
|
|
|
|
|
path = ext4_find_extent(inode, offset_lblk, NULL, 0);
|
|
|
|
if (IS_ERR(path)) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
extent = path[depth].p_ext;
|
|
|
|
if (extent) {
|
|
|
|
ee_start_lblk = le32_to_cpu(extent->ee_block);
|
|
|
|
ee_len = ext4_ext_get_actual_len(extent);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If offset_lblk is not the starting block of extent, split
|
|
|
|
* the extent @offset_lblk
|
|
|
|
*/
|
|
|
|
if ((offset_lblk > ee_start_lblk) &&
|
|
|
|
(offset_lblk < (ee_start_lblk + ee_len))) {
|
|
|
|
if (ext4_ext_is_unwritten(extent))
|
|
|
|
split_flag = EXT4_EXT_MARK_UNWRIT1 |
|
|
|
|
EXT4_EXT_MARK_UNWRIT2;
|
|
|
|
ret = ext4_split_extent_at(handle, inode, &path,
|
|
|
|
offset_lblk, split_flag,
|
|
|
|
EXT4_EX_NOCACHE |
|
|
|
|
EXT4_GET_BLOCKS_PRE_IO |
|
|
|
|
EXT4_GET_BLOCKS_METADATA_NOFAIL);
|
|
|
|
}
|
|
|
|
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2015-06-09 13:55:03 +08:00
|
|
|
if (ret < 0) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
2016-09-15 23:39:52 +08:00
|
|
|
} else {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2015-06-09 13:55:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
ret = ext4_es_remove_extent(inode, offset_lblk,
|
|
|
|
EXT_MAX_BLOCKS - offset_lblk);
|
|
|
|
if (ret) {
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
goto out_stop;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if offset_lblk lies in a hole which is at start of file, use
|
|
|
|
* ee_start_lblk to shift extents
|
|
|
|
*/
|
|
|
|
ret = ext4_ext_shift_extents(inode, handle,
|
2022-08-17 10:59:28 +08:00
|
|
|
max(ee_start_lblk, offset_lblk), len_lblk, SHIFT_RIGHT);
|
2015-06-09 13:55:03 +08:00
|
|
|
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
if (IS_SYNC(inode))
|
|
|
|
ext4_handle_sync(handle);
|
2017-05-30 01:24:55 +08:00
|
|
|
if (ret >= 0)
|
|
|
|
ext4_update_inode_fsync_trans(handle, inode, 1);
|
2015-06-09 13:55:03 +08:00
|
|
|
|
|
|
|
out_stop:
|
|
|
|
ext4_journal_stop(handle);
|
2015-12-08 03:28:03 +08:00
|
|
|
out_mmap:
|
2021-02-05 01:05:42 +08:00
|
|
|
filemap_invalidate_unlock(mapping);
|
2015-06-09 13:55:03 +08:00
|
|
|
out_mutex:
|
2016-01-23 04:40:57 +08:00
|
|
|
inode_unlock(inode);
|
2015-06-09 13:55:03 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-08-31 11:52:19 +08:00
|
|
|
/**
|
2019-06-20 04:30:03 +08:00
|
|
|
* ext4_swap_extents() - Swap extents between two inodes
|
|
|
|
* @handle: handle for this transaction
|
2014-08-31 11:52:19 +08:00
|
|
|
* @inode1: First inode
|
|
|
|
* @inode2: Second inode
|
|
|
|
* @lblk1: Start block for first inode
|
|
|
|
* @lblk2: Start block for second inode
|
|
|
|
* @count: Number of blocks to swap
|
2018-03-26 13:44:03 +08:00
|
|
|
* @unwritten: Mark second inode's extents as unwritten after swap
|
2014-08-31 11:52:19 +08:00
|
|
|
* @erp: Pointer to save error value
|
|
|
|
*
|
|
|
|
* This helper routine does exactly what is promise "swap extents". All other
|
|
|
|
* stuff such as page-cache locking consistency, bh mapping consistency or
|
|
|
|
* extent's data copying must be performed by caller.
|
|
|
|
* Locking:
|
2022-01-21 15:06:11 +08:00
|
|
|
* i_rwsem is held for both inodes
|
2014-08-31 11:52:19 +08:00
|
|
|
* i_data_sem is locked for write for both inodes
|
|
|
|
* Assumptions:
|
|
|
|
* All pages from requested range are locked for both inodes
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ext4_swap_extents(handle_t *handle, struct inode *inode1,
|
2018-03-26 13:44:03 +08:00
|
|
|
struct inode *inode2, ext4_lblk_t lblk1, ext4_lblk_t lblk2,
|
2014-08-31 11:52:19 +08:00
|
|
|
ext4_lblk_t count, int unwritten, int *erp)
|
|
|
|
{
|
|
|
|
struct ext4_ext_path *path1 = NULL;
|
|
|
|
struct ext4_ext_path *path2 = NULL;
|
|
|
|
int replaced_count = 0;
|
|
|
|
|
|
|
|
BUG_ON(!rwsem_is_locked(&EXT4_I(inode1)->i_data_sem));
|
|
|
|
BUG_ON(!rwsem_is_locked(&EXT4_I(inode2)->i_data_sem));
|
2016-01-23 04:40:57 +08:00
|
|
|
BUG_ON(!inode_is_locked(inode1));
|
|
|
|
BUG_ON(!inode_is_locked(inode2));
|
2014-08-31 11:52:19 +08:00
|
|
|
|
|
|
|
*erp = ext4_es_remove_extent(inode1, lblk1, count);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
2014-08-31 11:52:19 +08:00
|
|
|
return 0;
|
|
|
|
*erp = ext4_es_remove_extent(inode2, lblk2, count);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
2014-08-31 11:52:19 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
while (count) {
|
|
|
|
struct ext4_extent *ex1, *ex2, tmp_ex;
|
|
|
|
ext4_lblk_t e1_blk, e2_blk;
|
|
|
|
int e1_len, e2_len, len;
|
|
|
|
int split = 0;
|
|
|
|
|
2014-09-02 02:43:09 +08:00
|
|
|
path1 = ext4_find_extent(inode1, lblk1, NULL, EXT4_EX_NOCACHE);
|
2015-08-12 18:29:44 +08:00
|
|
|
if (IS_ERR(path1)) {
|
2014-08-31 11:52:19 +08:00
|
|
|
*erp = PTR_ERR(path1);
|
2014-09-01 03:03:14 +08:00
|
|
|
path1 = NULL;
|
|
|
|
finish:
|
|
|
|
count = 0;
|
|
|
|
goto repeat;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
2014-09-02 02:43:09 +08:00
|
|
|
path2 = ext4_find_extent(inode2, lblk2, NULL, EXT4_EX_NOCACHE);
|
2015-08-12 18:29:44 +08:00
|
|
|
if (IS_ERR(path2)) {
|
2014-08-31 11:52:19 +08:00
|
|
|
*erp = PTR_ERR(path2);
|
2014-09-01 03:03:14 +08:00
|
|
|
path2 = NULL;
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
|
|
|
ex1 = path1[path1->p_depth].p_ext;
|
|
|
|
ex2 = path2[path2->p_depth].p_ext;
|
2020-06-11 11:19:46 +08:00
|
|
|
/* Do we have something to swap ? */
|
2014-08-31 11:52:19 +08:00
|
|
|
if (unlikely(!ex2 || !ex1))
|
2014-09-01 03:03:14 +08:00
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
|
|
|
|
e1_blk = le32_to_cpu(ex1->ee_block);
|
|
|
|
e2_blk = le32_to_cpu(ex2->ee_block);
|
|
|
|
e1_len = ext4_ext_get_actual_len(ex1);
|
|
|
|
e2_len = ext4_ext_get_actual_len(ex2);
|
|
|
|
|
|
|
|
/* Hole handling */
|
|
|
|
if (!in_range(lblk1, e1_blk, e1_len) ||
|
|
|
|
!in_range(lblk2, e2_blk, e2_len)) {
|
|
|
|
ext4_lblk_t next1, next2;
|
|
|
|
|
|
|
|
/* if hole after extent, then go to next extent */
|
|
|
|
next1 = ext4_ext_next_allocated_block(path1);
|
|
|
|
next2 = ext4_ext_next_allocated_block(path2);
|
|
|
|
/* If hole before extent, then shift to that extent */
|
|
|
|
if (e1_blk > lblk1)
|
|
|
|
next1 = e1_blk;
|
|
|
|
if (e2_blk > lblk2)
|
2017-08-06 13:33:07 +08:00
|
|
|
next2 = e2_blk;
|
2014-08-31 11:52:19 +08:00
|
|
|
/* Do we have something to swap */
|
|
|
|
if (next1 == EXT_MAX_BLOCKS || next2 == EXT_MAX_BLOCKS)
|
2014-09-01 03:03:14 +08:00
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
/* Move to the rightest boundary */
|
|
|
|
len = next1 - lblk1;
|
|
|
|
if (len < next2 - lblk2)
|
|
|
|
len = next2 - lblk2;
|
|
|
|
if (len > count)
|
|
|
|
len = count;
|
|
|
|
lblk1 += len;
|
|
|
|
lblk2 += len;
|
|
|
|
count -= len;
|
|
|
|
goto repeat;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Prepare left boundary */
|
|
|
|
if (e1_blk < lblk1) {
|
|
|
|
split = 1;
|
|
|
|
*erp = ext4_force_split_extent_at(handle, inode1,
|
2014-09-02 02:37:09 +08:00
|
|
|
&path1, lblk1, 0);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
|
|
|
if (e2_blk < lblk2) {
|
|
|
|
split = 1;
|
|
|
|
*erp = ext4_force_split_extent_at(handle, inode2,
|
2014-09-02 02:37:09 +08:00
|
|
|
&path2, lblk2, 0);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
2014-09-02 02:37:09 +08:00
|
|
|
/* ext4_split_extent_at() may result in leaf extent split,
|
2014-08-31 11:52:19 +08:00
|
|
|
* path must to be revalidated. */
|
|
|
|
if (split)
|
|
|
|
goto repeat;
|
|
|
|
|
|
|
|
/* Prepare right boundary */
|
|
|
|
len = count;
|
|
|
|
if (len > e1_blk + e1_len - lblk1)
|
|
|
|
len = e1_blk + e1_len - lblk1;
|
|
|
|
if (len > e2_blk + e2_len - lblk2)
|
|
|
|
len = e2_blk + e2_len - lblk2;
|
|
|
|
|
|
|
|
if (len != e1_len) {
|
|
|
|
split = 1;
|
|
|
|
*erp = ext4_force_split_extent_at(handle, inode1,
|
2014-09-02 02:37:09 +08:00
|
|
|
&path1, lblk1 + len, 0);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
|
|
|
if (len != e2_len) {
|
|
|
|
split = 1;
|
|
|
|
*erp = ext4_force_split_extent_at(handle, inode2,
|
2014-09-02 02:37:09 +08:00
|
|
|
&path2, lblk2 + len, 0);
|
2014-08-31 11:52:19 +08:00
|
|
|
if (*erp)
|
2014-09-01 03:03:14 +08:00
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
2014-09-02 02:37:09 +08:00
|
|
|
/* ext4_split_extent_at() may result in leaf extent split,
|
2014-08-31 11:52:19 +08:00
|
|
|
* path must to be revalidated. */
|
|
|
|
if (split)
|
|
|
|
goto repeat;
|
|
|
|
|
|
|
|
BUG_ON(e2_len != e1_len);
|
|
|
|
*erp = ext4_ext_get_access(handle, inode1, path1 + path1->p_depth);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
*erp = ext4_ext_get_access(handle, inode2, path2 + path2->p_depth);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
|
|
|
|
/* Both extents are fully inside boundaries. Swap it now */
|
|
|
|
tmp_ex = *ex1;
|
|
|
|
ext4_ext_store_pblock(ex1, ext4_ext_pblock(ex2));
|
|
|
|
ext4_ext_store_pblock(ex2, ext4_ext_pblock(&tmp_ex));
|
|
|
|
ex1->ee_len = cpu_to_le16(e2_len);
|
|
|
|
ex2->ee_len = cpu_to_le16(e1_len);
|
|
|
|
if (unwritten)
|
|
|
|
ext4_ext_mark_unwritten(ex2);
|
|
|
|
if (ext4_ext_is_unwritten(&tmp_ex))
|
|
|
|
ext4_ext_mark_unwritten(ex1);
|
|
|
|
|
|
|
|
ext4_ext_try_to_merge(handle, inode2, path2, ex2);
|
|
|
|
ext4_ext_try_to_merge(handle, inode1, path1, ex1);
|
|
|
|
*erp = ext4_ext_dirty(handle, inode2, path2 +
|
|
|
|
path2->p_depth);
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
*erp = ext4_ext_dirty(handle, inode1, path1 +
|
|
|
|
path1->p_depth);
|
|
|
|
/*
|
|
|
|
* Looks scarry ah..? second inode already points to new blocks,
|
|
|
|
* and it was successfully dirtied. But luckily error may happen
|
|
|
|
* only due to journal error, so full transaction will be
|
|
|
|
* aborted anyway.
|
|
|
|
*/
|
2014-09-01 03:03:14 +08:00
|
|
|
if (unlikely(*erp))
|
|
|
|
goto finish;
|
2014-08-31 11:52:19 +08:00
|
|
|
lblk1 += len;
|
|
|
|
lblk2 += len;
|
|
|
|
replaced_count += len;
|
|
|
|
count -= len;
|
|
|
|
|
|
|
|
repeat:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path1);
|
|
|
|
ext4_free_ext_path(path2);
|
2014-09-02 02:39:09 +08:00
|
|
|
path1 = path2 = NULL;
|
2014-08-31 11:52:19 +08:00
|
|
|
}
|
|
|
|
return replaced_count;
|
|
|
|
}
|
2018-10-02 02:19:37 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* ext4_clu_mapped - determine whether any block in a logical cluster has
|
|
|
|
* been mapped to a physical cluster
|
|
|
|
*
|
|
|
|
* @inode - file containing the logical cluster
|
|
|
|
* @lclu - logical cluster of interest
|
|
|
|
*
|
|
|
|
* Returns 1 if any block in the logical cluster is mapped, signifying
|
|
|
|
* that a physical cluster has been allocated for it. Otherwise,
|
|
|
|
* returns 0. Can also return negative error codes. Derived from
|
|
|
|
* ext4_ext_map_blocks().
|
|
|
|
*/
|
|
|
|
int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)
|
|
|
|
{
|
|
|
|
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
|
|
|
|
struct ext4_ext_path *path;
|
|
|
|
int depth, mapped = 0, err = 0;
|
|
|
|
struct ext4_extent *extent;
|
|
|
|
ext4_lblk_t first_lblk, first_lclu, last_lclu;
|
|
|
|
|
ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline
When converting files with inline data to extents, delayed allocations
made on a file system created with both the bigalloc and inline options
can result in invalid extent status cache content, incorrect reserved
cluster counts, kernel memory leaks, and potential kernel panics.
With bigalloc, the code that determines whether a block must be
delayed allocated searches the extent tree to see if that block maps
to a previously allocated cluster. If not, the block is delayed
allocated, and otherwise, it isn't. However, if the inline option is
also used, and if the file containing the block is marked as able to
store data inline, there isn't a valid extent tree associated with
the file. The current code in ext4_clu_mapped() calls
ext4_find_extent() to search the non-existent tree for a previously
allocated cluster anyway, which typically finds nothing, as desired.
However, a side effect of the search can be to cache invalid content
from the non-existent tree (garbage) in the extent status tree,
including bogus entries in the pending reservation tree.
To fix this, avoid searching the extent tree when allocating blocks
for bigalloc + inline files that are being converted from inline to
extent mapped.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20221117152207.2424-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-11-17 23:22:07 +08:00
|
|
|
/*
|
|
|
|
* if data can be stored inline, the logical cluster isn't
|
|
|
|
* mapped - no physical clusters have been allocated, and the
|
|
|
|
* file has no extents
|
|
|
|
*/
|
|
|
|
if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA))
|
|
|
|
return 0;
|
|
|
|
|
2018-10-02 02:19:37 +08:00
|
|
|
/* search for the extent closest to the first block in the cluster */
|
|
|
|
path = ext4_find_extent(inode, EXT4_C2B(sbi, lclu), NULL, 0);
|
|
|
|
if (IS_ERR(path)) {
|
|
|
|
err = PTR_ERR(path);
|
|
|
|
path = NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
depth = ext_depth(inode);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A consistent leaf must not be empty. This situation is possible,
|
|
|
|
* though, _during_ tree modification, and it's why an assert can't
|
|
|
|
* be put in ext4_find_extent().
|
|
|
|
*/
|
|
|
|
if (unlikely(path[depth].p_ext == NULL && depth != 0)) {
|
|
|
|
EXT4_ERROR_INODE(inode,
|
|
|
|
"bad extent address - lblock: %lu, depth: %d, pblock: %lld",
|
|
|
|
(unsigned long) EXT4_C2B(sbi, lclu),
|
|
|
|
depth, path[depth].p_block);
|
|
|
|
err = -EFSCORRUPTED;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
extent = path[depth].p_ext;
|
|
|
|
|
|
|
|
/* can't be mapped if the extent tree is empty */
|
|
|
|
if (extent == NULL)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
first_lblk = le32_to_cpu(extent->ee_block);
|
|
|
|
first_lclu = EXT4_B2C(sbi, first_lblk);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Three possible outcomes at this point - found extent spanning
|
|
|
|
* the target cluster, to the left of the target cluster, or to the
|
|
|
|
* right of the target cluster. The first two cases are handled here.
|
|
|
|
* The last case indicates the target cluster is not mapped.
|
|
|
|
*/
|
|
|
|
if (lclu >= first_lclu) {
|
|
|
|
last_lclu = EXT4_B2C(sbi, first_lblk +
|
|
|
|
ext4_ext_get_actual_len(extent) - 1);
|
|
|
|
if (lclu <= last_lclu) {
|
|
|
|
mapped = 1;
|
|
|
|
} else {
|
|
|
|
first_lblk = ext4_ext_next_allocated_block(path);
|
|
|
|
first_lclu = EXT4_B2C(sbi, first_lblk);
|
|
|
|
if (lclu == first_lclu)
|
|
|
|
mapped = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2018-10-02 02:19:37 +08:00
|
|
|
|
|
|
|
return err ? err : mapped;
|
|
|
|
}
|
2020-10-16 04:37:59 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Updates physical block address and unwritten status of extent
|
|
|
|
* starting at lblk start and of len. If such an extent doesn't exist,
|
|
|
|
* this function splits the extent tree appropriately to create an
|
|
|
|
* extent like this. This function is called in the fast commit
|
|
|
|
* replay path. Returns 0 on success and error on failure.
|
|
|
|
*/
|
|
|
|
int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
|
|
|
|
int len, int unwritten, ext4_fsblk_t pblk)
|
|
|
|
{
|
|
|
|
struct ext4_ext_path *path = NULL, *ppath;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
path = ext4_find_extent(inode, start, NULL, 0);
|
2020-10-23 19:22:32 +08:00
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
if (!ex) {
|
|
|
|
ret = -EFSCORRUPTED;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (le32_to_cpu(ex->ee_block) != start ||
|
|
|
|
ext4_ext_get_actual_len(ex) != len) {
|
|
|
|
/* We need to split this extent to match our extent first */
|
|
|
|
ppath = path;
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
ret = ext4_force_split_extent_at(NULL, inode, &ppath, start, 1);
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
kfree(path);
|
|
|
|
path = ext4_find_extent(inode, start, NULL, 0);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
return -1;
|
|
|
|
ppath = path;
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
WARN_ON(le32_to_cpu(ex->ee_block) != start);
|
|
|
|
if (ext4_ext_get_actual_len(ex) != len) {
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
ret = ext4_force_split_extent_at(NULL, inode, &ppath,
|
|
|
|
start + len, 1);
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
kfree(path);
|
|
|
|
path = ext4_find_extent(inode, start, NULL, 0);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
return -EINVAL;
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (unwritten)
|
|
|
|
ext4_ext_mark_unwritten(ex);
|
|
|
|
else
|
|
|
|
ext4_ext_mark_initialized(ex);
|
|
|
|
ext4_ext_store_pblock(ex, pblk);
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
ret = ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
out:
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
ext4_mark_inode_dirty(NULL, inode);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Try to shrink the extent tree */
|
|
|
|
void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end)
|
|
|
|
{
|
|
|
|
struct ext4_ext_path *path = NULL;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
ext4_lblk_t old_cur, cur = 0;
|
|
|
|
|
|
|
|
while (cur < end) {
|
|
|
|
path = ext4_find_extent(inode, cur, NULL, 0);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
return;
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
if (!ex) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
ext4_mark_inode_dirty(NULL, inode);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
old_cur = cur;
|
|
|
|
cur = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
|
|
|
if (cur <= old_cur)
|
|
|
|
cur = old_cur + 1;
|
|
|
|
ext4_ext_try_to_merge(NULL, inode, path, ex);
|
|
|
|
down_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
|
|
|
|
up_write(&EXT4_I(inode)->i_data_sem);
|
|
|
|
ext4_mark_inode_dirty(NULL, inode);
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check if *cur is a hole and if it is, skip it */
|
2021-09-02 23:36:01 +08:00
|
|
|
static int skip_hole(struct inode *inode, ext4_lblk_t *cur)
|
2020-10-16 04:37:59 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct ext4_map_blocks map;
|
|
|
|
|
|
|
|
map.m_lblk = *cur;
|
|
|
|
map.m_len = ((inode->i_size) >> inode->i_sb->s_blocksize_bits) - *cur;
|
|
|
|
|
|
|
|
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
2021-09-02 23:36:01 +08:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2020-10-16 04:37:59 +08:00
|
|
|
if (ret != 0)
|
2021-09-02 23:36:01 +08:00
|
|
|
return 0;
|
2020-10-16 04:37:59 +08:00
|
|
|
*cur = *cur + map.m_len;
|
2021-09-02 23:36:01 +08:00
|
|
|
return 0;
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Count number of blocks used by this inode and update i_blocks */
|
|
|
|
int ext4_ext_replay_set_iblocks(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct ext4_ext_path *path = NULL, *path2 = NULL;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
ext4_lblk_t cur = 0, end;
|
|
|
|
int numblks = 0, i, ret = 0;
|
|
|
|
ext4_fsblk_t cmp1, cmp2;
|
|
|
|
struct ext4_map_blocks map;
|
|
|
|
|
|
|
|
/* Determin the size of the file first */
|
|
|
|
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
|
|
|
|
EXT4_EX_NOCACHE);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
if (!ex) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
|
|
|
|
/* Count the number of data blocks */
|
|
|
|
cur = 0;
|
|
|
|
while (cur < end) {
|
|
|
|
map.m_lblk = cur;
|
|
|
|
map.m_len = end - cur;
|
|
|
|
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
|
|
|
if (ret < 0)
|
|
|
|
break;
|
|
|
|
if (ret > 0)
|
|
|
|
numblks += ret;
|
|
|
|
cur = cur + map.m_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Count the number of extent tree blocks. We do it by looking up
|
|
|
|
* two successive extents and determining the difference between
|
|
|
|
* their paths. When path is different for 2 successive extents
|
|
|
|
* we compare the blocks in the path at each level and increment
|
|
|
|
* iblocks by total number of differences found.
|
|
|
|
*/
|
|
|
|
cur = 0;
|
2021-09-02 23:36:01 +08:00
|
|
|
ret = skip_hole(inode, &cur);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
2020-10-16 04:37:59 +08:00
|
|
|
path = ext4_find_extent(inode, cur, NULL, 0);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
goto out;
|
|
|
|
numblks += path->p_depth;
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
while (cur < end) {
|
|
|
|
path = ext4_find_extent(inode, cur, NULL, 0);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
break;
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
if (!ex) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
cur = max(cur + 1, le32_to_cpu(ex->ee_block) +
|
|
|
|
ext4_ext_get_actual_len(ex));
|
2021-09-02 23:36:01 +08:00
|
|
|
ret = skip_hole(inode, &cur);
|
|
|
|
if (ret < 0) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2021-09-02 23:36:01 +08:00
|
|
|
break;
|
|
|
|
}
|
2020-10-16 04:37:59 +08:00
|
|
|
path2 = ext4_find_extent(inode, cur, NULL, 0);
|
|
|
|
if (IS_ERR(path2)) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
|
|
|
|
cmp1 = cmp2 = 0;
|
|
|
|
if (i <= path->p_depth)
|
|
|
|
cmp1 = path[i].p_bh ?
|
|
|
|
path[i].p_bh->b_blocknr : 0;
|
|
|
|
if (i <= path2->p_depth)
|
|
|
|
cmp2 = path2[i].p_bh ?
|
|
|
|
path2[i].p_bh->b_blocknr : 0;
|
|
|
|
if (cmp1 != cmp2 && cmp2 != 0)
|
|
|
|
numblks++;
|
|
|
|
}
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
|
|
|
ext4_free_ext_path(path2);
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
inode->i_blocks = numblks << (inode->i_sb->s_blocksize_bits - 9);
|
|
|
|
ext4_mark_inode_dirty(NULL, inode);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int ext4_ext_clear_bb(struct inode *inode)
|
|
|
|
{
|
|
|
|
struct ext4_ext_path *path = NULL;
|
|
|
|
struct ext4_extent *ex;
|
|
|
|
ext4_lblk_t cur = 0, end;
|
|
|
|
int j, ret = 0;
|
|
|
|
struct ext4_map_blocks map;
|
|
|
|
|
2021-10-16 02:25:13 +08:00
|
|
|
if (ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA))
|
|
|
|
return 0;
|
|
|
|
|
2020-10-16 04:37:59 +08:00
|
|
|
/* Determin the size of the file first */
|
|
|
|
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
|
|
|
|
EXT4_EX_NOCACHE);
|
|
|
|
if (IS_ERR(path))
|
|
|
|
return PTR_ERR(path);
|
|
|
|
ex = path[path->p_depth].p_ext;
|
|
|
|
if (!ex) {
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
|
|
|
|
cur = 0;
|
|
|
|
while (cur < end) {
|
|
|
|
map.m_lblk = cur;
|
|
|
|
map.m_len = end - cur;
|
|
|
|
ret = ext4_map_blocks(NULL, inode, &map, 0);
|
|
|
|
if (ret < 0)
|
|
|
|
break;
|
|
|
|
if (ret > 0) {
|
|
|
|
path = ext4_find_extent(inode, map.m_lblk, NULL, 0);
|
|
|
|
if (!IS_ERR_OR_NULL(path)) {
|
|
|
|
for (j = 0; j < path->p_depth; j++) {
|
|
|
|
|
|
|
|
ext4_mb_mark_bb(inode->i_sb,
|
|
|
|
path[j].p_block, 1, 0);
|
2022-01-10 11:51:40 +08:00
|
|
|
ext4_fc_record_regions(inode->i_sb, inode->i_ino,
|
|
|
|
0, path[j].p_block, 1, 1);
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
2022-09-24 10:12:11 +08:00
|
|
|
ext4_free_ext_path(path);
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
|
|
|
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
|
2022-01-10 11:51:40 +08:00
|
|
|
ext4_fc_record_regions(inode->i_sb, inode->i_ino,
|
|
|
|
map.m_lblk, map.m_pblk, map.m_len, 1);
|
2020-10-16 04:37:59 +08:00
|
|
|
}
|
|
|
|
cur = cur + map.m_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|