linux/fs/btrfs
Filipe Manana 1f250e929a Btrfs: fix log replay failure after unlink and link combination
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/testdir
  $ touch /mnt/testdir/foo
  $ ln /mnt/testdir/foo /mnt/testdir/bar

  $ sync

  $ unlink /mnt/testdir/bar
  $ touch /mnt/testdir/bar
  $ xfs_io -c "fsync" /mnt/testdir/bar

  <power failure>

  $ mount /dev/sdb /mnt
  mount: mount(2) failed: /mnt: No such file or directory

When replaying the log, for that example, we also see the following in
dmesg/syslog:

  [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
  [71813.674204] ------------[ cut here ]------------
  [71813.675694] BTRFS: Transaction aborted (error -2)
  [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
  [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        4.15.0-rc9-btrfs-next-56+ #1
  [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
  [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
  [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
  [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
  [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
  [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
  [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
  [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
  [71813.679669] Call Trace:
  [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
  [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
  [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
  [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
  [71813.679669]  ? rcu_read_unlock+0x3a/0x57
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
  [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
  [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
  [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
  [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
  [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  do_mount+0x6e5/0x973
  [71813.679669]  ? memdup_user+0x3e/0x5c
  [71813.679669]  SyS_mount+0x72/0x98
  [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
  [71813.679669] RIP: 0033:0x7f7cedf150ba
  [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
  [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
  [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
  [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
  [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
  [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
  [71814.128078] BTRFS error (device dm-0): open_ctree failed

This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.

Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:18:40 +01:00
..
tests Btrfs: extent map selftest: dio write vs dio read 2018-01-22 16:08:22 +01:00
acl.c btrfs: preserve i_mode if __btrfs_set_acl() fails 2017-08-21 17:47:42 +02:00
async-thread.c Btrfs: fix confusing worker helper info in stacktrace 2017-10-30 12:27:57 +01:00
async-thread.h btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
backref.c btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes 2018-02-02 16:25:33 +01:00
backref.h btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
btrfs_inode.h btrfs: make the delalloc block rsv per inode 2017-11-01 20:45:35 +01:00
check-integrity.c btrfs: Fix bug for misused dev_t when lookup in dev state hash table. 2017-11-01 20:45:36 +01:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: Remove redundant bio_get/set calls in compressed read/write paths 2018-01-22 16:08:19 +01:00
compression.h btrfs: compression: add helper for type to string conversion 2018-01-22 16:08:16 +01:00
ctree.c btrfs: Improve btrfs_search_slot description 2018-01-22 16:08:19 +01:00
ctree.h Btrfs: fix log replay failure after unlink and link combination 2018-03-01 16:18:40 +01:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items 2018-01-22 16:08:11 +01:00
delayed-inode.h btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
delayed-ref.c btrfs: Ignore errors from btrfs_qgroup_trace_extent_post 2018-02-02 16:25:14 +01:00
delayed-ref.h Btrfs: add __init macro to btrfs init functions 2018-01-22 16:08:11 +01:00
dev-replace.c btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT 2018-01-22 16:08:15 +01:00
dev-replace.h btrfs: constify device path passed to relevant helpers 2017-02-28 14:26:07 +01:00
dir-item.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
disk-io.c btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP 2018-01-22 16:08:21 +01:00
disk-io.h btrfs: sink get_extent parameter to read_extent_buffer_pages 2018-01-22 16:08:13 +01:00
export.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent_io.c btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
extent_io.h btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
extent_map.c Btrfs: noinline merge_extent_mapping 2018-01-22 16:08:22 +01:00
extent_map.h Btrfs: move extent map specific code to extent_map.c 2018-01-22 16:08:22 +01:00
extent-tree.c Btrfs: fix null pointer dereference when replacing missing device 2018-02-02 16:25:44 +01:00
file-item.c Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-05 16:41:23 -07:00
file.c Btrfs: fix space leak after fallocate and zero range operations 2018-01-22 16:08:20 +01:00
free-space-cache.c btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c btrfs: Clean up unused variables in free-space-tree.c 2017-10-30 12:27:59 +01:00
free-space-tree.h btrfs: expose internal free space tree routine only if sanity tests are enabled 2017-08-18 16:36:29 +02:00
hash.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c Btrfs: fix log replay failure after unlink and link combination 2018-03-01 16:18:40 +01:00
inode-map.c Btrfs: rework outstanding_extents 2017-11-01 20:45:35 +01:00
inode-map.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inode.c btrfs: handle failure of add_pending_csums 2018-03-01 16:16:00 +01:00
ioctl.c btrfs: use correct string length in DEV_INFO ioctl 2018-01-22 16:08:21 +01:00
Kconfig Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
Makefile Btrfs: add extent map selftests 2018-01-22 16:08:22 +01:00
math.h
ordered-data.c Btrfs: rework outstanding_extents 2017-11-01 20:45:35 +01:00
ordered-data.h btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
orphan.c
print-tree.c Btrfs: add one more sanity check for shared ref type 2017-08-21 17:47:43 +02:00
print-tree.h btrfs: get fs_info from eb in btrfs_print_tree, remove argument 2017-08-16 16:12:03 +02:00
props.c btrfs: prop: use common helper for type to string conversion 2018-01-22 16:08:16 +01:00
props.h
qgroup.c btrfs: Ignore errors from btrfs_qgroup_trace_extent_post 2018-02-02 16:25:14 +01:00
qgroup.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
raid56.c Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io 2018-01-22 16:08:21 +01:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: remove unused member err from reada_extent 2017-06-19 18:25:59 +02:00
ref-verify.c btrfs: ref-verify: Remove unused parameter from walk_up_tree() to kill warning 2018-01-22 16:08:13 +01:00
ref-verify.h Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
relocation.c btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster 2018-03-01 16:16:12 +01:00
root-tree.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
scrub.c btrfs: rename btrfs_device::scrub_device to scrub_ctx 2018-01-22 16:08:20 +01:00
send.c Btrfs: send, fix issuing write op when processing hole in no data mode 2018-03-01 16:18:07 +01:00
send.h btrfs: fix send ioctl on 32bit with 64bit kernel 2017-10-30 12:27:59 +01:00
struct-funcs.c btrfs: struct-funcs, constify readers 2017-08-16 14:19:53 +02:00
super.c btrfs: use kvzalloc to allocate btrfs_fs_info 2018-03-01 16:15:36 +01:00
sysfs.c btrfs: use proper endianness accessors for super_copy 2018-03-01 16:17:27 +01:00
sysfs.h Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-11-14 13:35:29 -08:00
transaction.c btrfs: use proper endianness accessors for super_copy 2018-03-01 16:17:27 +01:00
transaction.h btrfs: reorder btrfs_transaction members for better packing 2018-01-22 16:08:14 +01:00
tree-checker.c btrfs: tree-check: reduce stack consumption in check_dir_item 2018-01-22 16:08:21 +01:00
tree-checker.h btrfs: tree-checker: Fix false panic for sanity test 2017-11-28 14:59:09 +01:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c Btrfs: fix log replay failure after unlink and link combination 2018-03-01 16:18:40 +01:00
tree-log.h btrfs: Make btrfs_del_inode_ref take btrfs_inode 2017-02-14 15:50:54 +01:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2016-12-19 18:08:15 +01:00
volumes.c btrfs: alloc_chunk: fix DUP stripe size handling 2018-03-01 16:16:47 +01:00
volumes.h btrfs: Remove unused readahead spinlock 2018-01-22 16:08:21 +01:00
xattr.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
zstd.c btrfs: move some zstd work data from stack to workspace 2018-01-22 16:08:14 +01:00