At this point we will have dropped extent entries from the file, so if we fail
to insert the new hole entries then we are leaving the fs in a corrupt state
(albeit an easily fixed one). Abort the transaciton if this happens so we can
avoid corrupting the fs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In order to do hole punching we have a block reserve to hold the reservation we
need to drop the extents in our range. Since we could end up dropping a lot of
extents we set rsv->failfast so we can just loop around again and drop the
remaining of the range. Unfortunately we unconditionally fill the hole extents
in and start from the last extent we encountered, which we may or may not have
dropped. So this can result in overlapping file extent entries, which can be
tripped over in a variety of ways, either by hitting BUG_ON(!ret) in
fill_holes() after the search, or in btrfs_set_item_key_safe() in
btrfs_drop_extent() at a later time by an unrelated task. Fix this by only
setting drop_end to the last extent we did actually drop. This way our holes
are filled in properly for the range that we did drop, and the rest of the range
that remains to be dropped is actually dropped. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we process the last item in the leaf and hit an I/O error while
reading the next leaf, we return -EIO without having adjusted the
position. Since we have emitted dirents, getdents() will return
the byte count to the user instead of the error. Subsequent callers
will emit the last successful dirent again, and return -EIO again,
with the same result. Callers loop forever.
Instead, if we always increment ctx->pos after emitting or skipping
the dirent, we'll be sure that we won't hit the same one again. When
we go to process the next leaf, we won't have emitted any dirents
and the -EIO will be returned to the user properly. We also don't
need to track if we've emitted a dirent already or if we've changed
the position yet.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 3de4586c52 (Btrfs: Allow subvolumes and snapshots anywhere
in the directory tree) introduced the current system of placing
snapshots in the directory tree. It also introduced the behavior of
creating the snapshot and then creating the directory entries for it.
We've kept this code around for compatibility reasons, but it turns
out that no file systems with the old tree_root based snapshots can
be mounted on newer (>= 2009) kernels anyway. About a month after the
above commit, commit 2a7108ad89 (Btrfs: rev the disk format for the
inode compat and csum selection changes) landed, changing the superblock
magic number.
As a result, we know that we'll never encounter tree_root-based dirents
or have to deal with skipping our own snapshot dirents. Since that
also means that we're now only iterating over DIR_INDEX items, which only
contain one directory entry per leaf item, we don't need to loop over
the leaf item contents anymore either.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If zlib_inflateInit2 fails, the input page is never unmapped.
Add a call to kunmap when it fails.
Signed-off-by: Nick Terrell <nickrterrell@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The balance status item contains currently known filter values, but the
stripes filter was unintentionally not among them. This would mean, that
interrupted and automatically restarted balance does not apply the
stripe filters.
Fixes: dee32d0ac3
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba@suse.com>
'btrfs_iget()' can not return NULL, so this test can be removed.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
csum member of struct btrfs_super_block has array type of u8. It makes
sense that function btrfs_csum_final should be also declared to accept
u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc,
char *result); to void btrfs_csum_final(u32 crc, u8 *result);
Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com>
[ changed cast to u8 at several call sites ]
Signed-off-by: David Sterba <dsterba@suse.com>
If we have
|0--hole--4095||4096--preallocate--12287|
instead of using preallocated space, a 8K direct write will just
create a new 8K extent and it'll end up with
|0--new extent--8191||8192--preallocate--12287|
It's because we find a hole em and then go to create a new 8K
extent directly without adjusting @len.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.
Signed-off-by: Shailendra Verma <shailendra.v@samsung.com>
[ edit subject ]
Signed-off-by: David Sterba <dsterba@suse.com>
Using copy_extent_buffer is suitable for copying betwenn buffers from an
arbitrary offset and deals with page boundaries. This is not necessary
when doing a full extent_buffer-to-extent_buffer copy. We can utilize
the copy_page helper as well.
Signed-off-by: David Sterba <dsterba@suse.com>
The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.
Signed-off-by: David Sterba <dsterba@suse.com>
The fsid and chunk tree uuid are always located in the first page,
we don't need the to use write_extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
During the time, the function has been shrunk to the point that it just
calls find_extent_buffer, just passing the parameters.
Signed-off-by: David Sterba <dsterba@suse.com>
We dereference fs_info several times, besides that post-mount functions
should never see a NULL fs_info.
Signed-off-by: David Sterba <dsterba@suse.com>
The lock is held, we make the same lookup that previously failed with
EEXIST and we don't insert NULL pointers.
Signed-off-by: David Sterba <dsterba@suse.com>
Originally, the eb and start were passed separately in case eb is NULL.
Since the readahead has been refactored in 4.6, this is not true anymore
and we can get rid of the parameter.
Signed-off-by: David Sterba <dsterba@suse.com>
'start' is not used since "btrfs: reada: Pass reada_extent into
__readahead_hook directly" (6e39dbe8b9).
Signed-off-by: David Sterba <dsterba@suse.com>
We can't touch the eb directly in case the function is called with a
non-zero error, so we can read the eb level when needed.
Signed-off-by: David Sterba <dsterba@suse.com>
The helpers are not meant to be generic, the name is misleading. Convert
them to static inlines for type checking.
Signed-off-by: David Sterba <dsterba@suse.com>
They're not even documented anywhere, letting users with no recourse but
to RTFS. It's no big burden to output the bitfield as words.
Also, display unknown flags as hex.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
My QEMU VM was seeing inexplicable I/O errors that I tracked down to
errors coming from the qcow2 virtual drive in the host system. The qcow2
file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
Every once in awhile, pread() or pwrite() would return EEXIST, which
makes no sense. This turned out to be a bug in btrfs_get_extent().
Commit 8dff9c8534 ("Btrfs: deal with duplciates during extent_map
insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
two threads race on adding the same extent map to an inode's extent map
tree. However, if the added em is merged with an adjacent em in the
extent tree, then we'll end up with an existing extent that is not
identical to but instead encompasses the extent we tried to add. When we
call merge_extent_mapping() to find the nonoverlapping part of the new
em, the arithmetic overflows because there is no such thing. We then end
up trying to add a bogus em to the em_tree, which results in a EEXIST
that can bubble all the way up to userspace.
Fix it by extending the identical extent map special case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tickets_id's name may result in some misunderstandings, it just indicates
the next ticket will be handled and is not stored per ticket.
Fixes: ce12965 ("btrfs: introduce tickets_id to determine whether
asynchronous metadata reclaim work makes progress")
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_map_block supports different types of mappings, which to a large
extent resemble block layer operations. But they don't always do, and
currently btrfs dangerously overlays it's own flag over the block layer
flags. This is just asking for a conflict, so introduce a different
map flags enum inside of btrfs instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull IOMMU fixes from David Woodhouse:
"Two minor fixes.
The first fixes the assignment of SR-IOV virtual functions to the
correct IOMMU unit, and the second fixes the excessively large (and
physically contiguous) PASID tables used with SVM"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Fix PASID table allocation
iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.9:
- Fix unreadable output in __do_page_fault due to the KERN_CONT
patchset
- Correctly handle MIPS R6 fixes to the c0_wired register"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: mm: Fix output of __do_page_fault
MIPS: Mask out limit field when calculating wired entry count
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull i2c fixes from Wolfram Sang:
"Here is a revert and two bugfixes for the I2C designware driver.
Please note that we are still hunting down a regression for the
i2c-octeon driver. While there is a fix pending, we have unclear
feedback from the testers currently. An rc8 would be quite helpful
for this case"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: designware: do not disable adapter after transfer"
i2c: designware: fix rx fifo depth tracking
i2c: designware: report short transfers
Pull ARM fix from Russell King:
"This resolves the ksyms issues by reverting the commit which
introduced the breakage"
There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
Revert "arm: move exports to definitions"
Pull networking fixes from David Miller:
1) Fix leak in fsl/fman driver, from Dan Carpenter.
2) Call flow dissector initcall earlier than any networking driver can
register and start to use it, from Eric Dumazet.
3) Some dup header fixes from Geliang Tang.
4) TIPC link monitoring compat fix from Jon Paul Maloy.
5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
Florian Fainelli.
6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
Mashak.
7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
tipc: resolve connection flow control compatibility problem
mvpp2: use correct size for memset
net/mlx5: drop duplicate header delay.h
net: ieee802154: drop duplicate header delay.h
ibmvnic: drop duplicate header seq_file.h
fsl/fman: fix a leak in tgec_free()
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
tipc: improve sanity check for received domain records
tipc: fix compatibility bug in link monitoring
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
dwc_eth_qos: drop duplicate headers
net sched filters: fix filter handle ID in tfilter_notify_chain()
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
bnxt: do not busy-poll when link is down
udplite: call proper backlog handlers
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
net/mlx4_en: Free netdev resources under state lock
net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
bnxt_en: Fix a VXLAN vs GENEVE issue
...
Pull libnvdimm fixes from Dan Williams:
- Fix a crash that occurs at driver initialization if the memory region
is already busy (request_mem_region() fails).
- Fix a vma validation check that mistakenly allows a private device-
dax mapping to be established. Device-dax explicitly forbids private
mappings so it can guarantee a given fault granularity and backing
memory type.
Both of these fixes have soaked in -next and are tagged for -stable.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fail all private mapping attempts
device-dax: check devm_nsio_enable() return value
Four fixes for bugs found by syzkaller on x86, all for stable.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJYObr8AAoJEED/6hsPKofocbIH/j3p7QB73rDM2OCBhzTgGoOb
hcMLXnYEBD5C48ym2QW+wTEWJNNBikKOknYDX8wD1fIsaf8QoMqjEOSyxLPlexWI
mfTZnRAqSqYY9sPdlexpGAQV1uusCoIf2q9A+kW9Yy5q9ngzimiimRtFXgb/u6o5
mXZc7WcM8ZYSYdS+0Bz1lL6k1MGt1Yn207tQ3QNdWi4Pn6aWZp3+8C7rLjWu5zq8
LkMRsgedyxjULnyXedF+/IaXlC7qVO2LVwdxuHWsmeAPp/GmrNbAD+/4JKNk/Sgz
DPcPOWB/cCcCbWVY/8k+gRm0mnknX4bqYnwHwju++gwiUmJXIg3vWKfCDUw2SN0=
=MnV8
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"Four fixes for bugs found by syzkaller on x86, all for stable"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: check for pic and ioapic presence before use
KVM: x86: fix out-of-bounds accesses of rtc_eoi map
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
KVM: x86: fix out-of-bounds access in lapic
Fixes marked for stable:
- Set missing wakeup bit in LPCR on POWER9 (Benjamin Herrenschmidt)
- Fix the early OPAL console wrappers (Oliver O'Halloran)
- Fixup kernel read only mapping (Aneesh Kumar K.V)
Fixes for code merged this cycle:
- Fix missing CRCs, add more asm-prototypes.h declarations (Nicholas Piggin)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYOTtHAAoJEFHr6jzI4aWAGDkP+wQB4UWU35wjU9QIVTwk5Xoo
TngN0iDa659/qBlnfnWpP7LjfePrkJxmvF9C8xBACF21iQ5Yzh0AZ93jQw6wa20H
smZqlXC29CvEbo5V5yqc/STeOeAPs5mDECLNR+tue5Rc9H+FXBTu8H+L/B+UUk56
IyR/hyns4HNo1bEj9hp/7MwHzMKWLkvKeRKuFeXU+CF8o+CNWBFjtlH2UYZhBtM8
QhIuPxWxVDGJa1JT6OJxm1wAJzTvNPW8Nm5BQvDc5eSTVW8KlV4hx47fAGQMFzFf
tP87KbQLqpR4WqrJQn+/NwayjhaCXCojc0XpY4EjwQL2EZ9nyU2XwOquxzghJnuD
zdKFI7NvuCI/VUMa3OT+1XyJE2DuUT1MJN/kICGi2y4T43TGwTFwVgimcsoQQ0YU
oet9ISs5bxh3xdKfzlen6mM9r61HDFUgsYmIwID8EAucyLnVa8GLUT5E+x90FKDO
/P3B4BB/5b87BdcmqVYyiP3QB1MrqiaV0ogngmoW3lPeiSYu1AgkNkmniDTsW93z
t6cYi5gjqquABbpMpmRIHDr/Uhc8zTn/7f/hjRbQ3ujyDjwqQ7b28498JYx4nGkL
FIfpJOjHuTzoCvYvGelY6F/FD+NNHvijTShR788aTYECXmVO7CKGRCJalVTMw/iw
w2sx5fcurB470Pr9GR5j
=75sc
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Set missing wakeup bit in LPCR on POWER9
- Fix the early OPAL console wrappers
- Fixup kernel read only mapping
Fixes for code merged this cycle:
- Fix missing CRCs, add more asm-prototypes.h declarations"
* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fixup kernel read only mapping
powerpc/boot: Fix the early OPAL console wrappers
powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
powerpc: Set missing wakeup bit in LPCR on POWER9
In commit 10724cc7bb ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.
However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.
We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-7 detects a short memset in mvpp2, introduced in the original
merge of the driver:
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_cls_init':
drivers/net/ethernet/marvell/mvpp2.c:3296:2: error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
The result seems to be that we write uninitialized data into the
flow table registers, although we did not get any warning about
that uninitialized data usage.
Using sizeof() lets us initialize then entire array instead.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop duplicate header delay.h from mlx5/core/main.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop duplicate header delay.h from adf7242.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop duplicate header seq_file.h from ibmvnic.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We set "tgec->cfg" to NULL before passing it to kfree(). There is no
need to set it to NULL at all. Let's just delete it.
Fixes: 57ba4c9b56 ("fsl/fman: Add FMan MAC support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
command and likewise it shouldn't require the CAP_NET_ADMIN capability.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 35c55c9877 ("tipc: add neighbor monitoring framework") we
added a data area to the link monitor STATE messages under the
assumption that previous versions did not use any such data area.
For versions older than Linux 4.3 this assumption is not correct. In
those version, all STATE messages sent out from a node inadvertently
contain a 16 byte data area containing a string; -a leftover from
previous RESET messages which were using this during the setup phase.
This string serves no purpose in STATE messages, and should no be there.
Unfortunately, this data area is delivered to the link monitor
framework, where a sanity check catches that it is not a correct domain
record, and drops it. It also issues a rate limited warning about the
event.
Since such events occur much more frequently than anticipated, we now
choose to remove the warning in order to not fill the kernel log with
useless contents. We also make the sanity check stricter, to further
reduce the risk that such data is inavertently admitted.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>