Many CPUs do not support complete set of RAPL domains, as a
result this detection failed message is very misleading and
can be annoying.
[ 5.082632] intel_rapl: RAPL domain core detection failed
[ 5.088370] intel_rapl: RAPL domain uncore detection failed
So lower the warning message to info and only print out the RAPL
domains that are supported.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
I've confirmed that monitoring the package power usage as well as setting power
limits appear to be working as expected. Supports the package and dram domains.
Tested aginst cpu:
Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit d431cbc53c (PM / sleep: Simplify sleep states sysfs
interface code) the pm_states[] array is not populated initially,
which causes setup_test_suspend() to always fail and the suspend
testing during boot doesn't work any more.
Fix the problem by using pm_labels[] instead of pm_states[] in
setup_test_suspend() and storing a pointer to the label of the
sleep state to test rather than the number representing it,
because the connection between the state numbers and labels is
only established by suspend_set_ops().
Fixes: d431cbc53c (PM / sleep: Simplify sleep states sysfs interface code)
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On the Toshiba Tecra Z40, after a suspend-to-disk, some FN hotkeys
driven by toshiba_acpi are not functional.
Calling the ACPI object ENAB on resume makes them back alive.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
This is a follow-up patch to commit 49458e8308 ("ideapad-laptop:
Constify DMI table and other r/o variables") to do what its commit
message says. The actual commit differs from the patch posted at
https://www.mail-archive.com/platform-driver-x86@vger.kernel.org/msg05340.html
significantly, probably due to a bad merge conflict resolution. Fix up
the mess and constify the DMI table for real and fix the bogus
double-const of ideapad_rfk_data[].
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
During allocation and initialization of the network driver structures,
the wrong pointer is used to initialize a spin lock. Fix the spin lock
initialization by using the proper pointer.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user_skb maybe be leaked if the operation on it failed and codes
skipped into the label "out:" without calling genlmsg_unicast.
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
make defconfig reports:
warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED)
Fixes: d79a61d netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_*
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
pull request: Netfilter/IPVS fixes for net
The following patchset contains seven Netfilter fixes for your net
tree, they are:
1) Make the NAT infrastructure independent of x_tables, some users are
already starting to test nf_tables with NAT without enabling x_tables.
Without this patch for Kconfig, there's a superfluous dependency
between NAT and x_tables.
2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL
with no good reason. From Daniel Borkmann.
3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also
resolves another NAT dependency with x_tables.
4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook
code as elsewhere in the kernel to resolve toolchain problems, from
Zhouyi Zhou.
5) Use iptunnel_handle_offloads() to set up tunnel encapsulation
depending on the offload capabilities, reported by Alex Gartrell
patch from Julian Anastasov.
6) Fix wrong family when registering the ip_vs_local_reply6() hook,
also from Julian.
7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał
Miłecki reported that when jumping from 3.16 to 3.17-rc, his log
target is not selected anymore due to changes in the previous
development cycle to accomodate the full logging support for
nf_tables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some hosts can be both little and big endian.
In certain scenarios a big endian kernel can kexec a little endian kernel.
This patch fixes this case from both ends:
1) Return endianity to original values on shutdown (in case little endian kernel boots after we shutdown).
2) Do not rely on HW reset values when loading driver in little endian kernel
but configure them explicitly (in case previous kernel was big endian and did not reset the HW).
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When qeth device is queried for ethtool data, hardware operation
is performed to extract the necessary information from the card.
If the card is not online at the moment (e.g. it is undergoing
recovery), this operation produces undesired effects like
temporarily freezing the system. This patch prevents execution
of the hardware query operation when the card is not online.
In such case, ioctl() operation returns error with errno ENODEV.
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calxeda 1G/10G XGMAC Ethernet support should be available only on
Calxeda ECX-1000/2000 (Highbank/Midway) platforms.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renesas SuperH Ethernet support should be available only on
Renesas ARM SoCs and SuperH architecture.
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a link is already up, the following sequence makes the kernel
block completely:
ip link set dev eth0 down
ip link set dev eth0 up
This is because on suspended phy, the following lines
__lpc_eth_reset(pldat);
__lpc_eth_init(pldat);
make the LPC ethernet core block (see LPC32x0 manual). The PHY needs to be
(re-)activated low-level first.
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch fixes race conditions between PCI error recovery callbacks and
potential ifup/ifdown.
First, if ifup (tg3_open) is called between tg3_io_error_detected() and
tg3_io_resume() then tp->timer is armed twice before expiry. Once during
tg3_open() and again during tg3_io_resume(). This results in BUG
at kernel/time/timer.c:945.
Second, if ifdown (tg3_close) is called between tg3_io_error_detected()
and tg3_io_resume() then tg3_napi_disable() is called twice without
a tg3_napi_enable between. Once during tg3_io_error_detected() and again
during tg3_close(). The tg3_io_resume() then hangs on rtnl_lock().
v2: Added logging messages per Prashant's request
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We ran into a case on ppc64 running mariadb where io_getevents would
return zeroed out I/O events. After adding instrumentation, it became
clear that there was some missing synchronization between reading the
tail pointer and the events themselves. This small patch fixes the
problem in testing.
Thanks to Zach for helping to look into this, and suggesting the fix.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Analogous to commit 8858d88a25
that fixed commit 70b41abc15
"ARM: ux500: move MSP pin control to the device tree"
accidentally activated MSP2, giving rise to a boot scroll
scream as the kernel attempts to probe a driver for it and
fails to obtain DMA channel 14.
For some reason I forgot to fix this on the Snowball. Fix
this up by marking the node disabled again.
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
The driver was not bound checking the received length byte to ensure it was within the
the buffer size that is allocated for SMBus blocks. This resulted in buffer overflows
whenever an invalid length byte was received.
It also failed to ensure the length byte was not zero. If it received zero, it would end up
in an infinite loop as the at91_twi_read_next_byte function returned immediately without
allowing RHR to be read to clear the RXRDY interrupt.
Tested agaisnt a SMBus compliant battery.
Signed-off-by: Marek Roszko <mark.roszko@gmail.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
In rk3x SOC, the I2C controller can receive/transmit up to 32 bytes data
in one chunk, so the size of data to be write/read to/from TXDATAx/RXDATAx
must be less than or equal 32 bytes at a time.
Tested on rk3288-pinky board, elan receive 158 bytes data.
Signed-off-by: Addy Ke <addy.ke@rock-chips.com>
Acked-by: Max Schwarz <max.schwarz@online.de>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
There is a race condition in at91_do_twi_xfer when signals arrive.
If a signal is recieved while waiting for a transfer to complete
wait_for_completion_interruptible_timeout() will return -ERESTARTSYS.
This is not handled correctly resulting in interrupts still being
enabled and a transfer being in flight when we return.
Symptoms include a range of oopses and bus lockups. Oopses can happen
when the transfer completes because the interrupt handler will corrupt
the stack. If a new transfer is started before the interrupt fires
the controller will start a new transfer in the middle of the old one,
resulting in confused slaves and a locked bus.
To avoid this, use wait_for_completion_io_timeout instead so that we
don't have to deal with gracefully shutting down the transfer and
disabling the interrupts.
Signed-off-by: Simon Lindgren <simon@aqwary.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
The "clock-frequency" DT property is listed as optional, However,
the current code stores the return value of of_property_read_u32 in
the return code of mv64xxx_of_config, but then forgets to clear it
after setting the default value of "clock-frequency". It is then
passed out to the main probe function, resulting in a probe failure
when "clock-frequency" is missing.
This patch checks and then throws away the return value of
of_property_read_u32, instead of storing it and having to clear it
afterwards.
This issue was discovered after the property was removed from all
sunxi DTs.
Fixes: 4c730a06c1 ("i2c: mv64xxx: Set bus frequency to 100kHz if clock-frequency is not provided")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Cc: stable@vger.kernel.org
Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Sometimes the MNR and MST interrupts happen simultaneously (stop automatically
follows NACK, according to the manuals) and in such case the ID_NACK flag isn't
set since the MST interrupt handling precedes MNR and all interrupts are cleared
and disabled then, so that MNR interrupt is never noticed -- this causes NACK'ed
transfers to be falsely reported as successful. Exchanging MNR and MST handlers
fixes this issue, however the MNR bit somehow gets set again even after being
explicitly cleared, so I decided to completely suppress handling of all disabled
interrupts (which is a good thing anyway)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
When intel_tv_detect() fails to do load detection it would forget to
drop the locks and clean up the acquire context. Fix it up.
This is a regression from:
commit 208bf9fdcd
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Mon Aug 11 13:15:35 2014 +0300
drm/i915: Fix locking for intel_enable_pipe_a()
v2: Make the code more readable (Chris)
v3: Drop WARN_ON(type < 0) (Chris)
Cc: stable@vger.kernel.org
Cc: Tibor Billes <tbilles@gmx.com>
Reported-by: Tibor Billes <tbilles@gmx.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
commit 0944fe3f4a ("s390/mm: implement software referenced bits")
triggered another paging/storage key corruption. There is an
unhandled invalid->valid pte change where we have to set the real
storage key from the pgste.
When doing paging a guest page might be swapcache or swap and when
faulted in it might be read-only and due to a parallel scan old.
An do_wp_page will make it writeable and young. Due to software
reference tracking this page was invalid and now becomes valid.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
Since 3.12 or more precisely commit 0944fe3f4a ("s390/mm:
implement software referenced bits") guest storage keys get
corrupted during paging. This commit added another valid->invalid
translation for page tables - namely ptep_test_and_clear_young.
We have to transfer the storage key into the pgste in that case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
commit 39b2bbe3d7
"gpio: add flags argument to gpiod_get*() functions"
added a dynamic flags argument to all the GPIOD getter
functions, however this did not cover the stubs so
when people used gpiod stubs to compile out descriptor
code, compilation failed.
Solve this by:
- Also rename all the stub functions __gpiod_*
- Moving the vararg hack outside of #ifdef CONFIG_GPIOLIB
so these will always be available.
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
As the race condition on the inode cache, following scenario can appear:
[Thread a] [Thread b]
->f2fs_mkdir
->f2fs_add_link
->__f2fs_add_link
->init_inode_metadata failed here
->gc_thread_func
->f2fs_gc
->do_garbage_collect
->gc_data_segment
->f2fs_iget
->iget_locked
->wait_on_inode
->unlock_new_inode
->move_data_page
->make_bad_inode
->iput
When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.
change log from v1:
o Add condition judgment in gc_data_segment() suggested by Changman Lee.
o use iget_failed to simplify code.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hariprasad Shenai says:
====================
Trivial fixes for cxgb4
This patch series adds support to fix T5 adapter accessing T4 adapter registers,
issue mbox command on correct mbox for physical function, avoid dumping write
only registers, use correct length for adapter part number and support to detect
and display firmware reported errors.
The patches series is created against 'net' tree.
And includes patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
Thanks
V2:
Added description for each patch as per David Miller's comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of RDMA-related called to t4_query_params() were issuing mbox commands
on mbox0 instead of mbox4.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid dumping MPS_RPLC_MAP_CTL for reg dumps; this is a Write-Only register.
Reading this register may cause MPS TCAM corruption.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The adapter firmware can indicate error conditions to the host.
If the firmware has indicated an error, print out the reason for
the firmware error.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes few register access for both T4 and T5.
PCIE_CORE_UTL_SYSTEM_BUS_AGENT_STATUS & PCIE_CORE_UTL_PCI_EXPRESS_PORT_STATUS
is T4 only register don't let T5 access them. For T5 MA_PARITY_ERROR_STATUS2
is additionally read. MPS_TRC_RSS_CONTROL is T4 only register, for T5 use
MPS_T5_TRC_RSS_CONTROL.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously it was using the length value of serial number.
Also added macro for VPD unique identifier (0x82).
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We previously assumed that a Port's Capabilities and Advertised Capabilities
would never change from Port Initialization time. This is no longer true
when we can have 10Gb/s and 1Gb/s SFP+ Transceiver Modules randomly swapped.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ALC1150 codec seems to need the COEF- and PLL-setups just like its
compatible ALC882 codec. Some machines (e.g. SunMicro X10SAT) show
the problem like too low output volumes unless the COEF setup is
applied.
Reported-and-tested-by: Dana Goyette <danagoyette@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In case of the HW is not able to do the receive checksum offloading
the only feature to remove is NETIF_F_RXCSUM.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For new GMACs it is possible to turn-on/off the COE.
In the current driver, when disabled the Rx-checksum
via ethtool, the tool reported that csum was disabled
but the HW continued to set the IPC. Indeed this is
because the fix_features allows this. So the patch
fixes this problem by adding the set_features.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver fixes 2014-08-29
The following series of patches includes fixes to the driver.
- Tx hardware queue flushing support dependent on hardware version
- Incorrect reported fifo size
- Proper mmd select in XPCS debugfs support
- Proper queue count for configuring Tx flow control
This patch series is based on net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring Tx flow control the Rx queue count was used instead of
the Tx queue count for looping through the Tx hardware queues. Fix the
code to use the Tx queue count.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debugfs support for the xpcs registers did not properly use the
specified mmd (xpcs_mmd entry) which resulted in the default mmd
value always being used. Update the debugfs support to generate the
proper mmd register value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fifo size reported by the hardware is not correct. Add support
to limit the reported size to what is actually present. Also, fix
the argument types used in the fifo size calculation function.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flushing of the Tx hardware queues is only supported at a certain
level of the hardware. Retrieve the current version of the hardware
and use that to determine if flushing is supported.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If NO_DMA=y:
drivers/built-in.o: In function `xgene_enet_delete_ring':
xgene_enet_main.c:(.text+0x28755a): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `xgene_enet_setup_tx_desc':
xgene_enet_main.c:(.text+0x287774): undefined reference to `dma_map_single'
xgene_enet_main.c:(.text+0x287780): undefined reference to `dma_mapping_error'
drivers/built-in.o: In function `xgene_enet_tx_completion':
xgene_enet_main.c:(.text+0x2878e6): undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `xgene_enet_refill_bufpool':
xgene_enet_main.c:(.text+0x2879d4): undefined reference to `dma_map_single'
xgene_enet_main.c:(.text+0x2879e0): undefined reference to `dma_mapping_error'
drivers/built-in.o: In function `xgene_enet_rx_frame':
xgene_enet_main.c:(.text+0x287aaa): undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `xgene_enet_free_desc_ring':
xgene_enet_main.c:(.text+0x287f98): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `xgene_enet_create_desc_ring':
xgene_enet_main.c:(.text+0x28808e): undefined reference to `dma_alloc_coherent'
drivers/built-in.o: In function `xgene_enet_probe':
xgene_enet_main.c:(.text+0x2883d4): undefined reference to `dma_set_mask'
xgene_enet_main.c:(.text+0x2883ec): undefined reference to `dma_supported'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfs_collapse_file_space() currently writes back the entire file
undergoing collapse range to settle things down for the extent shift
algorithm. While this prevents changes to the extent list during the
collapse operation, the writeback itself is not enough to prevent
unnecessary collapse failures.
The current shift algorithm uses the extent index to iterate the in-core
extent list. If a post-eof delalloc extent persists after the writeback
(e.g., a prior zero range op where the end of the range aligns with eof
can separate the post-eof blocks such that they are not written back and
converted), xfs_bmap_shift_extents() becomes confused over the encoded
br_startblock value and fails the collapse.
As with the full writeback, this is a temporary fix until the algorithm
is improved to cope with a volatile extent list and avoid attempts to
shift post-eof extents.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If we have delalloc extents on a file before we run a collapse range
opertaion, we sync the range that we are going to collapse to
convert delalloc extents in that region to real extents to simplify
the shift operation.
However, the shift operation then assumes that the extent list is
not going to change as it iterates over the extent list moving
things about. Unfortunately, this isn't true because we can't hold
the ILOCK over all the operations. We can prevent new IO from
modifying the extent list by holding the IOLOCK, but that doesn't
prevent writeback from running....
And when writeback runs, it can convert delalloc extents is the
range of the file prior to the region being collapsed, and this
changes the indexes of all the extents in the file. That causes the
collapse range operation to Go Bad.
The right fix is to rewrite the extent shift operation not to be
dependent on the extent list not changing across the entire
operation, but this is a fairly significant piece of work to do.
Hence, as a short-term workaround for the problem, sync the entire
file before starting a collapse operation to remove all delalloc
ranges from the file and so avoid the problem of concurrent
writeback changing the extent list.
Diagnosed-and-Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>