linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-27 06:34:11 +08:00

History

Coly Li 2aa8c52938 bcache: avoid unnecessary btree nodes flushing in btree_flush_write() the commit `91be66e131` ("bcache: performance improvement for btree_flush_write()") was an effort to flushing btree node with oldest btree node faster in following methods, - Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot of clean btree nodes. - Take c->btree_cache as a LRU-like list, aggressively flushing all dirty nodes from tail of c->btree_cache util the btree node with oldest journal entry is flushed. This is to reduce the time of holding c->bucket_lock. Guoju Fang and Shuang Li reported that they observe unexptected extra write I/Os on cache device after applying the above patch. Guoju Fang provideed more detailed diagnose information that the aggressive btree nodes flushing may cause 10x more btree nodes to flush in his workload. He points out when system memory is large enough to hold all btree nodes in memory, c->btree_cache is not a LRU-like list any more. Then the btree node with oldest journal entry is very probably not- close to the tail of c->btree_cache list. In such situation much more dirty btree nodes will be aggressively flushed before the target node is flushed. When slow SATA SSD is used as cache device, such over- aggressive flushing behavior will cause performance regression. After spending a lot of time on debug and diagnose, I find the real condition is more complicated, aggressive flushing dirty btree nodes from tail of c->btree_cache list is not a good solution. - When all btree nodes are cached in memory, c->btree_cache is not a LRU-like list, the btree nodes with oldest journal entry won't be close to the tail of the list. - There can be hundreds dirty btree nodes reference the oldest journal entry, before flushing all the nodes the oldest journal entry cannot be reclaimed. When the above two conditions mixed together, a simply flushing from tail of c->btree_cache list is really NOT a good idea. Fortunately there is still chance to make btree_flush_write() work better. Here is how this patch avoids unnecessary btree nodes flushing, - Only acquire c->journal.lock when getting oldest journal entry of fifo c->journal.pin. In rested locations check the journal entries locklessly, so their values can be changed on other cores in parallel. - In loop list_for_each_entry_safe_reverse(), checking latest front point of fifo c->journal.pin. If it is different from the original point which we get with locking c->journal.lock, it means the oldest journal entry is reclaim on other cores. At this moment, all selected dirty nodes recorded in array btree_nodes[] are all flushed and clean on other CPU cores, it is unncessary to iterate c->btree_cache any longer. Just quit the list_for_each_entry_safe_reverse() loop and the following for-loop will skip all the selected clean nodes. - Find a proper time to quit the list_for_each_entry_safe_reverse() loop. Check the refcount value of orignial fifo front point, if the value is larger than selected node number of btree_nodes[], it means more matching btree nodes should be scanned. Otherwise it means no more matching btee nodes in rest of c->btree_cache list, the loop can be quit. If the original oldest journal entry is reclaimed and fifo front point is updated, the refcount of original fifo front point will be 0, then the loop will be quit too. - Not hold c->bucket_lock too long time. c->bucket_lock is also required for space allocation for cached data, hold it for too long time will block regular I/O requests. When iterating list c->btree_cache, even there are a lot of maching btree nodes, in order to not holding c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are selected and to flush in following for-loop. With this patch, only btree nodes referencing oldest journal entry are flushed to cache device, no aggressive flushing for unnecessary btree node any more. And in order to avoid blocking regluar I/O requests, each time when btree_flush_write() called, at most only BTREE_FLUSH_NR btree nodes are selected to flush, even there are more maching btree nodes in list c->btree_cache. At last, one more thing to explain: Why it is safe to read front point of c->journal.pin without holding c->journal.lock inside the list_for_each_entry_safe_reverse() loop ? Here is my answer: When reading the front point of fifo c->journal.pin, we don't need to know the exact value of front point, we just want to check whether the value is different from the original front point (which is accurate value because we get it while c->jouranl.lock is held). For such purpose, it works as expected without holding c->journal.lock. Even the front point is changed on other CPU core and not updated to local core, and current iterating btree node has identical journal entry local as original fetched fifo front point, it is still safe. Because after holding mutex b->write_lock (with memory barrier) this btree node can be found as clean and skipped, the loop will quite latter when iterate on next node of list c->btree_cache. Fixes: `91be66e131` ("bcache: performance improvement for btree_flush_write()") Reported-by: Guoju Fang <fangguoju@gmail.com> Reported-by: Shuang Li <psymon@bonuscloud.io> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>		2020-01-23 11:40:02 -07:00
..
accessibility
acpi	ACPI: PM: Avoid attaching ACPI PM domain to certain devices	2019-12-10 00:22:18 +01:00
amba
android	binder: fix incorrect calculation for num_valid	2019-12-14 09:10:47 +01:00
ata	pci-v5.5-changes	2019-12-03 13:58:22 -08:00
atm
auxdisplay	auxdisplay: charlcd: deduplicate simple_strtoul()	2019-12-04 19:44:12 -08:00
base	Merge branch 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux	2019-12-15 11:36:12 -08:00
bcma
block	xen: branch for v5.5-rc2	2019-12-15 12:24:44 -08:00
bluetooth	Bluetooth: btbcm: Use the BDADDR_PROPERTY quirk	2019-11-22 13:35:20 +01:00
bus	bus: ti-sysc: Fix missing reset delay handling	2019-12-12 08:20:10 -08:00
cdrom	cdrom: respect device capabilities during opening action	2019-11-26 13:02:24 -07:00
char	drm msm + fixes for 5.5-rc1	2019-12-06 10:28:09 -08:00
clk	ARM: SoC platform updates	2019-12-05 11:38:40 -08:00
clocksource	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 12:20:25 -08:00
connector
counter
cpufreq	cpufreq: vexpress-spc: Switch cpumask from topology core to OPP sharing	2019-12-09 11:52:50 +00:00
cpuidle	cpuidle: Drop unnecessary type cast in cpuidle_poll_time()	2019-12-12 17:56:08 +01:00
crypto	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	2019-12-02 17:23:21 -08:00
dax	libnvdimm for 5.5	2019-12-01 18:43:25 -08:00
dca
devfreq	PM / devfreq: Use PM QoS for sysfs min/max_freq	2019-12-09 12:19:16 +09:00
dio
dma	dmaengine: Fix Kconfig indentation	2019-11-22 11:16:26 +05:30
dma-buf	- A fix for a memory leak in the dma-buf support	2019-12-09 17:13:19 +10:00
edac	EDAC/altera: Use the Altera System Manager driver	2019-11-22 10:18:29 +01:00
eisa
extcon	Char/Misc driver patches for 5.5-rc1	2019-11-27 10:53:50 -08:00
firewire	FireWire (IEEE 1394) subsystem updates:	2019-12-02 14:13:00 -08:00
firmware	Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-17 10:39:55 -08:00
fpga
fsi
gnss
gpio	spi: Fixes for v5.5	2019-12-17 13:06:31 -08:00
gpu	Merge tag 'drm-fixes-5.5-2019-12-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes	2019-12-13 14:50:01 +10:00
greybus
hid	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid	2019-12-01 18:20:54 -08:00
hsi
hv	Merge branch 'akpm' (patches from Andrew)	2019-12-01 20:36:41 -08:00
hwmon	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
hwspinlock
hwtracing	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
i2c	i2c: remove i2c_new_dummy() API	2019-12-10 23:15:09 +01:00
i3c
ide	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
idle	cpuidle: Drop disabled field from struct cpuidle_state	2019-11-29 11:48:39 +01:00
iio	First set of fixes for IIO in the 5.5 cycle.	2019-12-09 09:27:52 +01:00
infiniband	Pull request for 5.5-rc2	2019-12-15 14:58:13 -08:00
input	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input	2019-12-07 18:33:01 -08:00
interconnect	interconnect: qcom: msm8974: Walk the list safely on node removal	2019-12-12 10:28:54 +01:00
iommu	iommu: fix KASAN use-after-free in iommu_insert_resv_region	2019-12-16 08:58:42 -08:00
ipack
irqchip	pci-v5.5-changes	2019-12-03 13:58:22 -08:00
isdn	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
leds	Merge tag 'leds-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds	2019-12-01 16:09:28 -08:00
lightnvm
macintosh	powerpc updates for 5.5	2019-11-30 14:35:43 -08:00
mailbox	mailbox changes for v5.5	2019-12-01 18:42:02 -08:00
mcb
md	bcache: avoid unnecessary btree nodes flushing in btree_flush_write()	2020-01-23 11:40:02 -07:00
media	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
memory	memory: tegra: Fixes for v5.5-rc1	2019-12-06 08:28:51 -08:00
memstick	pci-v5.5-changes	2019-12-03 13:58:22 -08:00
message
mfd	chrome platform changes for v5.5	2019-12-03 14:37:12 -08:00
misc	lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr	2019-12-04 19:44:13 -08:00
mmc	Driver core patches for 5.5-rc1	2019-11-27 11:06:20 -08:00
mtd	TTY/Serial patches for 5.5-rc1	2019-12-03 14:09:14 -08:00
mux
net	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
nfc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-22 16:27:24 -08:00
ntb	Add Hygon Device ID to the AMD NTB device driver	2019-12-07 18:38:17 -08:00
nubus
nvdimm	libnvdimm for 5.5	2019-12-01 18:43:25 -08:00
nvme	block: Allow t10-pi to be modular	2020-01-06 20:59:04 -07:00
nvmem	ARM: SoC-related driver updates	2019-12-05 11:43:31 -08:00
of	of/platform: Unconditionally pause/resume sync state during kernel init	2019-12-12 18:39:52 -06:00
opp
oprofile	Printk changes for 5.5	2019-11-25 19:40:40 -08:00
parisc
parport
pci	PCI: rockchip: Fix IO outbound ATU register number	2019-12-12 15:25:37 -06:00
pcmcia	pcmcia: remove unused dprintk definition	2019-11-22 07:03:45 +01:00
perf
phy	ARM: SoC-related driver updates	2019-12-05 11:43:31 -08:00
pinctrl	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 09:29:50 -08:00
platform	chrome platform changes for v5.5	2019-12-03 14:37:12 -08:00
pnp
power	Additional power management updates for 5.5-rc1	2019-12-04 10:48:09 -08:00
powercap
pps
ps3
ptp	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-11-16 21:51:42 -08:00
pwm	pwm: Changes for v5.5-rc1	2019-12-05 11:28:14 -08:00
rapidio	drivers/rapidio/rio-access.c: fix missing include of <linux/rio_drv.h>	2019-12-04 19:44:13 -08:00
ras
regulator	regulator: Fixes for v5.5	2019-12-17 13:08:41 -08:00
remoteproc	remoteproc: stm32: fix probe error case	2019-11-18 20:35:16 -08:00
reset	reset: Do not register resource data for missing resets	2019-12-10 11:43:37 +01:00
rpmsg	rpmsg updates for v5.5	2019-12-01 18:39:24 -08:00
rtc	RTC for 5.5	2019-12-03 13:31:08 -08:00
s390	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
sbus
scsi	block: Allow t10-pi to be modular	2020-01-06 20:59:04 -07:00
sfi
sh
siox
slimbus
soc	ARM: SoC fixes	2019-12-06 14:19:37 -08:00
soundwire
spi	spi: Fixes for v5.5	2019-12-17 13:06:31 -08:00
spmi
ssb
staging	Staging/IIO fixes for 5.5-rc2	2019-12-14 12:43:57 -08:00
target	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
tc
tee	Merge mainline/master into arm/fixes	2019-12-05 13:18:54 -08:00
thermal	thermal: power_allocator: Fix Kconfig warning	2019-12-07 21:49:06 +08:00
thunderbolt	thunderbolt: Power cycle the router if NVM authentication fails	2019-11-19 17:35:57 +01:00
tty	TTY/Serial patches for 5.5-rc1	2019-12-03 14:09:14 -08:00
uio
usb	USB driver fixes for 5.5-rc2	2019-12-14 12:40:39 -08:00
vfio	VFIO updates for v5.5-rc1	2019-12-07 14:51:04 -08:00
vhost	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-12-08 13:28:11 -08:00
video	pci-v5.5-changes	2019-12-03 13:58:22 -08:00
virt	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
virtio	virtio_balloon: divide/multiply instead of shifts	2019-12-11 08:14:07 -05:00
visorbus
vlynq
vme
w1
watchdog	linux-watchdog 5.5-rc1 tag	2019-12-01 18:01:03 -08:00
xen	xen: branch for v5.5-rc2	2019-12-15 12:24:44 -08:00
zorro
Kconfig
Makefile