linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-14 15:54:15 +08:00

Author	SHA1	Message	Date
Alexander Aring	56171e0db2	fs: dlm: const void resource name parameter The resource name parameter should never be changed by DLM so we declare it as const. At some point it is handled as a char pointer, a resource name can be a non printable ascii string as well. This patch change it to handle it as void pointer as it is offered by DLM API. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 15:02:47 -05:00
Alexander Aring	9cb16d4271	fs: dlm: LSFL_CB_DELAY only for kernel lockspaces This patch only set/clear the LSFL_CB_DELAY bit when it's actually a kernel lockspace signaled by if ls->ls_callback_wq is set or not set in this case. User lockspaces will never evaluate this flag. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 15:02:46 -05:00
Alexander Aring	12cda13cfd	fs: dlm: remove DLM_LSFL_FS from uapi The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:54:54 -05:00
Alexander Aring	7a3de7324c	fs: dlm: trace user space callbacks This patch adds trace callbacks for user locks. Unfortenately user locks are handled in a different way than kernel locks in some cases. User locks never call the dlm_lock()/dlm_unlock() kernel API and use the next step internal API of dlm. Adding those traces from user API callers should make it possible for dlm trace system to see lock handling for user locks as well. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:54:04 -05:00
Alexander Aring	296d9d1e98	fs: dlm: change ls_clear_proc_locks to spinlock This patch changes the ls_clear_proc_locks to a spinlock because there is no need to handle it as a mutex as there is no sleepable context when ls_clear_proc_locks is held. This allows us to call those functionality in non-sleepable contexts. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:54:02 -05:00
Alexander Aring	e152c38dc0	fs: dlm: remove dlm_del_ast prototype This patch removes dlm_del_ast() prototype which is not being used in the dlm subsystem because there is not implementation for it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:54:01 -05:00
Alexander Aring	f45307d395	fs: dlm: handle rcom in else if branch Currently we handle in dlm_receive_buffer() everything else than a DLM_MSG type as DLM_RCOM message. Although a different message than DLM_MSG should be a DLM_RCOM we should explicit check on DLM_RCOM and drop a log_error() if we see something unexpected. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:53:59 -05:00
Alexander Aring	b5c9d37c7f	fs: dlm: allow lockspaces have zero lvblen A dlm user may not use the DLM_LKF_VALBLK flag in the DLM API, so a zero lvblen should be allowed as a lockspace parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:53:16 -05:00
Alexander Aring	7175e131eb	fs: dlm: fix invalid derefence of sb_lvbptr I experience issues when putting a lkbsb on the stack and have sb_lvbptr field to a dangled pointer while not using DLM_LKF_VALBLK. It will crash with the following kernel message, the dangled pointer is here 0xdeadbeef as example: [ 102.749317] BUG: unable to handle page fault for address: 00000000deadbeef [ 102.749320] #PF: supervisor read access in kernel mode [ 102.749323] #PF: error_code(0x0000) - not-present page [ 102.749325] PGD 0 P4D 0 [ 102.749332] Oops: 0000 [#1] PREEMPT SMP PTI [ 102.749336] CPU: 0 PID: 1567 Comm: lock_torture_wr Tainted: G W 5.19.0-rc3+ #1565 [ 102.749343] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014 [ 102.749344] RIP: 0010:memcpy_erms+0x6/0x10 [ 102.749353] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 102.749355] RSP: 0018:ffff97a58145fd08 EFLAGS: 00010202 [ 102.749358] RAX: ffff901778b77070 RBX: 0000000000000000 RCX: 0000000000000040 [ 102.749360] RDX: 0000000000000040 RSI: 00000000deadbeef RDI: ffff901778b77070 [ 102.749362] RBP: ffff97a58145fd10 R08: ffff901760b67a70 R09: 0000000000000001 [ 102.749364] R10: ffff9017008e2cb8 R11: 0000000000000001 R12: ffff901760b67a70 [ 102.749366] R13: ffff901760b78f00 R14: 0000000000000003 R15: 0000000000000001 [ 102.749368] FS: 0000000000000000(0000) GS:ffff901876e00000(0000) knlGS:0000000000000000 [ 102.749372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.749374] CR2: 00000000deadbeef CR3: 000000017c49a004 CR4: 0000000000770ef0 [ 102.749376] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 102.749378] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 102.749379] PKRU: 55555554 [ 102.749381] Call Trace: [ 102.749382] <TASK> [ 102.749383] ? send_args+0xb2/0xd0 [ 102.749389] send_common+0xb7/0xd0 [ 102.749395] _unlock_lock+0x2c/0x90 [ 102.749400] unlock_lock.isra.56+0x62/0xa0 [ 102.749405] dlm_unlock+0x21e/0x330 [ 102.749411] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 102.749416] torture_unlock+0x5a/0x90 [dlm_locktorture] [ 102.749419] ? preempt_count_sub+0xba/0x100 [ 102.749427] lock_torture_writer+0xbd/0x150 [dlm_locktorture] [ 102.786186] kthread+0x10a/0x130 [ 102.786581] ? kthread_complete_and_exit+0x20/0x20 [ 102.787156] ret_from_fork+0x22/0x30 [ 102.787588] </TASK> [ 102.787855] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common kvm_intel iTCO_wdt iTCO_vendor_support kvm vmw_vsock_virtio_transport qxl irqbypass vmw_vsock_virtio_transport_common drm_ttm_helper crc32_pclmul joydev crc32c_intel ttm vsock virtio_scsi virtio_balloon snd_pcm drm_kms_helper virtio_console snd_timer snd drm soundcore syscopyarea i2c_i801 sysfillrect sysimgblt i2c_smbus pcspkr fb_sys_fops lpc_ich serio_raw [ 102.792536] CR2: 00000000deadbeef [ 102.792930] ---[ end trace 0000000000000000 ]--- This patch fixes the issue by checking also on DLM_LKF_VALBLK on exflags is set when copying the lvbptr array instead of if it's just null which fixes for me the issue. I think this patch can fix other dlm users as well, depending how they handle the init, freeing memory handling of sb_lvbptr and don't set DLM_LKF_VALBLK for some dlm_lock() calls. It might a there could be a hidden issue all the time. However with checking on DLM_LKF_VALBLK the user always need to provide a sb_lvbptr non-null value. There might be more intelligent handling between per ls lvblen, DLM_LKF_VALBLK and non-null to report the user the way how DLM API is used is wrong but can be added for later, this will only fix the current behaviour. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:52:26 -05:00
Alexander Aring	9ac8ba46a7	fs: dlm: handle -EINVAL as log_error() If the user generates -EINVAL it's probably because they are using DLM incorrectly. Change the log level to make these errors more visible. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:49:09 -05:00
Alexander Aring	c2d76a62d8	fs: dlm: use __func__ for function name Avoid hard-coded function names inside message format strings. (Prevents checkpatch warnings.) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:46:49 -05:00
Alexander Aring	420ba3cd03	fs: dlm: handle -EBUSY first in unlock validation This patch checks for -EBUSY conditions in dlm_unlock() before checking for -EINVAL conditions (except for CANCEL and FORCEUNLOCK calls where a busy condition is expected.) There are no problems with the current ordering of checks, but this makes dlm_unlock() consistent with dlm_lock(), and may avoid future problems if other checks are added. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:42:32 -05:00
Alexander Aring	44637ca41d	fs: dlm: handle -EBUSY first in lock arg validation During lock arg validation, first check for -EBUSY cases, then for -EINVAL cases. The -EINVAL checks look at lkb state variables which are not stable when an lkb is busy and would cause an -EBUSY result, e.g. lkb->lkb_grmode. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:39:51 -05:00
Alexander Aring	eef6ec9bf3	fs: dlm: fix race between test_bit() and queue_work() This patch fixes a race by using ls_cb_mutex around the bit operations and conditional code blocks for LSFL_CB_DELAY. The function dlm_callback_stop() expects to stop all callbacks and flush all currently queued onces. The set_bit() is not enough because there can still be queue_work() after the workqueue was flushed. To avoid queue_work() after set_bit(), surround both by ls_cb_mutex. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:37:14 -05:00
Alexander Aring	30ea3257e8	fs: dlm: fix race in lowcomms This patch fixes a race between queue_work() in _dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can take the final reference of a dlm_msg and so msg->idx can contain garbage which is signaled by the following warning: [ 676.237050] ------------[ cut here ]------------ [ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50 [ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr [ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546 [ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014 [ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50 [ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 [ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282 [ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006 [ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e [ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001 [ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005 [ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480 [ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000 [ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0 [ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 676.258397] PKRU: 55555554 [ 676.258729] Call Trace: [ 676.259063] <TASK> [ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110 [ 676.259964] queue_bast+0x8b/0xb0 [ 676.260423] grant_pending_locks+0x166/0x1b0 [ 676.261007] _unlock_lock+0x75/0x90 [ 676.261469] unlock_lock.isra.57+0x62/0xa0 [ 676.262009] dlm_unlock+0x21e/0x330 [ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture] [ 676.263815] ? preempt_count_sub+0xba/0x100 [ 676.264361] ? complete+0x1d/0x60 [ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture] [ 676.265555] kthread+0x10a/0x130 [ 676.266007] ? kthread_complete_and_exit+0x20/0x20 [ 676.266616] ret_from_fork+0x22/0x30 [ 676.267097] </TASK> [ 676.267381] irq event stamp: 9579855 [ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60 [ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60 [ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7 [ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0 [ 676.272796] ---[ end trace 0000000000000000 ]--- I reproduced this warning with dlm_locktorture test which is currently not upstream. However this patch fix the issue by make a additional refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg(). In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be the final put. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-23 14:26:37 -05:00
Alexander Aring	9585898922	fs: dlm: move kref_put assert for lkb structs The unhold_lkb() function decrements the lock's kref, and asserts that the ref count was not the final one. Use the kref_put release function (which should not be called) to call the assert, rather than doing the assert based on the kref_put return value. Using kill_lkb() as the release function doesn't make sense if we only want to assert. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:46 -05:00
Alexander Aring	6b0afc0cc3	fs: dlm: don't use deprecated timeout features by default This patch will disable use of deprecated timeout features if CONFIG_DLM_DEPRECATED_API is not set. The deprecated features will be removed in upcoming kernel release v6.2. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:38 -05:00
Alexander Aring	81eeb82fc2	fs: dlm: add deprecation Kconfig and warnings for timeouts This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option that must be enabled to use two timeout-related features that we intend to remove in kernel v6.2. Warnings are printed if either is enabled and used. Neither has ever been used as far as we know. . The DLM_LSFL_TIMEWARN lockspace creation flag will be removed, along with the associated configfs entry for setting the timeout. Setting the flag and configfs file would cause dlm to track how long locks were waiting for reply messages. After a timeout, a kernel message would be logged, and a netlink message would be sent to userspace. Recently, midcomms messages have been added that produce much better logging about actual problems with messages. No use has ever been found for the netlink messages. . The userspace libdlm API has allowed the DLM_LKF_TIMEOUT flag with a timeout value to be set in lock requests. The lock request would be cancelled after the timeout. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-08-01 09:31:32 -05:00
Alexander Aring	8d614a4457	fs: dlm: remove timeout from dlm_user_adopt_orphan Remove the unused timeout parameter from dlm_user_adopt_orphan(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:53 -05:00
Alexander Aring	2bb2a3d66c	fs: dlm: remove waiter warnings This patch removes warning messages that could be logged when remote requests had been waiting on a reply message for some timeout period (which could be set through configfs, but was rarely enabled.) The improved midcomms layer now carefully tracks all messages and replies, and logs much more useful messages if there is an actual problem. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:52 -05:00
Alexander Aring	dfc020f334	fs: dlm: fix grammar in lowcomms output This patch fixes some grammar output in lowcomms implementation by removing the "successful" word which should be "successfully" but it can never be unsuccessfully so we remove it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:50 -05:00
Alexander Aring	f10da927a5	fs: dlm: add comment about lkb IFL flags This patch adds comments about the difference between the lower 2 bytes of lkb flags and the 2 upper bytes of the lkb IFL flags. In short the upper 2 bytes will be handled as internal flags whereas the lower 2 bytes are part of the DLM protocol and are used to exchange messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:49 -05:00
Alexander Aring	3182599f5f	fs: dlm: handle recovery result outside of ls_recover This patch cleans up the handling of recovery results by moving it from ls_recover() to the caller do_ls_recovery(). This makes the error handling clearer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:48 -05:00
Alexander Aring	682bb91b6b	fs: dlm: make new_lockspace() wait until recovery completes Make dlm_new_lockspace() wait until a full recovery completes sucessfully or fails. Previously, dlm_new_lockspace() returned to the caller after dlm_recover_members() finished, which is only partially through recovery. The result of the previous behavior is that the new lockspace would not be usable for some time (especially with overlapping recoveries), and some errors in the later part of recovery could not be returned to the caller. Kernel callers gfs2 and cluster-md have their own wait handling to wait for recovery to complete after calling dlm_new_lockspace(). This continues to work, but will be unnecessary. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:47 -05:00
Alexander Aring	7e09b15cfe	fs: dlm: call dlm_lsop_recover_prep once A lockspace can be "stopped" multiple times consecutively before being "started" (when recoveries overlap.) In this case, the lsop_recover_prep callback only needs to be called once when the lockspace is first stopped, and not repeatedly for each stop. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:46 -05:00
Alexander Aring	ca8031d917	fs: dlm: update comments about recovery and membership handling Make clear that a particular recovery iteration must not be aborted before membership changes are applied to the members list (ls_nodes) and midcomms layer. Interrupting recovery before this can result in missing node-specific changes in midcomms or through lsops. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:57:40 -05:00
Alexander Aring	5d92a30e90	fs: dlm: add resource name to tracepoints This patch adds the resource name to dlm tracepoints. The name usually comes through the lkb_resource, but in some cases a resource may not yet be associated with an lkb, in which case the name and namelen parameters are used. It should be okay to access the lkb_resource and the res_name field at the time when the tracepoint is invoked. The resource is assigned to a lkb and it's reference is being held during the tracepoint call. During this time the resource cannot be freed. Also a lkb will never switch its assigned resource. The name of a dlm_rsb is assigned at creation time and should never be changed during runtime as well. The TP_printk() call uses always a hexadecimal string array representation for the resource name (which is not necessarily ascii.) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:09 -05:00
Alexander Aring	0c4c516fa2	fs: dlm: remove additional dereference of lksb This patch removes a dereference of lksb of lkb when calling ast tracepoint. First it reduces additional overhead, even if traces are not active. Second we can deference it in TP_fast_assign from the existing lkb parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:08 -05:00
Alexander Aring	cd1e8ca9f3	fs: dlm: change ast and bast trace order This patch moves the trace calls for ast and bast to before the ast and bast callback functions are called rather than after. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:06 -05:00
Alexander Aring	b92a4e3f86	fs: dlm: change posix lock sigint handling This patch changes the handling of a plock operation that was interrupted while waiting for a user space reply from dlm_controld. (This is not the lock blocking state, i.e. locks_lock_file_wait().) Currently, when an op is interrupted while waiting on user space, the op is removed. When the user space result later arrives, a kernel message is loggged: "dev_write no op...". This can be seen from a test such as "stress-ng --fcntl 100" and interrupting it with ctrl-c. Now, leave the op in place when interrupted and remove it when the result arrives (the result will be ignored.) With this change, the logged message is not expected to appear, and would indicate a bug. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:05 -05:00
Alexander Aring	4d413ae9ce	fs: dlm: use dlm_plock_info for do_unlock_close This patch refactors do_unlock_close() by using only struct dlm_plock_info as a parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:53:04 -05:00
Alexander Aring	ea06d4cabf	fs: dlm: change plock interrupted message to debug again This patch reverses the commit `bcfad4265c` ("dlm: improve plock logging if interrupted") by moving it to debug level and notifying the user an op was removed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-24 11:52:36 -05:00
Alexander Aring	19d7ca051d	fs: dlm: add pid to debug log This patch adds the pid information which requested the lock operation to the debug log output. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:41:39 -05:00
Alexander Aring	976a062434	fs: dlm: plock use list_first_entry This patch will use the list helper list_first_entry() instead of using list_entry() to get the first element of a list. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-06-23 14:22:10 -05:00
Alexander Aring	8e51ec6146	dlm: use kref_put_lock in __put_lkb This patch will optimize __put_lkb() by using kref_put_lock(). The function kref_put_lock() will only take the lock if the reference is going to be zero, if not the lock will never be held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-05-02 11:23:49 -05:00
Alexander Aring	9502a7f688	dlm: use kref_put_lock in put_rsb This patch will optimize put_rsb() by using kref_put_lock(). The function kref_put_lock() will only take the lock if the reference is going to be zero, if not the lock will never be held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-05-02 11:22:56 -05:00
Alexander Aring	0ccc106052	dlm: remove unnecessary error assign This patch removes unnecessary error assigns to 0 at places we know that error is zero because it was checked on non-zero before. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-05-02 11:22:15 -05:00
Alexander Aring	1689c16913	dlm: fix missing lkb refcount handling We always call hold_lkb(lkb) if we increment lkb->lkb_wait_count. So, we always need to call unhold_lkb(lkb) if we decrement lkb->lkb_wait_count. This patch will add missing unhold_lkb(lkb) if we decrement lkb->lkb_wait_count. In case of setting lkb->lkb_wait_count to zero we need to countdown until reaching zero and call unhold_lkb(lkb). The waiters list unhold_lkb(lkb) can be removed because it's done for the last lkb_wait_count decrement iteration as it's done in _remove_from_waiters(). This issue was discovered by a dlm gfs2 test case which use excessively dlm_unlock(LKF_CANCEL) feature. Probably the lkb->lkb_wait_count value never reached above 1 if this feature isn't used and so it was not discovered before. The testcase ended in a rsb on the rsb keep data structure with a refcount of 1 but no lkb was associated with it, which is itself an invalid behaviour. A side effect of that was a condition in which the dlm was sending remove messages in a looping behaviour. With this patch that has not been reproduced. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-05-02 11:15:59 -05:00
Alexander Aring	e425ac99b1	fs: dlm: cast resource pointer to uintptr_t This patch fixes the following warning when doing a 32 bit kernel build when pointers are 4 byte long: In file included from ./include/linux/byteorder/little_endian.h:5, from ./arch/x86/include/uapi/asm/byteorder.h:5, from ./include/asm-generic/qrwlock_types.h:6, from ./arch/x86/include/asm/spinlock_types.h:7, from ./include/linux/spinlock_types_raw.h:7, from ./include/linux/ratelimit_types.h:7, from ./include/linux/printk.h:10, from ./include/asm-generic/bug.h:22, from ./arch/x86/include/asm/bug.h:87, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/gfp.h:5, from ./include/linux/slab.h:15, from fs/dlm/dlm_internal.h:19, from fs/dlm/rcom.c:12: fs/dlm/rcom.c: In function ‘dlm_send_rcom_lock’: ./include/uapi/linux/byteorder/little_endian.h:32:43: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] #define __cpu_to_le64(x) ((__force __le64)(__u64)(x)) ^ ./include/linux/byteorder/generic.h:86:21: note: in expansion of macro ‘__cpu_to_le64’ #define cpu_to_le64 __cpu_to_le64 ^~~~~~~~~~~~~ fs/dlm/rcom.c:457:14: note: in expansion of macro ‘cpu_to_le64’ rc->rc_id = cpu_to_le64(r); The rc_id value in dlm rcom is handled as u64. The rcom implementation uses for an unique number generation the pointer value of the used dlm_rsb instance. However if the pointer value is 4 bytes long -Wpointer-to-int-cast will print a warning. We get rid of that warning to cast the pointer to uintptr_t which is either 4 or 8 bytes. There might be a very unlikely case where this number isn't unique anymore if using dlm in a mixed cluster of nodes and sizeof(uintptr_t) returns 4 and 8. However this problem was already been there and this patch should get rid of the warning. Fixes: `2f9dbeda8d` ("dlm: use __le types for rcom messages") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-07 09:54:45 -05:00
Jakob Koschel	dc1acd5c94	dlm: replace usage of found with dedicated list iterator variable To move the list iterator variable into the list_for_each_entry_() macro in the future it should be avoided to use the list iterator variable after the loop body. To never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:14 -05:00
Jakob Koschel	c490b3afaa	dlm: remove usage of list iterator for list_add() after the loop body In preparation to limit the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element [1]. Before, the code implicitly used the head when no element was found when using &pos->list. Since the new variable is only set if an element was found, the list_add() is performed within the loop and only done after the loop if it is done on the list head directly. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:13 -05:00
Alexander Aring	ba58995909	dlm: fix pending remove if msg allocation fails This patch unsets ls_remove_len and ls_remove_name if a message allocation of a remove messages fails. In this case we never send a remove message out but set the per ls ls_remove_len ls_remove_name variable for a pending remove. Unset those variable should indicate possible waiters in wait_pending_remove() that no pending remove is going on at this moment. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:09 -05:00
Alexander Aring	f6f7418357	dlm: fix wake_up() calls for pending remove This patch move the wake_up() call at the point when a remove message completed. Before it was only when a remove message was going to be sent. The possible waiter in wait_pending_remove() waits until a remove is done if the resource name matches with the per ls variable ls->ls_remove_name. If this is the case we must wait until a pending remove is done which is indicated if DLM_WAIT_PENDING_COND() returns false which will always be the case when ls_remove_len and ls_remove_name are unset to indicate that a remove is not going on anymore. Fixes: `21d9ac1a53` ("fs: dlm: use event based wait for pending remove") Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:05 -05:00
Alexander Aring	2c3fa6ae4d	dlm: check required context while close This patch adds a WARN_ON() check to validate the right context while dlm_midcomms_close() is called. Even before commit `489d8e559c` ("fs: dlm: add reliable connection if reconnect") in this context dlm_lowcomms_close() flushes all ongoing transmission triggered by dlm application stack. If we do that, it's required that no new message will be triggered by the dlm application stack. The function dlm_midcomms_close() is not called often so we can check if all lockspaces are in such context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:03:01 -05:00
Alexander Aring	401597485c	dlm: cleanup lock handling in dlm_master_lookup This patch will remove the following warning by sparse: fs/dlm/lock.c:1049:9: warning: context imbalance in 'dlm_master_lookup' - different lock contexts for basic block I tried to find any issues with the current handling and I did not find any. However it is hard to follow the lock handling in this area of dlm_master_lookup() and I suppose that sparse cannot realize that there are no issues. The variable "toss_list" makes it really hard to follow the lock handling because if it's set the rsb lock/refcount isn't held but the ls->ls_rsbtbl[b].lock is held and this is one reason why the rsb lock/refcount does not need to be held. If it's not set the ls->ls_rsbtbl[b].lock is not held but the rsb lock/refcount is held. The indicator of toss_list will be used to store the actual lock state. Another possibility is that a retry can happen and then it's hard to follow the specific code part. I did not find any issues but sparse cannot realize that there are no issues. To make it more easier to understand for developers and sparse as well, we remove the toss_list variable which indicates a specific lock state and move handling in between of this lock state in a separate function. This function can be called now in case when the initial lock states are taken which was previously signalled if toss_list was set or not. The advantage here is that we can release all locks/refcounts in mostly the same code block as it was taken. Afterwards sparse had no issues to figure out that there are no problems with the current lock behaviour. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:58 -05:00
Alexander Aring	e91ce03b27	dlm: remove found label in dlm_master_lookup This patch cleanups a not necessary label found which can be replaced by a proper else handling to jump over a specific code block. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:54 -05:00
Alexander Aring	c087eabde1	dlm: remove __user conversion warnings This patch avoids the following sparse warning: fs/dlm/user.c:111:38: warning: incorrect type in assignment (different address spaces) fs/dlm/user.c:111:38: expected void [noderef] __user castparam fs/dlm/user.c:111:38: got void fs/dlm/user.c:112:37: warning: incorrect type in assignment (different address spaces) fs/dlm/user.c:112:37: expected void [noderef] __user castaddr fs/dlm/user.c:112:37: got void fs/dlm/user.c:113:38: warning: incorrect type in assignment (different address spaces) fs/dlm/user.c:113:38: expected void [noderef] __user bastparam fs/dlm/user.c:113:38: got void fs/dlm/user.c:114:37: warning: incorrect type in assignment (different address spaces) fs/dlm/user.c:114:37: expected void [noderef] __user bastaddr fs/dlm/user.c:114:37: got void fs/dlm/user.c:115:33: warning: incorrect type in assignment (different address spaces) fs/dlm/user.c:115:33: expected struct dlm_lksb [noderef] __user lksb fs/dlm/user.c:115:33: got void fs/dlm/user.c:130:39: warning: cast removes address space '__user' of expression fs/dlm/user.c:131:40: warning: cast removes address space '__user' of expression fs/dlm/user.c:132:36: warning: cast removes address space '__user' of expression So far I see there is no direct handling of copying a pointer value to another pointer value. The handling only copies the actual pointer address to a scalar type or vice versa. This should be okay because it never handles dereferencing anything of those addresses in the kernel space. To get rid of those warnings we doing some different casting which results in no warnings in sparse or compiler. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:49 -05:00
Alexander Aring	14a92fd703	dlm: move conversion to compile time This patch is a cleanup to move the byte order conversion to compile time. In a simple comparison like this it's possible to move it to static values so the compiler will always convert those values at compile time. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:40 -05:00
Alexander Aring	00e99ccde7	dlm: use __le types for dlm messages This patch changes to use __le types directly in the dlm message structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:37 -05:00
Alexander Aring	2f9dbeda8d	dlm: use __le types for rcom messages This patch changes to use __le types directly in the dlm rcom structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:32 -05:00
Alexander Aring	3428785a65	dlm: use __le types for dlm header This patch changes to use __le types directly in the dlm header structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:28 -05:00
Alexander Aring	d9efd005fd	dlm: use __le types for options header This patch changes to use __le types directly in the dlm option headers structures which are casted at the right dlm message buffer positions. Currently only midcomms.c using those headers which already was calling endian conversions on-the-fly without using in/out functionality like other endianness handling in dlm. Using __le types now will hopefully get useful warnings in future if we do comparison against host byte order values. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:24 -05:00
Alexander Aring	a8449f232e	dlm: add __CHECKER__ for false positives This patch will adds #ifndef __CHECKER__ for false positives warnings about an imbalance lock/unlock srcu handling. Which are shown by running sparse checks: fs/dlm/midcomms.c:1065:20: warning: context imbalance in 'dlm_midcomms_get_mhandle' - wrong count at exit Using __CHECKER__ will tell sparse to ignore these sections. Those imbalances are false positive because from upper layer it is always required to call a function in sequence, e.g. if dlm_midcomms_get_mhandle() is successful there must be a dlm_midcomms_commit_mhandle() call afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:20 -05:00
Alexander Aring	314a5540ff	dlm: move global to static inits Instead of init global module at module loading time we can move the initialization of those global variables at memory initialization of the module loader. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:16 -05:00
Alexander Aring	16d58904df	dlm: remove unnecessary INIT_LIST_HEAD() There is no need to call INIT_LIST_HEAD() when it's set directly afterwards by list_add_tail(). Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:13 -05:00
Alexander Aring	bcfad4265c	dlm: improve plock logging if interrupted This patch changes the log level if a plock is removed when interrupted from debug to info. Additional it signals now that the plock entity was removed to let the user know what's happening. If on a dev_write() a pending plock cannot be find it will signal that it might have been removed because wait interruption. Before this patch there might be a "dev_write no op ..." info message and the users can only guess that the plock was removed before because the wait interruption. To be sure that is the case we log both messages on the same log level. Let both message be logged on info layer because it should not happened a lot and if it happens it should be clear why the op was not found. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:09 -05:00
Alexander Aring	a800ba77fd	dlm: rearrange async condition return This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier than checking afterwards again if the request was an asynchronous request. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:05 -05:00
Alexander Aring	bcbb4ba6c9	dlm: cleanup plock_op vs plock_xop Lately the different casting between plock_op and plock_xop and list holders which was involved showed some issues which were hard to see. This patch removes the "plock_xop" structure and introduces a "struct plock_async_data". This structure will be set in "struct plock_op" in case of asynchronous lock handling as the original "plock_xop" was made for. There is no need anymore to cast pointers around for additional fields in case of asynchronous lock handling. As disadvantage another allocation was introduces but only needed in the asynchronous case which is currently only used in combination with nfs lockd. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:02:02 -05:00
Alexander Aring	a559790caa	dlm: replace sanity checks with WARN_ON There are several sanity checks and recover handling if they occur in the dlm plock handling. From my understanding those operation can't run in parallel with any list manipulation which involved setting the list holder of plock_op, if so we have a bug which this sanity check will warn about. Previously if such sanity check occurred the dlm plock handling was trying to recover from it by deleting the plock_op from a list which the holder was set to. However there is a bug in the dlm plock handling if this case ever happens. To make such bugs are more visible for further investigations we add a WARN_ON() on those sanity checks and remove the recovering handling because other possible side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:58 -05:00
Alexander Aring	42252d0d2a	dlm: fix plock invalid read This patch fixes an invalid read showed by KASAN. A unlock will allocate a "struct plock_op" and a followed send_op() will append it to a global send_list data structure. In some cases a followed dev_read() moves it to recv_list and dev_write() will cast it to "struct plock_xop" and access fields which are only available in those structures. At this point an invalid read happens by accessing those fields. To fix this issue the "callback" field is moved to "struct plock_op" to indicate that a cast to "plock_xop" is allowed and does the additional "plock_xop" handling if set. Example of the KASAN output which showed the invalid read: [ 2064.296453] ================================================================== [ 2064.304852] BUG: KASAN: slab-out-of-bounds in dev_write+0x52b/0x5a0 [dlm] [ 2064.306491] Read of size 8 at addr ffff88800ef227d8 by task dlm_controld/7484 [ 2064.308168] [ 2064.308575] CPU: 0 PID: 7484 Comm: dlm_controld Kdump: loaded Not tainted 5.14.0+ #9 [ 2064.310292] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 2064.311618] Call Trace: [ 2064.312218] dump_stack_lvl+0x56/0x7b [ 2064.313150] print_address_description.constprop.8+0x21/0x150 [ 2064.314578] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.315610] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.316595] kasan_report.cold.14+0x7f/0x11b [ 2064.317674] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.318687] dev_write+0x52b/0x5a0 [dlm] [ 2064.319629] ? dev_read+0x4a0/0x4a0 [dlm] [ 2064.320713] ? bpf_lsm_kernfs_init_security+0x10/0x10 [ 2064.321926] vfs_write+0x17e/0x930 [ 2064.322769] ? __fget_light+0x1aa/0x220 [ 2064.323753] ksys_write+0xf1/0x1c0 [ 2064.324548] ? __ia32_sys_read+0xb0/0xb0 [ 2064.325464] do_syscall_64+0x3a/0x80 [ 2064.326387] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.327606] RIP: 0033:0x7f807e4ba96f [ 2064.328470] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 39 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 7c 87 f8 ff 48 [ 2064.332902] RSP: 002b:00007ffd50cfe6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 2064.334658] RAX: ffffffffffffffda RBX: 000055cc3886eb30 RCX: 00007f807e4ba96f [ 2064.336275] RDX: 0000000000000040 RSI: 00007ffd50cfe7e0 RDI: 0000000000000010 [ 2064.337980] RBP: 00007ffd50cfe7e0 R08: 0000000000000000 R09: 0000000000000001 [ 2064.339560] R10: 000055cc3886eb30 R11: 0000000000000293 R12: 000055cc3886eb80 [ 2064.341237] R13: 000055cc3886eb00 R14: 000055cc3886f590 R15: 0000000000000001 [ 2064.342857] [ 2064.343226] Allocated by task 12438: [ 2064.344057] kasan_save_stack+0x1c/0x40 [ 2064.345079] __kasan_kmalloc+0x84/0xa0 [ 2064.345933] kmem_cache_alloc_trace+0x13b/0x220 [ 2064.346953] dlm_posix_unlock+0xec/0x720 [dlm] [ 2064.348811] do_lock_file_wait.part.32+0xca/0x1d0 [ 2064.351070] fcntl_setlk+0x281/0xbc0 [ 2064.352879] do_fcntl+0x5e4/0xfe0 [ 2064.354657] __x64_sys_fcntl+0x11f/0x170 [ 2064.356550] do_syscall_64+0x3a/0x80 [ 2064.358259] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.360745] [ 2064.361511] Last potentially related work creation: [ 2064.363957] kasan_save_stack+0x1c/0x40 [ 2064.365811] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.368100] call_rcu+0x11b/0xf70 [ 2064.369785] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.372404] receive_from_sock+0x290/0x770 [dlm] [ 2064.374607] process_recv_sockets+0x32/0x40 [dlm] [ 2064.377290] process_one_work+0x9a8/0x16e0 [ 2064.379357] worker_thread+0x87/0xbf0 [ 2064.381188] kthread+0x3ac/0x490 [ 2064.383460] ret_from_fork+0x22/0x30 [ 2064.385588] [ 2064.386518] Second to last potentially related work creation: [ 2064.389219] kasan_save_stack+0x1c/0x40 [ 2064.391043] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.393303] call_rcu+0x11b/0xf70 [ 2064.394885] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.397694] receive_from_sock+0x290/0x770 [dlm] [ 2064.399932] process_recv_sockets+0x32/0x40 [dlm] [ 2064.402180] process_one_work+0x9a8/0x16e0 [ 2064.404388] worker_thread+0x87/0xbf0 [ 2064.406124] kthread+0x3ac/0x490 [ 2064.408021] ret_from_fork+0x22/0x30 [ 2064.409834] [ 2064.410599] The buggy address belongs to the object at ffff88800ef22780 [ 2064.410599] which belongs to the cache kmalloc-96 of size 96 [ 2064.416495] The buggy address is located 88 bytes inside of [ 2064.416495] 96-byte region [ffff88800ef22780, ffff88800ef227e0) [ 2064.422045] The buggy address belongs to the page: [ 2064.424635] page:00000000b6bef8bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xef22 [ 2064.428970] flags: 0xfffffc0000200(slab\|node=0\|zone=1\|lastcpupid=0x1fffff) [ 2064.432515] raw: 000fffffc0000200 ffffea0000d68b80 0000001400000014 ffff888001041780 [ 2064.436110] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 2064.439813] page dumped because: kasan: bad access detected [ 2064.442548] [ 2064.443310] Memory state around the buggy address: [ 2064.445988] ffff88800ef22680: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.449444] ffff88800ef22700: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.452941] >ffff88800ef22780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc [ 2064.456383] ^ [ 2064.459386] ffff88800ef22800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc [ 2064.462788] ffff88800ef22880: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.466239] ================================================================== reproducer in python: import argparse import struct import fcntl import os parser = argparse.ArgumentParser() parser.add_argument('-f', '--file', help='file to use fcntl, must be on dlm lock filesystem e.g. gfs2') args = parser.parse_args() f = open(args.file, 'wb+') lockdata = struct.pack('hhllhh', fcntl.F_WRLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) lockdata = struct.pack('hhllhh', fcntl.F_UNLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) Fixes: `586759f03e` ("gfs2: nfs lock support for gfs2") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:53 -05:00
Alexander Aring	67e4d8c51d	dlm: fix missing check in validate_lock_args This patch adds a additional check if lkb->lkb_wait_count is non zero as it is done in validate_unlock_args() to check if any operation is in progress. While on it add a comment taken from validate_unlock_args() to signal what the check is doing. There might be no changes because if lkb->lkb_wait_type is non zero implies that lkb->lkb_wait_count is non zero. However we should add the check as it does validate_unlock_args(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:49 -05:00
Dan Carpenter	1f4f10845e	dlm: uninitialized variable on error in dlm_listen_for_all() The "sock" variable is not initialized on this error path. Cc: stable@vger.kernel.org Fixes: `2dc6b1158c` ("fs: dlm: introduce generic listen") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-04-06 14:01:44 -05:00
Linus Torvalds	6dc69d3d0d	driver core changes for 5.17-rc1 Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R VRZJypgNfU/Pt0ykstZD =CO9J -----END PGP SIGNATURE----- Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ...	2022-01-12 11:11:34 -08:00
Alexander Aring	feae43f8aa	fs: dlm: print cluster addr if non-cluster node connects This patch prints the cluster node address if a non-cluster node (according to the dlm config setting) tries to connect. The current hexdump call will print in a different loglevel and only available if dynamic debug is enabled. Additional we using the ip address format strings to print an IETF ip4/6 string represenation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2022-01-04 09:40:24 -06:00
Greg Kroah-Hartman	cf6299b610	kobject: remove kset from struct kset_uevent_ops callbacks There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-28 11:26:18 +01:00
Alexander Aring	e4dc81ed5a	fs: dlm: memory cache for lowcomms hotpath This patch introduces a kmem cache for dlm_msg handles which are used always if dlm sends a message out. Even if their are covered by midcomms layer or not. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	3af2326ca0	fs: dlm: memory cache for writequeue_entry This patch introduces a kmem cache for writequeue entry. A writequeue entry get quite a lot allocated if dlm transmit messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	6c547f2640	fs: dlm: memory cache for midcomms hotpath This patch will introduce a kmem cache for allocating message handles which are needed for midcomms layer to take track of lowcomms messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	be3b0400ed	fs: dlm: remove wq_alloc mutex This patch cleanups the code for allocating a new buffer in the dlm writequeue mechanism. There was a possible tuneup to allow scheduling while a new writequeue entry needs to be allocated because either no sending page is available or are full. To avoid multiple concurrent users checking at the same time if an entry is available or full alloc_wq was introduce that those are waiting if there is currently a new writequeue entry in process to be queued so possible further users will check on the new allocated writequeue entry if it's full. To simplify the code we just remove this mutex and switch that the already introduced spin lock will be held during writequeue check, allocation and queueing. So other users can never check on available writequeues while there is a new one in process but not queued yet. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	21d9ac1a53	fs: dlm: use event based wait for pending remove This patch will use an event based waitqueue to wait for a possible clash with the ls_remove_name field of dlm_ls instead of doing busy waiting. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	bcbfea41e1	fs: dlm: check for pending users filling buffers Currently we don't care if the DLM application stack is filling buffers (not committed yet) while we transmit some already committed buffers. By checking on active writequeue users before dequeue a writequeue entry we know there is coming more data and do nothing. We wait until the send worker will be triggered again if the writequeue entry users hit zero. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	f70813d6a5	fs: dlm: use list_empty() to check last iteration This patch will use list_empty(&ls->ls_cb_delay) to check for last list iteration. In case of a multiply count of MAX_CB_QUEUE and the list is empty we do a extra goto more which we can avoid by checking on list_empty(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-12-07 12:42:26 -06:00
Alexander Aring	1b9beda83e	fs: dlm: fix build with CONFIG_IPV6 disabled This patch will surround the AF_INET6 case in sk_error_report() of dlm with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the socket creation with AF_INET6 should already fail because a runtime check if AF_INET6 is registered. However if there is the possibility that AF_INET6 is set as sk_family the sk_error_report() callback will print then an invalid family type error. Reported-by: kernel test robot <lkp@intel.com> Fixes: `4c3d90570b` ("fs: dlm: don't call kernel_getpeername() in error_report()") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-17 08:46:23 -06:00
Alexander Aring	92c4460538	fs: dlm: replace use of socket sk_callback_lock with sock_lock This patch will replace the use of socket sk_callback_lock lock and uses socket lock instead. Some users like sunrpc, see commit `ea9afca88b` ("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving from sk_callback_lock to sock_lock which seems to be held when the socket callbacks are called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-15 11:49:18 -06:00
Alexander Aring	4c3d90570b	fs: dlm: don't call kernel_getpeername() in error_report() In some cases kernel_getpeername() will held the socket lock which is already held when the socket layer calls error_report() callback. Since commit `9dfc685e02` ("inet: remove races in inet{6}_getname()") this problem becomes more likely because the socket lock will be held always. You will see something like: bob9-u5 login: [ 562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0 [ 562.318562] lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7 [ 562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135 [ 562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 [ 562.321277] Call Trace: [ 562.321529] <IRQ> [ 562.321734] dump_stack_lvl+0x33/0x42 [ 562.322282] do_raw_spin_lock+0x8b/0xc0 [ 562.322674] lock_sock_nested+0x1e/0x50 [ 562.323057] inet_getname+0x39/0x110 [ 562.323425] ? sock_def_readable+0x80/0x80 [ 562.323838] lowcomms_error_report+0x63/0x260 [dlm] [ 562.324338] ? wait_for_completion_interruptible_timeout+0xd2/0x120 [ 562.324949] ? lock_timer_base+0x67/0x80 [ 562.325330] ? do_raw_spin_unlock+0x49/0xc0 [ 562.325735] ? _raw_spin_unlock_irqrestore+0x1e/0x40 [ 562.326218] ? del_timer+0x54/0x80 [ 562.326549] sk_error_report+0x12/0x70 [ 562.326919] tcp_validate_incoming+0x3c8/0x530 [ 562.327347] ? kvm_clock_read+0x14/0x30 [ 562.327718] ? ktime_get+0x3b/0xa0 [ 562.328055] tcp_rcv_established+0x121/0x660 [ 562.328466] tcp_v4_do_rcv+0x132/0x260 [ 562.328835] tcp_v4_rcv+0xcea/0xe20 [ 562.329173] ip_protocol_deliver_rcu+0x35/0x1f0 [ 562.329615] ip_local_deliver_finish+0x54/0x60 [ 562.330050] ip_local_deliver+0xf7/0x110 [ 562.330431] ? inet_rtm_getroute+0x211/0x840 [ 562.330848] ? ip_protocol_deliver_rcu+0x1f0/0x1f0 [ 562.331310] ip_rcv+0xe1/0xf0 [ 562.331603] ? ip_local_deliver+0x110/0x110 [ 562.332011] __netif_receive_skb_core+0x46a/0x1040 [ 562.332476] ? inet_gro_receive+0x263/0x2e0 [ 562.332885] __netif_receive_skb_list_core+0x13b/0x2c0 [ 562.333383] netif_receive_skb_list_internal+0x1c8/0x2f0 [ 562.333896] ? update_load_avg+0x7e/0x5e0 [ 562.334285] gro_normal_list.part.149+0x19/0x40 [ 562.334722] napi_complete_done+0x67/0x160 [ 562.335134] virtnet_poll+0x2ad/0x408 [virtio_net] [ 562.335644] __napi_poll+0x28/0x140 [ 562.336012] net_rx_action+0x23d/0x300 [ 562.336414] __do_softirq+0xf2/0x2ea [ 562.336803] irq_exit_rcu+0xc1/0xf0 [ 562.337173] common_interrupt+0xb9/0xd0 It is and was always forbidden to call kernel_getpeername() in context of error_report(). To get rid of the problem we access the destination address for the peer over the socket structure. While on it we fix to print out the destination port of the inet socket. Fixes: `1a31833d08` ("DLM: Replace nodeid_to_addr with kernel_getpeername") Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-15 11:49:18 -06:00
Alexander Aring	6a628fa438	fs: dlm: fix potential buffer overflow This patch fixes an potential overflow in sscanf and the maximum declared string parsing length which seems to be excluding the null termination symbol. This patch will just add one byte to be prepared on a string with length of DLM_RESNAME_MAXLEN including the null termination symbol. Fixes: `5054e79de9` ("fs: dlm: add lkb debugfs functionality") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-12 09:38:19 -06:00
Zhang Mingyu	c8b9f34e22	fs: dlm:Remove unneeded semicolon Eliminate the following coccinelle check warning: fs/dlm/midcomms.c:972:2-3 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-05 13:03:34 -05:00
Alexander Aring	b87b1883ef	fs: dlm: remove double list_first_entry call This patch removes a list_first_entry() call which is already done by the previous con_next_wq() call. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-03 16:07:20 -05:00
Alexander Aring	6c2e3bf68f	fs: dlm: filter user dlm messages for kernel locks This patch fixes the following crash by receiving a invalid message: [ 160.672220] ================================================================== [ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370 [ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319 [ 160.681447] [ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399 [ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014 [ 160.685574] Workqueue: dlm_recv process_recv_sockets [ 160.686721] Call Trace: [ 160.687310] dump_stack_lvl+0x56/0x6f [ 160.688169] ? dlm_user_add_ast+0xc3/0x370 [ 160.689116] kasan_report.cold.14+0x116/0x11b [ 160.690138] ? dlm_user_add_ast+0xc3/0x370 [ 160.690832] dlm_user_add_ast+0xc3/0x370 [ 160.691502] _receive_unlock_reply+0x103/0x170 [ 160.692241] _receive_message+0x11df/0x1ec0 [ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.694427] ? lock_acquire+0x175/0x400 [ 160.695058] ? do_purge.isra.51+0x200/0x200 [ 160.695744] ? lock_acquired+0x360/0x5d0 [ 160.696400] ? lock_contended+0x6a0/0x6a0 [ 160.697055] ? lock_release+0x21d/0x5e0 [ 160.697686] ? lock_is_held_type+0xe0/0x110 [ 160.698352] ? lock_is_held_type+0xe0/0x110 [ 160.699026] ? ___might_sleep+0x1cc/0x1e0 [ 160.699698] ? dlm_wait_requestqueue+0x94/0x140 [ 160.700451] ? dlm_process_requestqueue+0x240/0x240 [ 160.701249] ? down_write_killable+0x2b0/0x2b0 [ 160.701988] ? do_raw_spin_unlock+0xa2/0x130 [ 160.702690] dlm_receive_buffer+0x1a5/0x210 [ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0 [ 160.704210] receive_from_sock+0x1c0/0x3b0 [ 160.704886] ? dlm_tcp_shutdown+0x30/0x30 [ 160.705561] ? lock_acquire+0x175/0x400 [ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.707681] process_recv_sockets+0x32/0x40 [ 160.708366] process_one_work+0x55e/0xad0 [ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110 [ 160.709820] worker_thread+0x65/0x5e0 [ 160.710423] ? process_one_work+0xad0/0xad0 [ 160.711087] kthread+0x1ed/0x220 [ 160.711628] ? set_kthread_struct+0x80/0x80 [ 160.712314] ret_from_fork+0x22/0x30 The issue is that we received a DLM message for a user lock but the destination lock is a kernel lock. Note that the address which is trying to derefence is 00000000deadbeef, which is in a kernel lock lkb->lkb_astparam, this field should never be derefenced by the DLM kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua (memory is shared by a union field). The struct lkb_ua will be handled by the DLM kernel stack but on a kernel lock it will contain invalid data and ends in most likely crashing the kernel. It can be reproduced with two cluster nodes. node 2: dlm_tool join test echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters node 1: dlm_tool join test python: foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \ m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE) newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb") newFile.write(bytes(foo)) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	63eab2b00b	fs: dlm: add lkb waiters debugfs functionality This patch adds functionality to put a lkb to the waiters state. It can be useful to combine this feature with the "rawmsg" debugfs functionality. It will bring the DLM lkb into a state that a message will be parsed by the kernel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	5054e79de9	fs: dlm: add lkb debugfs functionality This patch adds functionality to add an lkb during runtime. This is a highly debugging feature only, wrong input can crash the kernel. It is a early state feature as well. The goal is to provide a user interface for manipulate dlm state and combine it with the rawmsg feature. It is debugfs functionality, we don't care about UAPI breakage. Even it's possible to add lkb's/rsb's which could never be exists in such wat by using normal DLM operation. The user of this interface always need to think before using this feature, not every crash which happens can really occur during normal dlm operation. Future there should be more functionality to add a more realistic lkb which reflects normal DLM state inside the kernel. For now this is enough. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	75d25ffe38	fs: dlm: allow create lkb with specific id range This patch adds functionality to add a lkb with a specific id range. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	9af5b8f0ea	fs: dlm: add debugfs rawmsg send functionality This patch adds a dlm functionality to send a raw dlm message to a specific cluster node. This raw message can be build by user space and send out by writing the message to "rawmsg" dlm debugfs file. There is a in progress scapy dlm module which provides a easy build of DLM messages in user space. For example: DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...) The goal is to provide an easy reproducable state to crash DLM or to fuzz the DLM kernel stack if there are possible ways to crash it. Note: that if the sequence number is zero and dlm version is not set to 3.1 the kernel will automatic will set a right sequence number, otherwise DLM stack testing is not possible. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	5c16febbc1	fs: dlm: let handle callback data as void This patch changes the dlm_lowcomms_new_msg() function pointer private data from "struct mhandle " to "void " to provide different structures than just "struct mhandle". Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	3cb5977c52	fs: dlm: ls_count busy wait to event based wait This patch changes the ls_count busy wait to use atomic counter values and wait_event() to wait until ls_count reach zero. It will slightly reduce the number of holding lslist_lock. At remove lockspace we need to retry the wait because it a lockspace get could interefere between wait_event() and holding the lock which deletes the lockspace list entry. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	164d88abd7	fs: dlm: requestqueue busy wait to event based wait This patch changes the requestqueue busy waiting algorithm to use atomic counter values and wait_event() to wait until the requestqueue is empty. It will slightly reduce the number of holding ls_requestqueue_mutex mutex. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	92732376fd	fs: dlm: trace socket handling This patch adds tracepoints for dlm socket receive and send functionality. We can use it to track how much data was send or received to or from a specific nodeid. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	f1d3b8f91d	fs: dlm: initial support for tracepoints This patch adds initial support for dlm tracepoints. It will introduce tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their complete ast() callback or blocking bast() callback. The lock/unlock functionality has a start and end tracepoint, this is because there exists a race in case if would have a tracepoint at the end position only the complete/blocking callbacks could occur before. To work with eBPF tracing and using their lookup hash functionality there could be problems that an entry was not inserted yet. However use the start functionality for hash insert and check again in end functionality if there was an dlm internal error so there is no ast callback. In further it might also that locks with local masters will occur those callbacks immediately so we must have such functionality. I did not make everything accessible yet, although it seems eBPF can be used to access a lot of internal datastructures if it's aware of the struct definitions of the running kernel instance. We still can change it, if you do eBPF experiments e.g. time measurements between lock and callback functionality you can simple use the local lkb_id field as hash value in combination with the lockspace id if you have multiple lockspaces. Otherwise you can simple use trace-cmd for some functionality, e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	2f05ec4327	fs: dlm: make dlm_callback_resume quite This patch makes dlm_callback_resume info printout less noisy by accumulate all callback queues into one printout not in 25 times steps. It seems this printout became lately quite noisy in relationship with gfs2. Before: [241767.849302] dlm: bin: dlm_callback_resume 25 [241767.854846] dlm: bin: dlm_callback_resume 25 [241767.860373] dlm: bin: dlm_callback_resume 25 ... [241767.865920] dlm: bin: dlm_callback_resume 25 [241767.871352] dlm: bin: dlm_callback_resume 25 [241767.876733] dlm: bin: dlm_callback_resume 25 After the patch: [ 385.485728] dlm: gfs2: dlm_callback_resume 175 if zero it will not be printed out. Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	e10249b190	fs: dlm: use dlm_recovery_stopped in condition This patch will change to evaluate the dlm_recovery_stopped() in the condition of the if branch instead fetch it before evaluating the condition. As this is an atomic test-set operation it should be evaluated in the condition itself. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	3e9736713d	fs: dlm: use dlm_recovery_stopped instead of test_bit This patch will change to use dlm_recovery_stopped() which is the dlm way to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper. It is an atomic operation but the check is still as before to fetch the value if ls_recover_lock is held. There might be more further investigations if the value can be changed afterwards and if it has any side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	658bd576f9	fs: dlm: move version conversion to compile time This patch moves version conversion to little endian from a runtime variable to compile time constant. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	fe93367541	fs: dlm: remove check SCTP is loaded message Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try load the sctp module before we try to create a sctp kernel socket. That a socket creation fails now has more likely other reasons. This patch removes the part of error to load the sctp module and instead printout the error code. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	1aafd9c231	fs: dlm: debug improvements print nodeid This patch improves the debug output for midcomms layer by also printing out the nodeid where users counter belongs to. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	bb6866a5bd	fs: dlm: fix small lockspace typo This patch fixes a typo from lockspace to lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	dea450c90f	fs: dlm: remove obsolete INBUF define This patch removes an obsolete define for some length for an temporary buffer which is not being used anymore. The use of this define is not necessary anymore since commit `4798cbbfbd` ("fs: dlm: rework receive handling"). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-11-02 14:39:20 -05:00
Alexander Aring	ecd9567314	fs: dlm: avoid comms shutdown delay in release_lockspace When dlm_release_lockspace does active shutdown on connections to other nodes, the active shutdown will wait for any exisitng passive shutdowns to be resolved. But, the sequence of operations during dlm_release_lockspace can prevent the normal resolution of passive shutdowns (processed normally by way of lockspace recovery.) This disruption of passive shutdown handling can cause the active shutdown to wait for a full timeout period, delaying the completion of dlm_release_lockspace. To fix this, make dlm_release_lockspace resolve existing passive shutdowns (by calling dlm_clear_members earlier), before it does active shutdowns. The active shutdowns will not find any passive shutdowns to wait for, and will not be delayed. Reported-by: Chris Mackowski <cmackows@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-09-01 11:29:14 -05:00
Alexander Aring	aee742c992	fs: dlm: fix return -EINTR on recovery stopped This patch will return -EINTR instead of 1 if recovery is stopped. In case of ping_members() the return value will be checked if the error is -EINTR for signaling another recovery was triggered and the whole recovery process will come to a clean end to process the next one. Returning 1 will abort the recovery process and can leave the recovery in a broken state. It was reported with the following kernel log message attached and a gfs2 mount stopped working: "dlm: bobvirt1: dlm_recover_members error 1" whereas 1 was returned because of a conversion of "dlm_recovery_stopped()" to an errno was missing which this patch will introduce. While on it all other possible missing errno conversions at other places were added as they are done as in other places. It might be worth to check the error case at this recovery level, because some of the functionality also returns -ENOBUFS and check why recovery ends in a broken state. However this will fix the issue if another recovery was triggered at some points of recovery handling. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-08-19 11:33:03 -05:00
Alexander Aring	b97f85259f	fs: dlm: implement delayed ack handling This patch changes that we don't ack each message. Lowcomms will take care about to send an ack back after a bulk of messages was processed. Currently it's only when the whole receive buffer was processed, there might better positions to send an ack back but only the lowcomms implementation know when there are more data to receive. This patch has also disadvantages that we might retransmit more on errors, however this is a very rare case. Tested with make_panic on gfs2 with three nodes by running: trace-cmd record -p function -l 'dlm_send_ack' sleep 100 and trace-cmd report \| wc -l Before patch: - 20548 - 21376 - 21398 After patch: - 18338 - 20679 - 19949 Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-08-19 11:33:03 -05:00
Alexander Aring	62699b3f0a	fs: dlm: move receive loop into receive handler This patch moves the kernel_recvmsg() loop call into the receive_from_sock() function instead of doing the loop outside the function and abort the loop over it's return value. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:56:36 -05:00
Alexander Aring	c51b022179	fs: dlm: fix multiple empty writequeue alloc This patch will add a mutex that a connection can allocate a writequeue entry buffer only at a sleepable context at one time. If multiple caller waits at the writequeue spinlock and the spinlock gets release it could be that multiple new writequeue page buffers were allocated instead of allocate one writequeue page buffer and other waiters will use remaining buffer of it. It will only be the case for sleepable context which is the common case. In non-sleepable contexts like retransmission we just don't care about such behaviour. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:56:35 -05:00
Alexander Aring	8728a455d2	fs: dlm: generic connect func This patch adds a generic connect function for TCP and SCTP. If the connect functionality differs from each other additional callbacks in dlm_proto_ops were added. The sockopts callback handling will guarantee that sockets created by connect() will use the same options as sockets created by accept(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:56:34 -05:00
Alexander Aring	90d21fc047	fs: dlm: auto load sctp module This patch adds a "for now" better handling of missing SCTP support in the kernel and try to load the sctp module if SCTP is set. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:56:26 -05:00
Alexander Aring	2dc6b1158c	fs: dlm: introduce generic listen This patch combines each transport layer listen functionality into one listen function. Per transport layer differences are provided by additional callbacks in dlm_proto_ops. This patch drops silently sock_set_keepalive() for listen tcp sockets only. This socket option is not set at connecting sockets, I also don't see the sense of set keepalive for sockets which are created by accept() only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	a66c008cd1	fs: dlm: move to static proto ops This patch moves the per transport socket callbacks to a static const array. We can support only one transport socket for the init namespace which will be determinted by reading the dlm config at lowcomms_start(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	66d5955a09	fs: dlm: introduce con_next_wq helper This patch introduce a function to determine if something is ready to being send in the writequeue. It's not just that the writequeue is not empty additional the first entry need to have a valid length field. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	88aa023a25	fs: dlm: cleanup and remove _send_rcom The _send_rcom() can be removed and we call directly dlm_rcom_out(). As we doing that we removing the struct dlm_ls parameter which isn't used. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	052849beea	fs: dlm: clear CF_APP_LIMITED on close If send_to_sock() sets CF_APP_LIMITED limited bit and it has not been cleared by a waiting lowcomms_write_space() yet and a close_connection() apprears we should clear the CF_APP_LIMITED bit again because the connection starts from a new state again at reconnect. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	b892e4792c	fs: dlm: fix typo in tlv prefix This patch fixes a small typo in a unused struct field. It should named be t_pad instead of o_pad. Came over this as I updated wireshark dissector. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	d921a23f3e	fs: dlm: use READ_ONCE for config var This patch will use READ_ONCE to signal the compiler to read this variable only one time. If we don't do that it could be that the compiler read this value more than one time, because some optimizations, from the configure data which might can be changed during this time. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	feb704bd17	fs: dlm: use sk->sk_socket instead of con->sock Instead of dereference "con->sock" we can get the socket structure over "sk->sk_socket" as well. This patch will switch to this behaviour. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-07-19 11:53:43 -05:00
Alexander Aring	957adb68b3	fs: dlm: invalid buffer access in lookup error This patch will evaluate the message length if a dlm opts header can fit in before accessing it if a node lookup fails. The invalid sequence error means that the version detection failed and an unexpected message arrived. For debugging such situation the type of arrived message is important to know. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-11 12:44:47 -05:00
Alexander Aring	f5fe8d5107	fs: dlm: fix race in mhandle deletion This patch fixes a race between mhandle deletion in case of receiving an acknowledge and flush of all pending mhandle in cases of an timeout or resetting node states. Fixes: `489d8e559c` ("fs: dlm: add reliable connection if reconnect") Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-11 12:44:47 -05:00
Alexander Aring	d10a0b8875	fs: dlm: rename socket and app buffer defines This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE defines the maximum size of buffer which can be handled on socket layer, the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be handled by the DLM application layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	ac7d5d036d	fs: dlm: introduce proto values Currently the dlm protocol values are that TCP is 0 and everything else is SCTP. This makes it difficult to introduce possible other transport layers. The only one user space tool dlm_controld, which I am aware of, handles the protocol value 1 for SCTP. We change it now to handle SCTP as 1, this will break user space API but it will fix it so we can add possible other transport layers. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	9a4139a794	fs: dlm: move dlm allow conn This patch checks if possible allowing new connections is allowed before queueing the listen socket to accept new connections. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	6c6a1cc666	fs: dlm: use alloc_ordered_workqueue The proper way to allocate ordered workqueues is to use alloc_ordered_workqueue() function. The current way implies an ordered workqueue which is also required by dlm. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	700ab1c363	fs: dlm: fix memory leak when fenced I got some kmemleak report when a node was fenced. The user space tool dlm_controld will therefore run some rmdir() in dlm configfs which was triggering some memleaks. This patch stores the sps and cms attributes which stores some handling for subdirectories of the configfs cluster entry and free them if they get released as the parent directory gets freed. unreferenced object 0xffff88810d9e3e00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 73 70 61 63 65 73 00 00 ........spaces.. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000db8b640b>] make_cluster+0x5d/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff88810d9e3a00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 63 6f 6d 6d 73 00 00 00 ........comms... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a7ef6ad2>] make_cluster+0x82/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	fcef0e6c27	fs: dlm: fix lowcomms_start error case This patch fixes the error path handling in lowcomms_start(). We need to cleanup some static allocated data structure and cleanup possible workqueue if these have started. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Colin Ian King	7d3848c03e	fs: dlm: Fix spelling mistake "stucked" -> "stuck" There are spelling mistake in log messages. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-26 09:49:04 -05:00
Colin Ian King	f6089981d0	fs: dlm: Fix memory leak of object mh There is an error return path that is not kfree'ing mh after it has been successfully allocates. Fix this by moving the call to create_rcom to after the check on rc_in->rc_id check to avoid this. Thanks to Alexander Ahring Oder Aring for suggesting the correct way to fix this. Addresses-Coverity: ("Resource leak") Fixes: `a070a91cf1` ("fs: dlm: add more midcomms hooks") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-26 09:49:04 -05:00
Alexander Aring	706474fbc5	fs: dlm: don't allow half transmitted messages This patch will clean a dirty page buffer if a reconnect occurs. If a page buffer was half transmitted we cannot start inside the middle of a dlm message if a node connects again. I observed invalid length receptions errors and was guessing that this behaviour occurs, after this patch I never saw an invalid message length again. This patch might drops more messages for dlm version 3.1 but 3.1 can't deal with half messages as well, for 3.2 it might trigger more re-transmissions but will not leave dlm in a broken state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	5b2f981fde	fs: dlm: add midcomms debugfs functionality This patch adds functionality to debug midcomms per connection state inside a comms directory which is similar like dlm configfs. Currently there exists the possibility to read out two attributes which is the send queue counter and the version of each midcomms node state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	489d8e559c	fs: dlm: add reliable connection if reconnect This patch introduce to make a tcp lowcomms connection reliable even if reconnects occurs. This is done by an application layer re-transmission handling and sequence numbers in dlm protocols. There are three new dlm commands: DLM_OPTS: This will encapsulate an existing dlm message (and rcom message if they don't have an own application side re-transmission handling). As optional handling additional tlv's (type length fields) can be appended. This can be for example a sequence number field. However because in DLM_OPTS the lockspace field is unused and a sequence number is a mandatory field it isn't made as a tlv and we put the sequence number inside the lockspace id. The possibility to add optional options are still there for future purposes. DLM_ACK: Just a dlm header to acknowledge the receive of a DLM_OPTS message to it's sender. DLM_FIN: This provides a 4 way handshake for connection termination inclusive support for half-closed connections. It's provided on application layer because SCTP doesn't support half-closed sockets, the shutdown() call can interrupted by e.g. TCP resets itself and a hard logic to implement it because the othercon paradigm in lowcomms. The 4-way termination handshake also solve problems to synchronize peer EOF arrival and that the cluster manager removes the peer in the node membership handling of DLM. In some cases messages can be still transmitted in this time and we need to wait for the node membership event. To provide a reliable connection the node will retransmit all unacknowledges message to it's peer on reconnect. The receiver will then filtering out the next received message and drop all messages which are duplicates. As RCOM_STATUS and RCOM_NAMES messages are the first messages which are exchanged and they have they own re-transmission handling, there exists logic that these messages must be first. If these messages arrives we store the dlm version field. This handling is on DLM 3.1 and after this patch 3.2 the same. A backwards compatibility handling has been added which seems to work on tests without tcpkill, however it's not recommended to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long term bugs in the DLM protocol. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8e2e40860c	fs: dlm: add union in dlm header for lockspace id This patch adds union inside the lockspace id to handle it also for another use case for a different dlm command. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	37a247da51	fs: dlm: move out some hash functionality This patch moves out some lowcomms hash functionality into lowcomms header to provide them to other layers like midcomms as well. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	2874d1a68c	fs: dlm: add functionality to re-transmit a message This patch introduces a retransmit functionality for a lowcomms message handle. It's just allocates a new buffer and transmit it again, no special handling about prioritize it because keeping bytestream in order. To avoid another connection look some refactor was done to make a new buffer allocation with a preexisting connection pointer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8f2dc78dbc	fs: dlm: make buffer handling per msg This patch makes the void pointer handle for lowcomms functionality per message and not per page allocation entry. A refcount handling for the handle was added to keep the message alive until the user doesn't need it anymore. There exists now a per message callback which will be called when allocating a new buffer. This callback will be guaranteed to be called according the order of the sending buffer, which can be used that the caller increments a sequence number for the dlm message handle. For transition process we cast the dlm_mhandle to dlm_msg and vice versa until the midcomms layer will implement a specific dlm_mhandle structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	a070a91cf1	fs: dlm: add more midcomms hooks This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	6fb5cf9d42	fs: dlm: public header in out utility This patch allows to use header_out() and header_in() outside of dlm util functionality. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8aa31cbf20	fs: dlm: fix connection tcp EOF handling This patch fixes the EOF handling for TCP that if and EOF is received we will close the socket next time the writequeue runs empty. This is a half-closed socket functionality which doesn't exists in SCTP. The midcomms layer will do a half closed socket functionality on DLM side to solve this problem for the SCTP case. However there is still the last ack flying around but other reset functionality will take care of it if it got lost. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	c6aa00e3d2	fs: dlm: cancel work sync othercon These rx tx flags arguments are for signaling close_connection() from which worker they are called. Obviously the receive worker cannot cancel itself and vice versa for swork. For the othercon the receive worker should only be used, however to avoid deadlocks we should pass the same flags as the original close_connection() was called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	ba868d9dea	fs: dlm: reconnect if socket error report occurs This patch will change the reconnect handling that if an error occurs if a socket error callback is occurred. This will also handle reconnects in a non blocking connecting case which is currently missing. If error ECONNREFUSED is reported we delay the reconnect by one second. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	7443bc9625	fs: dlm: set is othercon flag There is a is othercon flag which is never used, this patch will set it and printout a warning if the othercon ever sends a dlm message which should never be the case. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	b38bc9c2b3	fs: dlm: fix srcu read lock usage This patch holds the srcu connection read lock in cases where we lookup the connections and accessing it. We don't hold the srcu lock in workers function where the scheduled worker is part of the connection itself. The connection should not be freed if any worker is scheduled or pending. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	2df6b7627a	fs: dlm: add dlm macros for ratelimit log This patch add ratelimit macro to dlm subsystem and will set the connecting log message to ratelimit. In non blocking connecting cases it will print out this message a lot. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	c937aabbd7	fs: dlm: always run complete for possible waiters This patch changes the ping_members() result that we always run complete() for possible waiters. We handle the -EINTR error code as successful. This error code is returned if the recovery is stopped which is likely that a new recovery is triggered with a new members configuration and ping_members() runs again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Yang Yingliang	2fd8db2dd0	fs: dlm: fix missing unlock on error in accept_from_sock() Add the missing unlock before return from accept_from_sock() in the error handling case. Fixes: `6cde210a97` ("fs: dlm: add helper for init connection") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-29 13:28:18 -05:00
Alexander Aring	9d232469bc	fs: dlm: add shutdown hook This patch fixes issues which occurs when dlm lowcomms synchronize their workqueues but dlm application layer already released the lockspace. In such cases messages like: dlm: gfs2: release_lockspace final free dlm: invalid lockspace 3841231384 from 1 cmd 1 type 11 are printed on the kernel log. This patch is solving this issue by introducing a new "shutdown" hook before calling "stop" hook when the lockspace is going to be released finally. This should pretend any dlm messages sitting in the workqueues during or after lockspace removal. It's necessary to call dlm_scand_stop() as I instrumented dlm_lowcomms_get_buffer() code to report a warning after it's called after dlm_midcomms_shutdown() functionality, see below: WARNING: CPU: 1 PID: 3794 at fs/dlm/midcomms.c:1003 dlm_midcomms_get_buffer+0x167/0x180 Modules linked in: joydev iTCO_wdt intel_pmc_bxt iTCO_vendor_support drm_ttm_helper ttm pcspkr serio_raw i2c_i801 i2c_smbus drm_kms_helper virtio_scsi lpc_ich virtio_balloon virtio_console xhci_pci xhci_pci_renesas cec qemu_fw_cfg drm [last unloaded: qxl] CPU: 1 PID: 3794 Comm: dlm_scand Tainted: G W 5.11.0+ #26 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 RIP: 0010:dlm_midcomms_get_buffer+0x167/0x180 Code: 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e 41 5f c3 4c 89 e7 45 31 e4 e8 3b f1 ec ff eb 86 <0f> 0b 4c 89 e7 45 31 e4 e8 2c f1 ec ff e9 74 ff ff ff 0f 1f 80 00 RSP: 0018:ffffa81503f8fe60 EFLAGS: 00010202 RAX: 0000000000000008 RBX: ffff8f969827f200 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffad1e89a0 RDI: ffff8f96a5294160 RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8f96a250bc60 R10: 00000000000045d3 R11: 0000000000000000 R12: ffff8f96a250bc60 R13: ffffa81503f8fec8 R14: 0000000000000070 R15: 0000000000000c40 FS: 0000000000000000(0000) GS:ffff8f96fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aa3351c000 CR3: 000000010bf22000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dlm_scan_rsbs+0x420/0x670 ? dlm_uevent+0x20/0x20 dlm_scand+0xbf/0xe0 kthread+0x13a/0x150 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 To synchronize all dlm scand messages we stop it right before shutdown hook. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	eec054b5a7	fs: dlm: flush swork on shutdown This patch fixes the flushing of send work before shutdown. The function cancel_work_sync() is not the right workqueue functionality to use here as it would cancel the work if the work queues itself. In cases of EAGAIN in send() for dlm message we need to be sure that everything is send out before. The function flush_work() will ensure that every send work is be done inclusive in EAGAIN cases. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	df9e06b800	fs: dlm: remove unaligned memory access handling This patch removes unaligned memory access handling for receiving midcomms messages. This handling will not fix the unaligned memory access in general. All messages should be length aligned to 8 bytes, there exists cases where this isn't the case. It's part of the sending handling to not send such messages. As the sending handling itself, with the internal allocator of page buffers, can occur in unaligned memory access of dlm message fields we just ignore that problem for now as it seems this code is used by architecture which can handle it. This patch adds a comment to take care about that problem in a major bump of dlm protocol. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	710176e836	fs: dlm: check on minimum msglen size This patch adds an additional check for minimum dlm header size which is an invalid dlm message and signals a broken stream. A msglen field cannot be less than the dlm header size because the field is inclusive header lengths. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	f0747ebf48	fs: dlm: simplify writequeue handling This patch cleans up the current dlm sending allocator handling by using some named macros, list functionality and removes some goto statements. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	e1a7cbce53	fs: dlm: use GFP_ZERO for page buffer This patch uses GFP_ZERO for allocate a page for the internal dlm sending buffer allocator instead of calling memset zero after every allocation. An already allocated space will never be reused again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	c45674fbdd	fs: dlm: change allocation limits While running tcpkill I experienced invalid header length values while receiving to check that a node doesn't try to send a invalid dlm message we also check on applications minimum allocation limit. Also use DEFAULT_BUFFER_SIZE as maximum allocation limit. The define LOWCOMMS_MAX_TX_BUFFER_LEN is to calculate maximum buffer limits on application layer, future midcomms layer will subtract their needs from this define. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	517461630d	fs: dlm: add check if dlm is currently running This patch adds checks for dlm config attributes regarding to protocol parameters as it makes only sense to change them when dlm is not running. It also adds a check for valid protocol specifiers and return invalid argument if they are not supported. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	8aa9540b49	fs: dlm: add errno handling to check callback This allows to return individual errno values for the config attribute check callback instead of returning invalid argument only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	e9a470acd9	fs: dlm: set subclass for othercon sock_mutex This patch sets the lockdep subclass for the othercon socket mutex. In various places the connection socket mutex is held while locking the othercon socket mutex. This patch will remove lockdep warnings when such case occurs. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	b30a624f50	fs: dlm: set connected bit after accept This patch sets the CF_CONNECTED bit when dlm accepts a connection from another node. If we don't set this bit, next time if the connection socket gets writable it will assume an event that the connection is successfully connected. However that is only the case when the connection did a connect. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00
Alexander Aring	e125fbeb53	fs: dlm: fix mark setting deadlock This patch fixes an deadlock issue when dlm_lowcomms_close() is called. When dlm_lowcomms_close() is called the clusters_root.subsys.su_mutex is held to remove configfs items. At this time we flushing (e.g. cancel_work_sync()) the workers of send and recv workqueue. Due the fact that we accessing configfs items (mark values), these workers will lock clusters_root.subsys.su_mutex as well which are already hold by dlm_lowcomms_close() and ends in a deadlock situation. [67170.703046] ====================================================== [67170.703965] WARNING: possible circular locking dependency detected [67170.704758] 5.11.0-rc4+ #22 Tainted: G W [67170.705433] ------------------------------------------------------ [67170.706228] dlm_controld/280 is trying to acquire lock: [67170.706915] ffff9f2f475a6948 ((wq_completion)dlm_recv){+.+.}-{0:0}, at: __flush_work+0x203/0x4c0 [67170.708026] but task is already holding lock: [67170.708758] ffffffffa132f878 (&clusters_root.subsys.su_mutex){+.+.}-{3:3}, at: configfs_rmdir+0x29b/0x310 [67170.710016] which lock already depends on the new lock. The new behaviour adds the mark value to the node address configuration which doesn't require to held the clusters_root.subsys.su_mutex by accessing mark values in a separate datastructure. However the mark values can be set now only after a node address was set which is the case when the user is using dlm_controld. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-03-09 08:56:42 -06:00

1 2 3 4 5 ...

829 Commits