Before this patch many of the functions in quota.c got their superblock
pointer, sdp, from the quota_data's glock pointer. That's silly because
the qd already has its own pointer to the superblock (qd_sbd).
This patch changes references to use that instead, eliminating a level
of indirection.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit f07b352021 ("GFS2: Made logd daemon take into account log
demand") changed gfs2_ail_flush_reqd() and gfs2_jrnl_flush_reqd() to
take sd_log_blks_needed into account, but the checks in
gfs2_log_commit() were not updated correspondingly.
Once that is fixed, gfs2_jrnl_flush_reqd() and gfs2_ail_flush_reqd() can
be used in gfs2_log_commit(). Make those two helpers available to
gfs2_log_commit() by defining them above gfs2_log_commit().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When quotad detects an I/O error, it sets sd_log_error and then it wakes
up logd to withdraw the filesystem. However, logd doesn't wake up when
sd_log_error is set. Fix that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag
to determine if an AIL flush should be forced in low-memory situations.
However, it also immediately clears the flag, and when called repeatedly
as in function gfs2_logd, the flag will be lost. Fix that by pulling
the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.
Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag
whether or not enough pages were written. If enough pages could be
written, flushing the AIL is unnecessary, though.
Third, gfs2_writepages doesn't wake up logd after setting the
SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react.
It would be preferable to wake up logd, but that hurts the performance
of some workloads and we don't quite understand why so far, so don't
wake up logd so far.
Fixes: b066a4eebd ("gfs2: forcibly flush ail to relieve memory pressure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Consider the following case:
1. A glock is held in shared mode.
2. A process requests the glock in exclusive mode (rename).
3. Before the lock is granted, more processes (read / ls) request the
glock in shared mode again.
4. gfs2 sends a request to dlm for the lock in exclusive mode because
that holder is at the head of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
response with state == LM_ST_SHARED and LM_OUT_CANCELED. So at that
point, the glock is still held in shared mode.
6. finish_xmote gets called to process the response from dlm. It detects
that the glock is not in the requested mode and no demote is in
progress, so it moves the canceled holder to the tail of the queue
and finds the new holder at the head of the queue. That holder is
requesting the glock in shared mode.
7. finish_xmote calls do_xmote to transition the glock into shared mode,
but the glock is already in shared mode and so do_xmote complains
about that with:
GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
Instead, in finish_xmote, after moving the canceled holder to the tail
of the queue, check if any new holders can be granted. Only call
do_xmote to repeat the dlm request if the holder at the head of the
queue is requesting the glock in a mode that is incompatible with the
mode the glock is currently held in.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The last user of this flag was removed in commit b77b4a4815 ("gfs2:
Rework freeze / thaw logic").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Revert the rest of commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS"):
In gfs2_unstuff_dinode(), there is no need to carry out the page cache
allocation under GFP_NOFS because inodes on the "regular" filesystem are
never un-inlined under memory pressure, so switch back from
find_or_create_page() to grab_cache_page() here as well.
Inodes on the "metadata" filesystem can theoretically be un-inlined
under memory pressure, but any page cache allocations in that context
would happen in GFP_NOFS context because those inodes have
inode->i_mapping->gfp_mask set to GFP_NOFS (see the previous patch).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Set mapping->gfp mask to GFP_NOFS for all metadata inodes so that
allocating pages in the address space of those inodes won't call back
into the filesystem. This allows to switch back from
find_or_create_page() to grab_cache_page() in two places.
Partially reverts commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS").
Thanks to Dan Carpenter <dan.carpenter@linaro.org> for pointing out a
Smatch static checker warning.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
ksmbd has made significant improvements over the past two
years and is regularly tested and used. Remove the experimental
warning.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In this cycle, we don't have a highlighted feature enhancement, but mostly
have fixed issues mainly in two parts: 1) zoned block device, 2) compression
support. For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases caused by
wrong compression policy and logics. Other than them, there were some reverts
and stat corrections.
Bug fix:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmTyemoACgkQQBSofoJI
UNLLKQ//bupYGPOqAgKbd/s7FhtULMiiRmFVy7W2eoMIc/oeeXOGrzDAF/1NifLC
WLV4uBNVTS4PS8D1vRzxZNEZt9aqPS0vQ8hxW/3nTI9Z425NX3nz7gLSxxmwIkIe
xj++V6tvKPcCH0BfKvfFCtcxj09PsflgdEuT8w/sIkH6p4o+VEMFs1Lc9PQsjUmh
epznK7JGBwpAxmHqI74n1eAw2CI6W+oKx23YDTNMBD6hmXTU0fkTeBURrOlSsUHZ
nhafPecsrCEI+OpAj03G/7e/zt+iTUKdmHx9O5ir/P00vF/c+SU2vSwB97FiHqBi
B4UmocTM0MAsU80PQcmE6aU3zgQFI0Yun5yZ24VeWjKTu76ssZSmT2HA/4RL+LLf
AeAW4FSyfh76pls8X5IWfilxGLWq6kTzSZA0MF7dH2q7qlj5apL5wKpm/XH6POqn
qELY/Y9+P1QuCcNL8BiYrgA5xBqVJ7Uw/6/6U3Y77PElc+Pwl3vI8UZ7uCOBrsXL
e0TLXy23AJA6AS2DyLLziy669nXAZRb95B8TWMfEeVZIMFvCeeqYc74N8jOFa0T8
q6uQFZs+0cETLZA8MSZdlNhzvhJmbW6wgSIz++CEdikWSLBZMKWxBVjCPkkCY9uc
DMh8zruSVbYPZWBTcxkMFEBJKKrU43++e7pb8ZoqTj4Pq1317b0=
=Qa8+
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
With madvise and prctl KSM can be enabled for different VMA's. Once it is
enabled we can query how effective KSM is overall. However we cannot
easily query if an individual VMA benefits from KSM.
This commit adds a KSM section to the /prod/<pid>/smaps file. It reports
how many of the pages are KSM pages. Note that KSM-placed zeropages are
not included, only actual KSM pages.
Here is a typical output:
7f420a000000-7f421a000000 rw-p 00000000 00:00 0
Size: 262144 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51212 kB
Pss: 8276 kB
Shared_Clean: 172 kB
Shared_Dirty: 42996 kB
Private_Clean: 196 kB
Private_Dirty: 7848 kB
Referenced: 15388 kB
Anonymous: 51212 kB
KSM: 41376 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 202016 kB
SwapPss: 3882 kB
Locked: 0 kB
THPeligible: 0
ProtectionKey: 0
ksm_state: 0
ksm_skip_base: 0
ksm_skip_count: 0
VmFlags: rd wr mr mw me nr mg anon
This information also helps with the following workflow:
- First enable KSM for all the VMA's of a process with prctl.
- Then analyze with the above smaps report which VMA's benefit the most
- Change the application (if possible) to add the corresponding madvise
calls for the VMA's that benefit the most
[shr@devkernel.io: v5]
Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer size via
buffer_size_kb. Currently it just returns what was written, but the actual
size rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and dentries of
tracefs/events directory. As there are thousands of events, and each event
has several inodes and dentries that currently exist even when tracing is
never used, they take up precious memory. Instead, eventfs will allocate
the inodes and dentries in a JIT way (similar to what procfs does). There
is now metadata that handles the events and subdirectories, and will create
the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data, but will
wait till the next merge window before applying them. It's a little more
complex, and I want to make sure the dynamic code works properly before
adding more complexity, making it easier to revert if need be.
Minor changes:
- Optimization to user event list traversal.
- Remove intermediate permission of tracefs files (note the intermediate
permission removes all access to the files so it is not a security concern,
but just a clean up.)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic.
- Other minor clean ups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEXtmkj8VMCiLR0IBM68Js21pW3nMFAmTwtAsUHHJvc3RlZHRA
Z29vZG1pcy5vcmcACgkQ68Js21pW3nNOXRAAsslQT6alY4OeplC4x47+V6+6NiIA
oDtOmWAqf7TsH9bukzRFD36rUly42O20RJDx9z0Q3iRc3vGxEawId8z6P0HmBwRb
VSl5BryWvL5Wc5w94xS8EeCuC1MRfhVDyfbtVFmWigzfvd/f+hp71ViMPHUvrRJX
KhzzNSBc4ir5E1lzfwa7meYTXzDwrQlZbYfdf5aH94IWAkqDj85PUZDJ7UmLZhXG
CIglSpNFXZ0j19Wo/U6KZlHR1XfunBKungCzJ5Dbznc9YLWZTQXOIZF4YPKfPIJL
ulRG9chwXY0nQWhG3xM1UHZLsAMSWw5i13a4ZN4d8FCNOgv8ttcJnfDk7ZYUS0Oz
RmY1dGcSRKAZTUTjm8ZBtmyiUCc9kZAIk0fyEfIHtoDYXmhnvni3wuTnbRSdXaSi
q4YkxPaLfX8Fn3QloCqqddt8iONu7BnbpZOhUCl2AtBib52gnTTF7+rQ6/0D3rjo
SSuvEHhnjJhzk+3jM2odxjmTAztNT+yu6FbKXZUKPt1Kj9YHv1J9cEQw9/Etw+GV
8jQBe979D8hFJmDOJOT/O/TdPqE9mQoMNBt6Y8QnE4nbJWM+i/MBrThFpUSQhRCr
0Ya/HgR2QyRH7RmZW5o2H9mNtN+V9c7RxZW8erYzRbUs0YofK2OpGi9SrPzxWCke
w6j0VVZHaxdPguM=
=/s+e
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer
size via buffer_size_kb.
Currently it just returns what was written, but the actual size
rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and
dentries of tracefs/events directory. As there are thousands of
events, and each event has several inodes and dentries that
currently exist even when tracing is never used, they take up
precious memory. Instead, eventfs will allocate the inodes and
dentries in a JIT way (similar to what procfs does). There is now
metadata that handles the events and subdirectories, and will
create the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data,
but will wait till the next merge window before applying them. It's
a little more complex, and I want to make sure the dynamic code
works properly before adding more complexity, making it easier to
revert if need be.
Minor changes:
- Optimization to user event list traversal
- Remove intermediate permission of tracefs files (note the
intermediate permission removes all access to the files so it is
not a security concern, but just a clean up)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event
logic
- Other minor cleanups"
* tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
tracefs: Remove kerneldoc from struct eventfs_file
tracefs: Avoid changing i_mode to a temp value
tracing/user_events: Optimize safe list traversals
ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon()
tracing: Remove unused function declarations
tracing/filters: Document cpumask filtering
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Dynamically allocate filter_pred.regex
test: ftrace: Fix kprobe test for eventfs
eventfs: Move tracing/events to eventfs
eventfs: Implement removal of meta data from eventfs
eventfs: Implement functions to create files and dirs when accessed
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement eventfs file add functions
...
Here is the big set of char/misc and other small driver subsystem
changes for 6.6-rc1.
Stuff all over the place here, lots of driver updates and changes and
new additions. Short summary is:
- new IIO drivers and updates
- Interconnect driver updates
- fpga driver updates and additions
- fsi driver updates
- mei driver updates
- coresight driver updates
- nvmem driver updates
- counter driver updates
- lots of smaller misc and char driver updates and additions
All of these have been in linux-next for a long time with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH64g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynr2QCfd3RKeR+WnGzyEOFhksl30UJJhiIAoNZtYT5+
t9KG0iMDXRuTsOqeEQbd
=tVnk
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char/misc and other small driver subsystem
changes for 6.6-rc1.
Stuff all over the place here, lots of driver updates and changes and
new additions. Short summary is:
- new IIO drivers and updates
- Interconnect driver updates
- fpga driver updates and additions
- fsi driver updates
- mei driver updates
- coresight driver updates
- nvmem driver updates
- counter driver updates
- lots of smaller misc and char driver updates and additions
All of these have been in linux-next for a long time with no reported
problems"
* tag 'char-misc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (267 commits)
nvmem: core: Notify when a new layout is registered
nvmem: core: Do not open-code existing functions
nvmem: core: Return NULL when no nvmem layout is found
nvmem: core: Create all cells before adding the nvmem device
nvmem: u-boot-env:: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
nvmem: sec-qfprom: Add Qualcomm secure QFPROM support
dt-bindings: nvmem: sec-qfprom: Add bindings for secure qfprom
dt-bindings: nvmem: Add compatible for QCM2290
nvmem: Kconfig: Fix typo "drive" -> "driver"
nvmem: Explicitly include correct DT includes
nvmem: add new NXP QorIQ eFuse driver
dt-bindings: nvmem: Add t1023-sfp efuse support
dt-bindings: nvmem: qfprom: Add compatible for MSM8226
nvmem: uniphier: Use devm_platform_get_and_ioremap_resource()
nvmem: qfprom: do some cleanup
nvmem: stm32-romem: Use devm_platform_get_and_ioremap_resource()
nvmem: rockchip-efuse: Use devm_platform_get_and_ioremap_resource()
nvmem: meson-mx-efuse: Convert to devm_platform_ioremap_resource()
nvmem: lpc18xx_otp: Convert to devm_platform_ioremap_resource()
nvmem: brcm_nvram: Use devm_platform_get_and_ioremap_resource()
...
Here is a small set of driver core updates and additions for 6.6-rc1.
Included in here are:
- stable kernel documentation updates
- class structure const work from Ivan on various subsystems
- kernfs tweaks
- driver core tests!
- kobject sanity cleanups
- kobject structure reordering to save space
- driver core error code handling fixups
- other minor driver core cleanups
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH77Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylZMACePk8SitfaJc6FfFf5I7YK7Nq0V8MAn0nUjgsR
i8NcNpu/Yv4HGrDgTdh/
=PJbk
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of driver core updates and additions for 6.6-rc1.
Included in here are:
- stable kernel documentation updates
- class structure const work from Ivan on various subsystems
- kernfs tweaks
- driver core tests!
- kobject sanity cleanups
- kobject structure reordering to save space
- driver core error code handling fixups
- other minor driver core cleanups
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
driver core: Call in reversed order in device_platform_notify_remove()
driver core: Return proper error code when dev_set_name() fails
kobject: Remove redundant checks for whether ktype is NULL
kobject: Add sanity check for kset->kobj.ktype in kset_register()
drivers: base: test: Add missing MODULE_* macros to root device tests
drivers: base: test: Add missing MODULE_* macros for platform devices tests
drivers: base: Free devm resources when unregistering a device
drivers: base: Add basic devm tests for platform devices
drivers: base: Add basic devm tests for root devices
kernfs: fix missing kernfs_iattr_rwsem locking
docs: stable-kernel-rules: mention that regressions must be prevented
docs: stable-kernel-rules: fine-tune various details
docs: stable-kernel-rules: make the examples for option 1 a proper list
docs: stable-kernel-rules: move text around to improve flow
docs: stable-kernel-rules: improve structure by changing headlines
base/node: Remove duplicated include
kernfs: attach uuid for every kernfs and report it in fsid
kernfs: add stub helper for kernfs_generic_poll()
x86/resctrl: make pseudo_lock_class a static const structure
x86/MSR: make msr_class a static const structure
...
* Support for the new "riscv,isa-extensions" and "riscv,isa-base" device
tree interfaces for probing extensions.
* Support for userspace access to the performance counters.
* Support for more instructions in kprobes.
* Crash kernels can be allocated above 4GiB.
* Support for KCFI.
* Support for ELFs in !MMU configurations.
* ARCH_KMALLOC_MINALIGN has been reduced to 8.
* mmap() defaults to sv48-sized addresses, with longer addresses hidden
behind a hint (similar to Arm and Intel).
* Also various fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmTx96kTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiVjRD/9DYVLlkQ/OEDJjPaEcYCP49xgIVUUU
lhs3XbSs2VNHBaiG114f6Q0AaT/uNi+uqSej3CeTmEot2kZkBk/f2yu+UNIriPZ9
GQiZsdyXhu921C+5VFtiI47KDWOVZ+Jpy3M1ll61IWt3yPSQHr1xOP0AOiyHHqe3
cmqpNnzjajlfVDoXPc2mGGzUJt/7ar4thcwnMNi98raXR5Qh7SP6rrHjoQhE1oFk
LMP3CHqEAcHE2tE4CxZVpc6HOQ5m0LpQIOK7ypufGMyoIYESm5dt/JOT4MlhTtDw
6JzyVKtiM7lartUnUaW3ZoX4trQYT5gbXxWrJ2gCnUGy3VulikoXr1Rpz0qfdeOR
XN8OLkVAqHfTGFI7oKk24f9Adw96R5NPZcdCay90h4J/kMfCiC7ZyUUI1XIa5iy1
np5pZCkf8HNcdywML7qcFd5n2O0wchyFnRLFZo6kJP9Ls5cEi6kBx/1jSdTcNgx/
fUKXyoEcriGoQiiwn29+4RZnU69gJV3zqQNLPpuwDQ5F/Q1zHTlrr+dqzezKkzcO
dRTV2d2Q4A5vIDXPptzNNLlRQdrc8qxPJ1lxQVkPIU4/mtqczmZBwlyY2u9zwPyS
sehJgJZnoAf+jm71NgQAKLck4MUBsMnMogOWunhXkVRCoZlbbkUWX4ECZYwPKsVk
W7zVPmLvSM0l5g==
=/tXb
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the new "riscv,isa-extensions" and "riscv,isa-base"
device tree interfaces for probing extensions
- Support for userspace access to the performance counters
- Support for more instructions in kprobes
- Crash kernels can be allocated above 4GiB
- Support for KCFI
- Support for ELFs in !MMU configurations
- ARCH_KMALLOC_MINALIGN has been reduced to 8
- mmap() defaults to sv48-sized addresses, with longer addresses hidden
behind a hint (similar to Arm and Intel)
- Also various fixes and cleanups
* tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V
riscv: support PREEMPT_DYNAMIC with static keys
riscv: Move create_tmp_mapping() to init sections
riscv: Mark KASAN tmp* page tables variables as static
riscv: mm: use bitmap_zero() API
riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
riscv: remove redundant mv instructions
RISC-V: mm: Document mmap changes
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Restrict address space for sv39,sv48,sv57
riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
riscv: allow kmalloc() caches aligned to the smallest value
riscv: support the elf-fdpic binfmt loader
binfmt_elf_fdpic: support 64-bit systems
riscv: Allow CONFIG_CFI_CLANG to be selected
riscv/purgatory: Disable CFI
riscv: Add CFI error handling
riscv: Add ftrace_stub_graph
riscv: Add types to indirectly called assembly functions
...
New Features:
* Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
* NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
* NFS: Fix a potential data corruption
Bugfixes:
* Fix various READ_PLUS issues including:
* smatch warnings
* xdr size calculations
* scratch buffer handling
* 32bit / highmem xdr page handling
* Fix checkpatch errors in file.c
* Fix redundant readdir request after an EOF
* Fix handling of COPY ERR_OFFLOAD_NO_REQ
* Fix assignment of xprtdata.cred
Cleanups:
* Remove unused xprtrdma function declarations
* Clean up an integer overflow check to avoid a warning
* Clean up #includes in dns_resolve.c
* Clean up nfs4_get_device_info so we don't pass a NULL pointer to __free_page()
* Clean up sunrpc TCP socket timeout configuration
* Guard against READDIR loops when entry names are too long
* Use EXCHID4_FLAG_USE_PNFS_DS for DS servers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmTwzwYACgkQ18tUv7Cl
QOtIhBAA+BOh7MB6yjlyctFxABJiXz2x2Dehxy7Ox15LfnyStqQAUEpk35CXWvjC
iNxpZJ486+WrzM76WGEaRbECK9nTQLK1yacR3V1zpnDwHWIJA6VHN6qU4JAfSMu7
XbhWkHWry6d7PXhvqHlaiYvPX2pF39wUzfH+vLlzS2QLIkpT6LnG0zVRJTQvLCmq
zE5xD+NCQ1Dpo9VnouuzW7VVfm532hI7GQNrpo0E0vWKgeQD+/fOpDu23MW8A1Ua
ZgVMAc7vScgDZH8/20Ze5PH4jAEB4gwEIzjreQlIXr7Tf+mE7qn435lgOuvdMQCW
eHhdNriZ2X6HMLhNFFpup8bkRKGCCTooTHC1W66n9CuxIAuVT5DNwBbakpagHSZf
J4ho81hEgBfc5zppISVjV6eFK4brM0rF9AliaIw8r/qGcMmO1CILi9tLGiheiJcT
LuId7U2sE/vfIa6SiBt7rx37/MkrgLlAgjpk4dCRJQW+gKVBi09sMGnDlgaRvCZz
T0WCsK4DgI9q2rScpwJYJbNWbC2Q8qUtYWW9LSRvwhbNdm/VbRnEHWA7eOwqqm8r
KkkF4chyoTJqpnF3SjxT/lyFk6GwsD+wXafOmEeuFA6Si3dHDU9i3aUf+cCXhwRI
uUzCUHYcCKnv4QVGPuAbIdxMgueNCuLoeWgTClVlqidv7GRyz7Y=
=rjmq
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
- NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
- NFS: Fix a potential data corruption
Bugfixes:
- Fix various READ_PLUS issues including:
- smatch warnings
- xdr size calculations
- scratch buffer handling
- 32bit / highmem xdr page handling
- Fix checkpatch errors in file.c
- Fix redundant readdir request after an EOF
- Fix handling of COPY ERR_OFFLOAD_NO_REQ
- Fix assignment of xprtdata.cred
Cleanups:
- Remove unused xprtrdma function declarations
- Clean up an integer overflow check to avoid a warning
- Clean up #includes in dns_resolve.c
- Clean up nfs4_get_device_info so we don't pass a NULL pointer
to __free_page()
- Clean up sunrpc TCP socket timeout configuration
- Guard against READDIR loops when entry names are too long
- Use EXCHID4_FLAG_USE_PNFS_DS for DS servers"
* tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (22 commits)
pNFS: Fix assignment of xprtdata.cred
NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
SUNRPC: Allow specification of TCP client connect timeout at setup
SUNRPC: Refactor and simplify connect timeout
SUNRPC: Set the TCP_SYNCNT to match the socket timeout
NFS: Fix a potential data corruption
nfs: fix redundant readdir request after get eof
nfs/blocklayout: Use the passed in gfp flags
filemap: Fix errors in file.c
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
NFS: Move common includes outside ifdef
SUNRPC: clean up integer overflow check
xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
NFS: Enable the READ_PLUS operation by default
SUNRPC: kmap() the xdr pages during decode
NFSv4.2: Rework scratch handling for READ_PLUS (again)
...
I'm thrilled to announce that the Linux in-kernel NFS server now
offers NFSv4 write delegations. A write delegation enables a client
to cache data and metadata for a single file more aggressively,
reducing network round trips and server workload. Many thanks to Dai
Ngo for contributing this facility, and to Jeff Layton and Neil
Brown for reviewing and testing it.
This release also sees the removal of all support for DES- and
triple-DES-based Kerberos encryption types in the kernel's SunRPC
implementation. These encryption types have been deprecated by the
Internet community for years and are considered insecure. This
change affects both the in-kernel NFS client and server.
The server's UDP and TCP socket transports have now fully adopted
David Howells' new bio_vec iterator so that no more than one
sendmsg() call is needed to transmit each RPC message. In
particular, this helps kTLS optimize record boundaries when sending
RPC-with-TLS replies, and it takes the server a baby step closer to
handling file I/O via folios.
We've begun work on overhauling the SunRPC thread scheduler to
remove a costly linked-list walk when looking for an idle RPC
service thread to wake. The pre-requisites are included in this
release. Thanks to Neil Brown for his ongoing work on this
improvement.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTwoa0ACgkQM2qzM29m
f5cZvw/8CmFVNC27aMrJEhRRhwwrXbLzUkWh9GCYkG98PHYiLxLTvZ6qELXAax/a
UjSgIDSRcWl4z8M/tyBQtgsw7NADr+7XWqEoXR7HZ5pEEC/KNGM0oQWQ92ojjKYy
JmHdB02uaDJfcd9ioFTU13cw7q2BQfoe2xLI8yqis2vcVSu92AM7aIw+cvJIpwQB
inA3TIIsYTV/gPByXSfEtvmYACadoFiMvfvYwaWhjFS9MdSzFmcVG0Dp3EFIig29
odmWEofcz6uIvUWvUswWEGdoSu7uOKIztSuAI4PlTwaofUaSKG6e5kmtpr3cLERD
Uhg2lm5JgqkXBd7QHObNimJ4DtQzFwHmhA08qo8rd/zba75mn/Hr5IF0q3Rxs99J
SRYHcAeP8afKn5Ge0yzoTgCNcqhfz8KLRfoCQX49mljr+muNxld4nMklD2KdUwJi
XEB512/q3E4nUgopXZiSJYQYAq1CfdR5WpGipZ9X0XK9HZBDF/qhXGtk1YQuNWyj
ZxJS3bfBza4oVIvP5/ehjCIQwOvqkcrC5zZGDIgDvw9Q6L3L1wqmVntsdCLCLRcJ
jB4IOsj+DECfJ6w2vP2SZ3GeMtFnyuTQjhUTkjPuAKTBBiKo4Tj0o/agpfDYbWZy
1l3a2yH5jqJgkm4MaVh3YHRJGc0ub0ccpIrs3QQ4jvjMLQ/3Gcs=
=XGHs
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"I'm thrilled to announce that the Linux in-kernel NFS server now
offers NFSv4 write delegations. A write delegation enables a client to
cache data and metadata for a single file more aggressively, reducing
network round trips and server workload. Many thanks to Dai Ngo for
contributing this facility, and to Jeff Layton and Neil Brown for
reviewing and testing it.
This release also sees the removal of all support for DES- and
triple-DES-based Kerberos encryption types in the kernel's SunRPC
implementation. These encryption types have been deprecated by the
Internet community for years and are considered insecure. This change
affects both the in-kernel NFS client and server.
The server's UDP and TCP socket transports have now fully adopted
David Howells' new bio_vec iterator so that no more than one sendmsg()
call is needed to transmit each RPC message. In particular, this helps
kTLS optimize record boundaries when sending RPC-with-TLS replies, and
it takes the server a baby step closer to handling file I/O via
folios.
We've begun work on overhauling the SunRPC thread scheduler to remove
a costly linked-list walk when looking for an idle RPC service thread
to wake. The pre-requisites are included in this release. Thanks to
Neil Brown for his ongoing work on this improvement"
* tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
Documentation: Add missing documentation for EXPORT_OP flags
SUNRPC: Remove unused declaration rpc_modcount()
SUNRPC: Remove unused declarations
NFSD: da_addr_body field missing in some GETDEVICEINFO replies
SUNRPC: Remove return value of svc_pool_wake_idle_thread()
SUNRPC: make rqst_should_sleep() idempotent()
SUNRPC: Clean up svc_set_num_threads
SUNRPC: Count ingress RPC messages per svc_pool
SUNRPC: Deduplicate thread wake-up code
SUNRPC: Move trace_svc_xprt_enqueue
SUNRPC: Add enum svc_auth_status
SUNRPC: change svc_xprt::xpt_flags bits to enum
SUNRPC: change svc_rqst::rq_flags bits to enum
SUNRPC: change svc_pool::sp_flags bits to enum
SUNRPC: change cache_head.flags bits to enum
SUNRPC: remove timeout arg from svc_recv()
SUNRPC: change svc_recv() to return void.
SUNRPC: call svc_process() from svc_recv().
nfsd: separate nfsd_last_thread() from nfsd_put()
nfsd: Simplify code around svc_exit_thread() call in nfsd()
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmTuK5AACgkQiiy9cAdy
T1EggQv/RNMaOguGItp1qUgOg9XuHSboLVkCJgwoCTl0cb7VYWFlj8G4IOi37WoQ
M3SXVQ/DjxB3eSms2LaIFKFjyVZXzZZ9GFjb+dBGssLWB5Zdk6Ez+IJLOpwNUar1
nwSC0kU/Lqj/gUnmUsmDlmV2Y/14uTeZEh6RiA1IzDMOAEr8KEuakgFzRg5DudYM
CgZWfO466A48N/YXGmNNqq+8RVEtKaM3A31NZKgAsm4Lw03+V8JwYK/sSx8HWRBx
heb8Goa7AUIbpggtoVnWf6PPzJsWOgELrVzvUYdyj7JD5HzaY0LDVOJ6YYyuRTBP
M4n7yZT0mlAFDflHMvydaOKNJS+6HlE94xVPySo/S8uJJ9hHWcMqe8oJov11h6CT
a76Q7bMkBBXK6GfEjetIY6qWwhN78M1d/Rf9EJRll+d4vIU1i5gPpCTptvfTqMCc
A53bROyc3TPcH7A5PBWK0ecENIJ0S4wQd+7UzspQjXj+dk429CYF0+bksfWhijjf
ubEIo9fE
=4EB/
-----END PGP SIGNATURE-----
Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
- fix potential overflows in decoding create and in session setup
requests
- cleanup fixes
- compounding fixes, including one for MacOS compounded read requests
- session setup error handling fix
- fix mode bit bug when applying force_directory_mode and
force_create_mode
- RDMA (smbdirect) write fix
* tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd:
ksmbd: add missing calling smb2_set_err_rsp() on error
ksmbd: replace one-element array with flex-array member in struct smb2_ea_info
ksmbd: fix slub overflow in ksmbd_decode_ntlmssp_auth_blob()
ksmbd: fix wrong DataOffset validation of create context
ksmbd: Fix one kernel-doc comment
ksmbd: reduce descriptor size if remaining bytes is less than request size
ksmbd: fix `force create mode' and `force directory mode'
ksmbd: fix wrong interim response on compound
ksmbd: add support for read compound
ksmbd: switch to use kmemdup_nul() helper
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmTwrykACgkQNqiEXrVA
jGSIxg/9EkmkMiFyAFymr1EYVavngY7RsTwG4CpCv3jKCthjrsBi5PN9whJPTfBg
FCV3tvzGSk8pIpEADqywuYp9+e+0/gqNwExlyr1+JCatPtXeFpBN8yJN/a2u7zon
+CXmcSn6veKuVHWptBdxoQVwjjhznw12psa+kPiGPe/q4uZyFIvVAnDUkEeURV3f
dT7yOG6KEMMq7NZis1t2Tf9fuflzYpKOmF7qzTWAGOCXhbJbHWB51wMpFKJSyqP8
kxZQ9GvdjDMnI3V+IbV7WktN07ztGGiJ3SGRNuQFbkL8xCf6KTySgGnieTj8vBod
lg/UFEZrd2ZL9f+hUTyWeta+dhEVAAqnUJpMuyfMWBGg1ae4U6IO2t+Q7xM1zGLg
qGHfxka9C5tvKToldLsaoFBfW+9+KxCxyrI25FkxSXzJBJWnSaq/IC1/QEbubqiY
2zAD7hh/B8c3rzLIwIfGptRDoeMu8yiWx3I5jISZHZG5Azkui1VqC7slXCpcqhLF
7PoJHZ4hemK2zkPwCjZ914lHuCtePDtvvHkEL5G1tK8kW3e9k1Sk314zck69Oyjw
IuXICm14Qu5Pp8QLBrXTzXenoUXKiIwm+GIW7UkIzGRrKaLCMc8YyDvvdp4UoG5H
Pg+8Y93P/fvRbRcfm9jk1BWqaUFuIWRyzxQnMv8pN1xxabrgnGQ=
=W5Xa
-----END PGP SIGNATURE-----
Merge tag 'jfs-6.6' of github.com:kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
"A few small fixes"
* tag 'jfs-6.6' of github.com:kleikamp/linux-shaggy:
jfs: validate max amount of blocks before allocation.
jfs: remove redundant initialization to pointer ip
jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount
FS: JFS: (trivial) Fix grammatical error in extAlloc
fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount()
* Cleanups in the ext4 remount code when going to and from read-only
* Cleanups in ext4's multiblock allocator
* Cleanups in the jbd2 setup/mounting code paths
* Performance improvements when appending to a delayed allocation file
* Miscenallenous syzbot and other bug fixes
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmTwqUMACgkQ8vlZVpUN
gaMqgwf6Aui6MlrtNJx6CrJt4dxLANQ8G6bsJ2Zr+6QNS1X/GAUrCCyLWWom1dfb
OJ/n4/JUCNc9v5yLCTqHOE5ZFTdQItOBJUKXbJYff8EdnR+zCUULpj6bPbEs5BKp
U7CiiZ9TIi9S2TWezvIJKIa2VxgPej7CH/HOt8ISh/Msq8nHvcEEJIyOEvVk9odQ
LEkiQCsikWaljB7qEOIYo+xgFffMZfttc4zuTkdr/h1I6OWhvQYmlwSnTuAiE7BS
BVf3ebD2Dg8TChUMXOsk2d743iZNWf/+yTfbXVu93/uEM9vgF0+HO6EerTK8RMeM
yxhshg9z7ccuFjdY/2NYDXe6pEuDKw==
=cMIX
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Many ext4 and jbd2 cleanups and bug fixes:
- Cleanups in the ext4 remount code when going to and from read-only
- Cleanups in ext4's multiblock allocator
- Cleanups in the jbd2 setup/mounting code paths
- Performance improvements when appending to a delayed allocation file
- Miscellaneous syzbot and other bug fixes"
* tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
ext4: fix slab-use-after-free in ext4_es_insert_extent()
libfs: remove redundant checks of s_encoding
ext4: remove redundant checks of s_encoding
ext4: reject casefold inode flag without casefold feature
ext4: use LIST_HEAD() to initialize the list_head in mballoc.c
ext4: do not mark inode dirty every time when appending using delalloc
ext4: rename s_error_work to s_sb_upd_work
ext4: add periodic superblock update check
ext4: drop dio overwrite only flag and associated warning
ext4: add correct group descriptors and reserved GDT blocks to system zone
ext4: remove unused function declaration
ext4: mballoc: avoid garbage value from err
ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple()
ext4: change the type of blocksize in ext4_mb_init_cache()
ext4: fix unttached inode after power cut with orphan file feature enabled
jbd2: correct the end of the journal recovery scan range
ext4: ext4_get_{dev}_journal return proper error value
ext4: cleanup ext4_get_dev_journal() and ext4_get_journal()
jbd2: jbd2_journal_init_{dev,inode} return proper error return value
jbd2: drop useless error tag in jbd2_journal_wipe()
...
Changes include:
- Allow blocking posix lock requests to be interrupted while waiting.
This requires a cancel request to be sent to the userspace daemon
where posix lock requests are processed across the cluster.
- Fix a posix lock patch from the previous cycle in which lock requests
from different file systems could be mixed up.
- Fix some long standing problems with nfs posix lock cancelation.
- Add a new debugfs file for printing queued callbacks.
- Stop modifying buffers that have been used to receive a message.
- Misc cleanups and some refactoring.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJk8KCgAAoJEDgbc8f8gGmqfk4P/jB4L2qwaamq2mNRxFPXSzpp
y5UiNoMG8Mw4OT9vytu2xzmmrYT7d1TvZ4lNcLYjkNYmcyuTZzu8o/kvGwt9gnXC
94DPmGQb0RQY/pZOdTMcIBplXiCSFpooweFOQjiWo7wlwVlYGVcfEIv9xQTNIT2/
m0niBFEWDDbVudbWXXaa4lnvo07RTmSxiHjtxqbkea2jLUgxw9mYOR8C6De3rlJf
Uh450Kitktak9tywBZa3yj8Cgy8SbiWNHlNvcV1DI3QE7LKOM5+6qVuwERYYx9lw
JbdtEoRr97QFf4w40YrJpAxFBiHCLXAquz3D3cJI8mW0RDqDuGUFU6SfsCfQEza6
Dau6XrtfuumArMn/zViBIase9xkSb36RNFopr2Si6mUoLpPalUPuLr+42qmxZY3c
KOvWis4UFq4OiOqZY5gBBS6IKoJ+X4pVnNJswScvKFA2VBLCf9fucKRoEVOAUTbg
BoJEwOjBQCoaATbGBHjwdjZ4yX/x/tLN0LsPW202QOMGdfSdeD6Wr+COyS916eVK
8Nk3lcBcU21Nhulf2Ci3Zr6B9nG09UqDRHYfH0LJJX0dq++SBRvQvjI2lcdJ0Dvj
We7nVqhcW/r486oS/r8kTXOtctYYMxecoQFYPcVufQAIU8+6YZUD53wui8EyVL/2
3GmejZgMomvGn8D4kNPC
=BBCe
-----END PGP SIGNATURE-----
Merge tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- Allow blocking posix lock requests to be interrupted while waiting.
This requires a cancel request to be sent to the userspace daemon
where posix lock requests are processed across the cluster.
- Fix a posix lock patch from the previous cycle in which lock requests
from different file systems could be mixed up.
- Fix some long standing problems with nfs posix lock cancelation.
- Add a new debugfs file for printing queued callbacks.
- Stop modifying buffers that have been used to receive a message.
- Misc cleanups and some refactoring.
* tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: fix plock lookup when using multiple lockspaces
fs: dlm: don't use RCOM_NAMES for version detection
fs: dlm: create midcomms nodes when configure
fs: dlm: constify receive buffer
fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy
fs: dlm: drop rxbuf manipulation in dlm_copy_master_names
fs: dlm: get recovery sequence number as parameter
fs: dlm: cleanup lock order
fs: dlm: remove clear_members_cb
fs: dlm: add plock dev tracepoints
fs: dlm: check on plock ops when exit dlm
fs: dlm: debugfs for queued callbacks
fs: dlm: remove unused processed_nodes
fs: dlm: add missing spin_unlock
fs: dlm: fix F_CANCELLK to cancel pending request
fs: dlm: allow to F_SETLKW getting interrupted
fs: dlm: remove twice newline
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZPBy4AAKCRCRxhvAZXjc
ok3jAP9+iZREbmcPgrAUGZOjq7+Gx1kJ297Uw/LKiWmxZeX2NwD/cKyv239YXHBM
CB4dCwk3pvBZ8uD4dUonDX3PJYFauAU=
=knzN
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull more superblock follow-on fixes from Christian Brauner:
"This contains two more small follow-up fixes for the super work this
cycle. I went through all filesystems once more and detected two minor
issues that still needed fixing:
- Some filesystems support mtd devices (e.g., mount -t jffs2 mtd2
/mnt). The mtd infrastructure uses the sb->s_mtd pointer to find an
existing superblock. When the mtd device is put and sb->s_mtd
cleared the superblock can still be found fs_supers and so this
risks a use-after-free.
Add a small patch that aligns mtd with what we did for regular
block devices and switch keying to rely on sb->s_dev.
(This was tested with mtd devices and jffs2 as xfstests doesn't
support mtd devices.)
- Switch nfs back to rely on kill_anon_super() so the superblock is
removed from the list of active supers before sb->s_fs_info is
freed"
* tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
NFS: switch back to using kill_anon_super
mtd: key superblock by device number
fs: export sget_dev()
Commit 1756ddea69 ("pstore: Remove worst-case compression size logic")
removed some clunky per-algorithm worst case size estimation routines on
the basis that we can always store pstore records uncompressed, and
these worst case estimations are about how much the size might
inadvertently *increase* due to encapsulation overhead when the input
cannot be compressed at all. So if compression results in a size
increase, we just store the original data instead.
However, it seems that the original code was misinterpreting these
calculations as an estimation of how much uncompressed data might fit
into a compressed buffer of a given size, and it was using the results
to consume the input data in larger chunks than the pstore record size,
relying on the compression to ensure that what ultimately gets stored
fits into the available space.
One result of this, as observed and reported by Linus, is that upgrading
to a newer kernel that includes the given commit may result in pstore
decompression errors reported in the kernel log. This is due to the fact
that the existing records may unexpectedly decompress to a size that is
larger than the pstore record size.
Another potential problem caused by this change is that we may
underutilize the fixed sized records on pstore backends such as ramoops.
And on pstore backends with variable sized records such as EFI, we will
end up creating many more entries than before to store the same amount
of compressed data.
So let's fix both issues, by bringing back the typical case estimation of
how much ASCII text captured from the dmesg log might fit into a pstore
record of a given size after compression. The original implementation
used the computation given below for zlib:
switch (size) {
/* buffer range for efivars */
case 1000 ... 2000:
cmpr = 56;
break;
case 2001 ... 3000:
cmpr = 54;
break;
case 3001 ... 3999:
cmpr = 52;
break;
/* buffer range for nvram, erst */
case 4000 ... 10000:
cmpr = 45;
break;
default:
cmpr = 60;
break;
}
return (size * 100) / cmpr;
We will use the previous worst-case of 60% for compression. For
decompression go extra large (3x) so we make sure there's enough space
for anything.
While at it, rate limit the error message so we don't flood the log
unnecessarily on systems that have accumulated a lot of pstore history.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230830212238.135900-1-ardb@kernel.org
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Convert IBT selftest to asm to fix objtool warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
=3UUm
-----END PGP SIGNATURE-----
Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
NLS_UCS2_UTILS is an option selected by filesystems that need it,
don't expose it to users.
Fixes: 089f7f5913 ("fs/smb: Swing unicode common code from smb->NLS")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently with directory leases we cache directory contents for a fixed period
of time (default 30 seconds) but for many workloads this is too short. Allow
configuring the maximum amount of time directory entries are cached when a
directory lease is held on that directory. Add module load parm "max_dir_cache"
For example to set the timeout to 10 minutes you would do:
echo 600 > /sys/module/cifs/parameters/dir_cache_timeout
or to disable caching directory contents:
echo 0 > /sys/module/cifs/parameters/dir_cache_timeout
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The num_fwd in MClientRequestForward is int32_t, while the num_fwd
in ceph_mds_request_head is __u8. This is buggy when the num_fwd
is larger than 256 it will always be truncate to 0 again. But the
client couldn't recoginize this.
This will make them to __u32 instead. Because the old cephs will
directly copy the raw memories when decoding the reqeust's head,
so we need to make sure this kclient will be compatible with old
cephs. For newer cephs they will decode the requests depending
the version, which will be much simpler and easier to extend new
members.
Link: https://tracker.ceph.com/issues/62145
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
NFS switch to open coding kill_anon_super in 7b14a21389
("nfs: don't call bdi_unregister") to avoid the extra bdi_unregister
call. At that point bdi_destroy was called in nfs_free_server and
thus it required a later freeing of the anon dev_t. But since
0db10944a7 ("nfs: Convert to separately allocated bdi") the bdi has
been free implicitly by the sb destruction, so this isn't needed
anymore.
By not open coding kill_anon_super, nfs now inherits the fix in
dc3216b141 ("super: ensure valid info"), and we remove the only
open coded version of kill_anon_super.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230831052940.256193-1-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
They will be used for mtd devices as well.
Acked-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230829-vfs-super-mtd-v1-1-fecb572e5df3@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In addition to the EINVAL, there may be an ENOMEM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 70431bfd82 ("cifs: Support fscache indexing rewrite")
Signed-off-by: Katya Orlova <e.orlova@ispras.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmTvk44ACgkQiiy9cAdy
T1GT+wwAkiM+BFVoW0QhLK9ztPptYofGiTKr1AySV09AWos9Fwdhv0aS4LDKhauW
PVsORfnFcLdyAHtgX2DlhJHMpLWDz3Z51KWiUSo7AAZjIp/4K0yEarg4WKPtUPN0
PMET2OuqAfIfYLCxSZYFjiGK6xgSJEz+xIhX0qJPRZsyJp50WlFlyZRUfFa+6hXt
pguatCVw4qhP9hkdcklCY8rwlFDdWEHj9wD/PB2Qschw4gzxDUMwOJjDgT6PNxjA
SAC6J+NQVtMcnASd5pn0+Mbc+vNfKZ0PM+KZcDrJphcBz+arY6Hu57v3/yu2y++L
DqRI6QtEwVmHzytM51x5JaWFE0Asj/NsH69LVm4bXVIkkcXBut6lLhrd/KVSP+xN
LY4EcYEoufAAaecrQrMO4x2Tm10f+GMi1Fh9NvpLZVRrUXy4rdxLP2aC+q+i3uY3
34FaAbpjQ7NJq2yZTL8xDOdCvi8E3t58DsBv4jA9Y/SYGWYao8Kw0vxhCt0SVZPc
HaoMfkxl
=CtQO
-----END PGP SIGNATURE-----
Merge tag '6.6-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- fixes for excessive stack usage
- multichannel reconnect improvements
- DFS fix and cleanup patches
- move UCS-2 conversion code to fs/nls and update cifs and jfs to use
them
- cleanup patch for compounding, one to fix confusing function name
- inode number collision fix
- reparse point fixes (including avoiding an extra unneeded query on
symlinks) and a minor cleanup
- directory lease (caching) improvement
* tag '6.6-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
fs/jfs: Use common ucs2 upper case table
fs/smb/client: Use common code in client
fs/smb: Swing unicode common code from smb->NLS
fs/smb: Remove unicode 'lower' tables
SMB3: rename macro CIFS_SERVER_IS_CHAN to avoid confusion
[SMB3] send channel sequence number in SMB3 requests after reconnects
cifs: update desired access while requesting for directory lease
smb: client: reduce stack usage in smb2_query_reparse_point()
smb: client: reduce stack usage in smb2_query_info_compound()
smb: client: reduce stack usage in smb2_set_ea()
smb: client: reduce stack usage in smb_send_rqst()
smb: client: reduce stack usage in cifs_demultiplex_thread()
smb: client: reduce stack usage in cifs_try_adding_channels()
smb: cilent: set reparse mount points as automounts
smb: client: query reparse points in older dialects
smb: client: do not query reparse points twice on symlinks
smb: client: parse reparse point flag in create response
smb: client: get rid of dfs code dep in namespace.c
smb: client: get rid of dfs naming in automount code
smb: client: rename cifs_dfs_ref.c to namespace.c
...
* Chandan Babu will be taking over as the XFS release manager. He has
reviewed all the patches that are in this branch, though I'm signing
the branch one last time since I'm still technically maintainer. :P
* Create a maintainer entry profile for XFS in which we lay out the
various roles that I have played for many years. Aside from release
manager, the remaining roles are as yet unfilled.
* Start merging online repair -- we now have in-memory pageable memory
for staging btrees, a bunch of pending fixes, and we've started the
process of refactoring the scrub support code to support more of
repair. In particular, reaping of old blocks from damaged structures.
* Scrub the realtime summary file.
* Fix a bug where scrub's quota iteration only ever returned the root
dquot. Oooops.
* Fix some typos.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZOQE2AAKCRBKO3ySh0YR
pvmZAQDe+KceaVx6Dv2f9ihckeS2dILSpDTo1bh9BeXnt005VwD/ceHTaJxEl8lp
u/dixFDkRgp9RYtoTAK2WNiUxYetsAc=
=oZN6
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
- Chandan Babu will be taking over as the XFS release manager. He has
reviewed all the patches that are in this branch, though I'm signing
the branch one last time since I'm still technically maintainer. :P
- Create a maintainer entry profile for XFS in which we lay out the
various roles that I have played for many years. Aside from release
manager, the remaining roles are as yet unfilled.
- Start merging online repair -- we now have in-memory pageable memory
for staging btrees, a bunch of pending fixes, and we've started the
process of refactoring the scrub support code to support more of
repair. In particular, reaping of old blocks from damaged structures.
- Scrub the realtime summary file.
- Fix a bug where scrub's quota iteration only ever returned the root
dquot. Oooops.
- Fix some typos.
[ Pull request from Chandan Babu, but signed tag and description from
Darrick Wong, thus the first person singular above is Darrick, not
Chandan ]
* tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (37 commits)
fs/xfs: Fix typos in comments
xfs: fix dqiterate thinko
xfs: don't check reflink iflag state when checking cow fork
xfs: simplify returns in xchk_bmap
xfs: rewrite xchk_inode_is_allocated to work properly
xfs: hide xfs_inode_is_allocated in scrub common code
xfs: fix agf_fllast when repairing an empty AGFL
xfs: allow userspace to rebuild metadata structures
xfs: clear pagf_agflreset when repairing the AGFL
xfs: allow the user to cancel repairs before we start writing
xfs: don't complain about unfixed metadata when repairs were injected
xfs: implement online scrubbing of rtsummary info
xfs: always rescan allegedly healthy per-ag metadata after repair
xfs: move the realtime summary file scrubber to a separate source file
xfs: wrap ilock/iunlock operations on sc->ip
xfs: get our own reference to inodes that we want to scrub
xfs: track usage statistics of online fsck
xfs: improve xfarray quicksort pivot
xfs: create scaffolding for creating debugfs entries
xfs: cache pages used for xfarray quicksort convergence
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmTvFssACgkQnJ2qBz9k
QNl7HggAuY154urYJdh7M+mbKDSywhcK0YT5pNNkcXVpv/t2c073Ce57+ObDCBaS
xetyFgH2XlvuAJ4dWmRDwBEzJ0jquKzvYJEMiXAexgy47ctnNPx5kLPsXpt3g+2q
pro7sK1b5BmX/zrgOontbJ8/YAwX85XToD4Cv5XyNSx/ex6/zsd5FProfdiY/HAt
qAcv7NkNTBbJBEBHhBNQSL2wOj3LzQV1U8v0XEcsBvTUxlX2jH8J4CsuFIotXqCF
37SNvZPk2c04HbaLgyU4Ura69qD0fn4vTMocuCoaf0CN2PL5jblRAwsAO2bfSqJE
AxZFq3afI0YV3Y9OrVlzHtSALuiZMQ==
=QPEQ
-----END PGP SIGNATURE-----
Merge tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, quota, and udf updates from Jan Kara:
- fixes for possible use-after-free issues with quota when racing with
chown
- fixes for ext2 crashing when xattr allocation races with another
block allocation to the same file from page writeback code
- fix for block number overflow in ext2
- marking of reiserfs as obsolete in MAINTAINERS
- assorted minor cleanups
* tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix kernel-doc warnings
ext2: improve consistency of ext2_fsblk_t datatype usage
ext2: dump current reservation window info
ext2: fix race between setxattr and write back
ext2: introduce new flags argument for ext2_new_blocks()
ext2: remove ext2_new_block()
ext2: fix datatype of block number in ext2_xattr_set2()
udf: Drop pointless aops assignment
quota: use lockdep_assert_held_write in dquot_load_quota_sb
MAINTAINERS: change reiserfs status to obsolete
udf: Fix -Wstringop-overflow warnings
quota: simplify drop_dquot_ref()
quota: fix dqput() to follow the guarantees dquot_srcu should provide
quota: add new helper dquot_active()
quota: rename dquot_active() to inode_quota_active()
quota: factor out dquot_write_dquot()
ext2: remove redundant assignment to variable desc and variable best_desc
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmTu0QoACgkQEVvVyTe/
1WpbzBAAjIZXzhn8KldDpG0muw9JKaSOxM45uhZE1s/2uKsVCyp4k3lubTbxxYO1
S9rUjhF2gSJFOfuSOK/XXEKXyu4MGT7iy7pKswu0k8+AHDDRBksPXJKA/AkhLPUr
vX1pU6aWw2OSn1xdhIgY+F4DveyzYQL/CEoUzFyRPxSB0G/yjktRAjdZ2HL4cAvN
eVXPyTj0bd4LVj1ITla4uj8DbgivrqmRJbZ9bKnSRE8GXWBriJhV//M2Q3QRno+W
04TtAvyh+klQeqZFVOQ0reZUFZzYBBZZTmqoFiUzTny7oljWl5F0+JfJOHhRGknG
LYZCia34+T6TZPhOnZzT/szTDoXVvNJhEf+vBQCqhaCugqJc/2uJdw9CW8ZcDvA9
ZNOMxEbXE4VgGjJ0HM6MoDMUoIEUiNWEnXWEaKyCAfOPqgYwPy+QeDO4JtBPQpRn
fwZx7Xpc1FLpTc9feHxzox9o81S8rPRMycUBg2c3KZB6TFnYNDxWIIo365naMCzz
A8IDVGf+gd+S4NaZvh9FUijciIslYfyFgqwQERZmJnpDk1d1NyeUC7Nn7EkmUpyp
guRaC+rUcqYP4CpuSHTCPle94qHqiAkbsKSJWebZ2M1j9fjZ+okPw0k83Nih79vu
vRhs70Ah51v1lpBb0mlDjsV3vKm3Apv8nMJKZvVuC+Cw6Qiob5s=
=F4Hi
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- add verification feature needed by composefs (Alexander Larsson)
- improve integration of overlayfs and fanotify (Amir Goldstein)
- fortify some overlayfs code (Andrea Righi)
* tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: validate superblock in OVL_FS()
ovl: make consistent use of OVL_FS()
ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG
ovl: auto generate uuid for new overlay filesystems
ovl: store persistent uuid/fsid with uuid=on
ovl: add support for unique fsid per instance
ovl: support encoding non-decodable file handles
ovl: Handle verity during copy-up
ovl: Validate verity xattr when resolving lowerdata
ovl: Add versioned header for overlay.metacopy xattr
ovl: Add framework for verity support
The comma at the end of the line was leftover from an earlier refactor
of the _nfs4_pnfs_v3_ds_connect() function. This is technically valid C,
so the compilers didn't catch it, but if I'm understanding how it works
correctly it assigns the return value of rpc_clnt_add_xprtr() to
xprtdata.cred.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: a12f996d34 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the client sent a synchronous copy and the server replied with
ERR_OFFLOAD_NO_REQ indicating that it wants an asynchronous
copy instead, the client should retry with asynchronous copy.
Fixes: 539f57b3e0 ("NFS handle COPY ERR_OFFLOAD_NO_REQS")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 64cfca85ba asserts the only valid return values for
nfs2/3_decode_dirent should not include -ENAMETOOLONG, but for a server
that sends a filename3 which exceeds MAXNAMELEN in a READDIR response the
client's behavior will be to endlessly retry the operation.
We could map -ENAMETOOLONG into -EBADCOOKIE, but that would produce
truncated listings without any error. The client should return an error
for this case to clearly assert that the server implementation must be
corrected.
Fixes: 64cfca85ba ("NFS: Return valid errors from nfs2/3_decode_dirent()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Use the UCS-2 upper case tables from nls, that are shared
with smb.
This code in JFS is hard to test, so we're only reusing the
same tables (which are identical), not trying to reuse the
rest of the helper functions.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now we've got the common code, use it for the client as well.
Note there's a change here where we're using the server version of
UniStrcat now which had different types (__le16 vs wchar_t) but
it's not interpreting the value other than checking for 0, however
we do need casts to keep sparse happy.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Swing most of the inline functions and unicode tables into nls
from the copy in smb/server. This is UCS-2 rather than most
of the rest of the code in NLS, but it currently seems like the
best place for it.
The actual unicode.c implementations vary much more between server
and client so they're unmoved.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The unicode glue in smb/*/..uniupr.h has a section guarded
by 'ifndef UNIUPR_NOLOWER' - but that's always
defined in smb/*/..unicode.h. Nuke those tables.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Since older dialects such as CIFS do not support multichannel
the macro CIFS_SERVER_IS_CHAN can be confusing (it requires SMB 3
or later) so shorten its name to "SERVER_IS_CHAN"
Suggested-by: Tom Talpey <tom@talpey.com>
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs08EQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqa4EACu/zKE+omGXBV0Q7kEpVsChjp0ElGtSDIJ
tJfTuvnWqQjrqRv4ksmZvGdx8SkqFuXri4/7oBXlsaqeUVbIQdWJUpLErBye6nxa
lUb6nXOFWwyG94cMRYs71lN0loosjb7aiVw7oVLAIhntq3p3doFl/cyy3ndMZrUE
pZbsrWSt4QiOKhcO0TtIjfAwsr31AN51qFiNNITEiZl3UjXfkGRCK81X0yM2N8zZ
7Y0h1ldPBsZ/olNWeRyaW1uB64nKM0buR7/nDxCV/NI05nndJ34bIgo/JIj4xy0v
SiBj2+y86+oMJZt17yYENwOQdtX3hbyESGuVm9dCrO0t9/byVQxkUk0OMm65BM/l
l2d+gmMQZTbHziqfLlgq9i3i9+B4C2hsb7iBpuo7SW/FPbM45POgi3lpiZycaZyu
krQo1qwL4KSGXzGN9CabEuKDcJcXqLxqMDOyEDA3R5Kz06V9tNuM+Di/mr4vuZHK
sVHUfHuWBO9ionLlGPdc3fH/CuMqic8SHjumiAm2menBZV6cSzRDxpm6H4CyLt7y
tWmw7BNU7dfHFGd+Jw0Ld49sAuEybszEXq6qYv5uYBVfJNqDvOvEeVoQp0RN2jJA
AG30hymcZgxn9n7gkIgkPQDgIGUjnzUR8B2mE2UFU1CYVHXYXAXU55CCI5oeTkbs
d0Y/zCZf1A==
=p1bd
-----END PGP SIGNATURE-----
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
"Pretty quiet round for this release. This contains:
- Add support for zoned storage to ublk (Andreas, Ming)
- Series improving performance for drivers that mark themselves as
needing a blocking context for issue (Bart)
- Cleanup the flush logic (Chengming)
- sed opal keyring support (Greg)
- Fixes and improvements to the integrity support (Jinyoung)
- Add some exports for bcachefs that we can hopefully delete again in
the future (Kent)
- deadline throttling fix (Zhiguo)
- Series allowing building the kernel without buffer_head support
(Christoph)
- Sanitize the bio page adding flow (Christoph)
- Write back cache fixes (Christoph)
- MD updates via Song:
- Fix perf regression for raid0 large sequential writes (Jan)
- Fix split bio iostat for raid0 (David)
- Various raid1 fixes (Heinz, Xueshi)
- raid6test build fixes (WANG)
- Deprecate bitmap file support (Christoph)
- Fix deadlock with md sync thread (Yu)
- Refactor md io accounting (Yu)
- Various non-urgent fixes (Li, Yu, Jack)
- Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"
* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
block: use strscpy() to instead of strncpy()
block: sed-opal: keyring support for SED keys
block: sed-opal: Implement IOC_OPAL_REVERT_LSP
block: sed-opal: Implement IOC_OPAL_DISCOVERY
blk-mq: prealloc tags when increase tagset nr_hw_queues
blk-mq: delete redundant tagset map update when fallback
blk-mq: fix tags leak when shrink nr_hw_queues
ublk: zoned: support REQ_OP_ZONE_RESET_ALL
md: raid0: account for split bio in iostat accounting
md/raid0: Fix performance regression for large sequential writes
md/raid0: Factor out helper for mapping and submitting a bio
md raid1: allow writebehind to work on any leg device set WriteMostly
md/raid1: hold the barrier until handle_read_error() finishes
md/raid1: free the r1bio before waiting for blocked rdev
md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
drivers/rnbd: restore sysfs interface to rnbd-client
md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
raid6: test: only check for Altivec if building on powerpc hosts
raid6: test: make sure all intermediate and artifact files are .gitignored
...
Long ago we set out to remove the kitchen sink on kernel/sysctl.c arrays and
placings sysctls to their own sybsystem or file to help avoid merge conflicts.
Matthew Wilcox pointed out though that if we're going to do that we might as
well also *save* space while at it and try to remove the extra last sysctl
entry added at the end of each array, a sentintel, instead of bloating the
kernel by adding a new sentinel with each array moved.
Doing that was not so trivial, and has required slowing down the moves of
kernel/sysctl.c arrays and measuring the impact on size by each new move.
The complex part of the effort to help reduce the size of each sysctl is being
done by the patient work of el señor Don Joel Granados. A lot of this is truly
painful code refactoring and testing and then trying to measure the savings of
each move and removing the sentinels. Although Joel already has code which does
most of this work, experience with sysctl moves in the past shows is we need to
be careful due to the slew of odd build failures that are possible due to the
amount of random Kconfig options sysctls use.
To that end Joel's work is split by first addressing the major housekeeping
needed to remove the sentinels, which is part of this merge request. The rest
of the work to actually remove the sentinels will be done later in future
kernel releases.
At first I was only going to send his first 7 patches of his patch series,
posted 1 month ago, but in retrospect due to the testing the changes have
received in linux-next and the minor changes they make this goes with the
entire set of patches Joel had planned: just sysctl house keeping. There are
networking changes but these are part of the house keeping too.
The preliminary math is showing this will all help reduce the overall build
time size of the kernel and run time memory consumed by the kernel by about
~64 bytes per array where we are able to remove each sentinel in the future.
That also means there is no more bloating the kernel with the extra ~64 bytes
per array moved as no new sentinels are created.
Most of this has been in linux-next for about a month, the last 7 patches took
a minor refresh 2 week ago based on feedback.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmTuVnMSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinIckP/imvRlfkO6L0IP7MmJBRPtwY01rsTAKO
Q14dZ//bG4DVQeGl1FdzrF6hhuLgekU0qW1YDFIWiCXO7CbaxaNBPSUkeW6ReVoC
R/VHNUPxSR1PWQy1OTJV2t4XKri2sB7ijmUsfsATtISwhei9bggTHEysShtP4tv+
U87DzhoqMnbYIsfMo49KCqOa1Qm7TmjC1a7WAp6Fph3GJuXAzZR5pXpsd0NtOZ9x
Ud5RT22icnQpMl7K+yPsqY6XcS5JkgBe/WbSzMAUkYZvBZFBq9t2D+OW5h9TZMhw
piJWQ9X0Rm7qI2D15mJfXwaOhhyDhWci391hzdJmS6DI0prf6Ma2NFdAWOt/zomI
uiRujS4bGeBUaK5F4TX2WQ1+jdMtAZ+0FncFnzt4U8q7dzUc91uVCm6iHW3gcfAb
N7OEg2ZL0gkkgCZHqKxN8wpNQiC2KwnNk+HLAbnL2a/oJYfBtdopQmlxWfrN2hpF
xxROiENqk483BRdMXDq6DR/gyDZmZWCobXIglSzlqCOjCOcLbDziIJ7pJk83ok09
h/QnXTYHf9protBq9OIQesgh2pwNzBBLifK84KZLKcb7IbdIKjpQrW5STp04oNGf
wcGJzEz8tXUe0UKyMM47AcHQGzIy6cdXNLjyF8a+m7rnZzr1ndnMqZyRStZzuQin
AUg2VWHKPmW9
=sq2p
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"Long ago we set out to remove the kitchen sink on kernel/sysctl.c
arrays and placings sysctls to their own sybsystem or file to help
avoid merge conflicts. Matthew Wilcox pointed out though that if we're
going to do that we might as well also *save* space while at it and
try to remove the extra last sysctl entry added at the end of each
array, a sentintel, instead of bloating the kernel by adding a new
sentinel with each array moved.
Doing that was not so trivial, and has required slowing down the moves
of kernel/sysctl.c arrays and measuring the impact on size by each new
move.
The complex part of the effort to help reduce the size of each sysctl
is being done by the patient work of el señor Don Joel Granados. A lot
of this is truly painful code refactoring and testing and then trying
to measure the savings of each move and removing the sentinels.
Although Joel already has code which does most of this work,
experience with sysctl moves in the past shows is we need to be
careful due to the slew of odd build failures that are possible due to
the amount of random Kconfig options sysctls use.
To that end Joel's work is split by first addressing the major
housekeeping needed to remove the sentinels, which is part of this
merge request. The rest of the work to actually remove the sentinels
will be done later in future kernel releases.
The preliminary math is showing this will all help reduce the overall
build time size of the kernel and run time memory consumed by the
kernel by about ~64 bytes per array where we are able to remove each
sentinel in the future. That also means there is no more bloating the
kernel with the extra ~64 bytes per array moved as no new sentinels
are created"
* tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: Use ctl_table_size as stopping criteria for list macro
sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
vrf: Update to register_net_sysctl_sz
networking: Update to register_net_sysctl_sz
netfilter: Update to register_net_sysctl_sz
ax.25: Update to register_net_sysctl_sz
sysctl: Add size to register_net_sysctl function
sysctl: Add size arg to __register_sysctl_init
sysctl: Add size to register_sysctl
sysctl: Add a size arg to __register_sysctl_table
sysctl: Add size argument to init_header
sysctl: Add ctl_table_size to ctl_table_header
sysctl: Use ctl_table_header in list_for_each_table_entry
sysctl: Prefer ctl_table_header in proc_sysctl
("refactor Kconfig to consolidate KEXEC and CRASH options").
- kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
couple of macros to args.h").
- gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
commands").
- vsprintf inclusion rationalization from Andy Shevchenko
("lib/vsprintf: Rework header inclusions").
- Switch the handling of kdump from a udev scheme to in-kernel handling,
by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
un/plug").
- Many singleton patches to various parts of the tree
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
=WJQa
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options")
- kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
couple of macros to args.h")
- gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
commands")
- vsprintf inclusion rationalization from Andy Shevchenko
("lib/vsprintf: Rework header inclusions")
- Switch the handling of kdump from a udev scheme to in-kernel
handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
hot un/plug")
- Many singleton patches to various parts of the tree
* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
document while_each_thread(), change first_tid() to use for_each_thread()
drivers/char/mem.c: shrink character device's devlist[] array
x86/crash: optimize CPU changes
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
crash: hotplug support for kexec_load()
x86/crash: add x86 crash hotplug support
crash: memory and CPU hotplug sysfs attributes
kexec: exclude elfcorehdr from the segment digest
crash: add generic infrastructure for crash hotplug support
crash: move a few code bits to setup support of crash hotplug
kstrtox: consistently use _tolower()
kill do_each_thread()
nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
scripts/bloat-o-meter: count weak symbol sizes
treewide: drop CONFIG_EMBEDDED
lockdep: fix static memory detection even more
lib/vsprintf: declare no_hash_pointers in sprintf.h
lib/vsprintf: split out sprintf() and friends
kernel/fork: stop playing lockless games for exe_file replacement
adfs: delete unused "union adfs_dirtail" definition
...
The XDR specification in RFC 8881 looks like this:
struct device_addr4 {
layouttype4 da_layout_type;
opaque da_addr_body<>;
};
struct GETDEVICEINFO4resok {
device_addr4 gdir_device_addr;
bitmap4 gdir_notification;
};
union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
case NFS4_OK:
GETDEVICEINFO4resok gdir_resok4;
case NFS4ERR_TOOSMALL:
count4 gdir_mincount;
default:
void;
};
Looking at nfsd4_encode_getdeviceinfo() ....
When the client provides a zero gd_maxcount, then the Linux NFS
server implementation encodes the da_layout_type field and then
skips the da_addr_body field completely, proceeding directly to
encode gdir_notification field.
There does not appear to be an option in the specification to skip
encoding da_addr_body. Moreover, Section 18.40.3 says:
> If the client wants to just update or turn off notifications, it
> MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
> In that event, if the device ID is valid, the reply's da_addr_body
> field of the gdir_device_addr field will be of zero length.
Since the layout drivers are responsible for encoding the
da_addr_body field, put this fix inside the ->encode_getdeviceinfo
methods.
Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tom Haynes <loghyr@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In addition to the benefits of using an enum rather than a set of
macros, we now have a named type that can improve static type
checking of function return values.
As part of this change, I removed a stale comment from svcauth.h;
the return values from current implementations of the
auth_ops::release method are all zero/negative errno, not the SVC_OK
enum values as the old comment suggested.
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Most svc threads have no interest in a timeout.
nfsd sets it to 1 hour, but this is a wart of no significance.
lockd uses the timeout so that it can call nlmsvc_retry_blocked().
It also sometimes calls svc_wake_up() to ensure this is called.
So change lockd to be consistent and always use svc_wake_up() to trigger
nlmsvc_retry_blocked() - using a timer instead of a timeout to
svc_recv().
And change svc_recv() to not take a timeout arg.
This makes the sp_threads_timedout counter always zero.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_recv() currently returns a 0 on success or one of two errors:
- -EAGAIN means no message was successfully received
- -EINTR means the thread has been told to stop
Previously nfsd would stop as the result of a signal as well as
following kthread_stop(). In that case the difference was useful: EINTR
means stop unconditionally. EAGAIN means stop if kthread_should_stop(),
continue otherwise.
Now threads only exit when kthread_should_stop() so we don't need the
distinction.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
All callers of svc_recv() go on to call svc_process() on success.
Simplify callers by having svc_recv() do that for them.
This loses one call to validate_process_creds() in nfsd. That was
debugging code added 14 years ago. I don't think we need to keep it.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that the last nfsd thread is stopped by an explicit act of calling
svc_set_num_threads() with a count of zero, we only have a limited
number of places that can happen, and don't need to call
nfsd_last_thread() in nfsd_put()
So separate that out and call it at the two places where the number of
threads is set to zero.
Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all()
into nfsd_last_thread(), as they are really part of the same action.
nfsd_put() is now a thin wrapper around svc_put(), so make it a static
inline.
nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of
places we have to use svc_put() instead.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Previously a thread could exit asynchronously (due to a signal) so some
care was needed to hold nfsd_mutex over the last svc_put() call. Now a
thread can only exit when svc_set_num_threads() is called, and this is
always called under nfsd_mutex. So no care is needed.
Not only is the mutex held when a thread exits now, but the svc refcount
is elevated, so the svc_put() in svc_exit_thread() will never be a final
put, so the mutex isn't even needed at this point in the code.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.
In commit 3ebdbe5203 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.
This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.
This will allow for some simplification in later patches.
A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd allows SIGKILL and responds by dropping all locks and restarting
the grace period. This functionality has been present since 2.1.32 when
lockd was added to Linux.
This functionality is undocumented and most likely added as a useful
debug aid. When there is a need to drop locks, the better approach is
to use /proc/fs/nfsd/unlock_*.
This patch removes SIGKILL handling as part of preparation for removing
all signal handling from sunrpc service threads.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
Null pointer passed as 2nd argument to memory copy function.
Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
NULL if 'hostname' is invalid.
Fixes: 77a3ef33e2 ("NSM: More clean up of nsm_get_handle()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Remove kernel-doc warning in exportfs:
fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent'
not described in 'exportfs_encode_inode_fh'
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@
ACEs, but there is no requirement for inheritable entries for those
entities. POSIX ACLs must always have owner/group/other entries, even for a
default ACL.
nfsd builds the default ACL from inheritable ACEs, but the current code
just leaves any unspecified ACEs zeroed out. The result is that adding a
default user or group ACE to an inode can leave it with unwanted deny
entries.
For instance, a newly created directory with no acl will look something
like this:
# NFSv4 translation by server
A::OWNER@:rwaDxtTcCy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
# POSIX ACL of underlying file
user::rwx
group::r-x
other::r-x
...if I then add new v4 ACE:
nfs4_setfacl -a A:fd:1000:rwx /mnt/local/test
...I end up with a result like this today:
user::rwx
user:1000:rwx
group::r-x
mask::rwx
other::r-x
default:user::---
default:user:1000:rwx
default:group::---
default😷:rwx
default:other::---
A::OWNER@:rwaDxtTcCy
A::1000:rwaDxtcy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
D:fdi:OWNER@:rwaDx
A:fdi:OWNER@:tTcCy
A:fdi:1000:rwaDxtcy
A:fdi:GROUP@:tcy
A:fdi:EVERYONE@:tcy
...which is not at all expected. Adding a single inheritable allow ACE
should not result in everyone else losing access.
The setfacl command solves a silimar issue by copying owner/group/other
entries from the effective ACL when none of them are set:
"If a Default ACL entry is created, and the Default ACL contains no
owner, owning group, or others entry, a copy of the ACL owner,
owning group, or others entry is added to the Default ACL.
Having nfsd do the same provides a more sane result (with no deny ACEs
in the resulting set):
user::rwx
user:1000:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:1000:rwx
default:group::r-x
default😷:rwx
default:other::r-x
A::OWNER@:rwaDxtTcCy
A::1000:rwaDxtcy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
A:fdi:OWNER@:rwaDxtTcCy
A:fdi:1000:rwaDxtcy
A:fdi:GROUP@:rxtcy
A:fdi:EVERYONE@:rxtcy
Reported-by: Ondrej Valousek <ondrej.valousek@diasemi.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2136452
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch fixes races when lockd accesses the global nlm_blocked list.
It was mostly safe to access the list because everything was accessed
from the lockd kernel thread context but there exist cases like
nlmsvc_grant_deferred() that could manipulate the nlm_blocked list and
it can be called from any context.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In the event that we can't fetch post_op_attr attributes, we still need
to set a value for the after_change. The operation has already happened,
so we're not able to return an error at that point, but we do want to
ensure that the client knows that its cache should be invalidated.
If we weren't able to fetch post-op attrs, then just set the
after_change to before_change + 1. The atomic flag should already be
clear in this case.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
At one time, nfsd would scrape inode information directly out of struct
inode in order to populate the change_info4. At that time, the BUG_ON in
set_change_info made some sense, since having it unset meant a coding
error.
More recently, it calls vfs_getattr to get this information, which can
fail. If that fails, fh_pre_saved can end up not being set. While this
situation is unfortunate, we don't need to crash the box.
Move set_change_info to nfs4proc.c since all of the callers are there.
Revise the condition for setting "atomic" to also check for
fh_pre_saved. Drop the BUG_ON and just have it zero out both
change_attr4s when this occurs.
Reported-by: Boyang Xue <bxue@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2223560
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Collecting pre_op_attrs can fail, in which case it's probably best to
fail the whole operation.
Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and
have the callers check the return code and abort the operation if it's
not nfs_ok.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I got this today from modpost:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o
Add a module description.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The svc_ prefix is identified with the SunRPC layer. Although the
duplicate reply cache caches RPC replies, it is only for the NFS
protocol. Rename the struct to better reflect its purpose.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Over time I'd like to see NFS-specific fields moved out of struct
svc_rqst, which is an RPC layer object. These fields are layering
violations.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Avoid holding the bucket lock while freeing cache entries. This
change also caps the number of entries that are freed when the
shrinker calls to reduce the shrinker's impact on the cache's
effectiveness.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Enable nfsd_prune_bucket() to drop the bucket lock while calling
kfree(). Use the same pattern that Jeff recently introduced in the
NFSD filecache.
A few percpu operations are moved outside the lock since they
temporarily disable local IRQs which is expensive and does not
need to be done while the lock is held.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
To reduce contention on the bucket locks, we must avoid calling
kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_locked() into a helper
that removes an entry from the bucket (and must therefore run under
the lock) and a second helper that frees the entry (which does not
need to hold the lock).
For readability, rename the helpers nfsd_cacherep_<verb>.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch grants write delegations for OPEN with NFS4_SHARE_ACCESS_WRITE
if there is no conflict with other OPENs.
Write delegation conflicts with another OPEN, REMOVE, RENAME and SETATTR
are handled the same as read delegation using notify_change,
try_break_deleg.
The NFSv4.0 protocol does not enable a server to determine that a
conflicting GETATTR originated from the client holding the
delegation versus coming from some other client. With NFSv4.1 and
later, the SEQUENCE operation that begins each COMPOUND contains a
client ID, so delegation recall can be safely squelched in this case.
With NFSv4.0, however, the server must recall or send a CB_GETATTR
(per RFC 7530 Section 16.7.5) even when the GETATTR originates from
the client holding that delegation.
An NFSv4.0 client can trigger a pathological situation if it always
sends a DELEGRETURN preceded by a conflicting GETATTR in the same
COMPOUND. COMPOUND execution will always stop at the GETATTR and the
DELEGRETURN will never get executed. The server eventually revokes
the delegation, which can result in loss of open or lock state.
Tracepoint added to track whether read or write delegation is granted.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace the -1 (no limit) with a zero (no reserved space).
This prevents certain non-determinant client behavior, such as
silly-renaming a file when the only open reference is a write
delegation. Such a rename can leave unexpected .nfs files in a
directory that is otherwise supposed to be empty.
Note that other server implementations that support write delegation
also set this field to zero.
Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If the GETATTR request on a file that has write delegation in effect and
the request attributes include the change info and size attribute then
the write delegation is recalled. If the delegation is returned within
30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY
error is returned for the GETATTR.
Add counter for write delegation recall due to conflict GETATTR. This is
used to evaluate the need to implement CB_GETATTR to adoid recalling the
delegation with conflit GETATTR.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Remove the check for F_WRLCK in generic_add_lease to allow file_lock
to be used for write delegation.
First consumer is NFSD.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support tracking
KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the GENERIC_IOREMAP
ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency improvements
("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation, from
Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code ("Two
minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table range
API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM subsystem
documentation ("Improve mm documentation").
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
=Afws
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
Hi Linus,
Please, pull the following flexible-array transformations. These patches
have been baking in linux-next for a while.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmTtBEQACgkQRwW0y0cG
2zHgeBAAm1D5H+4THFOuYVsl93igowMbcPwraoo2xxnNtLqQR4y6p9SiWcjh2vk3
bmabPtID3jPZytjpV6QUYe6JlcOoWc2LsNTWqwnt2X2fhcovJfGMc/K3+7bRQzE8
uBqooxkiaZvPiUATeo0zdBSN9aYWXvEpYrBR5J9EtJWyS93LiYb/nn2DY+0d1yrY
89koxTV9/eOKJ7xi6UB0zWbyayor+JawX5RI2ysMFHO6UB2aVrNjqUCJP51dyywa
2QKtwkY/8vQIo8Y8c7ReApHR0t+s0jmxRi9ip+4rh/0VNl8ON5SoL+EcRB5FLCTn
E/VqQcebjIwo0AfeEzUQIc68E1hZJLYhGGJupF1Wsb9IvmbdTTMhhfLJmWStz5+E
GnNpXx9gTjCoM5+FqJlMkHgxT9JpHYSV5LtnZKmM7wiexHND+5T/ukAxRUhFjLY3
saMbXnJ85kHsNYLAxIDUfdRcXEDNBJl7XiVI2YRCI9fttD+GVjdFtATdNZFsFAge
kHZCAFloFHwe0lTcksN5I/NVdDc6Jv7ABjrTKNezfrjfRn24zvR61V54/ebPp5Fs
2JA5kVJpgGnUdIM2uot2lYLCmLDjhGNfXE95AjK0SdJXrZtzYqzTcKJg6w1gz3ZJ
VHWK4snsu48KxTx66f+v38ovKXLtlA8Rk8kYOHWXYgB0ZXOOAy4=
=L9kF
-----END PGP SIGNATURE-----
Merge tag 'flex-array-transformations-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array updates from Gustavo A. R. Silva.
* tag 'flex-array-transformations-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
fs: omfs: Use flexible-array member in struct omfs_extent
sparc: openpromio: Address -Warray-bounds warning
reiserfs: Replace one-element array with flexible-array member
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZO2teQAKCRCRxhvAZXjc
omN9AP9F3rtueiMv0kfdwRhXl4GITY+o5OpiEpLjHdPC4nEalwEAvt8h4nvNmTg6
B54+2wNX2vc3t5UVPuFlqSHtUxoKTgA=
=DMxT
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.super.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock fixes from Christian Brauner:
"Two follow-up fixes for the super work this cycle:
- Move a misplaced lockep assertion before we potentially free the
object containing the lock.
- Ensure that filesystems which match superblocks in sget{_fc}()
based on sb->s_fs_info are guaranteed to see a valid sb->s_fs_info
as long as a superblock still appears on the filesystem type's
superblock list.
What we want as a proper solution for next cycle is to split
sb->free_sb() out of sb->kill_sb() so that we can simply call
kill_super_notify() after sb->kill_sb() but before sb->free_sb().
Currently, this is lumped together in sb->kill_sb()"
* tag 'v6.6-vfs.super.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
super: ensure valid info
super: move lockdep assert
If some error happen on smb2_sess_setup(), Need to call
smb2_set_err_rsp() to set error response.
This patch add missing calling smb2_set_err_rsp() on error.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If authblob->SessionKey.Length is bigger than session key
size(CIFS_KEY_SIZE), slub overflow can happen in key exchange codes.
cifs_arc4_crypt copy to session key array from SessionKey from client.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21940
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If ->DataOffset of create context is 0, DataBuffer size is not correctly
validated. This patch change wrong validation code and consider tag
length in request.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21824
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix one kernel-doc comment to silence the warning:
fs/smb/server/smb2pdu.c:4160: warning: Excess function parameter 'infoclass_size' description in 'buffer_check_err'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Create 3 kinds of files to reproduce this problem.
dd if=/dev/urandom of=127k.bin bs=1024 count=127
dd if=/dev/urandom of=128k.bin bs=1024 count=128
dd if=/dev/urandom of=129k.bin bs=1024 count=129
When copying files from ksmbd share to windows or cifs.ko, The following
error message happen from windows client.
"The file '129k.bin' is too large for the destination filesystem."
We can see the error logs from ksmbd debug prints
[48394.611537] ksmbd: RDMA r/w request 0x0: token 0x669d, length 0x20000
[48394.612054] ksmbd: smb_direct: RDMA write, len 0x20000, needed credits 0x1
[48394.612572] ksmbd: filename 129k.bin, offset 131072, len 131072
[48394.614189] ksmbd: nbytes 1024, offset 132096 mincount 0
[48394.614585] ksmbd: Failed to process 8 [-22]
And we can reproduce it with cifs.ko,
e.g. dd if=129k.bin of=/dev/null bs=128KB count=2
This problem is that ksmbd rdma return error if remaining bytes is less
than Length of Buffer Descriptor V1 Structure.
smb_direct_rdma_xmit()
...
if (desc_buf_len == 0 || total_length > buf_len ||
total_length > t->max_rdma_rw_size)
return -EINVAL;
This patch reduce descriptor size with remaining bytes and remove the
check for total_length and buf_len.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
`force create mode' and `force directory mode' should be bitwise ORed
with the perms after `create mask' and `directory mask' have been
applied, respectively.
Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If smb2_lock or smb2_open request is compound, ksmbd could send wrong
interim response to client. ksmbd allocate new interim buffer instead of
using resonse buffer to support compound request.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
MacOS sends a compound request including read to the server
(e.g. open-read-close). So far, ksmbd has not handled read as
a compound request. For compatibility between ksmbd and an OS that
supports SMB, This patch provides compound support for read requests.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Use kmemdup_nul() helper instead of open-coding to
simplify the code.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The pointer ip is being initialized with a value that is never read, it
is being re-assigned later on. The assignment is redundant and can be
removed. Cleans up clang scan warning:
fs/jfs/namei.c:886:16: warning: Value stored to 'ip' during its
initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
The code path
fuse_update_attributes
fuse_update_get_attr
fuse_do_statx
has the risk to use a NULL pointer for struct kstat *stat, although current
callers of fuse_update_attributes() only set request_mask to values that
will trigger the call of fuse_do_getattr(), which already handles the NULL
pointer. Future updates might miss that fuse_do_statx() does not handle it
it is safer to add a condition already right now.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Fixes: d3045530bd ("fuse: implement statx")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
For keyed filesystems that recycle superblocks based on s_fs_info or
information contained therein s_fs_info must be kept as long as the
superblock is on the filesystem type super list. This isn't guaranteed
as s_fs_info will be freed latest in sb->kill_sb().
The fix is simply to perform notification and list removal in
kill_anon_super(). Any filesystem needs to free s_fs_info after they
call the kill_*() helpers. If they don't they risk use-after-free right
now so fixing it here is guaranteed that s_fs_info remain valid.
For block backed filesystems notifying in pass sb->kill_sb() in
deactivate_locked_super() remains unproblematic and is required because
multiple other block devices can be shut down after kill_block_super()
has been called from a filesystem's sb->kill_sb() handler. For example,
ext4 and xfs close additional devices. Block based filesystems don't
depend on s_fs_info (btrfs does use s_fs_info but also uses
kill_anon_super() and not kill_block_super().).
Sorry for that braino. Goal should be to unify this behavior during this
cycle obviously. But let's please do a simple bugfix now.
Fixes: 2c18a63b76 ("super: wait until we passed kill super")
Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Message-Id: <20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fix braino and move the lockdep assertion after put_super() otherwise we
risk a use-after-free.
Fixes: 2c18a63b76 ("super: wait until we passed kill super")
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230828-vfs-super-fixes-v1-1-b37a4a04a88f@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmTskOwACgkQxWXV+ddt
WDsNJw/8CCi41Z7e3LdJsQd2iy3/+oJZUvIGuT5YvshYxTLCbV7AL+diBPnSQs4Q
/KFMGL7RZBgJzwVoSQtXnESXXgX8VOVfN1zY//k5g6z7BscCEQd73H/M0B8ciZy/
aBygm9tJ7EtWbGZWNR8yad8YtOgl6xoClrPnJK/DCLwMGPy2o+fnKP3Y9FOKY5KM
1Sl0Y4FlJ9dTJpxIwYbx4xmuyHrh2OivjU/KnS9SzQlHu0nl6zsIAE45eKem2/EG
1figY5aFBYPpPYfopbLDalEBR3bQGiViZVJuNEop3AimdcMOXw9jBF3EZYUb5Tgn
MleMDgmmjLGOE/txGhvTxKj9kci2aGX+fJn3jXbcIMksAA0OQFLPqzGvEQcrs6Ok
HA0RsmAkS5fWNDCuuo4ZPXEyUPvluTQizkwyoulOfnK+UPJCWaRqbEBMTsvm6M6X
wFT2czwLpaEU/W6loIZkISUhfbRqVoA3DfHy398QXNzRhSrg8fQJjma1f7mrHvTi
CzU+OD5YSC2nXktVOnklyTr0XT+7HF69cumlDbr8TS8u1qu8n1keU/7M3MBB4xZk
BZFJDz8pnsAqpwVA4T434E/w45MDnYlwBw5r+U8Xjyso8xlau+sYXKcim85vT2Q0
yx/L91P6tdekR1y97p4aDdxw/PgTzdkNGMnsTBMVzgtCj+5pMmE=
=N7Yn
-----END PGP SIGNATURE-----
Merge tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"No new features, the bulk of the changes are fixes, refactoring and
cleanups. The notable fix is the scrub performance restoration after
rewrite in 6.4, though still only partial.
Fixes:
- scrub performance drop due to rewrite in 6.4 partially restored:
- do IO grouping by blg_plug/blk_unplug again
- avoid unnecessary tree searches when processing stripes, in
extent and checksum trees
- the drop is noticeable on fast PCIe devices, -66% and restored
to -33% of the original
- backports to 6.4 planned
- handle more corner cases of transaction commit during orphan
cleanup or delayed ref processing
- use correct fsid/metadata_uuid when validating super block
- copy directory permissions and time when creating a stub subvolume
Core:
- debugging feature integrity checker deprecated, to be removed in
6.7
- in zoned mode, zones are activated just before the write, making
error handling easier, now the overcommit mechanism can be enabled
again which improves performance by avoiding more frequent flushing
- v0 extent handling completely removed, deprecated long time ago
- error handling improvements
- tests:
- extent buffer bitmap tests
- pinned extent splitting tests
- cleanups and refactoring:
- compression writeback
- extent buffer bitmap
- space flushing, ENOSPC handling"
* tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
btrfs: zoned: skip splitting and logical rewriting on pre-alloc write
btrfs: tests: test invalid splitting when skipping pinned drop extent_map
btrfs: tests: add a test for btrfs_add_extent_mapping
btrfs: tests: add extent_map tests for dropping with odd layouts
btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()
btrfs: scrub: don't go ordered workqueue for dev-replace
btrfs: scrub: fix grouping of read IO
btrfs: scrub: avoid unnecessary csum tree search preparing stripes
btrfs: scrub: avoid unnecessary extent tree search preparing stripes
btrfs: copy dir permission and time when creating a stub subvolume
btrfs: remove pointless empty list check when reading delayed dir indexes
btrfs: drop redundant check to use fs_devices::metadata_uuid
btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super
btrfs: use the correct superblock to compare fsid in btrfs_validate_super
btrfs: simplify memcpy either of metadata_uuid or fsid
btrfs: add a helper to read the superblock metadata_uuid
btrfs: remove v0 extent handling
btrfs: output extra debug info if we failed to find an inline backref
btrfs: move the !zoned assert into run_delalloc_cow
btrfs: consolidate the error handling in run_delalloc_nocow
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmTsk7MACgkQxWXV+ddt
WDsKjRAAoKY1qtApAMWl7vX+fw4pyoexrZISvzsJpCmPDV0SlKZ8CqOfYGhyT845
/pYUB8oxCm8UgpbswCTDVzppFANVqJBRIQEQc9d6yej4bA62Z/4Z561xROE1BEsO
UR2UIKw4gtpAmSlCoLYljt4Eg0izRt0MNmYvIk9ccQTIzVO8u2E8keNW76OSbAW1
J9AH0KgrxPgK9GPN/MlWg8THyuWgP3p/Qf4QS6JcDRCNFr4gmNwxP8Cj1kifYVT/
iMxghN+Z2iFPN8gHIeebjYpyoJJ4ZHRVL9mUZ0SLwcbt6ZfV5DnYn2yepkxxHfyL
CaUrRNJHE3Cb4vyJgtvVoAHE78Bqyk4WHKirNnrHly5j8K9ba3PIEpL+XjDVUckA
597xlrBsaQhlvMHXn+sMOMAMs7EVkUfR/gzSoLirWWOlLJ9skl3H8vK/BunK1bo+
+YYwMJwPtv3HmCdMbb00Rm9D2vVSVYM4iydEjmW6ILWBgecQII5O14CmkSH9zoG7
sip4O5oDkxmnoq9TuKPjS6AEPi+6UsbCsv8Z8boXdj97w4+bJIZVGbDHXvdAIEkj
2NmrYgqCqcbB46cSIc2VF1Fx0WO8Jh+GTFYGAPGFrabLFDVWeuwtdJahbYnSrmuL
whMVNSH4sooZflCWjwANhV8H5wlIatnn+xh+lX3TWpKiBeqIGpw=
=wPHe
-----END PGP SIGNATURE-----
Merge tag 'affs-for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull affs updates from David Sterba:
"Two minor updates for AFFS:
- reimplement writepage() address space callback on top of
migrate_folio()
- fix a build warning, local parameters 'toupper' collide with the
standard ctype.h name"
* tag 'affs-for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: rename local toupper() to fn() to avoid confusion
affs: remove writepage implementation
Several cleanups for fs/verity/, including two commits that make the
builtin signature support more cleanly separated from the base feature.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZOwXSRQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+kzAP9UsTtZjjQLvEDF6OYlysFLQppDrmk0
zh9iH/Z7qT8BQwEA0azRHW5/FDLvkHZWsU0QCLjDpUPrNHc712aeM1pLpgc=
=zOdL
-----END PGP SIGNATURE-----
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull fsverity updates from Eric Biggers:
"Several cleanups for fs/verity/, including two commits that make the
builtin signature support more cleanly separated from the base
feature"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: skip PKCS#7 parser when keyring is empty
fsverity: move sysctl registration out of signature.c
fsverity: simplify handling of errors during initcall
fsverity: explicitly check that there is no algorithm 0
* Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
* Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in a
(potentially) multi-megabyte writeback IO.
* Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZM0Z1AAKCRBKO3ySh0YR
pp7BAQCzkKejCM0185tNIH/faHjzidSisNQkJ5HoB4Opq9U66AEA6IPuAdlPlM/J
FPW1oPq33Yn7AV4wXjUNFfDLzVb/Fgg=
=dFBU
-----END PGP SIGNATURE-----
Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"We've got some big changes for this release -- I'm very happy to be
landing willy's work to enable large folios for the page cache for
general read and write IOs when the fs can make contiguous space
allocations, and Ritesh's work to track sub-folio dirty state to
eliminate the write amplification problems inherent in using large
folios.
As a bonus, io_uring can now process write completions in the caller's
context instead of bouncing through a workqueue, which should reduce
io latency dramatically. IOWs, XFS should see a nice performance bump
for both IO paths.
Summary:
- Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
- Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in
a (potentially) multi-megabyte writeback IO.
- Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests"
* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
iomap: support IOCB_DIO_CALLER_COMP
io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
fs: add IOCB flags related to passing back dio completions
iomap: add IOMAP_DIO_INLINE_COMP
iomap: only set iocb->private for polled bio
iomap: treat a write through cache the same as FUA
iomap: use an unsigned type for IOMAP_DIO_* defines
iomap: cleanup up iomap_dio_bio_end_io()
iomap: Add per-block dirty state tracking to improve performance
iomap: Allocate ifs in ->write_begin() early
iomap: Refactor iomap_write_delalloc_punch() function out
iomap: Use iomap_punch_t typedef
iomap: Fix possible overflow condition in iomap_write_delalloc_scan
iomap: Add some uptodate state handling helpers for ifs state bitmap
iomap: Drop ifs argument from iomap_set_range_uptodate()
iomap: Rename iomap_page to iomap_folio_state and others
iomap: Copy larger chunks from userspace
iomap: Create large folios in the buffered write path
filemap: Allow __filemap_get_folio to allocate large folios
filemap: Add fgf_t typedef
...
- Support xattr bloom filter to optimize negative xattr lookups;
- Support DEFLATE compression algorithm as an alternative;
- Fix a regression that ztailpacking pclusters don't release properly;
- Avoid warning dedupe and fragments features anymore;
- Some folio conversions and cleanups.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZOvhIBEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBFgqAP4/gcxH5vhgxMunxmgBkSxMFBQf/W7CfOiN
QkGHjSKl8gEA78EBwAJ3vDJ1JgQRTb9/9UBrtW7n2hzj/eVS/LIyYQI=
=o3Bx
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, a xattr bloom filter feature is introduced to speed up
negative xattr lookups, which was originally suggested by Alexander
for Composefs use cases.
Additionally, the DEFLATE algorithm is now supported, which can be
used together with hardware accelerators for our cloud workloads. Each
supported compression algorithm can be selected on a per-file basis
for specific access patterns too.
There are also some random fixes and cleanups as usual:
- Support xattr bloom filter to optimize negative xattr lookups
- Support DEFLATE compression algorithm as an alternative
- Fix a regression that ztailpacking pclusters don't release properly
- Avoid warning dedupe and fragments features anymore
- Some folio conversions and cleanups"
* tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: release ztailpacking pclusters properly
erofs: don't warn dedupe and fragments features anymore
erofs: adapt folios for z_erofs_read_folio()
erofs: adapt folios for z_erofs_readahead()
erofs: get rid of fe->backmost for cache decompression
erofs: drop z_erofs_page_mark_eio()
erofs: tidy up z_erofs_do_read_page()
erofs: move preparation logic into z_erofs_pcluster_begin()
erofs: avoid obsolete {collector,collection} terms
erofs: simplify z_erofs_read_fragment()
erofs: remove redundant erofs_fs_type declaration in super.c
erofs: add necessary kmem_cache_create flags for erofs inode cache
erofs: clean up redundant comment and adjust code alignment
erofs: refine warning messages for zdata I/Os
erofs: boost negative xattr lookup with bloom filter
erofs: update on-disk format for xattr name filter
erofs: DEFLATE compression support