Avoid post_send counting (atomic) in the IO path just to keep track of
how many completions we need to consume. Use a beacon post to indicate
that all prior posts completed.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This will solve a possible condition where we might miss TX completion
(flush error) during session teardown. Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().
This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need a way to guarentee that we don't stay in soft-IRQ context for
too long. We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections. No harm done, but when we try to unload
the module we will need to cleanup all these connections. So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload. BUG_ON()
will cause segfault on module_exit and we don't want that.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.
While we are at it, add a nice kernel-doc style documentation.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called
after the IB device was removed (DEVICE_REMOVAL CM event). We also
call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and
tasks cleanup from racing.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we didn't need to unbind the iser_conn and iscsi_conn since
we always relied on iscsi daemon to teardown the connection and never
let it finish before we cleanup all that is needed in iser. This is
not the case anymore (for DEVICE_REMOVAL event). So avoid any possible
chance we cause iscsi_conn dereference after iscsi_conn was freed.
We also call iser_conn_terminate (safe to call multiple times) just
for the corner case of iscsi daemon stopping an old connection before
invoking endpoint removal (might happen if it was violently killed).
Notice we are unbinding under a lock - which is required.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.
ib_completion:
Guaranteed to come when:
- Got DISCONNECTED/ADDR_CHANGE event or
- iSCSI called ep_disconnect/conn_stop
Guaranteed to finish when:
- Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
- All Flush errors are consumed
- IB related resources are destroyed
stop_completion:
Guaranteed to come when:
- iSCSI calls conn_stop
Guaranteed to finish when:
- All inflight tasks were cleaned up
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).
This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.
The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.
This patch introduces a separation in logic when handling CM events:
- DISCONNECTED_HANDLER, ADDR_CHANGED
This events indicate the start of teardown process.
Actions:
1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
2. Notify iSCSI of connection failure
3. Change state to TERMINATING
4. Poll for all flush errors to be consumed
- TIMEWAIT_EXIT, DEVICE_REMOVAL
These events indicate the final stage of termination process and
we can free RDMA related resources.
Actions:
1. Call disconnected handler (we are not guaranteed that DISCONNECTED
event was invoked in the past)
2. Cleanup RDMA related resources
3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
implicitly destroy the cm_id (Can't rely on user-space, make sure
we have forward progress)
We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).
The iser_conn_release_work will wait for teardown completions:
- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion
And then will continue to free iser connection representation (iser_conn).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Put all connection IB related resources release in this routine. One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex. Also move its position to avoid forward
declaration. While at it fix qp NULL assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Structure that describes the RDMA relates connection objects. Static
member of iser_conn.
This patch does not change any functionality
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Two reasons why we choose to do this:
1. No point today calling struct iser_conn by another name ib_conn
2. In the next patches we will restructure iser control plane representation
- struct iser_conn: connection logical representation
- struct ib_conn: connection RDMA layout representation
This patch does not change any functionality.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull slave-dmaengine fixes from Vinod Koul:
"Two small fixes for omap dmaengine driver which fixes cyclic suspend
and resume"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: omap-dma: Restore the CLINK_CTRL in resume path
dmaengine: omap-dma: Add memory barrier to dma_resume path
Pull vfs fixes from Al Viro:
"Assorted fixes + unifying __d_move() and __d_materialise_dentry() +
minimal regression fix for d_path() of victims of overwriting rename()
ported on top of that"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Don't exchange "short" filenames unconditionally.
fold swapping ->d_name.hash into switch_names()
fold unlocking the children into dentry_unlock_parents_for_move()
kill __d_materialise_dentry()
__d_materialise_dentry(): flip the order of arguments
__d_move(): fold manipulations with ->d_child/->d_subdirs
don't open-code d_rehash() in d_materialise_unique()
pull rehashing and unlocking the target dentry into __d_materialise_dentry()
ufs: deal with nfsd/iget races
fuse: honour max_read and max_write in direct_io mode
shmem: fix nlink for rename overwrite directory
Pull cgroup fixes from Tejun Heo:
"This is quite late but these need to be backported anyway.
This is the fix for a long-standing cpuset bug which existed from
2009. cpuset makes use of PF_SPREAD_{PAGE|SLAB} flags to modify the
task's memory allocation behavior according to the settings of the
cpuset it belongs to; unfortunately, when those flags have to be
changed, cpuset did so directly even whlie the target task is running,
which is obviously racy as task->flags may be modified by the task
itself at any time. This obscure bug manifested as corrupt
PF_USED_MATH flag leading to a weird crash.
The bug is fixed by moving the flag to task->atomic_flags. The first
two are prepatory ones to help defining atomic_flags accessors and the
third one is the actual fix"
* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
sched: add macros to define bitops for task atomic flags
sched: fix confusing PFA_NO_NEW_PRIVS constant
Here's our last set of fixes for 3.17. Most of these are for TI platforms,
fixing some noisy Kconfig issues, runtime clock and power issues on
several platforms and NAND timings on DRA7.
There are also a couple of bug fixes for i.MX, one for QCOM and a small
fix to avoid section mismatch noise on PXA.
Diffstat looks large, partially due to some tables being updated and
thus touching many lines. The qcom gsbi change also restructures clock
management a bit and thus touches a bunch of lines.
All in all, a bit more changes than we'd like at this point, but nothing
stands out as risky either so it seems like the right thing to send it
up now instead of holding it to the merge window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJUJxVMAAoJEIwa5zzehBx3VVoP/3WeftI/+vncYhMmPCaUxOso
B/rNY1CW2ZYr9yWEvREQtMQCkLWYPifeyHa+fXHeFfLGWlMP1wU4LP78RrvaMnSs
V0d2wYmfTkSIlVwqRMuArY9KwnOTRSiDfhQpl2BQ84u1IaZM5/IRw9oNICTao8jI
A7NsLAnss3exKCT06R3CcG7+fq3zVc19aI1QJG61BFqTIVItf71NTm/lcjsL3Tss
Tr/ITTgZM6UGkEnTUuRCl3gpMn/TVvO/qE94xU6vY0jqDQKUl1cxUCx6gRcSDRu4
PvLvPS7d4p99dHmLxVUuLBT7AGtRCxfdAoVE3D3rmGfcthDt1nFBgJfp6ekQZAM9
ZfJnrvfHRLjl/lxQvWWkpuugu0z7GCFeXRFHN6aLsD6aRD4JmYoRuSeA0aXmTKyp
oDcduXqYOImTcbUQ8G8n1YeK8BAVlL6PEZKvaIhjmxUWHVeGdpesz9s7TFBqGBBd
F1EeCPtAczBpNJP4E/dRDzWYjp+lGyQs4dQEU+YpRe9drzJpw6GsDuaF78QP8A5a
TEcc3y3o2FSNbGCw9qQ7pkgm76aS1YhLKMQb+2JXJptgwKMw3G6abMr+iomlm3Id
DY8+WIBggx/gB5k/onFseZvjNxVKqQUeh31UT5e1v/9M4bCJvEcY+KeKcgjbPpy7
GnGoXEvCnwZ7kPokqH0D
=K6xV
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Here's our last set of fixes for 3.17. Most of these are for TI
platforms, fixing some noisy Kconfig issues, runtime clock and power
issues on several platforms and NAND timings on DRA7.
There are also a couple of bug fixes for i.MX, one for QCOM and a
small fix to avoid section mismatch noise on PXA.
Diffstat looks large, partially due to some tables being updated and
thus touching many lines. The qcom gsbi change also restructures
clock management a bit and thus touches a bunch of lines.
All in all, a bit more changes than we'd like at this point, but
nothing stands out as risky either so it seems like the right thing to
send it up now instead of holding it to the merge window"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
drivers/soc: qcom: do not disable the iface clock in probe
ARM: imx: fix .is_enabled() of shared gate clock
ARM: OMAP3: Fix I/O chain clock line assertion timed out error
ARM: keystone: dts: fix bindings for pcie and usb clock nodes
bus: omap_l3_noc: Fix connID for OMAP4
ARM: DT: imx53: fix lvds channel 1 port
ARM: dts: cm-t54: fix serial console power supply.
ARM: dts: dra7-evm: Fix NAND GPMC timings
ARM: pxa: fix section mismatch warning for pxa_timer_nodt_init
ARM: OMAP: Fix Kconfig warning for omap1
Pull MIPS fixes from Ralf Baechle:
"The final round of fixes. One corner case in the math emulator and
another one in the mcount function for ftrace"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: mcount: Adjust stack pointer for static trace in MIPS32
MIPS: Fix MFC1 & MFHC1 emulation for 64-bit MIPS systems
Pull x86 fixes from Ingo Molnar:
"This has:
- EFI revert to fix a boot regression
- early_ioremap() fix for boot failure
- KASLR fix for possible boot failures
- EFI fix for corrupted string printing
- remove a misleading EFI bootup 'failed!' error message
Unfortunately it's all rather close to the merge window"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
x86/efi: Delete misleading efi_printk() error message
Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
x86/kaslr: Avoid the setup_data area when picking location
x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
Only exchange source and destination filenames
if flags contain RENAME_EXCHANGE.
In case if executable file was running and replaced by
other file /proc/PID/exe should still show correct file name,
not the old name of the file by which it was replaced.
The scenario when this bug manifests itself was like this:
* ALT Linux uses rpm and start-stop-daemon;
* during a package upgrade rpm creates a temporary file
for an executable to rename it upon successful unpacking;
* start-stop-daemon is run subsequently and it obtains
the (nonexistant) temporary filename via /proc/PID/exe
thus failing to identify the running process.
Note that "long" filenames (> DNAiME_INLINE_LEN) are still
exchanged without RENAME_EXCHANGE and this behaviour exists
long enough (should be fixed too apparently).
So this patch is just an interim workaround that restores
behavior for "short" names as it was before changes
introduced by commit da1ce0670c ("vfs: add cross-rename").
See https://lkml.org/lkml/2014/9/7/6 for details.
AV: the comments about being more careful with ->d_name.hash
than with ->d_name.name are from back in 2.3.40s; they
became obsolete by 2.3.60s, when we started to unhash the
target instead of swapping hash chain positions followed
by d_delete() as we used to do when dcache was first
introduced.
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: da1ce0670c "vfs: add cross-rename"
Signed-off-by: Mikhail Efremov <sem@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... renaming it into dentry_unlock_for_move() and making it more
symmetric with dentry_lock_for_move().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... thus making it much closer to (now unreachable, BTW) IS_ROOT(dentry)
case in __d_move(). A bit more and it'll fold in.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
list_del() + list_add() is a slightly pessimised list_move()
list_del() + INIT_LIST_HEAD() is a slightly pessimised list_del_init()
Interleaving those makes the resulting code even worse. And harder to follow...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter. So fuse_get_user_pages() must
ensure that *nbytesp won't grow.
Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.
The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.
Fixes: c9c37e2e63 ("fuse: switch to iov_iter_get_pages()")
Reported-by: Werner Baumann <werner.baumann@onlinehome.de>
Tested-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If overwriting an empty directory with rename, then need to drop the extra
nlink.
Test prog:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/stat.h>
int main(void)
{
const char *test_dir1 = "test-dir1";
const char *test_dir2 = "test-dir2";
int res;
int fd;
struct stat statbuf;
res = mkdir(test_dir1, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir1);
res = mkdir(test_dir2, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir2);
fd = open(test_dir2, O_RDONLY);
if (fd == -1)
err(1, "open(\"%s\")", test_dir2);
res = rename(test_dir1, test_dir2);
if (res == -1)
err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);
res = fstat(fd, &statbuf);
if (res == -1)
err(1, "fstat(%i)", fd);
if (statbuf.st_nlink != 0) {
fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
return 1;
}
return 0;
}
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull input fix from Dmitry Torokhov:
"A small fixup to i8042 adding Asus X450LCP to the nomux list"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - fix Asus X450LCP touchpad detection
Pull scheduler fixes from Ingo Molnar:
"A CONFIG_STACK_GROWSUP=y fix, and a hotplug llc CPU mask fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix unreleased llc_shared_mask bit during CPU hotplug
sched: Fix end_of_stack() and location of stack canary for architectures using CONFIG_STACK_GROWSUP
Merge fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: softdirty: keep bit when zapping file pte
fs/cachefiles: add missing \n to kerror conversions
genalloc: fix device node resource counter
drivers/rtc/rtc-efi.c: add missing module alias
mm, slab: initialize object alignment on cache creation
mm: softdirty: addresses before VMAs in PTE holes aren't softdirty
ocfs2/dlm: do not get resource spinlock if lockres is new
nilfs2: fix data loss with mmap()
ocfs2: free vol_label in ocfs2_delete_osb()
This fixes the same bug as b43790eedd ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614af ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.
To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.
The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0227d6abb3 ("fs/cachefiles: replace kerror by pr_err") didn't
include newline featuring in original kerror definition
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> [3.16.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Decrement the np_pool device_node refcount, which was incremented on
the preceding of_parse_phandle() call.
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without proper alias kernel module is not loaded for rtc-efi driver.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 4590685546 ("mm/sl[aou]b: Common alignment code"), the
"ralign" automatic variable in __kmem_cache_create() may be used as
uninitialized.
The proper alignment defaults to BYTES_PER_WORD and can be overridden by
SLAB_RED_ZONE or the alignment specified by the caller.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Andrei Elovikov <a.elovikov@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In PTE holes that contain VM_SOFTDIRTY VMAs, unmapped addresses before
VM_SOFTDIRTY VMAs are reported as softdirty by /proc/pid/pagemap. This
bug was introduced in commit 68b5a65248 ("mm: softdirty: respect
VM_SOFTDIRTY in PTE holes"). That commit made /proc/pid/pagemap look at
VM_SOFTDIRTY in PTE holes but neglected to observe the start of VMAs
returned by find_vma.
Tested:
Wrote a selftest that creates a PMD-sized VMA then unmaps the first
page and asserts that the page is not softdirty. I'm going to send the
pagemap selftest in a later commit.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jamie Liu <jamieliu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.
It was introduced by commit 8d400b81cc ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.
Fixes: 8d400b81cc ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Guozhonghua <guozhonghua@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
osb->vol_label is malloced in ocfs2_initialize_super but not freed if
error occurs or during umount, thus causing a memory leak.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every mcount() call in the MIPS 32-bit kernel is done as follows:
[...]
move at, ra
jal _mcount
addiu sp, sp, -8
[...]
but upon returning from the mcount() function, the stack pointer
is not adjusted properly. This is explained in details in 58b69401c7
(MIPS: Function tracer: Fix broken function tracing).
Commit ad8c396936 ("MIPS: Unbreak function tracer for 64-bit kernel.)
fixed the stack manipulation for 64-bit but it didn't fix it completely
for MIPS32.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: <stable@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit bbd426f542 "MIPS: Simplify FP context access" modified the
SIFROMREG & SIFROMHREG macros such that they return unsigned rather
than signed 32b integers. I had believed that to be fine, but
inadvertently missed the MFC1 & MFHC1 cases which write to a struct
pt_regs regs element. On MIPS32 this is fine, but on 64 bit those
saved regs' fields are 64 bit wide. Using unsigned values caused the
32 bit value from the FP register to be zero rather than sign extended
as the architecture specifies, causing incorrect emulation of the
MFC1 & MFHc1 instructions. Fix by reintroducing the casts to signed
integers, and therefore the sign extension.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
- Revert of a recent hibernation core commit that introduced
a NULL pointer dereference during resume for at least one user
(Rafael J Wysocki).
- Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
asynchronous PM callback execution for LPSS devices during system
suspend/resume (introduced in 3.16) which turns out to break
ordering expectations on some systems. From Fu Zhonghui.
- cpufreq core fix related to the handling of sysfs nodes during
system suspend/resume that has been broken for intel_pstate
since 3.15 from Lan Tianyu.
- Restore the generation of "online" uevents for ACPI container
devices that was removed in 3.14, but some user space utilities
turn out to need them (Rafael J Wysocki).
- The cpufreq core fails to release a lock in an error code path
after changes made in 3.14. Fix from Prarit Bhargava.
- ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
operation regions (which means AML using GPIOs) work correctly
in all cases from Bob Moore and Srinivas Pandruvada.
- Fix for a wrong sign of the ACPI core's create_modalias() return
value in case of an error from Mika Westerberg.
- ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUJJGgAAoJEILEb/54YlRxt3kP/19OjVjGK/lFKJk4LCmQ77k5
6DDF7/clNJmYBkKBXGdyqqRVdDUXjRuHS1Yd78zWMmwdLtdOcyI+wBjG1w0mMU7o
vAYvXkIks9fCeKBRHSlqdtQROFf3+bxothKD8JGTONA5z4Fih40fqsnuSW8G7uJs
iTEQQK7L2uPJ+w1OnltwN6eNgzN5KqfxgxI+L6DhEMRjWXRHuhfRZorVIjvz+ALV
Fjm8shhjnhQKzS2zuv5PZ5gGM7zZBH7hy7kd4aDYsbppOLAB2pMOwVs0sgC1Xcbv
teyWkyzmhix2Z1bX9wwia5FfMgbnY2leejJN7mukKzHz8CQ1vxS98Sji2uviIAej
Ctp6GKjuemGvjryjbkstD6r3KYS8CuWAL++YwlamqSa0eWBuM+aD9YqGj4i6ntbU
8BFT5KXauOIsA5U51zC8wNUDHoTgBcvoN99zNIM1jIF81M7wuQrXUzJLXBStuSlR
/bDpExwxHt7I6MeUfRTjg37ApVNRAiStw32+DfsKAj4HLsqTkGs1879Paxf30T0f
Z2SlYr5Jeusu5u9DNhk7MG21A+m46R0jjLd1OKBbf2mrtfQfdKCo6szGR7vjEMZC
aGIlwtIA4iS4MN3UAyqOW3SxIPT2SxqPXzG/z27hRN5MUsGNWiClzcUsaaHoHmpp
GlbY/BvDYfur4NBeCSli
=SzQq
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI hotplug, cpufreq, hibernation, ACPI
LPSS driver), fixes for stuff that never worked correctly (ACPI GPIO
support in some cases and a wrong sign of an error code in the ACPI
core in one place), and one blacklist item for ACPI backlight
handling.
Specifics:
- Revert of a recent hibernation core commit that introduced a NULL
pointer dereference during resume for at least one user (Rafael J
Wysocki).
- Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
asynchronous PM callback execution for LPSS devices during system
suspend/resume (introduced in 3.16) which turns out to break
ordering expectations on some systems. From Fu Zhonghui.
- cpufreq core fix related to the handling of sysfs nodes during
system suspend/resume that has been broken for intel_pstate since
3.15 from Lan Tianyu.
- Restore the generation of "online" uevents for ACPI container
devices that was removed in 3.14, but some user space utilities
turn out to need them (Rafael J Wysocki).
- The cpufreq core fails to release a lock in an error code path
after changes made in 3.14. Fix from Prarit Bhargava.
- ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
operation regions (which means AML using GPIOs) work correctly in
all cases from Bob Moore and Srinivas Pandruvada.
- Fix for a wrong sign of the ACPI core's create_modalias() return
value in case of an error from Mika Westerberg.
- ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu"
* tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
gpio / ACPI: Use pin index and bit length
ACPICA: Update to GPIO region handler interface.
ACPI / platform / LPSS: disable async suspend/resume of LPSS devices
cpufreq: release policy->rwsem on error
cpufreq: fix cpufreq suspend/resume for intel_pstate
ACPI / scan: Correct error return value of create_modalias()
ACPI / video: disable native backlight for ThinkPad X201s
ACPI / hotplug: Generate online uevents for ACPI containers
Pull i2c fixes from Wolfram Sang:
"This is probably not the kind of pull request you want to see that
late in the cycle. Yet, the ACPI refactorization was problematic
again and caused another two issues which need fixing. My holidays
with limited internet (plus travelling) and the developer's illness
didn't help either :(
The details:
- ACPI code was refactored out into a seperate file and as a
side-effect, the i2c-core module got renamed. Jean Delvare
rightfully complained about the rename being problematic for
distributions. So, Mika and I thought the least problematic way to
deal with it is to move all the code back into the main i2c core
source file. This is mainly a huge code move with some #ifdeffery
applied. No functional code changes. Our personal tests and the
testbots did not find problems. (I was thinking about reverting,
too, yet that would also have ~800 lines changed)
- The new ACPI code also had a NULL pointer exception, thanks to
Peter for finding and fixing it.
- Mikko fixed a locking problem by decoupling clock_prepare and
clock_enable.
- Addy learnt that the datasheet was wrong and reimplemented the
frequency setup according to the new algorithm.
- Fan fixed an off-by-one error when copying data
- Janusz fixed a copy'n'paste bug which gave a wrong error message
- Sergei made sure that "don't touch" bits are not accessed"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: acpi: Fix NULL Pointer dereference
i2c: move acpi code back into the core
i2c: rk3x: fix divisor calculation for SCL frequency
i2c: mxs: fix error message in pio transfer
i2c: ismt: use correct length when copy buffer
i2c: rcar: fix RCAR_IRQ_ACK_{RECV|SEND}
i2c: tegra: Move clk_prepare/clk_set_rate to probe
Transfer Documentation maintainership to Jiri Kosina.
Thanks, Jiri.
I'll still be reviewing and working on documentation.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-cpufreq:
cpufreq: release policy->rwsem on error
cpufreq: fix cpufreq suspend/resume for intel_pstate
* pm-sleep:
Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"