xfrm_state_clone calls kfree instead of xfrm_state_put to free
a failed state. Depending on the state of the failed state, it
can cause leaks to things like module references.
All states should be freed by xfrm_state_put past the point of
xfrm_init_state.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ipcomp_tunnel_attach fails we will call ipcomp_destroy twice.
This may lead to double-frees on certain structures.
As there is no reason to explicitly call ipcomp_destroy, this patch
removes it from ipcomp*.c and lets the standard xfrm_state destruction
take place.
This is based on the discovery and patch by Alexey Dobriyan.
Tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm: sysfs revert add empty release function to avoid debug warning
dm mpath: fix stall when requeueing io
dm raid1: fix null pointer dereference in suspend
dm raid1: fail writes if errors are not handled and log fails
dm log: userspace fix overhead_size calcuations
dm snapshot: persistent annotate work_queue as on stack
dm stripe: avoid divide by zero with invalid stripe count
Revert commit d2bb7df8ca at Greg's request.
Author: Milan Broz <mbroz@redhat.com>
Date: Thu Dec 10 23:51:53 2009 +0000
dm: sysfs add empty release function to avoid debug warning
This patch just removes an unnecessary warning:
kobject: 'dm': does not have a release() function,
it is broken and must be fixed.
The kobject is embedded in mapped device struct, so
code does not need to release memory explicitly here.
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch fixes the problem that system may stall if target's ->map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
bounces between all-paths-down and paths-up on I/O load.
When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn(). Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.
For example, commands below can stall my 1 CPU box within 1 minute or so:
# dmsetup table mp
mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
# while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
# while true; do \
> dmsetup message mp 0 "fail_path 8:144" \
> dmsetup suspend --noflush mp \
> dmsetup resume mp \
> dmsetup message mp 0 "reinstate_path 8:144" \
> done
To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When suspending a failed mirror, bios are completed by mirror_end_io() and
__rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
required by design. Fix this by not changing the state of the recovery failed
region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().
Issue
On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
mirror by dmsetup command.
BUG: unable to handle kernel NULL pointer dereference at 00000020
IP: [<f94f38e2>] dm_rh_dec+0x35/0xa1 [dm_region_hash]
...
EIP: 0060:[<f94f38e2>] EFLAGS: 00010046 CPU: 0
EIP is at dm_rh_dec+0x35/0xa1 [dm_region_hash]
EAX: 00000286 EBX: 00000000 ECX: 00000286 EDX: 00000000
ESI: eff79eac EDI: eff79e80 EBP: f6915cd4 ESP: f6915cc4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process dmsetup (pid: 2849, ti=f6914000 task=eff03e80 task.ti=f6914000)
...
Call Trace:
[<f9530af6>] ? mirror_end_io+0x53/0x1b1 [dm_mirror]
[<f9413104>] ? clone_endio+0x4d/0xa2 [dm_mod]
[<f9530aa3>] ? mirror_end_io+0x0/0x1b1 [dm_mirror]
[<f94130b7>] ? clone_endio+0x0/0xa2 [dm_mod]
[<c02d6bcb>] ? bio_endio+0x28/0x2b
[<f952f303>] ? hold_bio+0x2d/0x62 [dm_mirror]
[<f952f942>] ? mirror_presuspend+0xeb/0xf7 [dm_mirror]
[<c02aa3e2>] ? vmap_page_range+0xb/0xd
[<f9414c8d>] ? suspend_targets+0x2d/0x3b [dm_mod]
[<f9414ca9>] ? dm_table_presuspend_targets+0xe/0x10 [dm_mod]
[<f941456f>] ? dm_suspend+0x4d/0x150 [dm_mod]
[<f941767d>] ? dev_suspend+0x55/0x18a [dm_mod]
[<c0343762>] ? _copy_from_user+0x42/0x56
[<f9417fb0>] ? dm_ctl_ioctl+0x22c/0x281 [dm_mod]
[<f9417628>] ? dev_suspend+0x0/0x18a [dm_mod]
[<f9417d84>] ? dm_ctl_ioctl+0x0/0x281 [dm_mod]
[<c02c3c4b>] ? vfs_ioctl+0x22/0x85
[<c02c422c>] ? do_vfs_ioctl+0x4cb/0x516
[<c02c42b7>] ? sys_ioctl+0x40/0x5a
[<c0202858>] ? sysenter_do_call+0x12/0x28
Analysis
When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.
When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.
Solution
Remove the state change of the recovery failed region from DM_RH_RECOVERING
to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
because:
- If the region data has been released by dm_rh_update_states(),
a new region data is created with the state of DM_RH_NOSYNC, and
bios are processed according to the DM_RH_NOSYNC state.
- If the region data has not been released by dm_rh_update_states(),
a state of the region is DM_RH_RECOVERING and bios are put in the
delayed_bio list.
The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
was added in the following commit:
dm raid1: handle resync failures
author Jonathan Brassow <jbrassow@redhat.com>
Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
http://git.kernel.org/linus/f44db678edcc6f4c2779ac43f63f0b9dfa28b724
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device. This patch
completes them with EIO instead.
This code path is taken:
do_writes:
bio_list_merge(&ms->failures, &sync);
do_failures:
if (!get_valid_mirror(ms)) (false)
else if (errors_handled(ms)) (false)
else bio_endio(bio, 0);
The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch fixes two bugs that revolve around the miscalculation and
misuse of the variable 'overhead_size'. 'overhead_size' is the size of
the various header structures used during communication.
The first bug is the use of 'sizeof' with the pointer of a structure
instead of the structure itself - resulting in the wrong size being
computed. This is then used in a check to see if the payload
(data_size) would be to large for the preallocated structure. Since the
bug produces a smaller value for the overhead, it was possible for the
structure to be breached. (Although the current users of the code do
not currently send enough data to trigger this bug.)
The second bug is that the 'overhead_size' value is used to compute how
much of the preallocated space should be cleared before populating it
with fresh data. This should have simply been 'sizeof(struct cn_msg)'
not overhead_size. The fact that 'overhead_size' was computed
incorrectly made this problem "less bad" - leaving only a pointer's
worth of space at the end uncleared. Thus, this bug was never producing
a bad result, but still needs to be fixed - especially now that the
value is computed correctly.
Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
chunk_io() declares its 'struct mdata_req' on the stack and then
initializes its 'struct work_struct' member. Annotate the
initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If a table containing zero as stripe count is passed into stripe_ctr
the code attempts to divide by zero.
This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
is zero.
We now get the following error messages:
device-mapper: table: 253:0: striped: Invalid stripe count
device-mapper: ioctl: error adding target to table
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The 64-bit version of ELF_PLAT_INIT() clears TIF_IA32, but at this point
it has already been cleared by SET_PERSONALITY == set_personality_64bit.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to have the WUS register set to all 1's in order for the hardware
to be capable of ever waking up. Set it here in the ixgbe_probe().
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82598 has an erratum that receipt of pause frames at 1G
could lead to a Tx Hang. To avoid this this patch disables
Rx FC while at 1G speed for all 82598 parts.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_ethtool() is currently using 604 bytes of stack, even with gcc-4.4.2
objdump -d vmlinux | scripts/checkstack.pl
...
0xc04bbc33 dev_ethtool [vmlinux]: 604
...
Adding noinline attributes to selected functions can reduce stack usage.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These changes add MDIO and phy lib support to the driver as the
IP core now supports the MDIO bus.
The MDIO bus and phy are added as a child to the emaclite in the device
tree as illustrated below.
mdio {
#address-cells = <1>;
#size-cells = <0>;
phy0: phy@7 {
compatible = "marvell,88e1111";
reg = <7>;
} ;
}
Signed-off-by: Sadanand Mutyala <Sadanand.Mutyala@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Addresses should be all digits.
Stops x25_bind using addresses containing characters.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_socket failures should return -ENOBUFS
a bad protocol should return -EINVAL
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code has been tested on IBM pSeries server.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Observed similar behavior on SPD as previouly seen on SAD flushing..
This fixes it.
cheers,
jamal
commit 428b20432dc31bc2e01a94cd451cf5a2c00d2bf4
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 05:49:38 2010 -0500
xfrm: Flushing empty SPD generates false events
To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SAD.
-On window1 "ip xfrm mon"
-on window2 issue "ip xfrm state flush"
You get prompt back in window1
and you see the flush event on window2.
With this fix, you still get prompt on window1 but no
event on window2.
I was tempted to return -ESRCH on window1 (which would
show "RTNETLINK answers: No such process") but didnt want
to change current behavior.
cheers,
jamal
commit 5f3dd4a772326166e1bcf54acc2391df00dc7ab5
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 04:41:36 2010 -0500
xfrm: Flushing empty SAD generates false events
To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
When no more memory can be allocated, fq_find() will return NULL and
increase the value of IPSTATS_MIB_REASMFAILS. In this case,
ipv6_frag_rcv() also increase the value of IPSTATS_MIB_REASMFAILS.
So, the patch deletes redundant counter of IPSTATS_MIB_REASMFAILS in fq_find().
and deletes the unused parameter of idev.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RCU usage in the original code was broken because
there are cases where we possibly sleep with rcu_read_lock
held. As a fix, change the macvtap_file_get_queue to
get a reference on the socket and the netdev instead of
taking the full rcu_read_lock.
Also, change macvtap_file_get_queue failure case to
not require a subsequent macvtap_file_put_queue, as
pointed out by Ed Swierk.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ed Swierk <eswierk@aristanetworks.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver is expected to report that the link is up
when the phy Rx signal is established and the mac
has not detected a link fault.
The code is however broken, the driver does not check the link fault
status when the phy link status changes.
The link fault status being checked within a short period of time,
it leads to link up/link down events.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac is expected to auto-inflate the Maximum Frame size for VLAN
tagged frames. It however does not work with jumbo frames.
Work around the bug adding 4 to the Maximum Frame for MTUs
greater than 1536.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent n-tuple patches added some comments to the headers
of the Flow Director functions that aren't accurate. This
cleans them up, and is a purely cosmetic patch.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
set_flags should check if the underlying device supports
n-tuple filter programming before setting the device flags
on the netdevice.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can allow a filter to be added successfully to the underlying
hardware, but still return an error if the cached list memory
allocation fails. This patch fixes that condition.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: ohci: retransmit isochronous transmit packets on cycle loss
firewire: net: fix panic in fwnet_write_complete
The cached read and write paths initialize fattr->time_start in their
setup procedures. The value of fattr->time_start is propagated to
read_cache_jiffies by nfs_update_inode(). Subsequent calls to
nfs_attribute_timeout() will then use a good time stamp when
computing the attribute cache timeout, and squelch unneeded GETATTR
calls.
Since the direct I/O paths erroneously leave the inode's
fattr->time_start field set to zero, read_cache_jiffies for that inode
is set to zero after any direct read or write operation. This
triggers an otw GETATTR or ACCESS call to update the file's attribute
and access caches properly, even when the NFS READ or WRITE replies
have usable post-op attributes.
Make sure the direct read and write setup code performs the same fattr
initialization as the cached I/O paths to prevent unnecessary GETATTR
calls.
This was likely introduced by commit 0e574af1 in 2.6.15, which appears
to add new nfs_fattr_init() call sites in the cached read and write
paths, but not in the equivalent places in fs/nfs/direct.c. A
subsequent commit in the same series, 33801147, introduces the
fattr->time_start field.
Interestingly, the direct write reschedule path already has a call to
nfs_fattr_init() in the right place.
Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: make sure retry count increases.
drm/radeon/kms/atom: use get_unaligned_le32() for ctx->ps
drm/ttm: Fix a bug occuring when validating a buffer object in a range.
drm: Fix a bug in the range manager.
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf top: Fix help text alignment
perf: Fix hypervisor sample reporting
perf: Make bp_len type to u64 generic across the arch
with 32 bit userland and 64 bit kernels, it is unlikely but possible
that insertion of new rules fails even tough there are only about 2000
iptables rules.
This happens because the compat delta is using a short int.
Easily reproducible via "iptables -m limit" ; after about 2050
rules inserting new ones fails with -ELOOP.
Note that compat_delta included 2 bytes of padding on x86_64, so
structure size remains the same.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This will cause trouble once CONFIG_COMPAT support is added to ebtables.
xt_compat_*_offset() calculate the kernel/userland structure size delta
using:
XT_ALIGN(size) - COMPAT_XT_ALIGN(size)
If the match/target sizes are aligned at registration time,
delta is always zero.
Should have zero effect for existing systems: xtables uses
XT_ALIGN() whenever it deals with match/target sizes.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
next_offset must be > 0, otherwise this loops forever.
The offset also contains the size of the ebt_entry structure
itself, so anything smaller is invalid.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.
Example:
iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1
Signed-off-by: Patrick McHardy <kaber@trash.net>
The error handlers might need the template to get the conntrack zone
introduced in the next patches to perform a conntrack lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The MSI blacklist entry for ASUS mobo added in the commit
8ce28d6abf was based on the alsa-info
output wrongly posted. Fix the id to the right one now.
Reported-by: Sid Boyce <sboyce@blueyonder.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>