From: Martin Willi <martin@strongswan.org>
An IPsec node speaking IKEv2 MUST accept incoming UDP encapsulated
ESP packets, even if no NAT situation is detected. This is important
if MOBIKE is in use. Some implementation keep the encapsulation
mode if they move out of a NAT situation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Installing SAs using the XFRM_STATE_AF_UNSPEC fails on hosts with
support for one address family only. This patch accepts such SAs, even
if the processing of not supported packets will fail.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Used __xfrm_policy_unlink() to instead of the dup codes when unlink
SPD entry.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After flush the SPD entries, dump the SPD entries will cause kernel painc.
Used the following commands to reproduct:
- echo 'spdflush;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
spddump;' | setkey -c
- echo 'spdflush; spddump;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
spddump;' | setkey -c
This is because when flush the SPD entries, the SPD entry is not remove
from the list.
This patch fix the problem by remove the SPD entry from the list.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make
net.core.xfrm_aevent_etime
net.core.xfrm_acq_expires
net.core.xfrm_aevent_rseqth
net.core.xfrm_larval_drop
sysctls per-netns.
For that make net_core_path[] global, register it to prevent two
/proc/net/core antries and change initcall position -- xfrm_init() is called
from fs_initcall, so this one should be fs_initcall at least.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SA/SPD doesn't pin netns (and it shouldn't), so get rid of them by hand.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SA and SPD flush are executed with NULL SA and SPD respectively, for
these cases pass netns explicitly from userspace socket.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grab netns either from netlink socket, state or policy.
SA and SPD flush are in init_net for now, this requires little
attention, see below.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()
[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.
Take it from socket or netdevice. Stub DECnet to init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per-netns hashes are independently resizeable.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disallow spurious wakeups in __xfrm_lookup().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
State GC is per-netns, and this is part of it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().
To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of this is implicit passing which netns's hashes should be resized.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since hashtables are per-netns, they can be independently resized.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid unnecessary complications with passing netns around.
* set once, very early after allocating
* once set, never changes
For a while create every xfrm_state in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 2518c7c2b3 ("[XFRM]: Hash
policies when non-prefixed."), the last use of xfrm_gen_policy() first
argument was removed, but the argument was left behind in the
prototype.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Alexey Dobriyan <adobriyan@gmail.com>
Based upon a lockdep trace by Simon Arlott.
xfrm_policy_kill() can be called from both BH and
non-BH contexts, so we have to grab xfrm_policy_gc_lock
with BH disabling.
Signed-off-by: David S. Miller <davem@davemloft.net>
While adding support for MIGRATE/KMADDRESS in strongSwan (as specified
in draft-ebalard-mext-pfkey-enhanced-migrate-00), Andreas Steffen
noticed that XFRMA_KMADDRESS attribute passed to userland contains the
local address twice (remote provides local address instead of remote
one).
This bug in copy_to_user_kmaddress() affects only key managers that use
native XFRM interface (key managers that use PF_KEY are not affected).
For the record, the bug was in the initial changeset I posted which
added support for KMADDRESS (13c1d18931
'xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)').
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Reported-by: Andreas Steffen <andreas.steffen@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.
However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new_mapping() implementation to the netlink xfrm_mgr to notify
address/port changes detected in UDP encapsulated ESP packets.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:
As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.
I've also changed the order of addition on new states to prevent
a never-ending dump.
Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're never supposed to shrink the headroom or tailroom. In fact,
shrinking the headroom is a fatal action.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discovered by Timo Teräs, the currently xfrm_state_walk scheme
is racy because if a second dump finishes before the first, we
may free xfrm states that the first dump would walk over later.
This patch fixes this by storing the dumps in a list in order
to calculate the correct completion counter which cures this
problem.
I've expanded netlink_cb in order to accomodate the extra state
related to this. It shouldn't be a big deal since netlink_cb
is kmalloced for each dump and we're just increasing it by 4 or
8 bytes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we save states within a walk we need synchronisation
so that the list the saved state is on doesn't disappear from
under us.
As it stands this is done by keeping the state on the list which
is bad because it gets in the way of the management of the state
life-cycle.
An alternative is to make our own pseudo-RCU system where we use
counters to indicate which state can't be freed immediately as
it may be referenced by an ongoing walk when that resumes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous default behavior is definitely the least user
friendly. Hanging there forever just because the keying
daemon is wedged or the refreshing of the policy can't move
forward is anti-social to say the least.
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit commit 4c563f7669 ("[XFRM]:
Speed up xfrm_policy and xfrm_state walking") inadvertently removed
larval states and socket policies from netlink dumps. This patch
restores them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ever since commit 4c563f7669
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held. If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.
Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function xfrm_bundle_create returns an ERR
pointer, but never returns a NULL pointer. So a NULL test that comes
after an IS_ERR test should be deleted.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = xfrm_bundle_create(...)
... when != x = E
* if (x != NULL)
S1 else S2
// </smpl>
Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small fix removing an unnecessary intermediate variable.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
When decompressing extremely large packets allocating them through
kmalloc is prone to failure. Therefore it's better to use page
frags instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch merges the IPv4/IPv6 IPComp implementations since most
of the code is identical. As a result future enhancements will no
longer need to be duplicated.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for
the selector family. Userspace applications can set this flag to leave
the selector family of the xfrm_state unspecified. This can be used
to to handle inter family tunnels if the selector is not set from
userspace.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the usage of RIPEMD-160 in xfrm_algo which in turn
allows hmac(rmd160) to be used as authentication mechanism in IPsec
ESP and AH (see RFC 2857).
Signed-off-by: Adrian-Ken Rueegsegger <rueegsegger@swiss-it.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The crypto layer will determine whether that is valid
or not.
Suggested by Herbert Xu, based upon a report and patch
by Martin Willi.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] new predicate - AUDIT_FILETYPE
[patch 2/2] Use find_task_by_vpid in audit code
[patch 1/2] audit: let userspace fully control TTY input auditing
[PATCH 2/2] audit: fix sparse shadowed variable warnings
[PATCH 1/2] audit: move extern declarations to audit.h
Audit: MAINTAINERS update
Audit: increase the maximum length of the key field
Audit: standardize string audit interfaces
Audit: stop deadlock from signals under load
Audit: save audit_backlog_limit audit messages in case auditd comes back
Audit: collect sessionid in netlink messages
Audit: end printk with newline
Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.
This patch cleans up such pointless code.
Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages. This patch adds that information to netlink messages
so we can audit who sent netlink messages.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Previously digest_null had no setkey function which meant that
we used hmac(digest_null) for IPsec since IPsec always calls
setkey. Now that digest_null has a setkey we no longer need to
do that.
In fact when only confidentiality is specified for ESP we already
use digest_null directly. However, when the null algorithm is
explicitly specified by the user we still opt for hmac(digest_null).
This patch removes this discrepancy. I have not added a new compat
name for it because by chance it wasn't actualy possible for the user
to specify the name hmac(digest_null) due to a key length check in
xfrm_user (which I found out when testing that compat name :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flowlabel text format was not correct and thus ambiguous.
For example, 0x00123 or 0x01203 are formatted as 0x123.
This is not what audit tools want.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.
The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.
After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.
The following patch does exactly that.
This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SECURITY_NETWORK_XFRM is undefined the following warnings appears:
net/xfrm/xfrm_user.c: In function 'xfrm_add_pol_expire':
net/xfrm/xfrm_user.c:1576: warning: 'ctx' may be used uninitialized in this function
net/xfrm/xfrm_user.c: In function 'xfrm_get_policy':
net/xfrm/xfrm_user.c:1340: warning: 'ctx' may be used uninitialized in this function
(security_xfrm_policy_alloc is noop for the case).
It seems that they are result of the commit
03e1ad7b5d ("LSM: Make the Labeled IPsec
hooks more stack friendly")
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm_get_policy() and xfrm_add_pol_expire() put some rather large structs
on the stack to work around the LSM API. This patch attempts to fix that
problem by changing the LSM API to require only the relevant "security"
pointers instead of the entire SPD entry; we do this for all of the
security_xfrm_policy*() functions to keep things consistent.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit df9dcb45 ([IPSEC]: Fix inter address family IPsec tunnel handling)
broke openswan by removing the selector initialization for tunnel mode
in case it is uninitialized.
This patch restores the initialization, fixing openswan, but probably
breaking inter-family tunnels again (unknown since the patch author
disappeared). The correct thing for inter-family tunnels is probably
to simply initialize the selector family explicitly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>