2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-17 17:53:56 +08:00
Commit Graph

26306 Commits

Author SHA1 Message Date
Johannes Berg
18b559d5db mac80211: split TX aggregation stop action
When TX aggregation is stopped, there are a few
different cases:
 - connection with the peer was dropped
 - session stop was requested locally
 - session stop was requested by the peer
 - connection was dropped while a session is stopping

The behaviour in these cases should be different, if
the connection is dropped then the driver should drop
all frames, otherwise the frames may continue to be
transmitted, aggregated in the case of a locally
requested session stop or unaggregated in the case of
the peer requesting session stop.

Split these different cases so that the driver can
act accordingly; however, treat local and remote stop
the same way and ask the driver to not send frames as
aggregated packets any more.

In the case of connection drop, the stop callback the
driver is otherwise supposed to call is no longer
required.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:42 +01:00
Johannes Berg
30bf5f1f43 mac80211: move ieee80211_remove_tid_tx function
To call it from ___ieee80211_stop_tx_ba_session,
move the function and dependencies up.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:42 +01:00
Johannes Berg
faec12ee2d mac80211: split out aggregation TX removal
Create the function ieee80211_remove_tid_tx to call
it from ___ieee80211_stop_tx_ba_session later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:41 +01:00
Johannes Berg
c82c4a80bb mac80211: split aggregation stop by reason
The initiator/tx doesn't really identify why an
aggregation session is stopped, give a reason
for stopping that more clearly identifies what's
going on. This will help tell the driver clearly
what is expected of it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:41 +01:00
Johannes Berg
d582cffbcd nl80211/mac80211: support full station state in AP mode
Today, stations are added already associated. That is
inefficient if, for example, the driver has no room
for stations any more because then the station will
go through the entire auth/assoc handshake, only to
be kicked out afterwards.

To address this a bit better, at least with drivers
using the new station state callback, allow hostapd
to add stations in unauthenticated mode, just after
receiving the AUTH frame, before even replying. Thus
if there's no more space at that point, it can send
a negative auth frame back. It still needs to handle
later state transition errors though, of course.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:40 +01:00
Johannes Berg
dfa674da18 cfg80211: move some AP code to right file
Some AP code ended up in mlme.c as ap.c didn't
exist when it was written, move it now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:40 +01:00
Johannes Berg
b08fbbd8ad mac80211: restrict assoc request VHT capabilities
In interoperability testing some APs showed bad behaviour
if some of the VHT capabilities of the station are better
than their own. Restrict the assoc request parameters
 - beamformee capabable,
 - RX STBC and
 - RX MCS set
to the subset that the AP can support.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:39 +01:00
Luis R. Rodriguez
0f500a5f6c cfg80211: move world roaming check for beacon hints
We should not add new beacon hints even if the wiphy
is not world roaming. Without this we were always adding
a beacon hint if not world roaming for every non world
roaming wiphy interface.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix locking]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:38 +01:00
Luis R. Rodriguez
3195e489a8 cfg80211: move reg_is_world_roaming()
This will be used later by other code. This has no
functional change.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:38 +01:00
Luis R. Rodriguez
3ebfa6e76b cfg80211: do not process beacon hints if one is already queued
Regulatory beacon hints are used to help with world roaming
and as it is right now we learn from a beacon hint processed
on one wiphy to all other wiphys. The processing of beacon
hints however is scheduled and if we have a lot of interfaces
we may hit the case that we'll queue a the same beacon hint
many times until its processed.

To avoid this do a lookup on the queued up beacon hints prior
to adding a new beacon hint. If the beacon hint is removed
from the pending reg beacon hint list then it would be processed
and we'd ensure all wiphys would have learned from it, if its
on the pending reg beacon list we'd now find it prior to it
being processed.

Tested-by: Ben Greear <greearb@candelatech.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:37 +01:00
Johannes Berg
ad2d223aa9 mac80211: assign bss_conf.bssid only once
Instead of checking every time bss_info_changed is called,
assign the pointer once depending on the interface type
and then leave it untouched until the interface type is
changed. This makes the ieee80211_bss_info_change_notify()
now a simple wrapper to call the driver only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:37 +01:00
Johannes Berg
b8dc1a35c8 mac80211: further simplify ieee80211_bss_info_change_notify
The special case in the function isn't really needed,
instead make the suspend code a bit better and also
easier to understand and move the warning into the
driver op wrapper inline.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:36 +01:00
Johannes Berg
8da349329a mac80211: reconfig bss_info_changed only if beaconing
For AP/IBSS/mesh interfaces, call the driver to reconfigure
bss_info_changed only if the interface was beaconing before
suspend, otherwise we call the driver and it might interpret
the change as going from enabled to disabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:36 +01:00
Johannes Berg
d6a8322882 mac80211: track enable_beacon explicitly
Instead of calculating in ieee80211_bss_info_change_notify()
whether beaconing should be enabled or not, set it in the
correct places in the callers. This simplifies the logic in
this function at the expense of offchannel, but is also more
robust.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:35 +01:00
Johannes Berg
8a61af65c6 mac80211: fix channel context iteration
During suspend/resume channel contexts might be
iterated even if they haven't been re-added to
the driver, keep track of this and skip them in
iteration. Also use the new status for sanity
checks.

Also clarify the fact that during HW restart all
contexts are iterated over (thanks Eliad.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:35 +01:00
Johannes Berg
529ba6e931 mac80211: clean up association better in suspend
When suspending, bss_info_changed() is called to
disable beacons, but managed mode interfaces are
simply removed (bss_info_changed() is called with
"no change" only). This can lead to problems.

To fix this and copy the BSS configuration, clear
it during suspend and restore it on resume.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:34 +01:00
Johannes Berg
61e8a48cc1 mac80211: clean up ieee80211_quiesce
It's a bit odd that there's a return value that only
depends on the iftype, move that logic out of the
function into the only caller that needs it.

Also, since the quiescing could stop timers that
trigger the sdata work, move the sdata work cancel
into the function and after the actual quiesce.

Finally, there's no need to call it on interfaces
that are down, so don't.

Change-Id: I1632d46d21ba3558ea713d035184f1939905f2f1
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:33 +01:00
Emmanuel Grumbach
d45c41722a mac82011: use frame control to differentiate probe resp/beacon
The probe response/beacon management frame RX code passes a
bool parameter to differentiate beacons and probe responses.
This is useless since we have the frame and can thus use its
frame control field. Moreover it is buggy since there is one
call to ieee80211_rx_bss_info with a beacon frame that is
indicated as a probe response, which is also fixed by using
the frame control field, so do that.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:33 +01:00
Johannes Berg
cc3983d8ab mac80211: fix ieee80211_ie_build_vht_cap indentation
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:32 +01:00
Johannes Berg
9cab315190 cfg80211: adjacent 80+80 MHz channel segments are invalid
In that case, it's really a 160 MHz channel, so disallow
this configuration.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:32 +01:00
Johannes Berg
75de9113bb mac80211: optimise AP stop RCU handling
If there are VLANs, stopping an AP is inefficient as it
calls rcu_barrier() once for each interface (the VLANs
and the AP itself). Optimise this by moving rcu_barrier()
out of the station cleanups and calling it only once for
all interfaces combined.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:31 +01:00
Johannes Berg
361c9c8b0e regulatory: use IS_ERR macro family for freq_reg_info
Instead of returning an error and filling a pointer
return the pointer and an ERR_PTR value in error cases.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:31 +01:00
Johannes Berg
c492db370c regulatory: use RCU to protect last_request
This will allow making freq_reg_info() lock-free.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:30 +01:00
Johannes Berg
458f4f9e96 regulatory: use RCU to protect global and wiphy regdomains
To simplify the locking and not require cfg80211_mutex
(which nl80211 uses to access the global regdomain) and
also to make it possible for drivers to access their
wiphy->regd safely, use RCU to protect these pointers.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:29 +01:00
Johannes Berg
379b82f4c9 regulatory: pass new regdomain to reset function
Instead of assigning after calling the function do
it inside the function. This will later avoid a
period of time where the pointer is NULL.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:29 +01:00
Johannes Berg
fe7ef5e9ba regulatory: remove handling of channel bandwidth
The channel bandwidth handling isn't really quite right,
it assumes that a 40 MHz channel is really two 20 MHz
channels, which isn't strictly true. This is the way the
regulatory database handling is defined right now though
so remove the logic to handle other channel widths.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:28 +01:00
Johannes Berg
6913b49a50 regulatory: fix reg_is_valid_request handling
There's a bug with the world regulatory domain, it
can be updated any time which is different from all
other regdomains that can only be updated once after
a request for them. Fix this by adding a check for
"processed" to the reg_is_valid_request() function
and clear that when doing a request.

While looking at this I also found another locking
bug, last_request is protected by the reg_mutex not
the cfg80211_mutex so the code in nl80211 is racy.
Remove that code as it only tries to prevent an
allocation in an error case, which isn't necessary.
Then the function can also become static and locking
in nl80211 can have a smaller scope.

Also change __set_regdom() to do the checks earlier
and not different for world/other regdomains.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:28 +01:00
Johannes Berg
540f6f2cc5 regulatory: remove locking from wiphy_apply_custom_regulatory
wiphy_apply_custom_regulatory() doesn't have to hold
the regulatory mutex as it only modifies the given
wiphy with the given regulatory domain, it doesn't
access any global regulatory data.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:27 +01:00
Johannes Berg
e8da2bb4fe regulatory: clarify locking rules and assertions
Many places that currently check that cfg80211_mutex
is held don't actually use any data protected by it.
The functions that need to hold the cfg80211_mutex
are the ones using the cfg80211_regdomain variable,
so add the lock assertion to those and clarify this
in the comments.

The reason for this is that nl80211 uses the regdom
without being able to hold reg_mutex.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:27 +01:00
Johannes Berg
5d885b999c regulatory: simplify freq_reg_info_regd
The function itself has dual-purpose: it can
retrieve from a given regdomain or from the
globally installed one. Change it to have a
single purpose only: to look up from a given
regdomain. Pass the correct regdomain in the
freq_reg_info() function instead.

This also changes the locking rules for it,
no locking is required any more.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:26 +01:00
Johannes Berg
0ba857ad67 regulatory: remove useless warning
Even if it never happens and is hidden behind the
debug config option, it's completely useless: the
calltrace will only show module loading.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:26 +01:00
Johannes Berg
d4f2c8819a regulatory: remove redundant isalpha() check
toupper() only modifies lower-case letters, so
the isalpha() check is redundant; remove it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:25 +01:00
Johannes Berg
11cff96c06 regulatory: simplify restore_regulatory_settings
Use list_splice_tail_init() and also simplify the locking.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:24 +01:00
Johannes Berg
fdc9d7b286 regulatory: remove BUG_ON
This code is a bit too BUG_ON happy, remove all
instances and while doing so make some code a bit
smarter by passing the right pointer instead of
indices into arrays.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:24 +01:00
Johannes Berg
f41737669d cfg80211: remove wiphy_idx_valid
This is pretty much useless since get_wiphy_idx()
always returns true since it's always called with
a valid wiphy pointer.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:23 +01:00
Johannes Berg
2f92212b71 regulatory: use proper enum for return values
Instead of treating special error codes specially,
like -EALREADY, introduce a real enum for all the
needed possibilities and use it.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:23 +01:00
Johannes Berg
9027b1493b regulatory: remove useless locking on exit
It would be a major problem if anything were to run
concurrently while the module is being unloaded so
remove the locking that doesn't help anything.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:22 +01:00
Johannes Berg
1a9193185f regulatory: code cleanup
Clean up various things like indentation, extra
parentheses, too many/few line breaks, etc.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:22 +01:00
Johannes Berg
75e2dba866 regulatory: simplify regulatory_hint_11d
There's no need to unlock before calling
queue_regulatory_request(), so simplify
the function.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:21 +01:00
Johannes Berg
fea9bcedce regulatory: don't test list before iterating
There's no need to test whether a list is
empty or not before iterating.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:21 +01:00
Johannes Berg
e9763c3c29 regulatory: clean up reg_copy_regd()
Use ERR_PTR/IS_ERR to return the result or errors,
also do some code cleanups.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:20 +01:00
Johannes Berg
74f53cd8d4 regulatory: clean up regdom_intersect
As the dummy_rule (also renamed from irule) is only
used for output by the reg_rules_intersect() function
there's no need to clear it at all, remove that.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:19 +01:00
Johannes Berg
82f2085630 regulatory: don't allocate too much memory
There's no need to allocate one reg rule more
than will be used, reduce the allocations. The
allocation in nl80211 already doesn't allocate
too much space.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:19 +01:00
Johannes Berg
8a57fff0c1 regulatory: don't write past array when intersecting rules
When intersecting rules, we count first to know how many
rules need to be allocated, and then do the intersection
into the allocated array. However, the code doing this
writes past the end of the array because it attempts to
do all intersections. Make it stop when the right number
of rules has been reached.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:18 +01:00
Johannes Berg
10ff57f98d mac80211: remove a bit of dead mesh code
In a file that's only built when CONFIG_MAC80211_MESH
is defined, having an #ifdef on the same is entirely
pointless, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:18 +01:00
Johannes Berg
051007d9e2 mac80211: optimise roaming time again
The last fixes re-added the RCU synchronize penalty
on roaming to fix the races. Split up sta_info_flush()
now to get rid of that again, and let managed mode
(and only it) delay the actual destruction.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:17 +01:00
Johannes Berg
09f4114e02 mac80211: warn if unexpectedly removing stations
When an interface is brought down it must have been
disconnected (or similar) in all modes other than WDS,
so warn if any stations were removed in other modes.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:17 +01:00
Johannes Berg
b998e8bb3e mac80211: remove final sta_info_flush()
When all interfaces have been removed, there can't
be any stations left over, so there's no need to
flush again. Remove this, and all code associated
with it, which also simplifies the function.

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:16 +01:00
Chun-Yeow Yeoh
f4eabc918c mac80211: use short slot time in mesh for 5GHz
Use short slot time in 5GHz for mesh. The performance is
increased from 16.4Mbps to 23.4Mbps for two directly
connected mesh STAs operating in legacy rate using iperf
measurement. Almost similar to the results claimed in IBSS
mode.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[call ieee80211_get_sdata_band() only once]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:14 +01:00
Ben Greear
bc0784d951 mac80211: Allow disabling SGI-20
This allows user-space (wpa_supplicant) to disable
short guard interval (SGI) for 20Mhz.  The SGI-40
disable option is already handled.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:01:14 +01:00
Chaitanya
09b1426e7f mac80211: fix maximum MTU
The maximum MTU shouldn't take the headers into account,
the maximum MSDU size is exactly the maximum MTU.

Signed-off-by: T Krishna Chaitanya <chaitanyatk@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:00:01 +01:00
Johannes Berg
826262c3d2 mac80211: fix dtim_period in hidden SSID AP association
When AP's SSID is hidden the BSS can appear several times in
cfg80211's BSS list: once with a zero-length SSID that comes
from the beacon, and once for each SSID from probe reponses.

Since the mac80211 stores its data in ieee80211_bss which
is embedded into cfg80211_bss, mac80211's data will be
duplicated too.

This becomes a problem when a driver needs the dtim_period
since this data exists only in the beacon's instance in
cfg80211 bss table which isn't the instance that is used
when associating.

Remove the DTIM period from the BSS table and track it
explicitly to avoid this problem.

Cc: stable@vger.kernel.org
Tested-by: Efi Tubul <efi.tubul@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:00:00 +01:00
Johannes Berg
a56f992cda mac80211: use del_timer_sync for final sta cleanup timer deletion
This is a very old bug, but there's nothing that prevents the
timer from running while the module is being removed when we
only do del_timer() instead of del_timer_sync().

The timer should normally not be running at this point, but
it's not clearly impossible (or we could just remove this.)

Cc: stable@vger.kernel.org
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 13:00:00 +01:00
Johannes Berg
97f97b1f5f mac80211: fix station destruction in AP/mesh modes
Unfortunately, commit b22cfcfcae, intended to speed up roaming
by avoiding the synchronize_rcu() broke AP/mesh modes as it moved
some code into that work item that will still call into the driver
at a time where it's no longer expected to handle this: after the
AP or mesh has been stopped.

To fix this problem remove the per-station work struct, maintain a
station cleanup list instead and flush this list when stations are
flushed. To keep this patch smaller for stable, do this when the
stations are flushed (sta_info_flush()). This unfortunately brings
back the original roaming delay; I'll fix that again in a separate
patch.

Also, Ben reported that the original commit could sometimes (with
many interfaces) cause long delays when an interface is set down,
due to blocking on flush_workqueue(). Since we now maintain the
cleanup list, this particular change of the original patch can be
reverted.

Cc: stable@vger.kernel.org [3.7]
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 12:59:59 +01:00
Thomas Pedersen
b7cfcd113a mac80211: RMC buckets are just list heads
The array of rmc_entrys is redundant since only the
list_head is used. Make this an array of list_heads
instead and save ~6k per vif at runtime :D

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 12:59:59 +01:00
Johannes Berg
4d76d21bd7 mac80211: assign VLAN channel contexts
Make AP_VLAN type interfaces track the AP master channel
context so they have one assigned for the various lookups.
Don't give them their own refcount etc. since they're just
slaves to the AP master.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 12:59:58 +01:00
Felix Fietkau
2d4072a547 mac80211: flush AP_VLAN stations when tearing down the BSS AP
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[change to flush stations with AP flush in second loop]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 12:59:58 +01:00
Stanislaw Gruszka
34bcf71502 mac80211: fix ibss scanning
Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this
can trigger Microcode errors on iwlwifi and iwlegacy drivers.

Also rename ieee80211_request_internal_scan() function since it is only
used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss().

This patch should address:
https://bugzilla.redhat.com/show_bug.cgi?id=883414
https://bugzilla.kernel.org/show_bug.cgi?id=49411

Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm>
Reported-by: Mikko Rapeli  <mikko.rapeli@iki.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03 12:59:57 +01:00
Linus Torvalds
982197277c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
 "Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...
2012-12-20 14:04:11 -08:00
Linus Torvalds
40889e8d9f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph update from Sage Weil:
 "There are a few different groups of commits here.  The largest is
  Alex's ongoing work to enable the coming RBD features (cloning,
  striping).  There is some cleanup in libceph that goes along with it.

  Cyril and David have fixed some problems with NFS reexport (leaking
  dentries and page locks), and there is a batch of patches from Yan
  fixing problems with the fs client when running against a clustered
  MDS.  There are a few bug fixes mixed in for good measure, many of
  which will be going to the stable trees once they're upstream.

  My apologies for the late pull.  There is still a gremlin in the rbd
  map/unmap code and I was hoping to include the fix for that as well,
  but we haven't been able to confirm the fix is correct yet; I'll send
  that in a separate pull once it's nailed down."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits)
  rbd: get rid of rbd_{get,put}_dev()
  libceph: register request before unregister linger
  libceph: don't use rb_init_node() in ceph_osdc_alloc_request()
  libceph: init event->node in ceph_osdc_create_event()
  libceph: init osd->o_node in create_osd()
  libceph: report connection fault with warning
  libceph: socket can close in any connection state
  rbd: don't use ENOTSUPP
  rbd: remove linger unconditionally
  rbd: get rid of RBD_MAX_SEG_NAME_LEN
  libceph: avoid using freed osd in __kick_osd_requests()
  ceph: don't reference req after put
  rbd: do not allow remove of mounted-on image
  libceph: Unlock unprocessed pages in start_read() error path
  ceph: call handle_cap_grant() for cap import message
  ceph: Fix __ceph_do_pending_vmtruncate
  ceph: Don't add dirty inode to dirty list if caps is in migration
  ceph: Fix infinite loop in __wake_requests
  ceph: Don't update i_max_size when handling non-auth cap
  bdi_register: add __printf verification, fix arg mismatch
  ...
2012-12-20 14:00:13 -08:00
Alex Elder
c89ce05e0c libceph: register request before unregister linger
In kick_requests(), we need to register the request before we
unregister the linger request.  Otherwise the unregister will
reset the request's osd pointer to NULL.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20 10:56:39 -06:00
Alex Elder
a978fa20fb libceph: don't use rb_init_node() in ceph_osdc_alloc_request()
The red-black node in the ceph osd request structure is initialized
in ceph_osdc_alloc_request() using rbd_init_node().  We do need to
initialize this, because in __unregister_request() we call
RB_EMPTY_NODE(), which expects the node it's checking to have
been initialized.  But rb_init_node() is apparently overkill, and
may in fact be on its way out.  So use RB_CLEAR_NODE() instead.

For a little more background, see this commit:
    4c199a93 rbtree: empty nodes have no color"

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20 10:56:33 -06:00
Alex Elder
3ee5234df6 libceph: init event->node in ceph_osdc_create_event()
The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event().  Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20 10:56:28 -06:00
Alex Elder
f407731d12 libceph: init osd->o_node in create_osd()
The red-black node node in the ceph osd structure is not initialized
in create_osd().  Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that.  Add a call to RB_CLEAR_NODE()
initialize it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20 10:56:21 -06:00
Alex Elder
28362986f8 libceph: report connection fault with warning
When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically).  We
currently get an error message on the log whenever this occurs.

A ceph connection will attempt to reestablish a socket connection
repeatedly if a fault occurs.  This means that these error messages
will get repeatedly added to the log, which is undesirable.

Change the error message to be a warning, so they don't get
logged by default.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-20 10:56:13 -06:00
Linus Torvalds
b7dfde956d Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
 
 There's a slightly non-trivial merge in virtio-net, as we cleaned up the
 virtio add_buf interface while DaveM accepted the mq virtio-net patches.
 
 You can see my solution in my pending-rebases branch, if that helps, but I
 know you love merging:
 
 https://git.kernel.org/?p=linux/kernel/git/rusty/linux.git;a=commit;h=12e4e64fa66a4c812e4855de32abdb4d819526fe
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQz/vKAAoJENkgDmzRrbjx+eYQAK/egj9T8Nnth6mkzdbCFSO7
 Bciga2hDiudGCiGojTRGPRSc0VP9LgfvPbY2pxX+R9CfEqR+a8q/rRQhCS79ZwPB
 /mJy3HNiCx418HZxgwNtk6vPe0PjJm6SsjbXeB9hB+PQLCbdwA0BjpG6xjF/jitP
 noPqhhXreeQgYVxAKoFPvff/Byu2GlNnDdVMQxWRmo8hTKlTCzl0T/7BHRxthhJj
 iOrXTFzrT/osPT0zyqlngT03T4wlBvL2Bfw8d/kuRPEZ71dpIctWeH2KzdwXVCrz
 hFQGxAz4OWvW3xrNwj7c6O3SWj4VemUMjQqeA/PtRiOEI5gM0Y/Bit47dWL4wM/O
 OWUKFHzq4DFs8MmwXBgDDXl5xOjOBH9Ik4FZayn3Y7COT/B8CjFdOC2MdDGmZ9yd
 NInumg7FqP+u12g+9Vq8S/b0cfoQm4qFe8VHiPJu+jRmCZglyvLjk7oq/QwW8Gaq
 Pkzit1Ey0DWo2KvZ4D/nuXJCuhmzN/AJ10M48lLYZhtOIVg9gsa0xjhfgq4FnvSK
 xFCf3rcWnlGIXcOYh/hKU25WaCLzBuqMuSK35A72IujrQOL7OJTk4Oqote3Z3H9B
 08XJmyW6SOZdfw17X4Im1jbyuLek///xQJ9Jw/tya7j9lBt8zjJ+FmLPs4mLGEOm
 WJv9uZPs+QbIMNky2Lcb
 =myDR
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio update from Rusty Russell:
 "Some nice cleanups, and even a patch my wife did as a "live" demo for
  Latinoware 2012.

  There's a slightly non-trivial merge in virtio-net, as we cleaned up
  the virtio add_buf interface while DaveM accepted the mq virtio-net
  patches."

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
  virtio_console: Add support for remoteproc serial
  virtio_console: Merge struct buffer_token into struct port_buffer
  virtio: add drv_to_virtio to make code clearly
  virtio: use dev_to_virtio wrapper in virtio
  virtio-mmio: Fix irq parsing in command line parameter
  virtio_console: Free buffers from out-queue upon close
  virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  virtio_console: Use kmalloc instead of kzalloc
  virtio_console: Free buffer if splice fails
  virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
  virtio: console: don't rely on virtqueue_add_buf() returning capacity.
  virtio_net: don't rely on virtqueue_add_buf() returning capacity.
  virtio-net: remove unused skb_vnet_hdr->num_sg field
  virtio-net: correct capacity math on ring full
  virtio: move queue_index and num_free fields into core struct virtqueue.
  ...
2012-12-20 08:37:05 -08:00
Linus Torvalds
9eb127cc04 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Really fix tuntap SKB use after free bug, from Eric Dumazet.

 2) Adjust SKB data pointer to point past the transport header before
    calling icmpv6_notify() so that the headers are in the state which
    that function expects.  From Duan Jiong.

 3) Fix ambiguities in the new tuntap multi-queue APIs.  From Jason
    Wang.

 4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.

 5) Don't destroy mutex after freeing up device private in mac802154,
    fix also from Konstantin Khlebnikov.

 6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.

 7) SCTP HMAC kconfig rework, from Neil Horman.

 8) Fix SCTP jprobes function signature, otherwise things explode, from
    Daniel Borkmann.

 9) Fix typo in ipv6-offload Makefile variable reference, from Simon
    Arlott.

10) Don't fail USBNET open just because remote wakeup isn't supported,
    from Oliver Neukum.

11) be2net driver bug fixes from Sathya Perla.

12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
    Woodhouse.

13) Fix MTU changing regression in 8139cp driver, from John Greene.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  solos-pci: ensure all TX packets are aligned to 4 bytes
  solos-pci: add firmware upgrade support for new models
  solos-pci: remove superfluous debug output
  solos-pci: add GPIO support for newer versions on Geos board
  8139cp: Prevent dev_close/cp_interrupt race on MTU change
  net: qmi_wwan: add ZTE MF880
  drivers/net: Use of_match_ptr() macro in smsc911x.c
  drivers/net: Use of_match_ptr() macro in smc91x.c
  ipv6: addrconf.c: remove unnecessary "if"
  bridge: Correctly encode addresses when dumping mdb entries
  bridge: Do not unregister all PF_BRIDGE rtnl operations
  use generic usbnet_manage_power()
  usbnet: generic manage_power()
  usbnet: handle PM failure gracefully
  ksz884x: fix receive polling race condition
  qlcnic: update driver version
  qlcnic: fix unused variable warnings
  net: fec: forbid FEC_PTP on SoCs that do not support
  be2net: fix wrong frag_idx reported by RX CQ
  be2net: fix be_close() to ensure all events are ack'ed
  ...
2012-12-19 20:29:15 -08:00
Cong Ding
bd7790286b ipv6: addrconf.c: remove unnecessary "if"
the value of err is always negative if it goes to errout, so we don't need to
check the value of err.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19 12:50:06 -08:00
Vlad Yasevich
09d7cf7d93 bridge: Correctly encode addresses when dumping mdb entries
When dumping mdb table, set the addresses the kernel returns
based on the address protocol type.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19 12:50:06 -08:00
Vlad Yasevich
63233159fd bridge: Do not unregister all PF_BRIDGE rtnl operations
Bridge fdb and link rtnl operations are registered in
core/rtnetlink.  Bridge mdb operations are registred
in bridge/mdb.  When removing bridge module, do not
unregister ALL PF_BRIDGE ops since that would remove
the ops from rtnetlink as well.  Do remove mdb ops when
bridge is destroyed.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19 12:50:06 -08:00
Linus Torvalds
a2faf2fc53 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull (again) user namespace infrastructure changes from Eric Biederman:
 "Those bugs, those darn embarrasing bugs just want don't want to get
  fixed.

  Linus I just updated my mirror of your kernel.org tree and it appears
  you successfully pulled everything except the last 4 commits that fix
  those embarrasing bugs.

  When you get a chance can you please repull my branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Fix typo in description of the limitation of userns_install
  userns: Add a more complete capability subset test to commit_creds
  userns: Require CAP_SYS_ADMIN for most uses of setns.
  Fix cap_capable to only allow owners in the parent user namespace to have caps.
2012-12-18 10:55:28 -08:00
Linus Torvalds
2d4dce0070 NFS client updates for Linux 3.8
Features include:
 
 - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code
   Remove altogether where possible, and replace with WARN_ON_ONCE and
   appropriate error returns where not.
 - NFSv4.1 client adds session dynamic slot table management. There is
   matching server side code that has been submitted to Bruce for
   consideration. Together, this code allows the server to dynamically
   manage the amount of memory it allocates to the duplicate request
   cache for each client. It will constantly resize those caches to
   reserve more memory for clients that are hot while shrinking caches
   for those that are quiescent.
 
 In addition, there are assorted bugfixes for the generic NFS write code,
 fixes to deal with the drop_nlink() warnings, and yet another fix for
 NFSv4 getacl.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQz8VNAAoJEGcL54qWCgDy7iYQAKbr7AAZOcZPoJigzakZ7nMi
 UKYulGbFais2Llwzw1e+U5RzmorTSbvl7/m8eS7pDf3auYw/t4xtXjKSGZUNxaE1
 q2hNKgVwodMbScYdkZXvKKNckS93oPDttrmEyzjKanqey+1E3HSklvOvikN0ihte
 B/G1OtA7Qpcr92bPrLK+PjDqarCBUI4g42dYbZOBrZnXKTRtzUqsuKPu7WjpPiof
 SHE5b1Emt7oUxgcijWGcvYCQ8voZdeSCnSksH3DgvORlutwdhUD3Yg8KyEfFZdyc
 6C59ozXRLiHkV3c+jMhJzDkQXR9bYHrnK3tlq4G8v1NdJxRktQliZeqecRvip/Wz
 rAxfE6fnPDEvKsCpZb3+5yTAt+aZwzEhRg1fFC9qfGOp+oRa+CWw5kJCyIFHwJu6
 4LOlubQAf6rnIsja1L8D0FdeqHUa1+wy61On5kgVYS5JGtoBsQHpa1zTwdOxPmsR
 2XTMYGNCEabvpKpO9+5xQbUzkFExPTesw47ygXiUuDT/snaarpV3/f05SSCaWZkX
 R8QsGEOXTIh8/S+UxARGpc7H6xi1PdBM5nBziHVzjEdHgZRF4wGFaJe2CirMjSJO
 Df5GEd5Z/8VCGWs+1w7HD5EaQ2n0wbt5daCE80Y2jRBr7NMYnY+ciF8/GktLpHsn
 Zq1bXGOdr3UZ92LXuzL9
 =G3N9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client
     code.  Remove altogether where possible, and replace with
     WARN_ON_ONCE and appropriate error returns where not.
   - NFSv4.1 client adds session dynamic slot table management.  There
     is matching server side code that has been submitted to Bruce for
     consideration.

     Together, this code allows the server to dynamically manage the
     amount of memory it allocates to the duplicate request cache for
     each client.  It will constantly resize those caches to reserve
     more memory for clients that are hot while shrinking caches for
     those that are quiescent.

  In addition, there are assorted bugfixes for the generic NFS write
  code, fixes to deal with the drop_nlink() warnings, and yet another
  fix for NFSv4 getacl."

* tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits)
  SUNRPC: continue run over clients list on PipeFS event instead of break
  NFS: Don't use SetPageError in the NFS writeback code
  SUNRPC: variable 'svsk' is unused in function bc_send_request
  SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
  NFSv4.1: Deal effectively with interrupted RPC calls.
  NFSv4.1: Move the RPC timestamp out of the slot.
  NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
  NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
  NFS: Fix calls to drop_nlink()
  NFS: Ensure that we always drop inodes that have been marked as stale
  nfs: Remove unused list nfs4_clientid_list
  nfs: Remove duplicate function declaration in internal.h
  NFS: avoid NULL dereference in nfs_destroy_server
  SUNRPC handle EKEYEXPIRED in call_refreshresult
  SUNRPC set gss gc_expiry to full lifetime
  nfs: fix page dirtying in NFS DIO read codepath
  nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
  NFSv4.1: Be conservative about the client highest slotid
  NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
  nfs: don't extend writes to cover entire page if pagecache is invalid
  ...
2012-12-18 09:36:34 -08:00
chas williams - CONTRACTOR
3c4177716c atm: use scnprintf() instead of sprintf()
As reported by Chen Gang <gang.chen@asianux.com>, we should ensure there
is enough space when formatting the sysfs buffers.

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-17 20:50:51 -08:00
Hannes Frederic Sowa
4e4b53768f netlink: validate addr_len on bind
Otherwise an out of bounds read could happen.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-17 20:50:51 -08:00
Hannes Frederic Sowa
9f1e0ad0ad netlink: change presentation of portid in procfs to unsigned
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-17 20:50:51 -08:00
J. Bruce Fields
afc59400d6 nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:16 -05:00
Linus Torvalds
6a2b60b17b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "While small this set of changes is very significant with respect to
  containers in general and user namespaces in particular.  The user
  space interface is now complete.

  This set of changes adds support for unprivileged users to create user
  namespaces and as a user namespace root to create other namespaces.
  The tyranny of supporting suid root preventing unprivileged users from
  using cool new kernel features is broken.

  This set of changes completes the work on setns, adding support for
  the pid, user, mount namespaces.

  This set of changes includes a bunch of basic pid namespace
  cleanups/simplifications.  Of particular significance is the rework of
  the pid namespace cleanup so it no longer requires sending out
  tendrils into all kinds of unexpected cleanup paths for operation.  At
  least one case of broken error handling is fixed by this cleanup.

  The files under /proc/<pid>/ns/ have been converted from regular files
  to magic symlinks which prevents incorrect caching by the VFS,
  ensuring the files always refer to the namespace the process is
  currently using and ensuring that the ptrace_mayaccess permission
  checks are always applied.

  The files under /proc/<pid>/ns/ have been given stable inode numbers
  so it is now possible to see if different processes share the same
  namespaces.

  Through the David Miller's net tree are changes to relax many of the
  permission checks in the networking stack to allowing the user
  namespace root to usefully use the networking stack.  Similar changes
  for the mount namespace and the pid namespace are coming through my
  tree.

  Two small changes to add user namespace support were commited here adn
  in David Miller's -net tree so that I could complete the work on the
  /proc/<pid>/ns/ files in this tree.

  Work remains to make it safe to build user namespaces and 9p, afs,
  ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
  Kconfig guard remains in place preventing that user namespaces from
  being built when any of those filesystems are enabled.

  Future design work remains to allow root users outside of the initial
  user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
  proc: Usable inode numbers for the namespace file descriptors.
  proc: Fix the namespace inode permission checks.
  proc: Generalize proc inode allocation
  userns: Allow unprivilged mounts of proc and sysfs
  userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
  procfs: Print task uids and gids in the userns that opened the proc file
  userns: Implement unshare of the user namespace
  userns: Implent proc namespace operations
  userns: Kill task_user_ns
  userns: Make create_new_namespaces take a user_ns parameter
  userns: Allow unprivileged use of setns.
  userns: Allow unprivileged users to create new namespaces
  userns: Allow setting a userns mapping to your current uid.
  userns: Allow chown and setgid preservation
  userns: Allow unprivileged users to create user namespaces.
  userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
  userns: fix return value on mntns_install() failure
  vfs: Allow unprivileged manipulation of the mount namespace.
  vfs: Only support slave subtrees across different user namespaces
  vfs: Add a user namespace reference from struct mnt_namespace
  ...
2012-12-17 15:44:47 -08:00
J. Bruce Fields
3a28e33111 svcrpc: fix some printks
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 16:02:40 -05:00
Alex Elder
7bb21d68c5 libceph: socket can close in any connection state
A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex).  As a result, the connectino can be in pretty much any state
at the time its socket is closed.

Handle those other cases at the top of con_work().  Pull this whole
block of code into a separate function to reduce the clutter.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17 12:07:32 -06:00
Alex Elder
61c7403562 rbd: remove linger unconditionally
In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17 12:07:31 -06:00
Stanislav Kinsbursky
cd6c596858 SUNRPC: continue run over clients list on PipeFS event instead of break
There are SUNRPC clients, which program doesn't have pipe_dir_name. These
clients can be skipped on PipeFS events, because nothing have to be created or
destroyed. But instead of breaking in case of such a client was found, search
for suitable client over clients list have to be continued. Otherwise some
clients could not be covered by PipeFS event handler.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org [>= v3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-17 12:19:16 -05:00
Alex Elder
685a7555ca libceph: avoid using freed osd in __kick_osd_requests()
If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17 08:37:23 -06:00
Alex Elder
7d5f24812b ceph: don't reference req after put
In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
2012-12-17 08:37:19 -06:00
Linus Torvalds
2a74dbb9a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16 15:40:50 -08:00
Simon Arlott
df48419107 ipv6: Fix Makefile offload objects
The following commit breaks IPv6 TCP transmission for me:
	Commit 75fe83c322
	Author: Vlad Yasevich <vyasevic@redhat.com>
	Date:   Fri Nov 16 09:41:21 2012 +0000
	ipv6: Preserve ipv6 functionality needed by NET

This patch fixes the typo "ipv6_offload" which should be
"ipv6-offload".

I don't know why not including the offload modules should
break TCP. Disabling all offload options on the NIC didn't
help. Outgoing pulseaudio traffic kept stalling.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-16 09:15:53 -08:00
Daniel Borkmann
4cb9d6eaf8 sctp: jsctp_sf_eat_sack: fix jprobes function signature mismatch
Commit 24cb81a6a (sctp: Push struct net down into all of the
state machine functions) introduced the net structure into all
state machine functions, but jsctp_sf_eat_sack was not updated,
hence when SCTP association probing is enabled in the kernel,
any simple SCTP client/server program from userspace will panic
the kernel.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:39 -08:00
Amerigo Wang
ccb1c31a7a bridge: add flags to distinguish permanent mdb entires
This patch adds a flag to each mdb entry, so that we can distinguish
permanent entries with temporary entries.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:39 -08:00
Neil Horman
0d0863b020 sctp: Change defaults on cookie hmac selection
Recently I posted commit 3c68198e75 which made selection of the cookie hmac
algorithm selectable.  This is all well and good, but Linus noted that it
changes the default config:
http://marc.info/?l=linux-netdev&m=135536629004808&w=2

I've modified the sctp Kconfig file to reflect the recommended way of making
this choice, using the thermal driver example specified, and brought the
defaults back into line with the way they were prior to my origional patch

Also, on Linus' suggestion, re-adding ability to select default 'none' hmac
algorithm, so we don't needlessly bloat the kernel by forcing a non-none
default.  This also led me to note that we won't honor the default none
condition properly because of how sctp_net_init is encoded.  Fix that up as
well.

Tested by myself (allbeit fairly quickly).  All configuration combinations seems
to work soundly.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:38 -08:00
Trond Myklebust
1efc28780b SUNRPC: variable 'svsk' is unused in function bc_send_request
Silence a compile time warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:05:57 -05:00
Trond Myklebust
4a20a988f7 SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
Silence the unnecessary warning "unhandled error (111) connecting to..."
and convert it to a dprintk for debugging purposes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:02:29 -05:00
Eric W. Biederman
5e4a08476b userns: Require CAP_SYS_ADMIN for most uses of setns.
Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-14 16:12:03 -08:00
Christoph Paasch
e337e24d66 inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
    [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
    [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
    [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
    [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
    [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
    [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
    [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
    [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
    [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
    [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
    [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
    [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
    [<ffffffff814cee68>] ip_rcv+0x217/0x267
    [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
    [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Duan Jiong
093d04d42f ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Konstantin Khlebnikov
1e9f954516 mac802154: fix destructon ordering for ieee802154 devices
mutex_destroy() must be called before wpan_phy_free(), because it puts the last
reference and frees memory. Catched as overwritten poison in kmalloc-2048.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Ang Way Chuang
8fa45a70ba bridge: remove temporary variable for MLDv2 maximum response code computation
As suggested by Stephen Hemminger, this remove the temporary variable
introduced in commit eca2a43bb0
("bridge: fix icmpv6 endian bug and other sparse warnings")

Signed-off-by: Ang Way Chuang <wcang@sfc.wide.ad.jp>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:06 -05:00
Linus Torvalds
8d9ea7172e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A pile of fixes in response to yesterday's big merge.  The SCTP HMAC
  thing hasn't been addressed yet, I'll take care of that myself if Neil
  and Vlad don't show signs of life by tomorrow.

   1) Use after free of SKB in tuntap code.  Fix by Eric Dumazet,
      reported by Dave Jones.

   2) NFC LLCP code emits annoying kernel log message, triggerable by
      the user.  From Dave Jones.

   3) Fix several endianness bugs noticed by sparse in the bridging
      code, from Stephen Hemminger.

   4) Ipv6 NDISC code doesn't take padding into account properly, fix
      from YOSHIFUJI Hideaki.

   5) Add missing docs to ethtool_flow_ext struct, from Yan Burman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bridge: fix icmpv6 endian bug and other sparse warnings
  net: ethool: Document struct ethtool_flow_ext
  ndisc: Fix padding error in link-layer address option.
  tuntap: dont use skb after netif_rx_ni(skb)
  nfc: remove noisy message from llcp_sock_sendmsg
2012-12-13 13:20:02 -08:00
Linus Torvalds
fd62c54503 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID subsystem updates from Jiri Kosina:

 1) Support for HID over I2C bus has been added by Benjamin Tissoires.
    ACPI device discovery is still in the works.

 2) Support for Win8 Multitiouch protocol is being added, most work done
    by Benjamin Tissoires as well

 3) EIO/ERESTARTSYS is fixed in hiddev/hidraw, fixes by Andrew Duggan
    and Jiri Kosina

 4) ION iCade driver added by Bastien Nocera

 5) Support for a couple new Roccat devices has been added by Stefan
    Achatz

 6) HID sensor hubs are now auto-detected instead of having to list all
    the VID/PID combinations in the blacklist array

 7) other random fixes and support for new device IDs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (65 commits)
  HID: i2c-hid: add mutex protecting open/close race
  Revert "HID: sensors: add to special driver list"
  HID: sensors: autodetect USB HID sensor hubs
  HID: hidp: fallback to input session properly if hid is blacklisted
  HID: i2c-hid: fix ret_count check
  HID: i2c-hid: fix i2c_hid_get_raw_report count mismatches
  HID: i2c-hid: remove extra .irq field in struct i2c_hid
  HID: i2c-hid: reorder allocation/free of buffers
  HID: i2c-hid: fix memory corruption due to missing hid declaration
  HID: i2c-hid: remove superfluous include
  HID: i2c-hid: remove unneeded test in i2c_hid_remove
  HID: i2c-hid: i2c_hid_get_report may fail
  HID: i2c-hid: also call i2c_hid_free_buffers in i2c_hid_remove
  HID: i2c-hid: fix error messages
  HID: i2c-hid: fix return paths
  HID: i2c-hid: remove unused static declarations
  HID: i2c-hid: fix i2c_hid_dbg macro
  HID: i2c-hid: fix checkpatch.pl warning
  HID: i2c-hid: enhance Kconfig
  HID: i2c-hid: change I2C name
  ...
2012-12-13 12:00:48 -08:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
stephen hemminger
eca2a43bb0 bridge: fix icmpv6 endian bug and other sparse warnings
Fix the warnings reported by sparse on recent bridge multicast
changes. Mostly just rcu annotation issues but in this case
sparse found a real bug! The ICMPv6 mld2 query mrc
values is in network byte order.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
YOSHIFUJI Hideaki / 吉藤英明
7bdc1b4aba ndisc: Fix padding error in link-layer address option.
If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
post padding is not initialized correctly.

(Un)fortunately, the only type that requires pad is Infiniband,
whose pad is 2 and data_len is 20, and this logical error has not
become obvious, but it is better to fix.

Note that ndisc_opt_addr_space() handles the situation described
above correctly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
Dave Jones
026e43def7 nfc: remove noisy message from llcp_sock_sendmsg
This is easily triggerable when fuzz-testing as an unprivileged user.
We could rate-limit it, but given we don't print similar messages
for other protocols, I just removed it.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:10 -05:00
Sage Weil
83aff95eb9 libceph: remove 'osdtimeout' option
This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-12-13 08:13:06 -06:00
Linus Torvalds
6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
Jiri Kosina
818b930bc1 Merge branches 'for-3.7/upstream-fixes', 'for-3.8/hidraw', 'for-3.8/i2c-hid', 'for-3.8/multitouch', 'for-3.8/roccat', 'for-3.8/sensors' and 'for-3.8/upstream' into for-linus
Conflicts:
	drivers/hid/hid-core.c
2012-12-12 21:41:55 +01:00
Andy Adamson
eb96d5c97b SUNRPC handle EKEYEXPIRED in call_refreshresult
Currently, when an RPCSEC_GSS context has expired or is non-existent
and the users (Kerberos) credentials have also expired or are non-existent,
the client receives the -EKEYEXPIRED error and tries to refresh the context
forever.  If an application is performing I/O, or other work against the share,
the application hangs, and the user is not prompted to refresh/establish their
credentials. This can result in a denial of service for other users.

Users are expected to manage their Kerberos credential lifetimes to mitigate
this issue.

Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number
of times to refresh the gss_context, and then return -EACCES to the application.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 15:36:02 -05:00
Andy Adamson
620038f6d2 SUNRPC set gss gc_expiry to full lifetime
Only use the default GSSD_MIN_TIMEOUT if the gss downcall timeout is zero.
Store the full lifetime in gc_expiry (not 3/4 of the lifetime) as subsequent
patches will use the gc_expiry to determine buffered WRITE behavior in the
face of expired or soon to be expired gss credentials.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 15:35:59 -05:00
Cong Wang
cfd5675435 bridge: add support of adding and deleting mdb entries
This patch implents adding/deleting mdb entries via netlink.
Currently all entries are temp, we probably need a flag to distinguish
permanent entries too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
Cong Wang
37a393bc49 bridge: notify mdb changes via netlink
As Stephen mentioned, we need to monitor the mdb
changes in user-space, so add notifications via netlink too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
YOSHIFUJI Hideaki
fd0ea7dbfa ndisc: Unexport ndisc_{build,send}_skb().
These symbols were exported for bonding device by commit 305d552a
("bonding: send IPv6 neighbor advertisement on failover").

It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle
NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by
commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'").

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 12:42:29 -05:00
Linus Torvalds
d206e09036 Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "A lot of activities on cgroup side.  The big changes are focused on
  making cgroup hierarchy handling saner.

   - cgroup_rmdir() had peculiar semantics - it allowed cgroup
     destruction to be vetoed by individual controllers and tried to
     drain refcnt synchronously.  The vetoing never worked properly and
     caused good deal of contortions in cgroup.  memcg was the last
     reamining user.  Michal Hocko removed the usage and cgroup_rmdir()
     path has been simplified significantly.  This was done in a
     separate branch so that the memcg people can base further memcg
     changes on top.

   - The above allowed cleaning up cgroup lifecycle management and
     implementation of generic cgroup iterators which are used to
     improve hierarchy support.

   - cgroup_freezer updated to allow migration in and out of a frozen
     cgroup and handle hierarchy.  If a cgroup is frozen, all descendant
     cgroups are frozen.

   - netcls_cgroup and netprio_cgroup updated to handle hierarchy
     properly.

   - Various fixes and cleanups.

   - Two merge commits.  One to pull in memcg and rmdir cleanups (needed
     to build iterators).  The other pulled in cgroup/for-3.7-fixes for
     device_cgroup fixes so that further device_cgroup patches can be
     stacked on top."

Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
  cgroup: update Documentation/cgroups/00-INDEX
  cgroup_rm_file: don't delete the uncreated files
  cgroup: remove subsystem files when remounting cgroup
  cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
  cgroup: warn about broken hierarchies only after css_online
  cgroup: list_del_init() on removed events
  cgroup: fix lockdep warning for event_control
  cgroup: move list add after list head initilization
  netprio_cgroup: allow nesting and inherit config on cgroup creation
  netprio_cgroup: implement netprio[_set]_prio() helpers
  netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
  netprio_cgroup: reimplement priomap expansion
  netprio_cgroup: shorten variable names in extend_netdev_table()
  netprio_cgroup: simplify write_priomap()
  netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
  cgroup: remove obsolete guarantee from cgroup_task_migrate.
  cgroup: add cgroup->id
  cgroup, cpuset: remove cgroup_subsys->post_clone()
  cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
  cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
  ...
2012-12-12 08:18:24 -08:00
Eric Dumazet
1abbe1394a pkt_sched: avoid requeues if possible
With BQL being deployed, we can more likely have following behavior :

We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :

- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs

This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.

This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 00:16:47 -05:00
Linus Torvalds
c6bd5bcc49 TTY/Serial merge for 3.8-rc1
Here's the big tty/serial tree set of changes for 3.8-rc1.
 
 Contained in here is a bunch more reworks of the tty port layer from Jiri and
 bugfixes from Alan, along with a number of other tty and serial driver updates
 by the various driver authors.
 
 Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY
 layer, which is much appreciated by me.
 
 All of these have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDHhgwACgkQMUfUDdst+ynI6wCcC+YeBwncnoWHvwLAJOwAZpUL
 bysAn28o780/lOsTzp3P1Qcjvo69nldo
 =hN/g
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/Serial merge from Greg Kroah-Hartman:
 "Here's the big tty/serial tree set of changes for 3.8-rc1.

  Contained in here is a bunch more reworks of the tty port layer from
  Jiri and bugfixes from Alan, along with a number of other tty and
  serial driver updates by the various driver authors.

  Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the
  TTY layer, which is much appreciated by me.

  All of these have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up some trivial conflicts in the staging tree, due to the fwserial
driver having come in both ways (but fixed up a bit in the serial tree),
and the ioctl handling in the dgrp driver having been done slightly
differently (staging tree got that one right, and removed both
TIOCGSOFTCAR and TIOCSSOFTCAR).

* tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits)
  staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer()
  staging/fwserial: Remove superfluous free
  staging/fwserial: Use WARN_ONCE when port table is corrupted
  staging/fwserial: Destruct embedded tty_port on teardown
  staging/fwserial: Fix build breakage when !CONFIG_BUG
  staging: fwserial: Add TTY-over-Firewire serial driver
  drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage
  staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user()
  staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver
  serial: ifx6x60: Add modem power off function in the platform reboot process
  serial: mxs-auart: unmap the scatter list before we copy the data
  serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled
  serial: max310x: Setup missing "can_sleep" field for GPIO
  tty/serial: fix ifx6x60.c declaration warning
  serial: samsung: add devicetree properties for non-Exynos SoCs
  serial: samsung: fix potential soft lockup during uart write
  tty: vt: Remove redundant null check before kfree.
  tty/8250 Add check for pci_ioremap_bar failure
  tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards
  tty/8250 Add XR17D15x devices to the exar_handle_irq override
  ...
2012-12-11 14:08:47 -08:00
John W. Linville
f9c4d420c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-12-11 16:24:55 -05:00
John W. Linville
c66cfd5325 Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next 2012-12-11 16:04:03 -05:00
Eric Dumazet
75be437230 net: gro: avoid double copy in skb_gro_receive()
__copy_skb_header(nskb, p) already copied p->cb[], no need to copy
it again.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 13:44:09 -05:00
Cong Wang
2ce297fc24 bridge: fix seq check in br_mdb_dump()
In case of rehashing, introduce a global variable 'br_mdb_rehash_seq'
which gets increased every time when rehashing, and assign
net->dev_base_seq + br_mdb_rehash_seq to cb->seq.

In theory cb->seq could be wrapped to zero, but this is not
easy to fix, as net->dev_base_seq is not visible inside
br_mdb_rehash(). In practice, this is rare.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 13:44:09 -05:00
Abhijit Pawar
a71258d79e net: remove obsolete simple_strto<foo>
This patch removes the redundant occurences of simple_strto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 12:49:53 -05:00
Eric Dumazet
89c5fa3369 net: gro: dev_gro_receive() cleanup
__napi_gro_receive() is inlined from two call sites for no good reason.

Lets move the prep stuff in a function of its own, called only if/when
needed. This saves 300 bytes on x86 :

# size net/core/dev.o.after net/core/dev.o.before
   text	   data	    bss	    dec	    hex	filename
  51968	   1238	   1040	  54246	   d3e6	net/core/dev.o.before
  51664	   1238	   1040	  53942	   d2b6	net/core/dev.o.after

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 12:49:53 -05:00
Trond Myklebust
7ce0171d4f Merge branch 'bugfixes' into nfs-for-next 2012-12-11 09:16:26 -05:00
Johannes Berg
8acbcddb5f minstrel: update stats after processing status
Instead of updating stats before sending a packet,
update them after processing the packet's status.
This makes minstrel in line with minstrel_ht.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-10 22:51:50 +01:00
Stanislav Kinsbursky
756933ee8a SUNRPC: remove redundant "linux/nsproxy.h" includes
This is a cleanup patch.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:30 -05:00
Johannes Berg
8e3c1b7743 mac80211: a few whitespace fixes
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-10 21:24:02 +01:00
John Fastabend
7c77ab24e3 net: Allow DCBnl to use other namespaces besides init_net
Allow DCB and net namespace to work together. This is useful if you
have containers that are bound to 'phys' interfaces that want to
also manage their DCB attributes.

The net namespace is taken from sock_net(skb->sk) of the netlink skb.

CC: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 14:09:01 -05:00
Abhijit Pawar
4b5511ebc7 net: remove obsolete simple_strto<foo>
This patch replace the obsolete simple_strto<foo> with kstrto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 14:09:00 -05:00
Johannes Berg
1bf3751ec9 ipv4: ip_check_defrag must not modify skb before unsharing
ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.

The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.

Reported-by: Eric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:51:44 -05:00
Dan Carpenter
2062cc20d0 bridge: make buffer larger in br_setlink()
We pass IFLA_BRPORT_MAX to nla_parse_nested() so we need
IFLA_BRPORT_MAX + 1 elements.  Also Smatch complains that we read past
the end of the array when in br_set_port_flag() when it's called with
IFLA_BRPORT_FAST_LEAVE.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:31:30 -05:00
Neal Cardwell
5e1f54201c inet_diag: validate port comparison byte code to prevent unsafe reads
Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.

Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 19:00:48 -05:00
Neal Cardwell
f67caec906 inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).

This change is needed for two reasons:

(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.

(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
405c005949 inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.

Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
1c95df85ca inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Ben Hutchings
65d2897c0f caif_usb: Make the driver name check more efficient
Use the device model to get just the name, rather than using the
ethtool API to get all driver information.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:34:02 -05:00
Ben Hutchings
406636340c caif_usb: Check driver name before reading driver state in netdev notifier
In cfusbl_device_notify(), the usbnet and usbdev variables are
initialised before the driver name has been checked.  In case the
device's driver is not cdc_ncm, this may result in reading beyond the
end of the netdev private area.  Move the initialisation below the
driver name check.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:34:02 -05:00
Alexander Duyck
fc70fb640b net: Handle encapsulated offloads before fragmentation or handing to lower dev
This change allows the VXLAN to enable Tx checksum offloading even on
devices that do not support encapsulated checksum offloads. The
advantage to this is that it allows for the lower device to change due
to routing table changes without impacting features on the VXLAN itself.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:20:28 -05:00
Joseph Gasparakis
6a674e9c75 net: Add support for hardware-offloaded encapsulation
This patch adds support in the kernel for offloading in the NIC Tx and Rx
checksumming for encapsulated packets (such as VXLAN and IP GRE).

For Tx encapsulation offload, the driver will need to set the right bits
in netdev->hw_enc_features. The protocol driver will have to set the
skb->encapsulation bit and populate the inner headers, so the NIC driver will
use those inner headers to calculate the csum in hardware.

For Rx encapsulation offload, the driver will need to set again the
skb->encapsulation flag and the skb->ip_csum to CHECKSUM_UNNECESSARY.
In that case the protocol driver should push the decapsulated packet up
to the stack, again with CHECKSUM_UNNECESSARY. In ether case, the protocol
driver should set the skb->encapsulation flag back to zero. Finally the
protocol driver should have NETIF_F_RXCSUM flag set in its features.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:20:28 -05:00
David S. Miller
ba501666fa Merge branch 'tipc_net-next_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Paul Gortmaker says:

====================
Changes since v1:
	-get rid of essentially unused variable spotted by
	 Neil Horman (patch #2)

	-drop patch #3; defer it for 3.9 content, so Neil,
	 Jon and Ying can discuss its specifics at their
	 leisure while net-next is closed.  (It had no
	 direct dependencies to the rest of the series, and
	 was just an optimization)

	-fix indentation of accept() code directly in place
	 vs. forking it out to a separate function (was patch
	 #10, now patch #9).

Rebuilt and re-ran tests just to ensure nothing odd happened.

Original v1 text follows, updated pull information follows that.

           ---------

Here is another batch of TIPC changes.  The most interesting
thing is probably the non-blocking socket connect - I'm told
there were several users looking forward to seeing this.

Also there were some resource limitation changes that had
the right intent back in 2005, but were now apparently causing
needless limitations to people's real use cases; those have
been relaxed/removed.

There is a lockdep splat fix, but no need for a stable backport,
since it is virtually impossible to trigger in mainline; you
have to essentially modify code to force the probabilities
in your favour to see it.

The rest can largely be categorized as general cleanup of things
seen in the process of getting the above changes done.

Tested between 64 and 32 bit nodes with the test suite.  I've
also compile tested all the individual commits on the chain.

I'd originally figured on this queue not being ready for 3.8, but
the extended stabilization window of 3.7 has changed that.  On
the other hand, this can still be 3.9 material, if that simply
works better for folks - no problem for me to defer it to 2013.
If anyone spots any problems then I'll definitely defer it,
rather than rush a last minute respin.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-08 20:25:45 -05:00
Paul Gortmaker
0fef8f205f tipc: refactor accept() code for improved readability
In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.

Here, we simply flip the check of the if, so that the main execution
path stays at the same indentation level, which improves readability.
If the allocation fails, we jump to an already existing exit label.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:24 -05:00
Ying Xue
258f8667a2 tipc: add lock nesting notation to quiet lockdep warning
TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:

  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  kworker/u:0/6 is trying to acquire lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc]

  but task is already holding lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc]

  other info that might help us debug this:
  Possible unsafe locking scenario:

          CPU0
          ----
          lock(sk_lock-AF_TIPC);
          lock(sk_lock-AF_TIPC);

          *** DEADLOCK ***

  May be due to missing lock nesting notation
  [...]

Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline.  So no need for stable
backports or similar of this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:23 -05:00
Ying Xue
cbab368790 tipc: eliminate connection setup for implied connect in recv_msg()
As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:22 -05:00
Ying Xue
584d24b396 tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.

With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.

The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.

It is also now possible to call select() or poll() to wait for the
completion of a connection.

An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:21 -05:00
Ying Xue
7e6c131e15 tipc: consolidate connection-oriented message reception in one function
Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine.  In the process, we convert the
chain of if/else into a switch/case for improved readability.

A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type".  This happens because existing
tipc code pseudo extends the default linux socket state values with:

	#define SS_LISTENING    -1      /* socket is listening */
	#define SS_READY        -2      /* socket is connectionless */

It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:20 -05:00
Paul Gortmaker
bc879117d4 tipc: standardize across connect/disconnect function naming
Currently we have tipc_disconnect and tipc_disconnect_port.  It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.

If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:

   __tipc_disconnect: "raw" function that does all the work.

   tipc_disconnect: wrapper that deals with locking and then calls
		    the real core __tipc_disconnect function

With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.

On the connect side of things, we currently only have the single
"tipc_connect2port" function.  It does both the locking at enter/exit,
and the core of the work.  Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.

Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency.  In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.

On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously.  We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:19 -05:00
Jon Maloy
e643df156a tipc: change sk_receive_queue upper limit
The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:18 -05:00
Eric Dumazet
c3c7c254b2 net: gro: fix possible panic in skb_gro_receive()
commit 2e71a6f808 (net: gro: selective flush of packets) added
a bug for skbs using frag_list. This part of the GRO stack is rarely
used, as it needs skb not using a page fragment for their skb->head.

Most drivers do use a page fragment, but some of them use GFP_KERNEL
allocations for the initial fill of their RX ring buffer.

napi_gro_flush() overwrite skb->prev that was used for these skb to
point to the last skb in frag_list.

Fix this using a separate field in struct napi_gro_cb to point to the
last fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:39:29 -05:00
Yuchung Cheng
93b174ad71 tcp: bug fix Fast Open client retransmission
If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.

Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:39:28 -05:00
Cong Wang
ee07c6e7a6 bridge: export multicast database via netlink
V5: fix two bugs pointed out by Thomas
    remove seq check for now, mark it as TODO

V4: remove some useless #include
    some coding style fix

V3: drop debugging printk's
    update selinux perm table as well

V2: drop patch 1/2, export ifindex directly
    Redesign netlink attributes
    Improve netlink seq check
    Handle IPv6 addr as well

This patch exports bridge multicast database via netlink
message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).

(Thanks to Thomas for patient reviews)

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:32:52 -05:00
Ying Xue
9da3d47587 tipc: eliminate aggregate sk_receive_queue limit
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted.  There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 14:19:52 -05:00
Thomas Graf
45122ca26c sctp: Add RCU protection to assoc->transport_addr_list
peer.transport_addr_list is currently only protected by sk_sock
which is inpractical to acquire for procfs dumping purposes.

This patch adds RCU protection allowing for the procfs readers to
enter RCU read-side critical sections.

Modification of the list continues to be serialized via sk_lock.

V2: Use list_del_rcu() in sctp_association_free() to be safe
    Skip transports marked dead when dumping for procfs

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:15:04 -05:00
Thomas Graf
0b0fe913bf sctp: proc: protect bind_addr->address_list accesses with rcu_read_lock()
address_list is protected via the socket lock or RCU. Since we don't want
to take the socket lock for each assoc we dump in procfs a RCU read-side
critical section must be entered.

V2: Skip local addresses marked as dead

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vyasevic@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:15:04 -05:00
David S. Miller
36f0ffa591 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This pull request is intended for 3.8...

This includes a Bluetooth pull.  Gustavo says:

"A few more patches to 3.8, I hope they can still make it to mainline!
The most important ones are the socket option for the SCO protocol to allow
accept/refuse new connections from userspace. Other than that I added some
fixes and Andrei did more AMP work."

Also, a mac80211 pull.  Johannes says:

"If you think there's any chance this might make it still, please pull my
mac80211-next tree (per below). This contains a relatively large number
of fixes to the previous code, as well as a few small features:
 * VHT association in mac80211
 * some new debugfs files
 * P2P GO powersave configuration
 * masked MAC address verification

The biggest patch is probably the BSS struct changes to use RCU for
their IE buffers to fix potential races. I've not tagged this for stable
because it's pretty invasive and nobody has ever seen any bugs in this
area as far as I know."

Several other drivers get some attention, including ath9k, brcmfmac,
brcmsmac, and a number of others.  Also, Hauke gives us a series that
improves watchdog support for the bcma and ssb busses.  Finally, Bill
Pemberton delivers a group of "remove __dev* attributes" for wireless
drivers -- these generate some "section mismatch" warnings, but Greg
K-H assures me that they will disappear by the time -rc1 is released.

This also includes a pull of the wireless tree to avoid merge
conflicts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:09:44 -05:00
John W. Linville
8024dc1910 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-12-07 13:03:50 -05:00