linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-11 04:54:13 +08:00

Author	SHA1	Message	Date
Marcel Holtmann	1df7b17a87	Bluetooth: Simplify check if L2CAP connection is AMP capable The check if a L2CAP connection is AMP capable was a little bit complicated. This changes the code to make it simpler and more readable. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:26:31 +02:00
Marcel Holtmann	80d58d0b5b	Bluetooth: Move hci_amp_capable() function into L2CAP core The hci_amp_capable() function has only a single user inside the L2CAP core. Instead of exporting the function, place it next to its user. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:25:58 +02:00
Marcel Holtmann	23f0cb41a2	Bluetooth: Remove check for number of AMP controller The number of controllers for the AMP discover response has already been calculated. And since the hci_dev_list lock is held, it can not change. So there is no need for any extra checks. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:25:30 +02:00
Marcel Holtmann	346e7099c2	Bluetooth: Remove pointless inline function The inline function for BR/EDR controller AMP discover response info is rather useless. Just include the code into the function that builds the whole response. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:25:07 +02:00
Marcel Holtmann	536619e86d	Bluetooth: Rename AMP status constants and use them The AMP controller status constants need to be actually used to avoid crypted hardcoded numbers. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:24:30 +02:00
Marcel Holtmann	6ed971ca4f	Bluetooth: Use explicit AMP controller id value for BR/EDR The special AMP controller id 0 is reserved for the BR/EDR controller that has the main link. It is a fixed value and so use a constant for this throughout the code to make it more visible when the handling is for the BR/EDR channel or when it is for the AMP channel. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:23:39 +02:00
Marcel Holtmann	ece6912648	Bluetooth: Separate AMP controller type from HCI device type There are two defined HCI device types. One is for BR/EDR controllers and the other is for AMP controllers. The HCI device type is not the same as the AMP controller type. It just happens that currently the defined types match, but that is not guaranteed. Split the usage of AMP controller type into its own domain so that it is possible to separate between BR/EDR controllers, 802.11 AMP controllers and any other AMP technology that might be defined in the future. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:22:38 +02:00
Marcel Holtmann	f822c411b2	Bluetooth: Remove useless external function to count controllers The list of controllers can be counted ahead of time and inline inside the AMP discover handling. There is no need to export such a function at all. In addition just count the AMP controller and only allocated space for a single mandatory BR/EDR controller. No need to allocate more space than needed. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:21:09 +02:00
Marcel Holtmann	23b9003b9a	Bluetooth: Fix controller list for AMP discover response The AMP discover response should list exactly one BR/EDR controller and ignore all other BR/EDR controller. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-06 10:19:53 +02:00
Johan Hedberg	2210246cf5	Bluetooth: Fix re-enabling advertising after a connection LE controllers will automatically disable advertising whenever they accept a new connection. In order not to fall out of sync with the advertising setting we need to re-enable advertising whenever the last LE connection drops. A failure to re-enable advertising should cause the setting to be disabled, so this patch also calls mgmt_new_settings() when this happens. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-05 03:03:38 -07:00
Johan Hedberg	d2f5a196d7	Bluetooth: Add public mgmt function to send New Settings event A function is needed so that the HCI event processing can ask the mgmt code to emit a new settings event. This is necessary e.g. when the event processing does updates to mgmt related states without any dependency of actual mgmt commands. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-05 03:03:38 -07:00
Johan Hedberg	f3d3444a4d	Bluetooth: Rename HCI_LE_PERIPHERAL to HCI_ADVERTISING This flag is used to indicate whether we want to have advertising enabled or not, so give it a more suitable name. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-05 03:03:38 -07:00
Andre Guedes	46a190cbd3	Bluetooth: Initialize hci_conn fields in hci_connect_le This patch moves some hci_conn fields initialization from hci_le_ create_connection() to hci_connect_le(). It makes more sense to initialize these fields within the function that creates the hci_ conn object. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-03 23:07:17 -07:00
Andre Guedes	f1e5d54743	Bluetooth: Rename hci_conn variable in hci_connect_le() This patch simply rename the hci_conn variable "le" to "conn" since it is a better name. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-03 23:07:17 -07:00
Marcel Holtmann	4f3e219d95	Bluetooth: Only one command per L2CAP LE signalling is supported The Bluetooth specification makes it clear that only one command should be present in the L2CAP LE signalling packet. So tighten the checks here and restrict it to exactly one command. This is different from L2CAP BR/EDR signalling where multiple commands can be part of the same packet. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 16:09:59 +03:00
Marcel Holtmann	92381f5cd7	Bluetooth: Check minimum length of SMP packets When SMP packets are received, make sure they contain at least 1 byte header for the opcode. If not, drop the packet and disconnect the link. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 13:06:41 +03:00
Marcel Holtmann	b99707d7ee	Bluetooth: Drop packets on ATT fixed channel on BR/EDR The ATT fixed channel is only valid when using LE connections. On BR/EDR it is required to go through L2CAP connection oriented channel for ATT. Drop ATT packets when they are received on a BR/EDR connection. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 13:05:36 +03:00
Marcel Holtmann	ae4fd2d374	Bluetooth: L2CAP connectionless channels are only valid for BR/EDR When receiving connectionless packets on a LE connection, just drop the packet. There is no concept of connectionless channels for LE. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 10:13:30 +03:00
Marcel Holtmann	7b9899dbcf	Bluetooth: SMP packets are only valid on LE connections When receiving SMP packets on a BR/EDR connection, then just drop the packet and do not try to process it. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 10:09:12 +03:00
Marcel Holtmann	94b6a09b67	Bluetooth: Don't copy L2CAP LE signalling to raw sockets The L2CAP raw sockets are only used for BR/EDR signalling. Packets on LE links should not be forwarded there. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 10:07:58 +03:00
Marcel Holtmann	a28776296c	Bluetooth: Fix switch statement order for L2CAP fixed channels The switch statement for the various L2CAP fixed channel handlers is not really ordered. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 10:07:29 +03:00
Marcel Holtmann	6203fc9834	Bluetooth: Allow changing device class when BR/EDR is disabled Changing the device class when BR/EDR is disabled has no visible effect for remote devices. However to simplify the logic allow it as long as the controller supports BR/EDR operations. If it is not allowed, then the overall logic becomes rather complicated since the class of device values would need clearing or restoring when BR/EDR setting changes. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 10:05:27 +03:00
Marcel Holtmann	cf99ba1359	Bluetooth: Restrict loading of long term keys to LE capable controllers Loading long term keys into a BR/EDR only controller make no sense. The kernel would never use any of these keys. So instead of allowing userspace to waste memory, reject such operation with a not supported error message. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 09:33:02 +03:00
Marcel Holtmann	9060d5cf52	Bluetooth: Restrict loading of link keys to BR/EDR capable controllers Loading link keys into a LE only controller make no sense. The kernel would never use any of these keys. So instead of allowing userspace to waste memory, reject such operation with a not supported error message. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 09:32:57 +03:00
Marcel Holtmann	62af444319	Bluetooth: Allow setting static address even if LE is disabled Setting the static address does not depend on LE beeing enabled. It only depends on a controller with LE support. When depending on LE enabled this command becomes really complicated since in case LE gets disabled, it would be required to clear the static address and also its random address representation inside the controller. With future support for private addresses such complex setup should be avoided. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 09:29:38 +03:00
Marcel Holtmann	cdba5281b2	Bluetooth: Restrict SSP setting changes to BR/EDR enabled controllers Only when BR/EDR is supported and enabled, allow changing of the SSP setting. Just checking if the hardware supports SSP is not enough since it might be the case that BR/EDR is disabled. In the case that BR/EDR is disabled, but SSP supported by the controller the not supported error message is now returned. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-03 09:20:37 +03:00
Marcel Holtmann	3b1662952e	Bluetooth: Fix memory leak with L2CAP signal channels The wrong type of L2CAP signalling packets on the wrong type of either BR/EDR or LE links need to be dropped. When that happens the packet is dropped, but the memory not freed. So actually free the memory as well. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-10-02 17:17:05 -03:00
Marcel Holtmann	9ab8cf3729	Bluetooth: Increment management interface revision This patch increments the management interface revision due to the various fixes, improvements and other changes that have gone in lately. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 16:24:03 +03:00
Johan Hedberg	11802b299f	Bluetooth: Fix advertising data flags with disabled BR/EDR We shouldn't include the simultaneous LE & BR/EDR flags in the LE advertising data if BR/EDR is disabled on a dual-mode controller. This patch fixes this issue and ensures that the create_ad function generates the correct flags when BR/EDR is disabled. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-02 06:18:18 -07:00
Johan Hedberg	e6fe798652	Bluetooth: Fix REJECTED vs NOT_SUPPORTED mgmt responses The REJECTED management response should mainly be used when the adapter is in a state where we cannot accept some command or a specific parameter value. The NOT_SUPPORTED response in turn means that the adapter really cannot support the command or parameter value. This patch fixes this distinction and adds two helper functions to easily get the appropriate LE or BR/EDR related status response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-02 05:52:51 -07:00
Marcel Holtmann	d13eafce2c	Bluetooth: Add management command for setting static address On dual-mode BR/EDR/LE and LE only controllers it is possible to configure a random address. There are two types or random addresses, one is static and the other private. Since the random private addresses require special privacy feature to be supported, the configuration of these two are kept separate. This command allows for setting the static random address. It is only supported on controllers with LE support. The static random address is suppose to be valid for the lifetime of the controller or at least until the next power cycle. To ensure such behavior, setting of the address is limited to when the controller is powered off. The special BDADDR_ANY address (00:00:00:00:00:00) can be used to disable the static address. This is also the default value. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 14:50:58 +03:00
Marcel Holtmann	a0cdf960be	Bluetooth: Restrict disabling of HS when controller is powered off Disabling the high speed setting when the controller is powered on has too many side effects that are not taken care of. And in general it is not an useful operation anyway. So just make such a command fail with a rejection error message. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 13:51:50 +03:00
Johan Hedberg	0663ca2a03	Bluetooth: Add a new mgmt_set_bredr command This patch introduces a new mgmt command for enabling/disabling BR/EDR functionality. This can be convenient when one wants to make a dual-mode controller behave like a single-mode one. The command is only available for dual-mode controllers and requires that LE is enabled before using it. The BR/EDR setting can be enabled at any point, however disabling it requires the controller to be powered off (otherwise a "rejected" response will be sent). Disabling the BR/EDR setting will automatically disable all other BR/EDR related settings. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-02 03:48:28 -07:00
Johan Hedberg	56f8790102	Bluetooth: Introduce a new HCI_BREDR_ENABLED flag To allow treating dual-mode (BR/EDR/LE) controllers as single-mode ones (LE-only) we want to introduce a new HCI_BREDR_ENABLED flag to track whether BR/EDR is enabled or not (previously we simply looked at the feature bit with lmp_bredr_enabled). This patch add the new flag and updates the relevant places to test against it instead of using lmp_bredr_enabled. The flag is by default enabled when registering an adapter and only cleared if necessary once the local features have been read during the HCI init procedure. We cannot completely block BR/EDR usage in case user space uses raw HCI sockets but the patch tries to block this in places where possible, such as the various BR/EDR specific ioctls. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-02 03:48:28 -07:00
Johan Hedberg	e1d08f4067	Bluetooth: Fix workqueue synchronization in hci_dev_open When hci_sock.c calls hci_dev_open it needs to ensure that there isn't pending work in progress, such as that which is scheduled for the initial setup procedure or the one for automatically powering off after the setup procedure. This adds the necessary calls to ensure that any previously scheduled work is completed before attempting to call hci_dev_do_open. This patch fixes a race with old user space versions where we might receive a HCIDEVUP ioctl before the setup procedure has been completed. When that happens the setup procedures callback may fail early and leave the device in an inconsistent state, causing e.g. the setup callback to be (incorrectly) called more than once. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-01 23:27:08 -07:00
Johan Hedberg	cbed0ca137	Bluetooth: Refactor hci_dev_open to a separate hci_dev_do_open function The requirements of an external call to hci_dev_open from hci_sock.c are different to that from within hci_core.c. In the former case we want to flush any pending work in hdev->req_workqueue whereas in the latter we don't (since there we are already calling from within the workqueue itself). This patch does the necessary refactoring to a separate hci_dev_do_open function (analogous to hci_dev_do_close) but does not yet introduce the synchronizations relating to the workqueue usage. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-10-01 23:27:08 -07:00
Marcel Holtmann	922ca1dfc2	Bluetooth: Enable -D__CHECK_ENDIAN__ for sparse by default The Bluetooth protocol and hardware is pretty much all little endian and so when running sparse via "make C=2" for example, enable the endian checks by default. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 09:10:05 +03:00
Marcel Holtmann	10a8b86f57	Bluetooth: Require CAP_NET_ADMIN for HCI User Channel operation The HCI User Channel operation is an admin operation that puts the device into promiscuous mode for single use. It is more suitable to require CAP_NET_ADMIN than CAP_NET_RAW. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 09:10:04 +03:00
Marcel Holtmann	ee39269369	Bluetooth: Send new settings event when changing high speed option When enabling or disabling high speed setting it is required to send a new settings event to inform other management interface users about the changed settings. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 09:10:01 +03:00
Marcel Holtmann	848566b381	Bluetooth: Provide high speed configuration option Hiding the Bluetooth high speed support behind a module parameter is not really useful. This can be enabled and disabled at runtime via the management interface. This also has the advantage that this can now be changed per controller and not just global. This patch removes the module parameter and exposes the high speed setting of the management interface to all controllers. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 09:09:59 +03:00
Marcel Holtmann	60f2a3ed7b	Bluetooth: Use only 2 bits for controller type information The controller type is limited to BR/EDR/LE and AMP controllers. This can be easily encoded with just 2 bits and still leave enough room for future controller types. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2013-10-02 09:09:54 +03:00
Gustavo Padovan	1025c04cec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Conflicts: net/bluetooth/hci_core.c	2013-09-27 11:56:14 -03:00
Johan Hedberg	4375f1037d	Bluetooth: Add new mgmt_set_advertising command This patch adds a new mgmt command for enabling and disabling LE advertising. The command depends on the LE setting being enabled first and will return a "rejected" response otherwise. The patch also adds safeguards so that there will ever only be one set_le or set_advertising command pending per adapter. The response handling and new_settings event sending is done in an asynchronous request callback, meaning raw HCI access from user space to enable advertising (e.g. hciconfig leadv) will not trigger the new_settings event. This is intentional since trying to support mixed raw HCI and mgmt access would mean adding extra state tracking or new helper functions, essentially negating the benefit of using the asynchronous request framework. The HCI_LE_ENABLED and HCI_LE_PERIPHERAL flags however are updated correctly even with raw HCI access so this will not completely break subsequent access over mgmt. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	eeca6f8913	Bluetooth: Add new mgmt setting for LE advertising This patch adds a new mgmt setting for LE advertising and hooks up the necessary places in the mgmt code to operate on the HCI_LE_PERIPHERAL flag (which corresponds to this setting). This patch does not yet add any new command for enabling the setting - that is left for a subsequent patch. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	416a4ae56b	Bluetooth: Use async request for LE enable/disable This patch updates the code to use an asynchronous request for handling the enabling and disabling of LE support. This refactoring is necessary as a preparation for adding advertising support, since when LE is disabled we should also disable advertising, and the cleanest way to do this is to perform the two respective HCI commands in the same asynchronous request. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	bd99abdd5b	Bluetooth: Move mgmt response convenience functions to a better location The settings_rsp and cmd_status_rsp functions can be useful for all mgmt command handlers when asynchronous request callbacks are used. They will e.g. be used by subsequent patches to change set_le to use an async request as well as a new set_advertising command. Therefore, move them higher up in the mgmt.c file to avoid unnecessary forward declarations or mixing this trivial change with other patches. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	87b95ba64e	Bluetooth: Fix busy return for mgmt_set_powered in some cases We should return a "busy" error always when there is another mgmt_set_powered operation in progress. Previously when powering on while the auto off timer was still set the code could have let two or more pending power on commands to be queued. This patch fixes the issue by moving the check for duplicate commands to an earlier point in the set_powered handler. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	970871bc9c	Bluetooth: Clean up socket locking in l2cap_sock_recvmsg This patch cleans up the locking login in l2cap_sock_recvmsg by pairing up each lock_sock call with a release_sock call. The function already has a "done" label that handles releasing the socket and returning from the function so the fix is rather simple. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	0fba96f97b	Bluetooth: Add clarifying comment to bt_sock_wait_state() The bt_sock_wait_state requires the sk lock to be held (through lock_sock) so document it clearly in the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Peter Senna Tschudin	941247f910	Bluetooth: Fix assignment of 0/1 to bool variables Convert 0 to false and 1 to true when assigning values to bool variables. Inspired by commit `3db1cd5c05`. The simplified semantic patch that find this problem is as follows (http://coccinelle.lip6.fr/): @@ bool b; @@ ( -b = 0 +b = false \| -b = 1 +b = true ) Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-22 16:59:14 -05:00
Gianluca Anzolin	29cd718beb	Bluetooth: don't release the port in rfcomm_dev_state_change() When the dlc is closed, rfcomm_dev_state_change() tries to release the port in the case it cannot get a reference to the tty. However this is racy and not even needed. Infact as Peter Hurley points out: 1. Only consider dlcs that are 'stolen' from a connected socket, ie. reused. Allocated dlcs cannot have been closed prior to port activate and so for these dlcs a tty reference will always be avail in rfcomm_dev_state_change() -- except for the conditions covered by #2b below. 2. If a tty was at some point previously created for this rfcomm, then either (a) the tty reference is still avail, so rfcomm_dev_state_change() will perform a hangup. So nothing to do, or, (b) the tty reference is no longer avail, and the tty_port will be destroyed by the last tty_port_put() in rfcomm_tty_cleanup. Again, no action required. 3. Prior to obtaining the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to do here. 4. After releasing the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a tty reference could not be obtained. Again, the best thing to do here is nothing. Any future attempted open() will block on rfcomm_dev_carrier_raised(). The unconnected device will exist until released by ioctl(RFCOMMRELEASEDEV). The patch removes the aforementioned code and uses the tty_port_tty_hangup() helper to hangup the tty. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-20 14:17:54 -05:00
Johan Hedberg	d62e6d67a7	Bluetooth: Add event mask page 2 setting support For those controller that support the HCI_Set_Event_Mask_Page_2 command we should include it in the init sequence. This patch implements sending of the command and enables the events in it based on supported features (currently only CSB is checked). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-19 10:21:44 -05:00
Johan Hedberg	5d4e7e8db0	Bluetooth: Add synchronization train parameters reading support This patch adds support for reading the synchronization train parameters for controllers that support the feature. Since the feature is detectable through the local features page 2, which is retreived only in stage 3 of the HCI init sequence, there is no other option than to add a fourth stage to the init sequence. For now the patch doesn't yet add storing of the parameters, but it is nevertheless convenient to have around to see what kind of parameters various controllers use by default (analyzable e.g. with the btmon user space tool). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-19 10:20:07 -05:00
Johan Hedberg	e793dcf082	Bluetooth: Fix waiting for clearing of BT_SK_SUSPEND flag In the case of blocking sockets we should not proceed with sendmsg() if the socket has the BT_SK_SUSPEND flag set. So far the code was only ensuring that POLLOUT doesn't get set for non-blocking sockets using poll() but there was no code in place to ensure that blocking sockets do the right thing when writing to them. This patch adds a new bt_sock_wait_ready helper function to sleep in the sendmsg call if the BT_SK_SUSPEND flag is set, and wake up as soon as it is unset. It also updates the L2CAP and RFCOMM sendmsg callbacks to take advantage of this new helper function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 17:02:59 -05:00
Johan Hedberg	69c4e4e8b4	Bluetooth: Fix responding to invalid L2CAP signaling commands When we have an LE link we should not respond to any data on the BR/EDR L2CAP signaling channel (0x0001) and vice-versa when we have a BR/EDR link we should not respond to LE L2CAP (CID 0x0005) signaling commands. This patch fixes this issue by checking for a valid link type and ignores data if it is wrong. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:50:53 -05:00
Johan Hedberg	9245e73758	Bluetooth: Fix sending responses to identified L2CAP response packets When L2CAP packets return a non-zero error and the value is passed onwards by l2cap_bredr_sig_cmd this will trigger a command reject packet to be sent. However, the core specification (page 1416 in core 4.0) says the following: "Command Reject packets should not be sent in response to an identified Response packet.". This patch ensures that a command reject packet is not sent for any identified response packet by ignoring the error return value from the response handler functions. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:48:32 -05:00
Johan Hedberg	7c2005d6f9	Bluetooth: Fix L2CAP command reject reason There are several possible reason codes that can be sent in the command reject L2CAP packet. Before this patch the code has used a hard-coded single response code ("command not understood"). This patch adds a helper function to map the return value of an L2CAP handler function to the correct command reject reason. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:45:28 -05:00
Johan Hedberg	c4ea249f5f	Bluetooth: Fix L2CAP Disconnect response for unknown CID If we receive an L2CAP Disconnect Request for an unknown CID we should not just silently drop it but reply with a proper Command Reject response. This patch fixes this by ensuring that the disconnect handler returns a proper error instead of 0 and will cause the function caller to send the right response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:44:32 -05:00
Johan Hedberg	21870b523e	Bluetooth: Fix L2CAP error return used for failed channel lookups The EFAULT error should only be used for memory address related errors and ENOENT might be needed for other purposes than invalid CID errors. This patch fixes the l2cap_config_req, l2cap_connect_create_rsp and l2cap_create_channel_req handlers to use the unique EBADSLT error to indicate failed lookups on a given CID. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:43:40 -05:00
Johan Hedberg	dc280801da	Bluetooth: Fix double error response for l2cap_create_chan_req When an L2CAP request handler returns non-zero the calling code will send a command reject response. The l2cap_create_chan_req function will in some cases send its own response but then still return a -EFAULT error which would cause two responses to be sent. This patch fixes this by making the function return 0 after sending its own response. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 16:41:07 -05:00
Johan Hedberg	bf5430360e	Bluetooth: Fix rfkill functionality during the HCI setup stage We need to let the setup stage complete cleanly even when the HCI device is rfkilled. Otherwise the HCI device will stay in an undefined state and never get notified to user space through mgmt (even when it gets unblocked through rfkill). This patch makes sure that hci_dev_open() can be called in the HCI_SETUP stage, that blocking the device doesn't abort the setup stage, and that the device gets proper powered down as soon as the setup stage completes in case it was blocked meanwhile. The bug that this patch fixed can be very easily reproduced using e.g. the rfkill command line too. By running "rfkill block all" before inserting a Bluetooth dongle the resulting HCI device goes into a state where it is never announced over mgmt, not even when "rfkill unblock all" is run. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 12:39:23 -05:00
Johan Hedberg	5e130367d4	Bluetooth: Introduce a new HCI_RFKILLED flag This makes it more convenient to check for rfkill (no need to check for dev->rfkill before calling rfkill_blocked()) and also avoids potential races if the RFKILL state needs to be checked from within the rfkill callback. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-18 12:37:27 -05:00
Syam Sidhardhan	330b6c1521	Bluetooth: Fix ACL alive for long in case of non pariable devices For certain devices (ex: HID mouse), support for authentication, pairing and bonding is optional. For such devices, the ACL alive for too long after the L2CAP disconnection. To avoid the ACL alive for too long after L2CAP disconnection, reset the ACL disconnect timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect. While merging the commit id:a9ea3ed9b71cc3271dd59e76f65748adcaa76422 this issue might have introduced. Hcidump info: sh-4.1# /opt/hcidump -Xt 2013-08-05 16:49:00.894129 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.894195 < HCI Command: Exit Sniff Mode (0x02\|0x0004) plen 2 handle 12 2013-08-05 16:49:00.894269 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.895645 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02\|0x0004) status 0x00 ncmd 1 2013-08-05 16:49:00.934391 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:00.936592 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 12 packets 2 2013-08-05 16:49:00.951577 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.952820 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.969165 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:49:48.175533 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:48.219045 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 108 Mode: Sniff 2013-08-05 16:51:00.968209 < HCI Command: Disconnect (0x01\|0x0006) plen 3 handle 12 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:51:00.969056 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01\|0x0006) status 0x00 ncmd 1 2013-08-05 16:51:01.013495 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:51:01.073777 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 12 reason 0x16 Reason: Connection Terminated by Local Host ============================ After fix ================================ 2013-08-05 16:57:35.986648 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004c scid 0x0041 2013-08-05 16:57:35.986713 < HCI Command: Exit Sniff Mode (0x02\|0x0004) plen 2 handle 11 2013-08-05 16:57:35.986785 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004b scid 0x0040 2013-08-05 16:57:35.988110 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02\|0x0004) status 0x00 ncmd 1 2013-08-05 16:57:36.030714 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:36.032950 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 11 packets 2 2013-08-05 16:57:36.047926 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004c scid 0x0041 2013-08-05 16:57:36.049200 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004b scid 0x0040 2013-08-05 16:57:36.065509 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:57:40.052006 < HCI Command: Disconnect (0x01\|0x0006) plen 3 handle 11 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:57:40.052869 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01\|0x0006) status 0x00 ncmd 1 2013-08-05 16:57:40.104731 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:40.146935 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 11 reason 0x16 Reason: Connection Terminated by Local Host Signed-off-by: Sang-Ki Park <sangki79.park@samsung.com> Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com> Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:41:02 -03:00
Andre Guedes	89cbb4da0a	Bluetooth: Fix encryption key size for peripheral role This patch fixes the connection encryption key size information when the host is playing the peripheral role. We should set conn->enc_key_ size in hci_le_ltk_request_evt, otherwise it is left uninitialized. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:36:56 -03:00
Andre Guedes	f8776218e8	Bluetooth: Fix security level for peripheral role While playing the peripheral role, the host gets a LE Long Term Key Request Event from the controller when a connection is established with a bonded device. The host then informs the LTK which should be used for the connection. Once the link is encrypted, the host gets an Encryption Change Event. Therefore we should set conn->pending_sec_level instead of conn-> sec_level in hci_le_ltk_request_evt. This way, conn->sec_level is properly updated in hci_encrypt_change_evt. Moreover, since we have a LTK associated to the device, we have at least BT_SECURITY_MEDIUM security level. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:36:55 -03:00
Marcel Holtmann	52de599e04	Bluetooth: Only schedule raw queue when user channel is active When the user channel is set and an user application has full control over the device, do not bother trying to schedule any queues except the raw queue. This is an optimization since with user channel, only the raw queue is in use. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	a675d7f1a0	Bluetooth: Use GFP_KERNEL when cloning SKB in a workqueue There is no need to use GFP_ATOMIC with skb_clone() when the code is executed in a workqueue. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	af750e942e	Bluetooth: Disable upper layer connections when user channel is active When the device has the user channel flag set, it means it is driven by an user application. In that case do not allow any connections from L2CAP or SCO sockets. This is the same situation as when the device has the raw flag set and it will then return EHOSTUNREACH. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:56 -03:00
Marcel Holtmann	23500189d7	Bluetooth: Introduce new HCI socket channel for user operation This patch introcuces a new HCI socket channel that allows user applications to take control over a specific HCI device. The application gains exclusive access to this device and forces the kernel to stay away and not manage it. In case of the management interface it will actually hide the device. Such operation is useful for security testing tools that need to operate underneath the Bluetooth stack and need full control over a device. The advantage here is that the kernel still provides the service of hardware abstraction and HCI level access. The use of Bluetooth drivers for hardware access also means that sniffing tools like btmon or hcidump are still working and the whole set of transaction can be traced with existing tools. With the new channel it is possible to send HCI commands, ACL and SCO data packets and receive HCI events, ACL and SCO packets from the device. The format follows the well established H:4 protocol. The new HCI user channel can only be established when a device has been through its setup routine and is currently powered down. This is enforced to not cause any problems with current operations. In addition only one user channel per HCI device is allowed. It is exclusive access for one user application. Access to this channel is limited to process with CAP_NET_RAW capability. Using this new facility does not require any external library or special ioctl or socket filters. Just create the socket and bind it. After that the file descriptor is ready to speak H:4 protocol. struct sockaddr_hci addr; int fd; fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI); memset(&addr, 0, sizeof(addr)); addr.hci_family = AF_BLUETOOTH; addr.hci_dev = 0; addr.hci_channel = HCI_CHANNEL_USER; bind(fd, (struct sockaddr *) &addr, sizeof(addr)); The example shows on how to create a user channel for hci0 device. Error handling has been left out of the example. However with the limitations mentioned above it is advised to handle errors. Binding of the user cahnnel socket can fail for various reasons. Specifically if the device is currently activated by BlueZ or if the access permissions are not present. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	0736cfa8e5	Bluetooth: Introduce user channel flag for HCI devices This patch introduces a new user channel flag that allows to give full control of a HCI device to a user application. The kernel will stay away from the device and does not allow any further modifications of the device states. The existing raw flag is not used since it has a bit of unclear meaning due to its legacy. Using a new flag makes the code clearer. A device with the user channel flag set can still be enumerate using the legacy API, but it does not longer enumerate using the new management interface used by BlueZ 5 and beyond. This is intentional to not confuse users of modern systems. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	c1c4f95670	Bluetooth: Restrict ioctls to HCI raw channel sockets The various legacy ioctls used with HCI sockets are limited to raw channel only. They are not used on the other channels and also have no meaning there. So return an error if tried to use them. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	c2371e80b3	Bluetooth: Fix error handling for HCI socket options The HCI sockets for monitor and control do not support any HCI specific socket options and if tried, an error will be returned. However the error used is EINVAL and that is not really descriptive. To make it clear that these sockets are not handling HCI socket options, return EBADFD instead. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	808a049e26	Bluetooth: Report error for HCI reset ioctl when device is down Even if this is legacy API, there is no reason to not report a proper error when trying to reset a HCI device that is down. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:55 -03:00
Marcel Holtmann	9d4b68b239	Bluetooth: Fix handling of getsockname() for HCI sockets The hci_dev check is not protected and so move it into the socket lock. In addition return the HCI channel identifier instead of always 0 channel. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Marcel Holtmann	06f43cbc4d	Bluetooth: Fix handling of getpeername() for HCI sockets The HCI sockets do not have a peer associated with it and so make sure that getpeername() returns EOPNOTSUPP since this operation is actually not supported on HCI sockets. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Marcel Holtmann	f81fe64f3d	Bluetooth: Refactor raw socket filter into more readable code The handling of the raw socket filter is rather obscure code and it gets in the way of future extensions. Instead of inline filtering in the raw socket packet routine, refactor it into its own function. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-16 14:35:54 -03:00
Linus Torvalds	c7c4591db6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users	2013-09-07 14:35:32 -07:00
Linus Torvalds	0ffb01d9de	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "A quick set of fixes, some to deal with fallout from yesterday's net-next merge. 1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled, from Dmitry Kravkov. 2) Fix a bnx2x regression caused by one of Dave Jones's mistaken braces changes, from Eilon Greenstein. 3) Add some protective filtering in the netlink tap code, from Daniel Borkmann. 4) Fix TCP congestion window growth regression after timeouts, from Yuchung Cheng. 5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: tcp: properly increase rcv_ssthresh for ofo packets net: add documentation for BQL helpers mlx5: remove unused MLX5_DEBUG param in Kconfig bnx2x: Restore a call to config_init bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set tcp: fix no cwnd growth after timeout net: netlink: filter particular protocols from analyzers	2013-09-07 14:27:46 -07:00
Eric Dumazet	4e4f1fc226	tcp: properly increase rcv_ssthresh for ofo packets TCP receive window handling is multi staged. A socket has a memory budget, static or dynamic, in sk_rcvbuf. Because we do not really know how this memory budget translates to a TCP window (payload), TCP announces a small initial window (about 20 MSS). When a packet is received, we increase TCP rcv_win depending on the payload/truesize ratio of this packet. Good citizen packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2 This heuristic takes place in tcp_grow_window() Problem is : We currently call tcp_grow_window() only for in-order packets. This means that reorders or packet losses stop proper grow of rcv_win, and senders are unable to benefit from fast recovery, or proper reordering level detection. Really, a packet being stored in OFO queue is not a bad citizen. It should be part of the game as in-order packets. In our traces, we very often see sender is limited by linux small receive windows, even if linux hosts use autotuning (DRS) and should allow rcv_win to grow to ~3MB. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:49 -04:00
Yuchung Cheng	16edfe7ee0	tcp: fix no cwnd growth after timeout In commit `0f7cc9a3` "tcp: increase throughput when reordering is high", it only allows cwnd to increase in Open state. This mistakenly disables slow start after timeout (CA_Loss). Moreover cwnd won't grow if the state moves from Disorder to Open later in tcp_fastretrans_alert(). Therefore the correct logic should be to allow cwnd to grow as long as the data is received in order in Open, Loss, or even Disorder state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:49 -04:00
Daniel Borkmann	5ffd5cddd4	net: netlink: filter particular protocols from analyzers Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-06 14:43:48 -04:00
Linus Torvalds	2e515bf096	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...	2013-09-06 09:36:28 -07:00
Linus Torvalds	22e04f6b4b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: "Highlights: - conversion of HID subsystem to use devm-based resource management, from Benjamin Tissoires - i2c-hid support for DT bindings, from Benjamin Tissoires - much improved support for Win8-multitouch devices, from Benjamin Tissoires - cleanup of core code using common hidinput_input_event(), from David Herrmann - fix for bug in implement() access to the bit stream (causing oops) that has been present in the code for ages, but devices that are able to trigger it have started to appear only now, from Jiri Kosina - fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896, CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially crafted malicious HW devices plugged into the system), from Kees Cook - hidraw oops fix, from Manoj Chourasia - various smaller fixes here and there, support for a bunch of new devices by various contributors" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits) HID: MAINTAINERS: add roccat drivers HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup HID: hid-sensor-hub: move to devm_kzalloc HID: hid-sensor-hub: fix indentation accross the code HID: move HID_REPORT_TYPES closer to the report-definitions HID: check for NULL field when setting values HID: picolcd_core: validate output report details HID: sensor-hub: validate feature report details HID: ntrig: validate feature report details HID: pantherlord: validate output report details HID: hid-wiimote: print small buffers via %*phC HID: uhid: improve uhid example client HID: Correct the USB IDs for the new Macbook Air 6 HID: wiimote: add support for Guitar-Hero guitars HID: wiimote: add support for Guitar-Hero drums Input: introduce BTN/ABS bits for drums and guitars HID: battery: don't do DMA from stack HID: roccat: add support for KonePureOptical v2 HID: picolcd: Prevent NULL pointer dereference on _remove() HID: usbhid: quirk for N-Trig DuoSense Touch Screen ...	2013-09-06 09:30:36 -07:00
Linus Torvalds	cc998ff881	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: "Noteworthy changes this time around: 1) Multicast rejoin support for team driver, from Jiri Pirko. 2) Centralize and simplify TCP RTT measurement handling in order to reduce the impact of bad RTO seeding from SYN/ACKs. Also, when both timestamps and local RTT measurements are available prefer the later because there are broken middleware devices which scramble the timestamp. From Yuchung Cheng. 3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel memory consumed to queue up unsend user data. From Eric Dumazet. 4) Add a "physical port ID" abstraction for network devices, from Jiri Pirko. 5) Add a "suppress" operation to influence fib_rules lookups, from Stefan Tomanek. 6) Add a networking development FAQ, from Paul Gortmaker. 7) Extend the information provided by tcp_probe and add ipv6 support, from Daniel Borkmann. 8) Use RCU locking more extensively in openvswitch data paths, from Pravin B Shelar. 9) Add SCTP support to openvswitch, from Joe Stringer. 10) Add EF10 chip support to SFC driver, from Ben Hutchings. 11) Add new SYNPROXY netfilter target, from Patrick McHardy. 12) Compute a rate approximation for sending in TCP sockets, and use this to more intelligently coalesce TSO frames. Furthermore, add a new packet scheduler which takes advantage of this estimate when available. From Eric Dumazet. 13) Allow AF_PACKET fanouts with random selection, from Daniel Borkmann. 14) Add ipv6 support to vxlan driver, from Cong Wang" Resolved conflicts as per discussion. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits) openvswitch: Fix alignment of struct sw_flow_key. netfilter: Fix build errors with xt_socket.c tcp: Add missing braces to do_tcp_setsockopt caif: Add missing braces to multiline if in cfctrl_linkup_request bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize vxlan: Fix kernel panic on device delete. net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls net: mvneta: properly disable HW PHY polling and ensure adjust_link() works icplus: Use netif_running to determine device state ethernet/arc/arc_emac: Fix huge delays in large file copies tuntap: orphan frags before trying to set tx timestamp tuntap: purge socket error queue on detach qlcnic: use standard NAPI weights ipv6:introduce function to find route for redirect bnx2x: VF RSS support - VF side bnx2x: VF RSS support - PF side vxlan: Notify drivers for listening UDP port changes net: usbnet: update addr_assign_type if appropriate driver/net: enic: update enic maintainers and driver driver/net: enic: Exposing symbols for Cisco's low latency driver ...	2013-09-05 14:54:29 -07:00
Jesse Gross	0d40f75bda	openvswitch: Fix alignment of struct sw_flow_key. sw_flow_key alignment was declared as " __aligned(__alignof__(long))". However, this breaks on the m68k architecture where long is 32 bit in size but 16 bit aligned by default. This aligns to the size of a long to ensure that we can always do comparsions in full long-sized chunks. It also adds an additional build check to catch any reduction in alignment. CC: Andy Zhou <azhou@nicira.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 15:54:37 -04:00
David S. Miller	06c54055be	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:58:52 -04:00
David S. Miller	1a5bbfc3d6	netfilter: Fix build errors with xt_socket.c As reported by Randy Dunlap: ==================== when CONFIG_IPV6=m and CONFIG_NETFILTER_XT_MATCH_SOCKET=y: net/built-in.o: In function `socket_mt6_v1_v2': xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `socket_mt_init': xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable' ==================== Like several other modules under net/netfilter/ we have to have a dependency "IPV6 disabled or set compatibly with this module" clause. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:38:03 -04:00
Dave Jones	e2e5c4c07c	tcp: Add missing braces to do_tcp_setsockopt Signed-off-by: Dave Jones <davej@fedoraproject.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:31:02 -04:00
Dave Jones	0c1db731bf	caif: Add missing braces to multiline if in cfctrl_linkup_request The indentation here implies this was meant to be a multi-line if. Introduced several years back in commit `c85c2951d4` ("caif: Handle dev_queue_xmit errors.") Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:31:02 -04:00
Duan Jiong	b55b76b221	ipv6:introduce function to find route for redirect RFC 4861 says that the IP source address of the Redirect is the same as the current first-hop router for the specified ICMP Destination Address, so the gateway should be taken into consideration when we find the route for redirect. There was once a check in commit `a6279458c5` ("NDISC: Search over all possible rules on receipt of redirect.") and the check went away in commit `b94f1c0904` ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect()"). The bug is only "exploitable" on layer-2 because the source address of the redirect is checked to be a valid link-local address but it makes spoofing a lot easier in the same L2 domain nonetheless. Thanks very much for Hannes's help. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:44:31 -04:00
Linus Lüssing	3c3769e633	bridge: apply multicast snooping to IPv6 link-local, too The multicast snooping code should have matured enough to be safely applicable to IPv6 link-local multicast addresses (excluding the link-local all nodes address, ff02::1), too. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:53 -04:00
Linus Lüssing	8fad9c39f3	bridge: prevent flooding IPv6 packets that do not have a listener Currently if there is no listener for a certain group then IPv6 packets for that group are flooded on all ports, even though there might be no host and router interested in it on a port. With this commit they are only forwarded to ports with a multicast router. Just like commit `bd4265fe36` ("bridge: Only flood unregistered groups to routers") did for IPv4, let's do the same for IPv6 with the same reasoning. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:41 -04:00
Linus Torvalds	27703bb4a6	PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+ XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT BiHX6YFWtDDqRlVv8Q0F =LeaW -----END PGP SIGNATURE----- Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO	2013-09-04 17:31:11 -07:00
Daniel Borkmann	b4af8def5c	net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it more readable. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	2b7c121f82	net: ipv6: mld: refactor query processing into v1/v2 functions Make igmp6_event_query() a bit easier to read by refactoring code parts into mld_process_v1() and mld_process_v2(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	cc7f7ab758	net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 Similarly as we do in MLDv2 queries, set a forged MLDv1 query with 0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is eventually done in igmp6_group_queried() anyway, so we can simplify a check there. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Daniel Borkmann	58c0ecfd8d	net: ipv6: mld: implement RFC3810 MLDv2 mode only RFC3810, 10. Security Considerations says under subsection 10.1. Query Message: A forged Version 1 Query message will put MLDv2 listeners on that link in MLDv1 Host Compatibility Mode. This scenario can be avoided by providing MLDv2 hosts with a configuration option to ignore Version 1 messages completely. Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic: echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version or echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version Note that <all> device has a higher precedence as it was previously also the case in the macro MLD_V1_SEEN() that would "short-circuit" if condition on <all> case. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	e3f5b17047	net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation Get rid of MLDV2_MRC and use our new macros for mantisse and exponent to calculate Maximum Response Delay out of the Maximum Response Code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	6c567b78c8	net: ipv6: mld: clean up MLD_V1_SEEN macro Replace the macro with a function to make it more readable. GCC will eventually decide whether to inline this or not (also, that's not fast-path anyway). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	89225d1ce6	net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. i) RFC3810, 9.2. Query Interval [QI] says: The Query Interval variable denotes the interval between General Queries sent by the Querier. Default value: 125 seconds. [...] ii) RFC3810, 9.3. Query Response Interval [QRI] says: The Maximum Response Delay used to calculate the Maximum Response Code inserted into the periodic General Queries. Default value: 10000 (10 seconds) [...] The number of seconds represented by the [Query Response Interval] must be less than the [Query Interval]. iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says: The Older Version Querier Present Timeout is the time-out for transitioning a host back to MLDv2 Host Compatibility Mode. When an MLDv1 query is received, MLDv2 hosts set their Older Version Querier Present Timer to [Older Version Querier Present Timeout]. This value MUST be ([Robustness Variable] times (the [Query Interval] in the last Query received)) plus ([Query Response Interval]). Hence, on default the timeout results in: [RV] = 2, [QI] = 125sec, [QRI] = 10sec [OVQPT] = [RV] * [QI] + [QRI] = 260sec Having that said, we currently calculate [OVQPT] (here given as 'switchback' variable) as ... switchback = (idev->mc_qrv + 1) * max_delay RFC3810, 9.12. says "the [Query Interval] in the last Query received". In section "9.14. Configuring timers", it is said: This section is meant to provide advice to network administrators on how to tune these settings to their network. Ambitious router implementations might tune these settings dynamically based upon changing characteristics of the network. [...] iv) RFC38010, 9.14.2. Query Interval: The overall level of periodic MLD traffic is inversely proportional to the Query Interval. A longer Query Interval results in a lower overall level of MLD traffic. The value of the Query Interval MUST be equal to or greater than the Maximum Response Delay used to calculate the Maximum Response Code inserted in General Query messages. I assume that was why switchback is calculated as is (3 * max_delay), although this setting seems to be meant for routers only to configure their [QI] interval for non-default intervals. So usage here like this is clearly wrong. Concluding, the current behaviour in IPv6's multicast code is not conform to the RFC as switch back is calculated wrongly. That is, it has a too small value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs instead of ~260secs on default. Hence, introduce necessary helper functions and fix this up properly as it should be. Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu who did initial testing. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: David Stevens <dlstevens@us.ibm.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	3a1c756590	net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv In tcp_v6_do_rcv() code, when processing pkt options, we soley work on our skb clone opt_skb that we've created earlier before entering tcp_rcv_established() on our way. However, only in condition ... if (np->rxopt.bits.rxtclass) np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); ... we work on skb itself. As we extract every other information out of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can already be released by tcp_rcv_established() earlier on. When we try to access it in ipv6_hdr(), we will dereference freed skb. [ Bug added by commit `4c507d2897` ("net: implement IP_RECVTOS for IP_PKTOPTIONS") ] Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:44:41 -04:00
Yuchung Cheng	52f20e655d	tcp: better comments for RTO initiallization Commit 1b7fdd2ab585("tcp: do not use cached RTT for RTT estimation") removes important comments on how RTO is initialized and updated. Hopefully this patch puts those information back. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:41:55 -04:00
Thomas Graf	25a6e6b84f	ipv6: Don't depend on per socket memory for neighbour discovery messages Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:37:41 -04:00
Hannes Frederic Sowa	639739b5e6	ipv6: fix null pointer dereference in __ip6addrlbl_add Commit `b67bfe0d42` ("hlist: drop the node parameter from iterators") changed the behavior of hlist_for_each_entry_safe to leave the p argument NULL. Fix this up by tracking the last argument. Reported-by: Michele Baldessari <michele@acksyn.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:14:53 -04:00
Alexander Sverdlin	c08751c851	net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 Initially the problem was observed with ipsec, but later it became clear that SCTP data chunk fragmentation algorithm has problems with MTU values which are not multiple of 4. Test program was used which just transmits 2000 bytes long packets to other host. tcpdump was used to observe re-fragmentation in IP layer after SCTP already fragmented data chunks. With MTU 1500: 12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500) 10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0] 12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596) 10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0] 12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] With MTU 1499: 13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492) 10.151.38.153.39084 > 10.151.24.91.54321: sctp[\|sctp] 13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28) 10.151.38.153 > 10.151.24.91: ip-proto-132 13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600) 10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0] 13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] Here problem in data portion limit calculation leads to re-fragmentation in IP, which is sub-optimal. The problem is max_data initial value, which doesn't take into account the fact, that data chunk must be padded to 4-bytes boundary. It's enough to correct max_data, because all later adjustments are correctly aligned to 4-bytes boundary. After the fix is applied, everything is fragmented correctly for uneven MTUs: 15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496) 10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0] 15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600) 10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0] 15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48) 10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0] The bug was there for years already, but - is a performance issue, the packets are still transmitted - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438) Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 13:20:27 -04:00
David S. Miller	48f8e0af86	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following batch contains: * Three fixes for the new synproxy target available in your net-next tree, from Jesper D. Brouer and Patrick McHardy. * One fix for TCPMSS to correctly handling the fragmentation case, from Phil Oester. I'll pass this one to -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 12:28:02 -04:00
Phil Oester	1205e1fa61	netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet In commit `b396966c4` (netfilter: xt_TCPMSS: Fix missing fragmentation handling), I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan of Project N56U correctly points out that returning XT_CONTINUE in this function does not work. The callers (tcpmss_tg[46]) expect to receive a value of 0 in order to return XT_CONTINUE. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 14:20:03 +02:00
Jesper Dangaard Brouer	7cc9eb6ef7	netfilter: SYNPROXY: let unrelated packets continue Packets reaching SYNPROXY were default dropped, as they were most likely invalid (given the recommended state matching). This patch, changes SYNPROXY target to let packets, not consumed, continue being processed by the stack. This will be more in line other target modules. As it will allow more flexible configurations of handling, logging or matching on packets in INVALID states. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:44:23 +02:00
Patrick McHardy	f4de4c89d8	netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length() With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init: [ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]() The reason is that the conntrack template is set to confirmed before adding the extension and it is invalid to add extensions to already confirmed conntracks. Fix by adding the extensions before setting the conntrack to confirmed. Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:43:36 +02:00
Jesper Dangaard Brouer	775ada6d9f	netfilter: more strict TCP flag matching in SYNPROXY Its seems Patrick missed to incoorporate some of my requested changes during review v2 of SYNPROXY netfilter module. Which were, to avoid SYN+ACK packets to enter the path, meant for the ACK packet from the client (from the 3WHS). Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets that didn't exclude the ACK flag. Go a step further with SYN packet/flag matching by excluding flags ACK+FIN+RST, in both IPv4 and IPv6 modules. The intented usage of SYNPROXY is as follows: (gracefully describing usage in commit) iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \ -j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose This does filter SYN flags early, for packets in the UNTRACKED state, but packets in the INVALID state with other TCP flags could still reach the module, thus this stricter flag matching is still needed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-04 11:43:11 +02:00
Jiri Kosina	efd15f5f4f	Merge branch 'master' into for-3.12/upstream Sync with Linus' tree to be able to apply fixup patch on top of `9d9a04ee75` ("HID: apple: Add support for the 2013 Macbook Air") Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-09-04 10:49:57 +02:00
Vijay Subramanian	c995ae2259	tcp: Change return value of tcp_rcv_established() tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit `0c24604b` (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:28 -04:00
Daniel Borkmann	cc8c6c1b21	net: tcp_probe: adapt tbuf size for recent changes With recent changes in tcp_probe module (e.g. `f925d0a62d` ("net: tcp_probe: add IPv6 support")) we also need to take into account that tbuf needs to be updated as format string will be further expanded. tbuf sits on the stack in tcpprobe_read() function that is invoked when user space reads procfs file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established(). Having a size similarly as in sctp_probe module of 256 bytes is fully sufficient for that, we need theoretical maximum of 252 bytes otherwise we could get truncated. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:28 -04:00
Dan Carpenter	80aa4e1096	x25: add a sanity check parsing X.25 facilities This was found with a manual audit and I don't have a reproducer. We limit ->calling_len and ->called_len when we get them from copy_from_user() in x25_ioctl() so when they come from skb->data then we should cap them there as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:27 -04:00
Veaceslav Falico	82476b3160	net: correctly interlink lower/upper devices Currently we're linking upper devices to lower ones, which results in upside-down relationship: upper devices seeing lower devices via its upper lists. Fix this by correctly linking lower devices to the upper ones. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:26 -04:00
Nicolas Dichtel	ea23192e8e	tunnels: harmonize cleanup done on skb on rx path The goal of this patch is to harmonize cleanup done on a skbuff on rx path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:26 -04:00
Nicolas Dichtel	963a88b31d	tunnels: harmonize cleanup done on skb on xmit path The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	8b27f27797	skb: allow skb_scrub_packet() to be used by tunnels This function was only used when a packet was sent to another netns. Now, it can also be used after tunnel encapsulation or decapsulation. Only skb_orphan() should not be done when a packet is not crossing netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	117961878c	vxlan: remove net arg from vxlan[6]_xmit_skb() This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Nicolas Dichtel	8b7ed2d91d	iptunnels: remove net arg from iptunnel_xmit() This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 00:27:25 -04:00
Joe Perches	1372a298ea	wireless: scan: Remove comment to compare_ether_addr This function is being removed, so remove the reference to it. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:48 -04:00
Joe Perches	c3923b7a3d	batman: Remove reference to compare_ether_addr This function is being removed, rename the reference. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:48 -04:00
Joe Perches	951fd874c3	llc: Use normal etherdevice.h tests Convert the llc_<foo> static inlines to the equivalents from etherdevice.h and remove the llc_<foo> static inline functions. llc_mac_null -> is_zero_ether_addr llc_mac_multicast -> is_multicast_ether_addr llc_mac_match -> ether_addr_equal Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:34:47 -04:00
Petr Holasek	13c7bf0871	ipv6: ipv6_create_tempaddr cleanup This two-liner removes max_addresses variable which is now unecessary related to patch [ipv6: remove max_addresses check from ipv6_create_tempaddr]. Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:16:50 -04:00
Jiri Bohac	61e76b178d	ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 22:11:44 -04:00
David S. Miller	c12a22428a	Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next Marc Kleine-Budde says: ==================== this is a pull request for net-next. There are two patches from Gerhard Sittig, which improves the clock handling on mpc5121. Oliver Hartkopp provides a patch that adds a per rule limitation of frame hops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:54:02 -04:00
David S. Miller	e7abfe4092	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please accept this batch of updates intended for the 3.12 stream. For the mac80211 bits, Johannes says this: "This time I have various improvements all over the place: IBSS, mesh, testmode, AP client powersave handling, one of the rare rfkill patches and some code cleanup." Also for mac80211: "And I also have some more changes for -next, just a few small fixes and improvements, nothing really stands out." And for iwlwifi: "This time I have some powersave work (notably uAPSD support), CQM offloads, support for a new firmware API and various code cleanups." Regarding the Bluetooth bits, Gustavo says: "Patches to 3.12, here we have: * implementation of a proper tty_port for RFCOMM devices, this fixes some issues people were seeing lately in the kernel. * Add voice_setting option for SCO, it is used for SCO Codec selection * bugfixes, small improvements and clean ups" For the NFC bits, Samuel says: "With this one we have: - A few pn533 improvements and minor fixes. Testing our pn533 driver against Google's NCI stack triggered a few issues that we fixed now. We also added Tx fragmentation support to this driver. - More NFC secure element handling. We added a GET_SE netlink command for getting all the discovered secure elements, and we defined 2 additional secure element netlink event (transaction and connectivity). We also fixed a couple of typos and copy-paste bugs from the secure element handling code. - Firmware download support for the pn544 driver. This chipset can enter a special mode where it's waiting for firmware blobs to replace the already flashed one. We now support that mode." With repect to the ath tree, Kalle says: "New features in ath10k are rx/tx checsumming in hw and survey scan implemented by Michal. Also he made fixes to different areas of the driver, most notable being fixing the case when using two streams and reducing the number of interface combinations to avoid firmware crashes. Bartosz did a clean related to how we handle SoC power save in PCI layer. For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent crashes." I also pulled the wireless tree into wireless-next to support a request from Johannes. On top of all that, there are the usual sort of driver updates. The mwifiex, brcmfmac, brcmsmac, ath9k, and rt2x00 drivers all get some attention, as does the bcma bus and a few other random bits here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:45:31 -04:00
Daniel Borkmann	b1b72076b9	net: sctp: probe: allow more advanced ingress filtering by mark This is a follow-up commit for commit `b1dcdc68b1` ("net: tcp_probe: allow more advanced ingress filtering by mark") that allows for advanced SCTP probe module filtering based on skb mark (for a more detailed description and advantages using mark, refer to `b1dcdc68b1`). The current option to filter by a given port is still being preserved. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:44:11 -04:00
Tim Gardner	3e25c65ed0	net: neighbour: Remove CONFIG_ARPD This config option is superfluous in that it only guards a call to neigh_app_ns(). Enabling CONFIG_ARPD by default has no change in behavior. There will now be call to __neigh_notify() for each ARP resolution, which has no impact unless there is a user space daemon waiting to receive the notification, i.e., the case for which CONFIG_ARPD was designed anyways. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 21:41:43 -04:00
Linus Torvalds	32dad03d16	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...	2013-09-03 18:25:03 -07:00
Bjørn Mork	2fcc800583	net: dsa: inherit addr_assign_type along with dev_addr A device inheriting a random or set address should reflect this in its addr_assign_type. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 20:57:49 -04:00
Bjørn Mork	6b93f4a1f2	net: vlan: inherit addr_assign_type along with dev_addr A device inheriting a random or set address should reflect this in its addr_assign_type. Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-03 20:57:49 -04:00
Linus Torvalds	542a086ac7	Driver core patches for 3.12-rc1 Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk 718AoLrgnUZs3B+70AT34DVktg4HSThk =USl9 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers" * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits) firmware loader: fix pending_fw_head list corruption drivers/base/memory.c: introduce help macro to_memory_block dynamic debug: line queries failing due to uninitialized local variable sysfs: sysfs_create_groups returns a value. debugfs: provide debugfs_create_x64() when disabled rbd: convert bus code to use bus_groups firmware: dcdbas: use binary attribute groups sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled driver core: add #include <linux/sysfs.h> to core files. HID: convert bus code to use dev_groups Input: serio: convert bus code to use drv_groups Input: gameport: convert bus code to use drv_groups driver core: firmware: use __ATTR_RW() driver core: core: use DEVICE_ATTR_RO driver core: bus: use DRIVER_ATTR_WO() driver core: create write-only attribute macros for devices and drivers sysfs: create __ATTR_WO() driver-core: platform: convert bus code to use dev_groups workqueue: convert bus code to use dev_groups MEI: convert bus code to use dev_groups ...	2013-09-03 11:37:15 -07:00
Cong Wang	5a17a390de	net: make snmp_mib_free static inline Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-02 21:00:50 -07:00
Cong Wang	eb3c0d83cc	net: unify skb_udp_tunnel_segment() and skb_udp6_tunnel_segment() As suggested by Pravin, we can unify the code in case of duplicated code. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	d949d826c0	ipv6: Add generic UDP Tunnel segmentation Similar to commit `7313626745` (tunneling: Add generic Tunnel segmentation) This patch adds generic tunneling offloading support for IPv6-UDP based tunnels. This can be used by tunneling protocols like VXLAN. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	f564f45c45	vxlan: add ipv6 proxy support This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:01 -04:00
Cong Wang	f39dc1023d	ipv6: move in6_dev_finish_destroy() into core kernel in6_dev_put() will be needed by vxlan module, so is in6_dev_finish_destroy(). Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	e15a00aafa	vxlan: add ipv6 route short circuit support route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	e4c7ed4153	vxlan: add ipv6 support This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	caf92bc400	ipv6: do not call ndisc_send_rs() with write lock Because vxlan module will call ip6_dst_lookup() in TX path, which will hold write lock. So we have to release this write lock before calling ndisc_send_rs(), otherwise could deadlock. Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	034dfc5df9	ipv6: export in6addr_loopback to modules It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	5f81bd2e5d	ipv6: export a stub for IPv6 symbols used by vxlan In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. Suggested by Ben. This is an ugly but easy solution for now. Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	788787b559	ipv6: move ip6_local_out into core kernel It will be used the vxlan kernel module. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:30:00 -04:00
Cong Wang	3ce9b35ff6	ipv6: move ip6_dst_hoplimit() into core kernel It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 22:29:59 -04:00
stephen hemminger	34aedd3f3b	qdisc: fix build with !CONFIG_NET_SCHED Multiqueue scheduler refers to default_qdisc_ops; therefore the variable definition needs to be moved to handle case where net scheduler API is not available. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 18:09:45 -04:00
stephen hemminger	d2a7f269f9	qdisc: make args to qdisc_create_default const Fixes warnings introduced by the qdisc default patch. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 18:09:45 -04:00
Eric W. Biederman	c7b96acf14	userns: Kill nsown_capable it makes the wrong thing easy nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-30 23:44:11 -07:00
stephen hemminger	6da7c8fcbc	qdisc: allow setting default queuing discipline By default, the pfifo_fast queue discipline has been used by default for all devices. But we have better choices now. This patch allow setting the default queueing discipline with sysctl. This allows easy use of better queueing disciplines on all devices without having to use tc qdisc scripts. It is intended to allow an easy path for distributions to make fq_codel or sfq the default qdisc. This patch also makes pfifo_fast more of a first class qdisc, since it is now possible to manually override the default and explicitly use pfifo_fast. The behavior for systems who do not use the sysctl is unchanged, they still get pfifo_fast Also removes leftover random # in sysctl net core. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-31 00:32:32 -04:00
Linus Torvalds	a8787645e1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) There was a simplification in the ipv6 ndisc packet sending attempted here, which avoided using memory accounting on the per-netns ndisc socket for sending NDISC packets. It did fix some important issues, but it causes regressions so it gets reverted here too. Specifically, the problem with this change is that the IPV6 output path really depends upon there being a valid skb->sk attached. The reason we want to do this change in some form when we figure out how to do it right, is that if a device goes down the ndisc_sk socket send queue will fill up and block NDISC packets that we want to send to other devices too. That's really bad behavior. Hopefully Thomas can come up with a better version of this change. 2) Fix a severe TCP performance regression by reverting a change made to dev_pick_tx() quite some time ago. From Eric Dumazet. 3) TIPC returns wrongly signed error codes, fix from Erik Hugne. 4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the skb->sk too early. Fix from Li Hongjun. 5) RAW ipv4 sockets can use the wrong routing key during lookup, from Chris Clark. 6) Similar to #1 revert an older change that tried to use plain alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner mark which needs to see the skb->sk for such frames. From Phil Oester. 7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz, specifically in the handling of virtual functions. 8) IPSEC path error propagations to sockets is not done properly when we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic Sowa. 9) Fix missing channel context release in mac80211, from Johannes Berg. 10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy Lutomirski. 11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic drivers. From Michal Schmidt. 12) Hopefully a complete and correct fix for the genetlink dump locking and module reference counting. From Pravin B Shelar. 13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir. 14) Fix handling of timestamp offset when restoring a snapshotted TCP socket. From Andrew Vagin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: fec: fix time stamping logic after napi conversion net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay mISDN: return -EINVAL on error in dsp_control_req() net: revert `8728c544a9` ("net: dev_pick_tx() fix") Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" ipv4 tunnels: fix an oops when using ipip/sit with IPsec tipc: set sk_err correctly when connection fails tcp: tcp_make_synack() should use sock_wmalloc bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones ipv6: Don't depend on per socket memory for neighbour discovery messages ipv4: sendto/hdrincl: don't use destination address found in header tcp: don't apply tsoffset if rcv_tsecr is zero tcp: initialize rcv_tstamp for restored sockets net: xilinx: fix memleak net: usb: Add HP hs2434 device to ZLP exception table net: add cpu_relax to busy poll loop net: stmmac: fixed the pbl setting with DT genl: Hold reference on correct module while netlink-dump. genl: Fix genl dumpit() locking. xfrm: Fix potential null pointer dereference in xdst_queue_output ...	2013-08-30 17:43:17 -07:00
Daniel Borkmann	2d98c29b6f	net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:56:47 -04:00
Eric Dumazet	702821f4ea	net: revert `8728c544a9` ("net: dev_pick_tx() fix") commit `8728c544a9` ("net: dev_pick_tx() fix") and commit `b6fe83e952` ("bonding: refine IFF_XMIT_DST_RELEASE capability") are quite incompatible : Queue selection is disabled because skb dst was dropped before entering bonding device. This causes major performance regression, mainly because TCP packets for a given flow can be sent to multiple queues. This is particularly visible when using the new FQ packet scheduler with MQ + FQ setup on the slaves. We can safely revert the first commit now that `416186fbf8` ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") properly caps the queue_index. Reported-by: Xi Wang <xii@google.com> Diagnosed-by: Xi Wang <xii@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:48:04 -04:00
David S. Miller	25ad6117e7	Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" This reverts commit `1f324e3887`. It seems to cause regressions, and in particular the output path really depends upon there being a socket attached to skb->sk for checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2(). Reported-by: Stephen Warren <swarren@wwwdotorg.org> Reported-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:39:33 -04:00
Thomas Graf	816c5b5b01	ipv6: Remove redundant sk variable A sk variable initialized to ndisc_sk is already available outside of the branch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:18:59 -04:00
Li Hongjun	737e828bdb	ipv4 tunnels: fix an oops when using ipip/sit with IPsec Since commit `3d7b46cd20` (ip_tunnel: push generic protocol handling to ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on an IPv4 over IPv4 tunnel. xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb). Signed-off-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:13:28 -04:00
Erik Hugne	2c8d851823	tipc: set sk_err correctly when connection fails Should a connect fail, if the publication/server is unavailable or due to some other error, a positive value will be returned and errno is never set. If the application code checks for an explicit zero return from connect (success) or a negative return (failure), it will not catch the error and subsequent send() calls will fail as shown from the strace snippet below. socket(0x1e /* PF_??? /, SOCK_SEQPACKET, 0) = 3 connect(3, {sa_family=0x1e / AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111 sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe) The reason for this behaviour is that TIPC wrongly inverts error codes set in sk_err. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 16:06:57 -04:00
Phil Oester	eb8895debe	tcp: tcp_make_synack() should use sock_wmalloc In commit `90ba9b19` (tcp: tcp_make_synack() can use alloc_skb()), Eric changed the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so, the netfilter owner match lost its ability to block the SYNACK packet on outbound listening sockets. Revert the change, restoring the owner match functionality. This closes netfilter bugzilla #847. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 16:02:04 -04:00
Linus Lüssing	cc0fdd8028	bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (`b00589af3b`) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 15:24:37 -04:00
Yuchung Cheng	1b7fdd2ab5	tcp: do not use cached RTT for RTT estimation RTT cached in the TCP metrics are valuable for the initial timeout because SYN RTT usually does not account for serialization delays on low BW path. However using it to seed the RTT estimator maybe disruptive because other components (e.g., pacing) require the smooth RTT to be obtained from actual connection. The solution is to use the higher cached RTT to set the first RTO conservatively like tcp_rtt_estimator(), but avoid seeding the other RTT estimator variables such as srtt. It is also a good idea to keep RTO conservative to obtain the first RTT sample, and the performance is insured by TCP loss probe if SYN RTT is available. To keep the seeding formula consistent across SYN RTT and cached RTT, the rttvar is twice the cached RTT instead of cached RTTVAR value. The reason is because cached variation may be too small (near min RTO) which defeats the purpose of being conservative on first RTO. However the metrics still keep the RTT variations as they might be useful for user applications (through ip). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 15:14:38 -04:00
Eric Dumazet	08f89b981b	pkt_sched: fq: prefetch() fix kbuild bot reported following m68k build error : net/sched/sch_fq.c: In function 'fq_dequeue': >> net/sched/sch_fq.c:491:2: error: implicit declaration of function 'prefetch' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors While we are fixing this, move this prefetch() call a bit earlier. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 14:51:59 -04:00
Eric Dumazet	afe4fd0624	pkt_sched: fq: Fair Queue packet scheduler - Uses perfect flow match (not stochastic hash like SFQ/FQ_codel) - Uses the new_flow/old_flow separation from FQ_codel - New flows get an initial credit allowing IW10 without added delay. - Special FIFO queue for high prio packets (no need for PRIO + FQ) - Uses a hash table of RB trees to locate the flows at enqueue() time - Smart on demand gc (at enqueue() time, RB tree lookup evicts old unused flows) - Dynamic memory allocations. - Designed to allow millions of concurrent flows per Qdisc. - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow. - Single high resolution timer for throttled flows (if any). - One RB tree to link throttled flows. - Ability to have a max rate per flow. We might add a socket option to add per socket limitation. Attempts have been made to add TCP pacing in TCP stack, but this seems to add complex code to an already complex stack. TCP pacing is welcomed for flows having idle times, as the cwnd permits TCP stack to queue a possibly large number of packets. This removes the 'slow start after idle' choice, hitting badly large BDP flows, and applications delivering chunks of data as video streams. Nicely spaced packets : Here interface is 10Gbit, but flow bottleneck is ~20Mbit cwin is big, yet FQ avoids the typical bursts generated by TCP (as in netperf TCP_RR -- -r 100000,100000) 15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115> 15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115> 15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115> 15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115> 15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115> 15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115> 15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115> 15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> TCP timestamps show that most packets from B were queued in the same ms timeframe (TSval 1159799{3,4}), but FQ managed to send them right in time to avoid a big burst. In slow start or steady state, very few packets are throttled [1] FQ gets a bunch of tunables as : limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the credit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) All of them can be changed on a live qdisc. $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows, 511 inactive, 0 throttled 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit [1] Except if initial srtt is overestimated, as if using cached srtt in tcp metrics. We'll provide a fix for this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 21:38:31 -04:00
Oliver Hartkopp	391ac1282d	can: gw: add a per rule limitation of frame hops Usually the received CAN frames can be processed/routed as much as 'max_hops' times (which is given at module load time of the can-gw module). Introduce a new configuration option to reduce the number of possible hops for a specific gateway rule to a value smaller then max_hops. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2013-08-29 22:58:24 +02:00
Daniel Borkmann	f55d112e52	net: packet: use reciprocal_divide in fanout_demux_hash Instead of hard-coding reciprocal_divide function, use the inline function from reciprocal_div.h. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:43:29 -04:00
Daniel Borkmann	5df0ddfbc9	net: packet: add randomized fanout scheduler We currently allow for different fanout scheduling policies in pf_packet such as scheduling by skb's rxhash, round-robin, by cpu, and rollover. Also allow for a random, equidistributed selection of the socket from the fanout process group. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:43:29 -04:00
Veaceslav Falico	48311f4685	net: add netdev_upper_get_next_dev_rcu(dev, iter) This function returns the next dev in the dev->upper_dev_list after the struct list_head *iter position, and updates iter accordingly. Returns NULL if there are no devices left. Caller must hold RCU read lock. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	620f3186ca	net: remove search_list from netdev_adjacent We already don't need it cause we see every upper/lower device in the list already. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	5d261913ca	net: add lower_dev_list to net_device and make a full mesh This patch adds lower_dev_list list_head to net_device, which is the same as upper_dev_list, only for lower devices, and begins to use it in the same way as the upper list. It also changes the way the whole adjacent device lists work - now they contain all of upper/lower devices, not only the first level. The first level devices are distinguished by the bool neighbour field in netdev_adjacent, also added by this patch. There are cases when a device can be added several times to the adjacent list, the simplest would be: /---- eth0.10 ---\ eth0- --- bond0 \---- eth0.20 ---/ where both bond0 and eth0 'see' each other in the adjacent lists two times. To avoid duplication of netdev_adjacent structures ref_nr is being kept as the number of times the device was added to the list. The 'full view' is achieved by adding, on link creation, all of the upper_dev's upper_dev_list devices as upper devices to all of the lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice versa. On unlink they are removed using the same logic. I've tested it with thousands vlans/bonds/bridges, everything works ok and no observable lags even on a huge number of interfaces. Memory footprint for 128 devices interconnected with each other via both upper and lower (which is impossible, but for the comparison) lists would be: 1281282*sizeof(netdev_adjacent) = 1.5MB but in the real world we usualy have at most several devices with slaves and a lot of vlans, so the footprint will be much lower. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:42 -04:00
Veaceslav Falico	aa9d85605f	net: rename netdev_upper to netdev_adjacent Rename the structure to reflect the upcoming addition of lower_dev_list. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:19:41 -04:00
David S. Miller	79f9ab7e0a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== This pull request fixes some issues that arise when 6in4 or 4in6 tunnels are used in combination with IPsec, all from Hannes Frederic Sowa and a null pointer dereference when queueing packets to the policy hold queue. 1) We might access the local error handler of the wrong address family if 6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer to the correct local_error to xfrm_state_afinet. 2) Add a helper function to always refer to the correct interpretation of skb->sk. 3) Call skb_reset_inner_headers to record the position of the inner headers when adding a new one in various ipv6 tunnels. This is needed to identify the addresses where to send back errors in the xfrm layer. 4) Dereference inner ipv6 header if encapsulated to always call the right error handler. 5) Choose protocol family by skb protocol to not call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode. 6) Partly revert "xfrm: introduce helper for safe determination of mtu" because this introduced pmtu discovery problems. 7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs. We need this to get the correct mtu informations in xfrm. 8) Fix null pointer dereference in xdst_queue_output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:05:30 -04:00
Thomas Graf	1f324e3887	ipv6: Don't depend on per socket memory for neighbour discovery messages Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, simply use alloc_skb() and don't depend on any socket wmem space any longer. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:01:05 -04:00
Chris Clark	c27c9322d0	ipv4: sendto/hdrincl: don't use destination address found in header ipv4: raw_sendmsg: don't use header's destination address A sendto() regression was bisected and found to start with commit `f8126f1d51` (ipv4: Adjust semantics of rt->rt_gateway.) The problem is that it tries to ARP-lookup the constructed packet's destination address rather than the explicitly provided address. Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used. cf. commit `2ad5b9e4bd` Reported-by: Chris Clark <chris.clark@alcatel-lucent.com> Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com> Tested-by: Chris Clark <chris.clark@alcatel-lucent.com> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:57:52 -04:00
Daniel Borkmann	7613f5fe11	net: sctp: sctp_verify_init: clean up mandatory checks and add comment Add a comment related to RFC4960 explaning why we do not check for initial TSN, and while at it, remove yoda notation checks and clean up code from checks of mandatory conditions. That's probably just really minor, but makes reviewing easier. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:54:48 -04:00
Eric Dumazet	95bd09eb27	tcp: TSO packets automatic sizing After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:50:06 -04:00
Hannes Frederic Sowa	b800c3b966	ipv6: drop fragmented ndisc packets by default (RFC 6980) This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:32:08 -04:00
Florian Fainelli	fd094808a0	bridge: inherit slave devices needed_headroom Some slave devices may have set a dev->needed_headroom value which is different than the default one, most likely in order to prepend a hardware descriptor in front of the Ethernet frame to send. Whenever a new slave is added to a bridge, ensure that we update the needed_headroom value accordingly to account for the slave needed_headroom value. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:17:09 -04:00
Andrew Vagin	e3e1202831	tcp: don't apply tsoffset if rcv_tsecr is zero The zero value means that tsecr is not valid, so it's a special case. tsoffset is used to customize tcp_time_stamp for one socket. tsoffset is usually zero, it's used when a socket was moved from one host to another host. Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to TCP_RTO_MAX. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:11:12 -04:00
Andrew Vagin	c7781a6e3c	tcp: initialize rcv_tstamp for restored sockets u32 rcv_tstamp; /* timestamp of last received ACK */ Its value used in tcp_retransmit_timer, which closes socket if the last ack was received more then TCP_RTO_MAX ago. Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer is called before receiving a first ack, the connection is closed. This patch initializes rcv_tstamp to a timestamp, when a socket was restored. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:11:11 -04:00
John W. Linville	0d8165e9fc	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c	2013-08-29 14:08:24 -04:00
Eric W. Biederman	7dc5dbc879	sysfs: Restrict mounting sysfs Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights over the net namespace. The principle here is if you create or have capabilities over it you can mount it, otherwise you get to live with what other people have mounted. Instead of testing this with a straight forward ns_capable call, perform this check the long and torturous way with kobject helpers, this keeps direct knowledge of namespaces out of sysfs, and preserves the existing sysfs abstractions. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-08-28 21:35:14 -07:00
Pravin B Shelar	33c6b1f6b1	genl: Hold reference on correct module while netlink-dump. netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-28 17:19:17 -04:00
Pravin B Shelar	9b96309c5b	genl: Fix genl dumpit() locking. In case of genl-family with parallel ops off, dumpif() callback is expected to run under genl_lock, But commit `def3117493` (genl: Allow concurrent genl callbacks.) changed this behaviour where only first dumpit() op was called under genl-lock. For subsequent dump, only nlk->cb_lock was taken. Following patch fixes it by defining locked dumpit() and done() callback which takes care of genl-locking. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-28 17:19:17 -04:00
Trond Myklebust	347e2233b7	SUNRPC: Fix memory corruption issue on 32-bit highmem systems Some architectures, such as ARM-32 do not return the same base address when you call kmap_atomic() twice on the same page. This causes problems for the memmove() call in the XDR helper routine "_shift_data_right_pages()", since it defeats the detection of overlapping memory ranges, and has been seen to corrupt memory. The fix is to distinguish between the case where we're doing an inter-page copy or not. In the former case of we know that the memory ranges cannot possibly overlap, so we can additionally micro-optimise by replacing memmove() with memcpy(). Reported-by: Mark Young <MYoung@nvidia.com> Reported-by: Matt Craighead <mcraighead@nvidia.com> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Matt Craighead <mcraighead@nvidia.com>	2013-08-28 15:43:43 -04:00
John W. Linville	f3e979a52c	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2013-08-28 13:51:40 -04:00
John W. Linville	cd80e107b7	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-28 13:49:20 -04:00
John W. Linville	b35c809708	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c net/mac80211/ibss.c	2013-08-28 10:36:09 -04:00
Antonio Quartulli	c6eaa3f067	batman-adv: send GW_DEL event when the gw client mode is deselected Whenever the GW client mode is deselected, a DEL event has to be sent in order to tell userspace that the current gateway has been lost. Send the uevent on state change only if a gateway was currently selected. Reported-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2013-08-28 11:33:00 +02:00
Simon Wunderlich	c00a072d3f	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:52 +02:00
Antonio Quartulli	791c2a2d3f	batman-adv: move enum definition at the top of the file Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:51 +02:00
Simon Wunderlich	c54f38c9aa	batman-adv: set skb priority according to content The skb priority field may help the wireless driver to choose the right queue (e.g. WMM queues). This should be set in batman-adv, as this information is only available here. This patch adds support for IPv4/IPv6 DS fields and VLAN PCP. Note that only VLAN PCP is used if a VLAN header is present. Also initially set TC_PRIO_CONTROL only for self-generated packets, and keep the priority set by higher layers. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-08-28 11:31:50 +02:00
Steffen Klassert	302a50bc94	xfrm: Fix potential null pointer dereference in xdst_queue_output The net_device might be not set on the skb when we try refcounting. This leads to a null pointer dereference in xdst_queue_output(). It turned out that the refcount to the net_device is not needed after all. The dst_entry has a refcount to the net_device before we queue the skb, so it can't go away. Therefore we can remove the refcount on queueing to fix the null pointer dereference. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-28 08:47:14 +02:00
David S. Miller	5b2941b18d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 22:11:18 -04:00
Florian Westphal	b7e092c05b	netfilter: ctnetlink: fix uninitialized variable net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect': 'helper' may be used uninitialized in this function It was only initialized in if CTA_EXPECT_HELP_NAME attribute was present, it must be NULL otherwise. Problem added recently in `bd077937` (netfilter: nfnetlink_queue: allow to attach expectations to conntracks). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:19 +02:00
Patrick McHardy	4ad362282c	netfilter: add IPv6 SYNPROXY target Add an IPv6 version of the SYNPROXY target. The main differences to the IPv4 version is routing and IP header construction. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:13 +02:00
Patrick McHardy	81eb6a1487	net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:28:04 +02:00
Patrick McHardy	48b1de4c11	netfilter: add SYNPROXY core/target Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:27:54 +02:00
Patrick McHardy	0198230b77	net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Extract the local TCP stack independant parts of tcp_v4_init_sequence() and cookie_v4_check() and export them for use by the upcoming SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:27:44 +02:00
Patrick McHardy	41d73ec053	netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:26:48 +02:00
Nathan Hintz	706f5151e3	netfilter: nf_defrag_ipv6.o included twice 'nf_defrag_ipv6' is built as a separate module; it shouldn't be included in the 'nf_conntrack_ipv6' module as well. Signed-off-by: Nathan Hintz <nlhintz@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:13:41 +02:00
Phil Oester	affe759dba	netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT with the tcp-reset option sends out reset packets with the src MAC address of the local bridge interface, instead of the MAC address of the intended destination. This causes some routers/firewalls to drop the reset packet as it appears to be spoofed. Fix this by bypassing ip[6]_local_out and setting the MAC of the sender in the tcp reset packet. This closes netfilter bugzilla #531. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-08-28 00:13:12 +02:00
Andy Zhou	5828cd9a68	openvswitch: optimize flow compare and mask functions Make sure the sw_flow_key structure and valid mask boundaries are always machine word aligned. Optimize the flow compare and mask operations using machine word size operations. This patch improves throughput on average by 15% when CPU is the bottleneck of forwarding packets. This patch is inspired by ideas and code from a patch submitted by Peter Klausler titled "replace memcmp() with specialized comparator". However, The original patch only optimizes for architectures support unaligned machine word access. This patch optimizes for all architectures. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-27 13:13:09 -07:00
David S. Miller	decf4f3f2d	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== This is one more set of fixes intended for the 3.11 stream... For the mac80211 bits, Johannes says: "I have three more patches for the 3.11 stream: Felix's fix for the fairly visible brcmsmac crash, a fix from Simon for an IBSS join bug I found and a fix for a channel context bug in IBSS I'd introduced." Along with those... Sujith Manoharan makes a minor change to not use a PLL hang workaroun for AR9550. This one-liner fixes a couple of bugs reported in the Red Hat bugzilla. Helmut Schaa addresses an ath9k_htc bug that mangles frame headers during Tx. This fix is small, tested by the bug reported and isolated to ath9k_htc. Stanislaw Gruszka reverts a recent iwl4965 change that broke rfkill notification to user space. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 15:54:47 -04:00
Daniel Borkmann	b1dcdc68b1	net: tcp_probe: allow more advanced ingress filtering by mark Currently, the tcp_probe snooper can either filter packets by a given port (handed to the module via module parameter e.g. port=80) or lets all TCP traffic pass (port=0, default). When a port is specified, the port number is tested against the sk's source/destination port. Thus, if one of them matches, the information will be further processed for the log. As this is quite limited, allow for more advanced filtering possibilities which can facilitate debugging/analysis with the help of the tcp_probe snooper. Therefore, similarly as added to BPF machine in commit `7e75f93e` ("pkt_sched: ingress socket filter by mark"), add the possibility to use skb->mark as a filter. If the mark is not being used otherwise, this allows ingress filtering by flow (e.g. in order to track updates from only a single flow, or a subset of all flows for a given port) and other things such as dynamic logging and reconfiguration without removing/re-inserting the tcp_probe module, etc. Simple example: insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1 ... iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 [... sampling interval ...] iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 The current option to filter by a given port is still being preserved. A similar approach could be done for the sctp_probe module as a follow-up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 15:53:34 -04:00
Andy Lutomirski	d661684cf6	net: Check the correct namespace when spoofing pid over SCM_RIGHTS This is a security bug. The follow-up will fix nsproxy to discourage this type of issue from happening again. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 13:52:52 -04:00
Andy Zhou	02237373b1	openvswitch: Rename key_len to key_end Key_end is a better name describing the ending boundary than key_len. Rename those variables to make it less confusing. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-26 14:03:14 -07:00
Joe Stringer	a175a72330	openvswitch: Add SCTP support This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-26 14:03:13 -07:00
David S. Miller	b05930f5d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-26 16:37:08 -04:00
Hannes Frederic Sowa	9c9c9ad5fa	ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs Currently we don't initialize skb->protocol when transmitting data via tcp, raw(with and without inclhdr) or udp+ufo or appending data directly to the socket transmit queue (via ip6_append_data). This needs to be done so that we can get the correct mtu in the xfrm layer. Setting of skb->protocol happens only in functions where we also have a transmitting socket and a new skb, so we don't overwrite old values. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-26 12:46:24 +02:00
Hannes Frederic Sowa	5a25cf1e31	xfrm: revert ipv4 mtu determination to dst_mtu In commit `0ea9d5e3e0` ("xfrm: introduce helper for safe determination of mtu") I switched the determination of ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is never correct for ipv4 ipsec. This patch partly reverts `0ea9d5e3e0` ("xfrm: introduce helper for safe determination of mtu"). Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-08-26 12:40:53 +02:00
Johannes Berg	a986553877	mac80211: fix change_interface queue assignments Jouni reported that with mac80211_hwsim, multicast TX was causing crashes due to invalid vif->cab_queue assignment. It turns out that this is caused by change_interface() getting invoked and not having the vif->type/vif->p2p assigned correctly before calling the queue check (ieee80211_check_queues). Fix this by passing the 'external' interface type to the function and adjusting it accordingly. While at it, also fix the error path in change_interface, it wasn't correctly resetting to the external type but using the internal one instead. Fortunately this affects on hwsim because all other drivers set the vif->type/vif->p2p variables when changing iftype. This shouldn't be needed, but almost all implementations actually do it for their own internal handling. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-26 09:52:58 +02:00
Dan Carpenter	b4de77ade3	ipip: potential race in ip_tunnel_init_net() Eric Dumazet says that my previous fix for an ERR_PTR dereference (`ea857f28ab` 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()') could be racy and suggests the following fix instead. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-25 18:39:59 -04:00
Andy Zhou	03f0d916aa	openvswitch: Mega flow implementation Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:43:07 -07:00
Cong Wang	3fa34de678	openvswitch: check CONFIG_OPENVSWITCH_GRE in makefile Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:43:07 -07:00
Justin Pettit	2694838d60	openvswitch: Fix argument descriptions in vport.c. Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Jiri Pirko	2537b4dd0a	openvswitch:: link upper device for port devices Link upper device properly. That will make IFLA_MASTER filled up. Set the master to port 0 of the datapath under which the port belongs. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Pravin B Shelar	76a66c7e7f	openvswitch: Use non rcu hlist_del() flow table entry. Flow table destroy is done in rcu call-back context. Therefore there is no need to use rcu variant of hlist_del(). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:38:00 -07:00
Pravin B Shelar	59a35d60af	openvswitch: Use RCU lock for dp dump operation. RCUfy dp-dump operation which is already read-only. This makes all ovs dump operations lockless. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:37:59 -07:00
Pravin B Shelar	d57170b1b1	openvswitch: Use RCU lock for flow dump operation. Flow dump operation is read-only operation. There is no need to take ovs-lock. Following patch use rcu-lock for dumping flows. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:37:59 -07:00
John W. Linville	81ca2ff945	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-08-23 11:47:48 -04:00
Johannes Berg	d70b7616d9	mac80211: ignore (E)CSA in probe response frames Seth reports that some APs, notably the Netgear WNDAP360, send invalid ECSA IEs in probe response frames with the operating class and channel number both set to zero, even when no channel switch is being done. As a result, any scan while connected to such an AP results in the connection being dropped. Fix this by ignoring any channel switch announcment in probe response frames entirely, since we're connected to the AP we will be receiving a beacon (and maybe even an action frame) if a channel switch is done, which is sufficient. Cc: stable@vger.kernel.org # 3.10 Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 17:05:12 +02:00
Vladimir Kondratiev	19504cf5f3	cfg80211: add flags to cfg80211_rx_mgmt() Add flags intended to report various auxiliary information and introduce the NL80211_RXMGMT_FLAG_ANSWERED flag to report that the frame was already answered by the device. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> [REPLIED->ANSWERED, reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 16:06:03 +02:00
Bob Copeland	c4c205f3cd	mac80211: assign seqnums for group QoS frames According to 802.11-2012 9.3.2.10, paragraph 4, QoS data frames with a group address in the Address 1 field have sequence numbers allocated from the same counter as non-QoS data and management frames. Without this flag, some drivers may not assign sequence numbers, and in rare cases frames might get dropped. Set the control flag accordingly. Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 15:43:38 +02:00
Chun-Yeow Yeoh	a4ef66a915	mac80211: only respond to probe request with mesh ID Previously, the mesh STA responds to probe request from legacy STA but now it will only respond to legacy STA if the legacy STA does include the specific mesh ID or wildcard mesh ID in the probe request. The iw patch "iw: scan using meshid" can be used either by legacy STA or by mesh STA to do active scanning by inserting the mesh ID in the probe request frame. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com> Acked-by: Thomas Pedersen <thomas@cozybit.com> Acked-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 15:25:06 +02:00
Johannes Berg	1fb9026000	mac80211: move setting WIPHY_FLAG_SUPPORTS_SCHED_SCAN into drivers mac80211 currently sets WIPHY_FLAG_SUPPORTS_SCHED_SCAN based on whether the start_sched_scan operation is supported or not, but that will not be correct for all drivers, we're adding scheduled scan to the iwlmvm driver but it depends on firmware support. Therefore, move setting WIPHY_FLAG_SUPPORTS_SCHED_SCAN into the drivers so that they can control it regardless of implementing the operation. This currently only affects the TI drivers since they're the only ones implementing scheduled scan (in a mac80211 driver.) Acked-by: Luciano Coelho <luca@coelho.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-23 12:02:26 +02:00
Daniel Borkmann	05f147ef7c	net: sctp_probe: simplify code by using %pISc format specifier We can simply use the %pISc format specifier that was recently added and thus remove some code that distinguishes between IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 22:07:06 -07:00
Duan Jiong	c92a59eca8	ipv6: handle Redirect ICMP Message with no Redirected Header option rfc 4861 says the Redirected Header option is optional, so the kernel should not drop the Redirect Message that has no Redirected Header option. In this patch, the function ip6_redirect_no_header() is introduced to deal with that condition. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2013-08-22 20:08:21 -07:00
Daniel Borkmann	f925d0a62d	net: tcp_probe: add IPv6 support The tcp_probe currently only supports analysis of IPv4 connections. Therefore, it would be nice to have IPv6 supported as well. Since we have the recently added %pISpc specifier that is IPv4/IPv6 generic, build related sockaddress structures from the flow information and pass this to our format string. Tested with SSH and HTTP sessions on IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
Daniel Borkmann	d8cdeda6dd	net: tcp_probe: kprobes: adapt jtcp_rcv_established signature This patches fixes a rather unproblematic function signature mismatch as the const specifier was missing for the th variable; and next to that it adds a build-time assertion so that future function signature mismatches for kprobes will not end badly, similarly as commit `22222997` ("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack") did it for SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
Daniel Borkmann	b4c1c1d038	net: tcp_probe: also include rcv_wnd next to snd_wnd It is helpful to sometimes know the TCP window sizes of an established socket e.g. to confirm that window scaling is working or to tweak the window size to improve high-latency connections, etc etc. Currently the TCP snooper only exports the send window size, but not the receive window size. Therefore, also add the receive window size to the end of the output line. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:19:50 -07:00
David S. Miller	baf3b3f227	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Some constifications, from Mathias Krause. 2) Catch bugs if a hold timer is still active when xfrm_policy_destroy() is called, from Fan Du. 3) Remove a redundant address family checking, from Fan Du. 4) Make xfrm_state timer monotonic to be independent of system clock changes, from Fan Du. 5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(), from Rami Rosen. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 16:04:41 -07:00
Yuchung Cheng	0f7cc9a3c2	tcp: increase throughput when reordering is high The stack currently detects reordering and avoid spurious retransmission very well. However the throughput is sub-optimal under high reordering because cwnd is increased only if the data is deliverd in order. I.e., FLAG_DATA_ACKED check in tcp_ack(). The more packet are reordered the worse the throughput is. Therefore when reordering is proven high, cwnd should advance whenever the data is delivered regardless of its ordering. If reordering is low, conservatively advance cwnd only on ordered deliveries in Open state, and retain cwnd in Disordered state (RFC5681). Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms to 55ms (for reordering effect). This change increases TCP throughput by 20 - 25% to near bottleneck BW. A special case is the stretched ACK with new SACK and/or ECE mark. For example, a receiver may receive an out of order or ECN packet with unacked data buffered because of LRO or delayed ACK. The principle on such an ACK is to advance cwnd on the cummulative acked part first, then reduce cwnd in tcp_fastretrans_alert(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 14:39:46 -07:00
Johannes Berg	9d47b38056	Revert "genetlink: fix family dump race" This reverts commit `58ad436fcf`. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-22 13:24:02 -07:00
John W. Linville	69b307a48a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2013-08-22 14:27:31 -04:00
John W. Linville	89b5f74a26	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2013-08-22 11:35:22 -04:00
Johannes Berg	e133fae263	mac80211: minstrel_ht: don't use control.flags in TX status path Sujith reports that my commit `af61a16518` ("mac80211: add control port protocol TX control flag") broke ath9k (aggregation). The reason is that I made minstrel_ht use the flag in the TX status path, where it can have been overwritten by the driver. Since we have no more space in info->flags, revert that part of the change for now, until we can reshuffle the flags or so. Reported-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-08-22 08:37:08 +02:00
Frédéric Dalleau	2dea632f9a	Bluetooth: Add SCO connection fallback When initiating a transparent eSCO connection, make use of T2 settings at first try. T2 is the recommended settings from HFP 1.6 WideBand Speech. Upon connection failure, try T1 settings. When CVSD is requested and eSCO is supported, try to establish eSCO connection using S3 settings. If it fails, fallback in sequence to S2, S1, D1, D0 settings. To know which setting should be used, conn->attempt is used. It indicates the currently ongoing SCO connection attempt and can be used as the index for the fallback settings table. These setting and the fallback order are described in Bluetooth HFP 1.6 specification p. 101. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	1a4c958cf9	Bluetooth: Handle specific error for SCO connection fallback Synchronous Connection Complete event can return error "Connection Rejected due to Limited resources (0x10)". Handling this error is required for SCO connection fallback. This error happens when the server tried to accept the connection but failed to negotiate settings. This error code has been verified experimentally by sending a T2 request to a T1 only SCO listener. Client dump follows : < HCI Command (0x01\|0x0028) plen 17 [hci0] 3.696064 Handle: 12 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 > HCI Event (0x0f) plen 4 [hci0] 3.697034 Setup Synchronous Connection (0x01\|0x0028) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 3.736059 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:AB (OUI 70-F3-95) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Server dump follows : > HCI Event (0x04) plen 10 [hci0] 4.741513 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Class: 0x620100 Major class: Computer (desktop, notebook, PDA, organizers) Minor class: Uncategorized, code for device not assigned Networking (LAN, Ad hoc) Audio (Speaker, Microphone, Headset) Telephony (Cordless telephony, Modem, Headset) Link type: eSCO (0x02) < HCI Command (0x01\|0x0029) plen 21 [hci0] 4.743269 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c1 > HCI Event (0x0f) plen 4 [hci0] 4.745517 Accept Synchronous Connection (0x01\|0x0029) ncmd 1 Status: Success (0x00) > HCI Event (0x2c) plen 17 [hci0] 4.749508 Status: Connection Rejected due to Limited Resources (0x0d) Handle: 0 Address: xx:xx:xx:xx:xx:D9 (OUI 20-68-9D) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x06 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:13 +02:00
Frédéric Dalleau	79dc0087c3	Bluetooth: Prevent transparent SCO on older devices Older Bluetooth devices may not support Setup Synchronous Connection or SCO transparent data. This is indicated by the corresponding LMP feature bits. It is not possible to know if the adapter support these features before setting BT_VOICE option since the socket is not bound to an adapter. An adapter can also be added after the socket is created. The socket can be bound to an address before adapter is plugged in. Thus, on a such adapters, if user request BT_VOICE_TRANSPARENT, outgoing connections fail on connect() and returns -EOPNOTSUPP. Incoming connections do not fail. However, they should only be allowed depending on what was specified in Write_Voice_Settings command. EOPNOTSUPP is choosen because connect() system call is failing after selecting route but before any connection attempt. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:12 +02:00
Frédéric Dalleau	10c62ddc6f	Bluetooth: Parameters for outgoing SCO connections In order to establish a transparent SCO connection, the correct settings must be specified in the Setup Synchronous Connection request. For that, a setting field is added to ACL connection data to set up the desired parameters. The patch also removes usage of hdev->voice_setting in CVSD connection and makes use of T2 parameters for transparent data. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	2f69a82acf	Bluetooth: Use voice setting in deferred SCO connection request When an incoming eSCO connection is requested, check the selected voice setting and reply appropriately. Voice setting should have been negotiated previously. For example, in case of HFP, the codec is negotiated using AT commands on the RFCOMM channel. This patch only changes replies for socket with deferred setup enabled. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:11 +02:00
Frédéric Dalleau	ad10b1a487	Bluetooth: Add Bluetooth socket voice option This patch extends the current Bluetooth socket options with BT_VOICE. This is intended to choose voice data type at runtime. It only applies to SCO sockets. Incoming connections shall be setup during deferred setup. Outgoing connections shall be setup before connect(). The desired setting is stored in the SCO socket info. This patch declares needed members, modifies getsockopt() and setsockopt(). Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	33f2404823	Bluetooth: Remove unused mask parameter in sco_conn_defer_accept From Bluetooth Core v4.0 specification, 7.1.8 Accept Connection Request Command "When accepting synchronous connection request, the Role parameter is not used and will be ignored by the BR/EDR Controller." Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:09 +02:00
Frédéric Dalleau	e660ed6c70	Bluetooth: Use hci_connect_sco directly hci_connect is a super function for connecting hci protocols. But the voice_setting parameter (introduced in subsequent patches) is only needed by SCO and security requirements are not needed for SCO channels. Thus, it makes sense to have a separate function for SCO. Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ffe6b68cc5	Bluetooth: Purge the dlc->tx_queue to avoid circular dependency In rfcomm_tty_cleanup we purge the dlc->tx_queue which may contain socket buffers referencing the tty_port and thus preventing the tty_port destruction. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:08 +02:00
Gianluca Anzolin	ece3150dea	Bluetooth: Fix the reference counting of tty_port The tty_port can be released in two cases: when we get a HUP in the functions rfcomm_tty_hangup() and rfcomm_dev_state_change(). Or when the user releases the device in rfcomm_release_dev(). In these cases we set the flag RFCOMM_TTY_RELEASED so that no other function can get a reference to the tty_port. The use of !test_and_set_bit(RFCOMM_TTY_RELEASED) ensures that the 'initial' tty_port reference is only dropped once. The rfcomm_dev_del function is removed becase it isn't used anymore. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	cad348a17e	Bluetooth: Implement .activate, .shutdown and .carrier_raised methods Implement .activate, .shutdown and .carrier_raised methods of tty_port to manage the dlc, moving the code from rfcomm_tty_install() and rfcomm_tty_cleanup() functions. At the same time the tty .open()/.close() and .hangup() methods are changed to use the tty_port helpers that properly call the aforementioned tty_port methods. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:07 +02:00
Gianluca Anzolin	54b926a143	Bluetooth: Move the tty initialization and cleanup out of open/close Move the tty_struct initialization from rfcomm_tty_open() to rfcomm_tty_install() and do the same for the cleanup moving the code from rfcomm_tty_close() to rfcomm_tty_cleanup(). Add also extra error handling in rfcomm_tty_install() because, unlike .open()/.close(), .cleanup() is not called if .install() fails. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	ebe937f74b	Bluetooth: Remove the device from the list in the destructor The current code removes the device from the device list in several places. Do it only in the destructor instead and in the error path of rfcomm_add_dev() if the device couldn't be initialized. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:06 +02:00
Gianluca Anzolin	396dc223dd	Bluetooth: Take proper tty_struct references In net/bluetooth/rfcomm/tty.c the struct tty_struct is used without taking references. This may lead to a use-after-free of the rfcomm tty. Fix this by taking references properly, using the tty_port_* helpers when possible. The raw assignments of dev->port.tty in rfcomm_tty_open/close are addressed in the later commit 'rfcomm: Implement .activate, .shutdown and .carrier_raised methods'. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Marcel Holtmann	c7882cbd11	Bluetooth: Set different event mask for LE-only controllers In case of a Low Energy only controller it makes no sense to configure the full BR/EDR event mask. It will just enable events that can not be send anyway and there is no guarantee that such a controller will accept this value. Use event mask 0x90 0xe8 0x04 0x02 0x00 0x80 0x00 0x20 for LE-only controllers which enables the following events: Disconnection Complete Encryption Change Read Remote Version Information Complete Command Complete Command Status Hardware Error Number of Completed Packets Data Buffer Overflow Encryption Key Refresh Complete LE Meta This is according to Core Specification, Part E, Section 3. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:05 +02:00
Johan Hedberg	9d225d2208	Bluetooth: Fix getting SCO socket options in deferred state When a socket is in deferred state there does actually exist an underlying connection even though the connection state is not yet BT_CONNECTED. In the deferred state it should therefore be allowed to get socket options that usually depend on a connection, such as SCO_OPTIONS and SCO_CONNINFO. This patch fixes the behavior of some user space code that behaves as follows without it: $ sudo tools/btiotest -i 00:1B:DC:xx:xx:xx -d -s accept=2 reject=-1 discon=-1 defer=1 sec=0 update_sec=0 prio=0 voice=0x0000 Listening for SCO connections bt_io_get(OPT_DEST): getsockopt(SCO_OPTIONS): Transport endpoint is not connected (107) Accepting connection Successfully connected to 60:D8:19:xx:xx:xx. handle=43, class=000000 The conditions that the patch updates the if-statements to is taken from similar code in l2cap_sock.c which correctly handles the deferred state. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-08-21 16:47:04 +02:00

... 3 4 5 6 7 ...

29598 Commits