2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-24 05:04:00 +08:00
Commit Graph

35471 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
24a72acac1 Linux 3.10
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL
 Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO
 Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9
 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS
 bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10
 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ=
 =Ob6Z
 -----END PGP SIGNATURE-----

Merge tag 'v3.10' into next

Merge 3.10 in order to get some of the last minute powerpc
changes, resolve conflicts and add additional fixes on top
of them.
2013-07-01 17:57:25 +10:00
Linus Torvalds
98b6ed0f2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Found via trinity:

    If you connect up an ipv6 socket to an ipv4 mapped address then an
    ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the
    route cached in the socket is an ipv6 one.  In this case there is an
    ipv4 route attached, so it gets stomped on.

    Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric
    Dumazet.

 2) AF_KEY notifications leak some kernel memory to userspace, fix from
    Mathias Krause.

 3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del
    doesn't validate that the device being deleted is actually a DLCI
    one.  Fixes from Li Zefan.

 4) Length check on bluetooth l2cap information responses is wrong, each
    response type has a different lenth, so we should make sure it's in
    a given range rather than enforce one single valid length.  From
    Jaganath Kanakkassery.

 5) Receive FIFO overflow is really easy to trigger in stress scenerios
    in the sh_eth driver, but the event isn't being handled properly at
    all.  Specifically, the mask of error interrupts doesn't include the
    event so we never clear it, resulting in the driver becomming wedged
    processing an interrupt that never gets cleared.

    Fix from Sergei Shtylyov.

 6) qlcnic sleeps while holding a spinlock, use mdelay() instead of
    msleep().  From Shahed Shaikh.

 7) Missing curly braces causes SIP netfilter NAT module to always drop
    packets.  Fix from Balazs Peter Odor.

 8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing
    the timer to dereference crap when it fires.  Fix from Gao Feng.

 9) Missing RCU protection around txq->axq_acq traversal in
    ath_txq_schedule().  Fix from Felix Fietkau.

10) Idle state transition test in ath9k_htc_config() is reversed, fix
    from Sujith Manoharan.

11) IPV6 forwarding handles unicast Router Alert packets incorrectly.
    It tests the wrong option state.  Previously opt->ra being non-zero
    indicated a router alert marking in the SKB, but now it's indicated
    by a bit in opt->flags.  Fix from YOSHIFUJI Hideaki.

12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet.

13) get_user_pages_fast() error handling in TUN and MACVTAP use the same
    local variable for the base index and the loop iterator for page
    traversal, oops! Fix from Michael S Tsirkin.

14) ipv6_get_lladdr() can fail, and we must therefore check it's return
    value in inet6_set_iftoken().  For from Hannes Frederic Sowa.

15) If you change an interface name and meanwhile can sneak in something
    that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can
    deadlock with CONFIG_PREEMPT=n.  Fix this by providing a helper
    function that properly uses raw_seqcount_begin().  From Nicolas
    Schichan.

16) Chain noise calibration test is inverted in iwlwifi, fix from
    Nikolay Martynov.

17) Properly set TX iwlwifi descriptor flags for back requests.  Fix
    from Emmanuel Grumbach.

18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP
    module, fix from Pablo Neira Ayuso.

19) Some crummy APs don't provide the proper High Throughput info in
    association response frames.  Add a workaround by assume we'll use
    whatever is in the beacon/probe.  Fix from Johannes Berg.

20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and
    channel width).  Fix from Simon Wunderlich.

21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented
    frames.  Fix from Phil Oester.

22) Fix rate control regression causing iwlwifi/iwlegacy chips to use
    1Mbit/s on pre-11n networks.  From Moshe Benji and Stanslaw Gruszka.

23) Disable brcmsmac power-save functions, they cause regressions.  From
    Arend van Spriel.

24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can
    easily crash.  Fix from Anderson Lizardo.

25) If a learning packet arrives during vxlan_stop() we crash, easily
    fixed by checking netif_running().  From Stephen Hemminger.

26) Static vxlan FDB entries should not be migrated, also from Stephen.

27) skb_clone() failures not handled in vxlan_xmit(), oops.  Also from
    Stephen.

28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes
    Berg.

29) Fix regression in userspace VLAN acceleration control, added by the
    802.1ad support changes.  Fix from Fernando Luis Vazquez Cao.

30) Interval selection for MLD queries in the bridging code was
    reversed.  Fix from Linus Lüssing.

31) ipv6's ndisc_send_redirect() erroneously writes to the packet we
    received not the packet we are building to send out.  Fix from
    Matthias Schiffer.

32) Don't free netdev before unregistering it, in usb_8dev can driver.
    From Marc Kleine-Budde.

33) Fix nl80211 attribute buffer races, from Johannes Berg.

34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild.
    From Stephen Hemminger.

35) Wrong address and family passed to MD5 key lookups in TCP, from
    Aydin Arik.

36) phy_type attribute created by SFC driver should not be writable.
    From Ben Hutchings.

37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth
    should use kzalloc().  Otherwise if setup fails half-way, we'll
    dereference garbage when trying to teardown the rings.  From Lubomir
    Rintel.

38) Fix double-allocation of dst (resulting in unfreeable net device) in
    ipv6's init_loopback().  From Gao Feng.

39) Fix fragmentation handling SKB leak in netfilter conntrack, we were
    freeing the wrong skb pointer.  From Phil Oester.

40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from
    Nikolay Aleksandrov.

41) davinci_cpdma doesn't check for DMA mapping errors, letting the
    device scribble to random addresses.  From Sebastian Siewior.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
  dlci: validate the net device in dlci_del()
  dlci: acquire rtnl_lock before calling __dev_get_by_name()
  af_key: fix info leaks in notify messages
  ipv6: ip6_sk_dst_check() must not assume ipv6 dst
  net: fix kernel deadlock with interface rename and netdev name retrieval.
  net/tg3: Avoid delay during MMIO access
  ipv6: check return value of ipv6_get_lladdr
  macvtap: fix recovery from gup errors
  tun: fix recovery from gup errors
  gre: fix a possible skb leak
  ipv6: Process unicast packet with Router Alert by checking flag in skb.
  ath9k_htc: Handle IDLE state transition properly
  ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
  netfilter: ipt_ULOG: fix incorrect setting of ulog timer
  netfilter: ctnetlink: send event when conntrack label was modified
  netfilter: nf_nat_sip: fix mangling
  qlcnic: Do not sleep while holding spinlock
  drivers: net: cpsw: fix compilation error with cpsw driver
  tcp: doc : fix the syncookies default value
  sh_eth: fix misreporting of transmit abort
  ...
2013-06-26 19:24:37 -10:00
Nicolas Schichan
5dbe7c178d net: fix kernel deadlock with interface rename and netdev name retrieval.
When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.

This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.

The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.

The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-26 13:42:54 -07:00
Kirill A. Shutemov
3766a1abc5 mm/thp: define HPAGE_PMD_* constants as BUILD_BUG() if !THP
Currently, HPAGE_PMD_* constans rely on PMD_SHIFT regardless of
CONFIG_TRANSPARENT_HUGEPAGE.  PMD_SHIFT is not defined everywhere (e.g.
arm nommu case).

It means we can't use anything like this in generic code:

        if (PageTransHuge(page))
                zero_huge_user(page, 0, HPAGE_PMD_SIZE);
        else
                clear_highpage(page);

For !THP case, PageTransHuge() is 0 and compiler can eliminate
zero_huge_user() call.  But it still need to be valid C expression, means
HPAGE_PMD_SIZE has to expand to something compiler can understand.

Previously, HPAGE_PMD_* were defined to BUILD_BUG() for !THP. Let's come
back to it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-26 09:11:05 +10:00
Eric Dumazet
bd8a7036c0 gre: fix a possible skb leak
commit 68c3316311 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.

This patch adds a kfree_skb_list() helper to fix the bug.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25 16:07:44 -07:00
Linus Torvalds
b8ff768b5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Several fixes for bugs caught while looking through f_pos (ab)users"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aout32 coredump compat fix
  splice: don't pass the address of ->f_pos to methods
  mconsole: we'd better initialize pos before passing it to vfs_read()...
2013-06-22 08:42:20 -10:00
Linus Torvalds
a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds
86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds
4db88eb4c3 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 - Fix inconstinant clock usage in virtual time accounting
 - Fix a build error in KVM caused by the NOHZ work
 - Remove a pointless timekeeping duty assignment which breaks NOHZ
 - Use a proper notifier return value to avoid random behaviour

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Remove useless timekeeping duty attribution to broadcast source
  nohz: Fix notifier return val that enforce timekeeping
  kvm: Move guest entry/exit APIs to context_tracking
  vtime: Use consistent clocks among nohz accounting
2013-06-20 08:15:13 -10:00
Al Viro
7995bd2871 splice: don't pass the address of ->f_pos to methods
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20 19:02:45 +04:00
Aruna Balakrishnaiah
a5e4797b0f powerpc/pseries: Read common partition via pstore
This patch exploits pstore subsystem to read details of common partition
in NVRAM to a separate file in /dev/pstore. For instance, common partition
details will be stored in a file named [common-nvram-6].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:02 +10:00
Aruna Balakrishnaiah
f33f748c96 powerpc/pseries: Read of-config partition via pstore
This patch set exploits the pstore subsystem to read details of
of-config partition in NVRAM to a separate file in /dev/pstore.
For instance, of-config partition details will be stored in a
file named [of-nvram-5].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:59 +10:00
Aruna Balakrishnaiah
69020eea97 powerpc/pseries: Read rtas partition via pstore
This patch set exploits the pstore subsystem to read details of rtas partition
in NVRAM to a separate file in /dev/pstore. For instance, rtas details will be
stored in a file named [rtas-nvram-4].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:52 +10:00
Aneesh Kumar K.V
fde52796d4 mm/THP: don't use HPAGE_SHIFT in transparent hugepage code
For architectures like powerpc that support multiple explicit hugepage
sizes, HPAGE_SHIFT indicate the default explicit hugepage shift.  For THP
to work the hugepage size should be same as PMD_SIZE.  So use PMD_SHIFT
directly.  So move the define outside CONFIG_TRANSPARENT_HUGEPAGE #ifdef
because we want to use these defines in generic code with if
(pmd_trans_huge()) conditional.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:08 +10:00
Olaf Hering
a7bf58040f net: vlan: fix comment for vlan_ethhdr->h_vlan_proto
After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or
ETH_P_8021AD.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:16:50 -07:00
Steven Rostedt
29bb9e5a75 tracing/context-tracking: Add preempt_schedule_context() for tracing
Dave Jones hit the following bug report:

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.10.0-rc2+ #1 Not tainted
 -------------------------------
 include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:
 RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 2 locks held by cc1/63645:
  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0

 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
  0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
  ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
  0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
 Call Trace:
  [<ffffffff816ae383>] dump_stack+0x19/0x1b
  [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
  [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
  [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
  [<ffffffff8108dffc>] update_curr+0xec/0x240
  [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
  [<ffffffff816b3a71>] __schedule+0x161/0x9b0
  [<ffffffff816b4721>] preempt_schedule+0x51/0x80
  [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
  [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
  [<ffffffff816be280>] ftrace_call+0x5/0x2f
  [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b4805>] ? schedule_user+0x5/0x70
  [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
 ------------[ cut here ]------------

What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.

Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.

The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.

The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:55:10 +02:00
David Daney
f21afc25f9 smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
Thanks to commit f91eb62f71 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.

With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
result of the time_init() call.  Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
      show_stack+0x68/0x80
      warn_slowpath_common+0x78/0xb0
      warn_slowpath_fmt+0x38/0x48
      start_kernel+0x250/0x410

Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
need a flags variable, make it a static inline to avoid name space
issues.

[ Change from v1: Convert on_each_cpu to a static inline function, add
  #include <linux/irqflags.h> to avoid build breakage on some files.

  on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
  on_each_cpu(), but they are not causing !SMP bugs for me, so I will
  defer changing them to a less urgent patch. ]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-14 19:24:42 -10:00
Linus Torvalds
cb7e9704d5 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fixes from Paul McKenney:
 "I must confess that this past merge window was not RCU's best showing.
  This series contains three more fixes for RCU regressions:

   1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
        interrupt from idle rather than as a task switch from idle.
        This change is needed due to the recent use of _rcuidle()
        tracepoints that can be invoked from interrupt handlers as well
        as from idle.  Without this fix, invoking _rcuidle() tracepoints
        from interrupt handlers results in splats and (more seriously)
        confusion on RCU's part as to whether a given CPU is idle or not.
        This confusion can in turn result in too-short grace periods and
        therefore random memory corruption.

   2.   A fix to a subtle deadlock that could result due to RCU doing
        a wakeup while holding one of its rcu_node structure's locks.
        Although the probability of occurrence is low, it really
        does happen.  The fix, courtesy of Steven Rostedt, uses
        irq_work_queue() to avoid the deadlock.

   3.   A fix to a silent deadlock (invisible to lockdep) due to the
        interaction of timeouts posted by RCU debug code enabled by
        CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
        hotplug operations.  This will not occur in production kernels,
        but really does occur in randconfig testing.  Diagnosis courtesy
        of Steven Rostedt"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
  rcu: Don't call wakeup() with rcu_node structure ->lock held
  trace: Allow idle-safe tracepoints to be called from irq
2013-06-13 12:36:42 -07:00
Linus Torvalds
26e04462c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:

 1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
    dump all objects properly, from Pablo Neira Ayuso.

 2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
    present.  Fix from Phil Oester.

 3) qdisc_get_rtab() looks for an existing matching rate table and uses
    that instead of creating a new one.  However, it's key matching is
    incomplete, it fails to check to make sure the ->data[] array is
    identical too.  Fix from Eric Dumazet.

 4) ip_vs_dest_entry isn't fully initialized before copying back to
    userspace, fix from Dan Carpenter.

 5) Fix ubuf reference counting regression in vhost_net, from Jason
    Wang.

 6) When sock_diag dumps a socket filter back to userspace, we have to
    translate it out of the kernel's internal representation first.
    From Nicolas Dichtel.

 7) davinci_mdio holds a spinlock while calling pm_runtime, which
    sleeps.  Fix from Sebastian Siewior.

 8) Timeout check in sh_eth_check_reset is off by one, from Sergei
    Shtylyov.

 9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
    from Daniel Borkmann.

10) netlink_mmap() does not propagate errors properly, from Patrick
    McHardy.

11) Disable powersave and use minstrel by default in ath9k.  From Sujith
    Manoharan.

12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
    which prevents vhost from being able to use zerocopy.  From Jason
    Wang.

13) Fix race between port lookup and TX path in team driver, from Jiri
    Pirko.

14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
    Hedberg.

15) rtlwifi fails to connect to networking using any encryption method
    other than WPA2.  Fix from Larry Finger.

16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
    management stuff.  From Yijing Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  b43: stop format string leaking into error msgs
  ath9k: Use minstrel rate control by default
  Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
  ath9k: Disable PowerSave by default
  net: wireless: iwlegacy: fix build error for il_pm_ops
  rtlwifi: Fix a false leak indication for PCI devices
  wl12xx/wl18xx: scan all 5ghz channels
  wl12xx: increase minimum singlerole firmware version required
  wl12xx: fix minimum required firmware version for wl127x multirole
  rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
  mwifiex: debugfs: Fix out of bounds array access
  Bluetooth: Fix mgmt handling of power on failures
  Bluetooth: Fix missing length checks for L2CAP signalling PDUs
  Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
  Bluetooth: Fix checks for LE support on LE-only controllers
  team: fix checks in team_get_first_port_txable_rcu()
  team: move add to port list before port enablement
  team: check return value of team_get_port_by_index_rcu() for NULL
  tuntap: set SOCK_ZEROCOPY flag during open
  netlink: fix error propagation in netlink_mmap()
  ...
2013-06-12 17:18:29 -07:00
Linus Torvalds
b2cc9c19e4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "Outside of bcache (which really isn't super big), these are all
  few-liners.  There are a few important fixes in here:

   - Fix blk pm sleeping when holding the queue lock

   - A small collection of bcache fixes that have been done and tested
     since bcache was included in this merge window.

   - A fix for a raid5 regression introduced with the bio changes.

   - Two important fixes for mtip32xx, fixing an oops and potential data
     corruption (or hang) due to wrong bio iteration on stacked devices."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scatterlist: sg_set_buf() argument must be in linear mapping
  raid5: Initialize bi_vcnt
  pktcdvd: silence static checker warning
  block: remove refs to XD disks from documentation
  blkpm: avoid sleep when holding queue lock
  mtip32xx: Correctly handle bio->bi_idx != 0 conditions
  mtip32xx: Fix NULL pointer dereference during module unload
  bcache: Fix error handling in init code
  bcache: clarify free/available/unused space
  bcache: drop "select CLOSURES"
  bcache: Fix incompatible pointer type warning
2013-06-12 16:42:39 -07:00
Alex Shi
c2853c8df5 include/linux/math64.h: add div64_ul()
There is div64_long() to handle the s64/long division, but no mocro do
u64/ul division.  It is necessary in some scenarios, so add this
function.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:47 -07:00
Naoya Horiguchi
30dad30922 mm: migration: add migrate_entry_wait_huge()
When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes.  As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.

This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage.  This patch introduces
migration_entry_wait_huge() to solve this.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>	[2.6.35+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
Kees Cook
637241a900 kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections.  Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions.  With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

 - /proc/kmsg allows:
  - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
  - everything, after an open.

 - syslog syscall allows:
  - anything, if CAP_SYSLOG.
  - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
  - nothing else (EPERM).

The use-cases were:
 - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
 - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
   destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

 - /dev/kmsg allows:
  - open if CAP_SYSLOG or dmesg_restrict==0
  - reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Srivatsa S. Bhat
16e53dbf10 CPU hotplug: provide a generic helper to disable/enable CPU hotplug
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation.  Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Jiri Pirko
b79462a8b9 team: fix checks in team_get_first_port_txable_rcu()
should be checked if "cur" is txable, not "port".

Introduced by commit 6e88e1357c "team: use function team_port_txable()
for determing enabled and up port"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:56:27 -07:00
Nicolas Dichtel
ed13998c31 sock_diag: fix filter code sent to userspace
Filters need to be translated to real BPF code for userland, like SO_GETFILTER.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 22:23:32 -07:00
Paul E. McKenney
d62840995a trace: Allow idle-safe tracepoints to be called from irq
__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
may safely be invoked from what RCU considers to be an idle CPU.
However, these _rcuidle() tracepoints may -not- be invoked from the
handler of an irq taken from idle, because rcu_idle_enter() zeroes
RCU's nesting-level counter, so that the rcu_irq_exit() returning to
idle will trigger a WARN_ON_ONCE().

This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
tracepoints usable from irq handlers as well as from process context.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-10 13:37:10 -07:00
Linus Torvalds
14d0ee0517 This contains 4 fixes.
The first two fix the case where full RCU debugging is enabled, enabling
 function tracing causes a live lock of the system. This is due to the added
 debug checks in rcu_dereference_raw() that is used by the function tracer.
 These checks are also traced by the function tracer as well as cause enough
 overhead to the function tracer to slow down the system enough that
 the time to finish an interrupt can take longer than when the next
 interrupt is triggered, causing a live lock from the timer interrupt.
 
 Talking this over with Paul McKenney, we came up with a fix that adds
 a new rcu_dereference_raw_notrace() that does not perform these added checks,
 and let the function tracer use that.
 
 The third commit fixes a failed compile when branch tracing is enabled,
 due to the conversion of the trace_test_buffer() selftest that the
 branch trace wasn't converted for.
 
 The forth patch fixes a bug caught by the RCU lockdep code where a
 rcu_read_lock() is performed when rcu is disabled (either going to
 or from idle, or user space). This happened on the irqsoff tracer
 as it calls task_uid(). The fix here was to use current_uid() when
 possible that doesn't use rcu locking. Which luckily, is always used
 when irqsoff calls this code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJRsQZhAAoJEOdOSU1xswtMquIH/0zyrqrLTnkc5MsNnnJ8kH5R
 z1cULts4FqBTUNZ1hdb3BTOu4zywjREIkWfM9qqpBmq9Mq6PBxX7gxWTqYvD4jiX
 EatiiCKa7Fyddx4iHJNfvtWgKVYt9WKSNeloRugS9h7NxIZ1wpz21DUpENFQzW2f
 jWRnq/AKXFmZ0vn1953mPePtRsg61RYpb7DCkTE1gtUnvL43wMd/Mo6p6BLMEG26
 1dDK6EWO/uewl8A4oP5JZYP+AP5Ckd4x1PuQK682AtQw+8S6etaGfeJr0WZmKQoD
 0aDZ/NXXSNKChlUFGJusBNJCWryONToa+sdiKuk1h/lW/k9Mail/FChiHBzMiwk=
 =uvlD
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This contains 4 fixes.

  The first two fix the case where full RCU debugging is enabled,
  enabling function tracing causes a live lock of the system.  This is
  due to the added debug checks in rcu_dereference_raw() that is used by
  the function tracer.  These checks are also traced by the function
  tracer as well as cause enough overhead to the function tracer to slow
  down the system enough that the time to finish an interrupt can take
  longer than when the next interrupt is triggered, causing a live lock
  from the timer interrupt.

  Talking this over with Paul McKenney, we came up with a fix that adds
  a new rcu_dereference_raw_notrace() that does not perform these added
  checks, and let the function tracer use that.

  The third commit fixes a failed compile when branch tracing is
  enabled, due to the conversion of the trace_test_buffer() selftest
  that the branch trace wasn't converted for.

  The forth patch fixes a bug caught by the RCU lockdep code where a
  rcu_read_lock() is performed when rcu is disabled (either going to or
  from idle, or user space).  This happened on the irqsoff tracer as it
  calls task_uid().  The fix here was to use current_uid() when possible
  that doesn't use rcu locking.  Which luckily, is always used when
  irqsoff calls this code."

* tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use current_uid() for critical time tracing
  tracing: Fix bad parameter passed in branch selftest
  ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
  rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
2013-06-07 18:46:51 -07:00
Andy Lutomirski
a7526eb5d0 net: Unbreak compat_sys_{send,recv}msg
I broke them in this commit:

    commit 1be374a051
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Wed May 22 14:07:44 2013 -0700

        net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept
MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints.  It
also reverts some unnecessary checks in sys_socketcall.

Apparently I was suffering from underscore blindness the first time around.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 11:52:14 -07:00
Linus Torvalds
4d3797d7e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix timeouts with direct mode authentication in mac80211, from
    Stanislaw Gruszka.

 2) Aggregation sessions can deadlock in ath9k, from Felix Fietkau.

 3) Netfilter's xt_addrtype doesn't work with ipv6 due to route lookups
    creating undesirable cache entries, from Florian Westphal.

 4) Fix netfilter's ipt_ULOG from generating non-NULL terminated
    strings.

 5) Fix netdev transmit queue crashes in mac80211, from Johannes Berg.

 6) Fix copy and paste error in 802.11 stack that broke reporting of
    64-bit station tx statistics, from Felix Fietkau.

 7) When qlge_probe fails, it leaks the netdev.  Fix from Wei Yongjun.

 8) SKB control block (where we store the IP options information,
    amongst other things) must be cleared properly otherwise ICMP
    sending can crash for IP tunnels.  Fix from Eric Dumazet.

 9) Verification of Energy Efficient Ether support was coded wrongly,
    the test was inversed.  Fix from Giuseppe CAVALLARO.

10) TCP handles redirects improperly because the wrong flow key is used
    for the route lookup.  From Michal Kubecek.

11) Don't interpret MSG_CMSG_COMPAT from userspace, fix from Andy
    Lutomirski.

12) The new AF_VSOCK was missing from the lockdep string table, fix from
    Federico Vaga.

13) be2net doesn't handle checksumming of IP fragments properly, from
    Somnath Kotur.

14) Fix several bugs in the device address list code that lead to
    crashes and other misbehaviors.  From Jay Vosburgh.

15) Fix ipv6 segmentation handling of fragmented GRE tunnel traffic,
    from Pravin B Shalr.

16) Fix usage of stale policies in IPSEC layer, from Paul Moore.

17) Fix team driver dump of ports when there are a large number of them,
    from Jiri Pirko.

18) Fix softlockups in UDP ipv4 socket lookup causes by and error in the
    hlist_nulls_for_each_entry_rcu() macro.  From Eric Dumazet.

19) Fix several regressions added by the high rate accuracy changes to
    the htb packet scheduler.  From Eric Dumazet.

20) Fix DMA'ing onto the stack in esd_usb2 and peak_usb CAN drivers,
    from Olivier Sobrie and Marc Kleine-Budde.

21) Fix unremovable network devices due to missing route pointer
    installation in the per-device ipv6 address list entries.  From Gao
    feng.

22) Apply the tg3 5719 DMA workaround on 5720 chips as well, otherwise
    we get stalls.  From Nithin Sujir.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
  net_sched: htb: do not mix 1ns and 64ns time units
  net: fix sk_buff head without data area
  tg3: Add read dma workaround for 5720
  net: ethernet: xilinx_emaclite: set protocol selector bits when writing ANAR
  bnx2x: Fix bridged GSO for 57710/57711 chips
  net: fec: add fallback to random MAC address
  bnx2x: fix TCP offload for tunneling ipv4 over ipv6
  ipv6: assign rt6_info to inet6_ifaddr in init_loopback
  net/mlx4_core: Keep VF assigned MAC in the PF admin table
  net/mlx4_en: Handle unassigned VF MAC address correctly
  net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized
  net/mlx4_en: Fix adaptive moderation cq update
  net: can: peak_usb: Do not do dma on the stack
  net: can: esd_usb2: Do not do dma on the stack
  net: can: kvaser_usb: fix reception on "USBcan Pro" and "USBcan R" type hardware.
  net_sched: restore "overhead xxx" handling
  net: force a reload of first item in hlist_nulls_for_each_entry_rcu
  hyperv: Fix vlan_proto setting in netvsc_recv_callback()
  team: fix port list dump for big number of ports
  list: introduce list_first_entry_or_null
  ...
2013-06-05 19:19:04 +09:00
Linus Torvalds
7d80fea426 Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Fix for yet another xattr bug which may lead to NULL deref.

 - A subtle bug in for_each_descendant_pre().  This bug requires quite
   specific conditions to trigger and isn't too likely to actually
   happen in the wild, but maybe that just makes it that much more
   nastier.

 - A warning message added for silly cgroup re-mount (not -o remount,
   but unmount followed by mount) behavior.

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: warn about mismatching options of a new mount of an existing hierarchy
  cgroup: fix a subtle bug in descendant pre-order walk
  cgroup: initialize xattr before calling d_instantiate()
2013-06-03 17:57:16 +09:00
Eric Dumazet
c87a124a5d net: force a reload of first item in hlist_nulls_for_each_entry_rcu
Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
first item in the rcu protected list, in case the loop was restarted.

This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37

rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
ptr->field :

In some cases, gcc caches the value or ptr->field in a register.

Use a barrier() to disallow such caching, as documented in
Documentation/atomic_ops.txt line 114

Thanks a lot to Roman for providing analysis and numerous patches.

Diagnosed-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-02 20:53:59 -07:00
Jiri Pirko
6d7581e62f list: introduce list_first_entry_or_null
non-rcu variant of list_first_or_null_rcu

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:31:52 -07:00
Pravin B Shelar
1e2bd517c1 udp6: Fix udp fragmentation for tunnel traffic.
udp6 over GRE tunnel does not work after to GRE tso changes. GRE
tso handler passes inner packet but keeps track of outer header
start in SKB_GSO_CB(skb)->mac_offset.  udp6 fragment need to
take care of outer header, which start at the mac_offset, while
adding fragment header.
This bug is introduced by commit 68c3316311 (GRE: Add TCP
segmentation offload for GRE).

Reported-by: Dmitry Kravkov <dkravkov@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:06:07 -07:00
Frederic Weisbecker
521921bad1 kvm: Move guest entry/exit APIs to context_tracking
The kvm_host.h header file doesn't handle well
inclusion when archs don't support KVM.

This results in build crashes for such archs when they
want to implement context tracking because this subsystem
includes kvm_host.h in order to implement the
guest_enter/exit APIs but it doesn't handle KVM off case.

To fix this, move the guest_enter()/guest_exit()
declarations and generic implementation to the context
tracking headers. These generic APIs actually belong to
this subsystem, besides other domains boundary tracking
like user_enter() et al.

KVM now properly becomes a user of this library, not the
other buggy way around.

Reported-by: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 11:32:30 +02:00
Frederic Weisbecker
45eacc6927 vtime: Use consistent clocks among nohz accounting
While computing the cputime delta of dynticks CPUs,
we are mixing up clocks of differents natures:

* local_clock() which takes care of unstable clock
sources and fix these if needed.

* sched_clock() which is the weaker version of
local_clock(). It doesn't compute any fixup in case
of unstable source.

If the clock source is stable, those two clocks are the
same and we can safely compute the difference against
two random points.

Otherwise it results in random deltas as sched_clock()
can randomly drift away, back or forward, from local_clock().

As a consequence, some strange behaviour with unstable tsc
has been observed such as non progressing constant zero cputime.
(The 'top' command showing no load).

Fix this by only using local_clock(), or its irq safe/remote
equivalent, in vtime code.

Reported-by: Mike Galbraith <efault@gmx.de>
Suggested-by: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 11:31:50 +02:00
David S. Miller
73ce00d4d6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
they are:

* fix xt_addrtype with IPv6, from Florian Westphal. This required
  a new hook for IPv6 functions in the netfilter core to avoid
  hard dependencies with the ipv6 subsystem when this match is
  only used for IPv4.

* fix connection reuse case in IPVS. Currently, if an reused
  connection are directed to the same server. If that server is
  down, those connection would fail. Therefore, clear the
  connection and choose a new server among the available ones.

* fix possible non-nul terminated string sent to user-space if
  ipt_ULOG is used as the default netfilter logging stub, from
  Chen Gang.

* fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
  This bug has been there since 2.6.26.

* Fix breakage ip_vs_sh due to incorrect structure layout for
  RCU, from Jan Beulich.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 16:38:38 -07:00
Lance Ortiz
37448adfc7 aerdrv: Move cper_print_aer() call out of interrupt context
The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-05-30 10:51:20 -07:00
Rusty Russell
ac4e97abce scatterlist: sg_set_buf() argument must be in linear mapping
Add a check behind CONFIG_DEBUG_SG to verify this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-05-30 09:20:20 +02:00
Steven Rostedt
12bcbe66d7 rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Add a new interface to RCU for both rcu_dereference_raw_notrace() as well
as hlist_for_each_entry_rcu_notrace() as the hlist iterator uses the
rcu_dereference_raw() as well, and is used a bit with the function tracer.

Link: http://lkml.kernel.org/r/20130528184209.304356745@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-28 22:47:13 -04:00
Peter Zijlstra
26cb63ad11 perf: Fix perf mmap bugs
Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma->vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:05:08 +02:00
Linus Torvalds
27a24cfa04 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dma fixes from Vinod Koul:
 "We have two patches from Andy & Rafael fixing the Lynxpoint dma"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  ACPI / LPSS: register clock device for Lynxpoint DMA properly
  dma: acpi-dma: parse CSRT to extract additional resources
2013-05-25 20:30:31 -07:00
Linus Torvalds
9cf1848278 Merge branch 'akpm' (incoming from Andrew Morton)
Merge fixes from Andrew Morton:
 "A bunch of fixes and one simple fbdev driver which missed the merge
  window because people will still talking about it (to no great
  effect)."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits)
  aio: fix kioctx not being freed after cancellation at exit time
  mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
  drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
  ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
  random: fix accounting race condition with lockless irq entropy_count update
  drivers/char/random.c: fix priming of last_data
  mm/memory_hotplug.c: fix printk format warnings
  nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
  drivers/block/brd.c: fix brd_lookup_page() race
  fbdev: FB_GOLDFISH should depend on HAS_DMA
  drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
  auditfilter.c: fix kernel-doc warnings
  aio: fix io_getevents documentation
  revert "selftest: add simple test for soft-dirty bit"
  drivers/leds/leds-ot200.c: fix error caused by shifted mask
  mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
  linux/kernel.h: fix kernel-doc warning
  mm compaction: fix of improper cache flush in migration code
  rapidio/tsi721: fix bug in MSI interrupt handling
  hfs: avoid crash in hfs_bnode_create
  ...
2013-05-24 18:12:15 -07:00
Linus Torvalds
00cec111ac ARM: SoC fixes for 3.10-rc
We didn't have any fixes sent up for -rc2, so this is a slightly larger
 batch. A bit all over the place platform-wise; OMAP, at91, marvell,
 renesas, sunxi, ux500, etc.
 
 I tried to summarize highlights but there isn't a whole lot to point
 out. Lots of little things fixed all over. A couple of defconfig updates
 due to new/changing options.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRn+66AAoJEIwa5zzehBx3f/4P/3sqK2z7u5SSa+tpkKYkxezO
 MykUOpUc4tuwrKiuUEeXPh89pjIrclQVzKYDqdaXIcezKB7IXFfQSyLNxDzGM7Y3
 NrqrURvNpDmUi6F/xP89gXWkbvg2zr563mxSMpkF4G8HTYxvCv7sY0W/PDzb48Qg
 q3Efc/AfQsOGM0Zl8WXX9jZBdCNTquZYWd7YEFxCe9oVRGfmNlIYKirPOLnk9MAZ
 iIrbfFQG+cOos0NKrjM+tbtQNnPUBQeZdy3MvR7DtXCpKNdxs5EJL9EyHHqGGzPk
 Jd1CG3hNrPiuKVhmVSDkELNDYNT7J9+/rya1Mc33pwMrReAIKWUU/sitvsOkKypi
 PnTvlJkUv3UM2fOtoF8mOhQLTGdKSWfaF0BfHNmnCqJFDPs8vuQjx7O1WGKKUdC4
 SbYZetUnL3AoMrEbza5EoMyqQwpTXiALWp4/6MunLhNyXo0OQvsqexvT6QIrhKyQ
 0iExuSSbXDZAw2GXsqzOlCJecxHYG4qkGbf1DmeTW31GRQQ3nhpiu0GVAvHIEAhT
 SJGD57P0gbEXMpnSIKLNKIAYlLdFkGx6AhX1/Xb+AL7Jsod+UEwgz2ya4GMI4JI/
 0lUe8fPglU3ws8Up1y5FcIq4gjXTsvsEy317zeW/oq2vac0ACKBika2EVukdMD/5
 SYr+m40EzQtVdTKuaZ+e
 =8tMi
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We didn't have any fixes sent up for -rc2, so this is a slightly
  larger batch.  A bit all over the place platform-wise; OMAP, at91,
  marvell, renesas, sunxi, ux500, etc.

  I tried to summarize highlights but there isn't a whole lot to point
  out.  Lots of little things fixed all over.  A couple of defconfig
  updates due to new/changing options."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
  ARM: at91/sama5: fix incorrect PMC pcr div definition
  ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition
  ARM: at91: at91sam9n12: move external irq declatation to DT
  ARM: shmobile: marzen: Use error values in usb_power_*
  ARM: tegra: defconfig fixes
  ARM: nomadik: fix IRQ assignment for SMC ethernet
  ARM: vt8500: Add missing NULL terminator in dt_compat
  clk: tegra: add ac97 controller clock
  clk: tegra: remove USB from clk init table
  ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node
  ARM: plat-orion: Fix num_resources and id for ge10 and ge11
  ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis
  SERIAL: OMAP: Remove the slave idle handling from the driver
  ARM: OMAP2+: serial: Remove the un-used slave idle hooks
  ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes
  ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active
  ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc()
  arm: mvebu: fix the 'ranges' property to handle PCIe
  ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform
  ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock
  ...
2013-05-24 16:27:37 -07:00
Randy Dunlap
7450231fb3 linux/kernel.h: fix kernel-doc warning
Fix kernel-doc warning in <linux/kernel.h>:

  Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
Imre Deak
4c663cfc52 wait: fix false timeouts when using wait_event_timeout()
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: "Paul E.  McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
bc8fcfea18 rapidio: add enumeration/discovery start from user space
Add RapidIO enumeration/discovery start from user space.  User space
start allows to defer RapidIO fabric scan until the moment when all
participating endpoints are initialized avoiding mandatory synchronized
start of all endpoints (which may be challenging in systems with large
number of RapidIO endpoints).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
a11650e110 rapidio: make enumeration/discovery configurable
Systems that use RapidIO fabric may need to implement their own
enumeration and discovery methods which are better suitable for needs of
a target application.

The following set of patches is intended to simplify process of
introduction of new RapidIO fabric enumeration/discovery methods.

The first patch offers ability to add new RapidIO enumeration/discovery
methods using kernel configuration options.  This new configuration
option mechanism allows to select statically linked or modular
enumeration/discovery method(s) from the list of existing methods or use
external module(s).

This patch also updates the currently existing enumeration/discovery
code to be used as a statically linked or modular method.

The corresponding configuration option is named "Basic
enumeration/discovery" method.  This is the only one configuration
option available today but new methods are expected to be introduced
after adoption of provided patches.

The second patch address a long time complaint of RapidIO subsystem
users regarding fabric enumeration/discovery start sequence.  Existing
implementation offers only a boot-time enumeration/discovery start which
requires synchronized boot of all endpoints in RapidIO network.  While
it works for small closed configurations with limited number of
endpoints, using this approach in systems with large number of endpoints
is quite challenging.

To eliminate requirement for synchronized start the second patch
introduces RapidIO enumeration/discovery start from user space.

For compatibility with the existing RapidIO subsystem implementation,
automatic boot time enumeration/discovery start can be configured in by
specifying "rio-scan.scan=1" command line parameter if statically linked
basic enumeration method is selected.

This patch:

Rework to implement RapidIO enumeration/discovery method selection
combined with ability to use enumeration/discovery as a kernel module.

This patch adds ability to introduce new RapidIO enumeration/discovery
methods using kernel configuration options.  Configuration option
mechanism allows to select statically linked or modular
enumeration/discovery method from the list of existing methods or use
external modules.  If a modular enumeration/discovery is selected each
RapidIO mport device can have its own method attached to it.

The existing enumeration/discovery code was updated to be used as
statically linked or modular method.  This configuration option is named
"Basic enumeration/discovery" method.

Several common routines have been moved from rio-scan.c to make them
available to other enumeration methods and reduce number of exported
symbols.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Linus Torvalds
eb3d33900a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "It's been a while since my last pull request so quite a few fixes have
  piled up."

Indeed.

 1) Fix nf_{log,queue} compilation with PROC_FS disabled, from Pablo
    Neira Ayuso.

 2) Fix data corruption on some tg3 chips with TSO enabled, from Michael
    Chan.

 3) Fix double insertion of VLAN tags in be2net driver, from Sarveshwar
    Bandi.

 4) Don't have TCP's MD5 support pass > PAGE_SIZE page offsets in
    scatter-gather entries into the crypto layer, the crypto layer can't
    handle that.  From Eric Dumazet.

 5) Fix lockdep splat in 802.1Q MRP code, also from Eric Dumazet.

 6) Fix OOPS in netfilter log module when called from conntrack, from
    Hans Schillstrom.

 7) FEC driver needs to use netif_tx_{lock,unlock}_bh() rather than the
    non-BH disabling variants.  From Fabio Estevam.

 8) TCP GSO can generate out-of-order packets, fix from Eric Dumazet.

 9) vxlan driver doesn't update 'used' field of fdb entries when it
    should, from Sridhar Samudrala.

10) ipv6 should use kzalloc() to allocate inet6 socket cork options,
    otherwise we can OOPS in ip6_cork_release().  From Eric Dumazet.

11) Fix races in bonding set mode, from Nikolay Aleksandrov.

12) Fix checksum generation regression added by "r8169: fix 8168evl
    frame padding.", from Francois Romieu.

13) ip_gre can look at stale SKB data pointer, fix from Eric Dumazet.

14) Fix checksum handling when GSO is enabled in bnx2x driver with
    certain chips, from Yuval Mintz.

15) Fix double free in batman-adv, from Martin Hundebøll.

16) Fix device startup synchronization with firmware in tg3 driver, from
    Nithin Sujit.

17) perf networking dropmonitor doesn't work at all due to mixed up
    trace parameter ordering, from Ben Hutchings.

18) Fix proportional rate reduction handling in tcp_ack(), from Nandita
    Dukkipati.

19) IPSEC layer doesn't return an error when a valid state is detected,
    causing an OOPS.  Fix from Timo Teräs.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  be2net: bug fix on returning an invalid nic descriptor
  tcp: xps: fix reordering issues
  net: Revert unused variable changes.
  xfrm: properly handle invalid states as an error
  virtio_net: enable napi for all possible queues during open
  tcp: bug fix in proportional rate reduction.
  net: ethernet: sun: drop unused variable
  net: ethernet: korina: drop unused variable
  net: ethernet: apple: drop unused variable
  qmi_wwan: Added support for Cinterion's PLxx WWAN Interface
  perf: net_dropmonitor: Remove progress indicator
  perf: net_dropmonitor: Use bisection in symbol lookup
  perf: net_dropmonitor: Do not assume ordering of dictionaries
  perf: net_dropmonitor: Fix symbol-relative addresses
  perf: net_dropmonitor: Fix trace parameter order
  net: fec: use a more proper compatible string for MVF type device
  qlcnic: Fix updating netdev->features
  qlcnic: remove netdev->trans_start updates within the driver
  qlcnic: Return proper error codes from probe failure paths
  tg3: Update version to 3.132
  ...
2013-05-24 08:27:32 -07:00
Tejun Heo
7805d000db cgroup: fix a subtle bug in descendant pre-order walk
When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
2013-05-24 10:50:24 +09:00