Commit Graph

48062 Commits

Author SHA1 Message Date
Linus Torvalds
f1cbd03f5e Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Been sitting on this for a while, but lets get this out the door.
  This fixes various important bugs for 3.3 final, along with a few more
  trivial ones.  Please pull!"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix ioc leak in put_io_context
  block, sx8: fix pointer math issue getting fw version
  Block: use a freezable workqueue for disk-event polling
  drivers/block/DAC960: fix -Wuninitialized warning
  drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning
  block: fix __blkdev_get and add_disk race condition
  block: Fix setting bio flags in drivers (sd_dif/floppy)
  block: Fix NULL pointer dereference in sd_revalidate_disk
  block: exit_io_context() should call elevator_exit_icq_fn()
  block: simplify ioc_release_fn()
  block: replace icq->changed with icq->flags
2012-03-14 17:16:45 -07:00
Linus Torvalds
e304dfdb03 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking from David Miller:

1) IPV4 routing metrics can become stale when routes are changed by the
   administrator, fix from Steffen Klassert.

2) atl1c does "val |= XXX;" where XXX is a bit number not a bit mask,
   fix by using set_bit.  From Dan Carpenter.

3) Memory accounting bug in carl9170 driver results in wedged TX queue.
   Fix from Nicolas Cavallari.

4) iwlwifi accidently uses "sizeof(ptr)" instead of "sizeof(*ptr)", fix
   from Johannes Berg.

5) Openvswitch doesn't honor dp_ifindex when doing vport lookups, fix
   from Ben Pfaff.

6) ehea conversion to 64-bit stats lost multicast and rx_errors
   accounting, fix from Eric Dumazet.

7) Bridge state transition logging in br_stp_disable_port() is busted,
   it's emitted at the wrong time and the message is in the wrong tense,
   fix from Paulius Zaleckas.

8) mlx4 device erroneously invokes the queue resize firmware operation
   twice, fix from Jack Morgenstein.

9) Fix deadlock in usbnet, need to drop lock when invoking usb_unlink_urb()
   otherwise we recurse into taking it again.  Fix from Sebastian Siewior.

10) hyperv network driver uses the wrong driver name string, fix from
    Haiyang Zhang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net/hyperv: Use the built-in macro KBUILD_MODNAME for this driver
  net/usbnet: avoid recursive locking in usbnet_stop()
  route: Remove redirect_genid
  inetpeer: Invalidate the inetpeer tree along with the routing cache
  mlx4_core: fix bug in modify_cq wrapper for resize flow.
  atl1c: set ATL1C_WORK_EVENT_RESET bit correctly
  bridge: fix state reporting when port is disabled
  bridge: br_log_state() s/entering/entered/
  ehea: restore multicast and rx_errors fields
  openvswitch: Fix checksum update for actions on UDP packets.
  openvswitch: Honor dp_ifindex, when specified, for vport lookup by name.
  iwlwifi: fix wowlan suspend
  mwifiex: reset encryption mode flag before association
  carl9170: fix frame delivery if sta is in powersave mode
  carl9170: Fix memory accounting when sta is in power-save mode.
2012-03-09 07:14:44 -08:00
Linus Torvalds
ee0849c911 Minor bug fixes and documentation updates for v3.3-rc5
Fixes up a duplicate #include, adds an empty implementation of
 of_find_compatible_node() and make git ignore .dtb files.  And fix
 up bus name on OF described PHYs.  Nothing exciting here.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPWPvsAAoJEEFnBt12D9kB8QsP/2e6jv7CQ/wUhihzw126Y8yi
 4HLG3ByJ382/c3LXg3juC+WfMvlpqEa0yqhikFKgIsm7cNbv7PkPS583pujdSXo0
 IRykiT3Q9+XbO7L+9C88miYIQT8+IT/AjIjfuRwlP4gLPNUJhknpYhYOc09YzwOp
 zhheeGVyeyTay+beQXip8gOEq2dYEl4IKsItagCBZfkDvj3Y8yDwTlP7f2j0FYZi
 uODa3NFKE4uc0U2chtro0Vt87TRfbnIQ93SbvEWyQBWMEbPqT3l1Q94AeFGubCLj
 RX910WvurCE4evzo2ZJzXPt9gPEyGIgtMGbjh+cENfT5/wNB2HWk1ftKnss5yEjp
 WmcbwWKwXPwRPc5EpRdgabBCRgouCZghtcnJpkoXKSwbEt/nj3no8GOZXQnv+rSL
 Ga0BSk1jJYC9DRpaIrLvdxXZF7vy1nug8fgU9ALi2R0gTmN+k6tHlLh60EWMSCBF
 YCXv3fUd9IVJfOYsu4qMk0Pu40pW9rDc698Nxiep7C0iNhlddq8fiHTwB3DdoinC
 iYmz+wEdDDp/68qY/mguXXtr5GAFSz1PUOwSJkh2zW6LnBDEv1djfhoA+5gwZX2v
 I1IwfNHMn/XAEuhOFWMo9CejB0rDoBhigwucH8EilbIdBYVCCEtYaubO/Gq6HmId
 h7hx9sWjrwF9HY0N9EHX
 =/fxH
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull minor devicetree bug fixes and documentation updates from Grant Likely:
 "Fixes up a duplicate #include, adds an empty implementation of
  of_find_compatible_node() and make git ignore .dtb files.  And fix up
  bus name on OF described PHYs.  Nothing exciting here."

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  doc: dt: Fix broken reference in gpio-leds documentation
  of/mdio: fix fixed link bus name
  of/fdt.c: asm/setup.h included twice
  of: add picochip vendor prefix
  dt: add empty of_find_compatible_node function
  ARM: devicetree: Add .dtb files to arch/arm/boot/.gitignore
2012-03-08 17:24:27 -08:00
Steffen Klassert
ac3f48de09 route: Remove redirect_genid
As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:30:32 -08:00
Steffen Klassert
5faa5df1fa inetpeer: Invalidate the inetpeer tree along with the routing cache
We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:30:24 -08:00
Linus Torvalds
4f262acfde Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King.

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7358/1: perf: add PMU hotplug notifier
  ARM: 7357/1: perf: fix overflow handling for xscale2 PMUs
  ARM: 7356/1: perf: check that we have an event in the PMU IRQ handlers
  ARM: 7355/1: perf: clear overflow flag when disabling counter on ARMv7 PMU
  ARM: 7354/1: perf: limit sample_period to half max_period in non-sampling mode
  ARM: ecard: ensure fake vma vm_flags is setup
  ARM: 7346/1: errata: fix PL310 erratum #753970 workaround selection
  ARM: 7345/1: errata: update workaround for A9 erratum #743622
  ARM: 7348/1: arm/spear600: fix one-shot timer
  ARM: 7339/1: amba/serial.h: Include types.h for resolving dependency of type bool
2012-03-07 08:33:03 -08:00
Linus Torvalds
3e85fb9cd4 Merge branch 'akpm' (Andrew's patch bomb)
Merge the emailed seties of 19 patches from Andrew Morton

* akpm:
  rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
  memcg: fix mapcount check in move charge code for anonymous page
  mm: thp: fix BUG on mm->nr_ptes
  alpha: fix 32/64-bit bug in futex support
  memcg: fix GPF when cgroup removal races with last exit
  debugobjects: Fix selftest for static warnings
  floppy/scsi: fix setting of BIO flags
  memcg: fix deadlock by inverting lrucare nesting
  drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
  c2port: class_create() returns an ERR_PTR
  pps: class_create() returns an ERR_PTR, not NULL
  hung_task: fix the broken rcu_lock_break() logic
  vfork: kill PF_STARTING
  coredump_wait: don't call complete_vfork_done()
  vfork: make it killable
  vfork: introduce complete_vfork_done()
  aio: wake up waiters when freeing unused kiocbs
  kprobes: return proper error code from register_kprobe()
  kmsg_dump: don't run on non-error paths by default
2012-03-05 15:50:25 -08:00
Hugh Dickins
7512102cf6 memcg: fix GPF when cgroup removal races with last exit
When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).

Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.

A task, or a charge, or a page on lru: each secures a memcg against
removal.  In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.

Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons.  So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.

There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed.  We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().

I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f3e ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.

But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
change?  just going back to how it worked before; testing, and an audit of
uses of pc->mem_cgroup, show no problem.

And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2b9 ("memcg:
clear pc->mem_cgroup if necessary").

Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:43 -08:00
Oleg Nesterov
6e27f63edb vfork: kill PF_STARTING
Previously it was (ab)used by utrace.  Then it was wrongly used by the
scheduler code.

Currently it is not used, kill it before it finds the new erroneous user.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
57b59c4a14 coredump_wait: don't call complete_vfork_done()
Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
d68b46fe16 vfork: make it killable
Make vfork() killable.

Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
fails we do not return to the user-mode and never touch the memory shared
with our child.

However, in this case we should clear child->vfork_done before return, we
use task_lock() in do_fork()->wait_for_vfork_done() and
complete_vfork_done() to serialize with each other.

Note: now that we use task_lock() we don't really need completion, we
could turn task->vfork_done into "task_struct *wake_up_me" but this needs
some complications.

NOTE: this and the next patches do not affect in-kernel users of
CLONE_VFORK, kernel threads run with all signals ignored including
SIGKILL/SIGSTOP.

However this is obviously the user-visible change.  Not only a fatal
signal can kill the vforking parent, a sub-thread can do execve or
exit_group() and kill the thread sleeping in vfork().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
c415c3b47e vfork: introduce complete_vfork_done()
No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Matthew Garrett
c22ab33290 kmsg_dump: don't run on non-error paths by default
Since commit 04c6862c05 ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.

This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace.  In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.

This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace).  Without it, the code will only be run on
failure paths rather than on normal paths.  The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Linus Torvalds
aa139092de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

1) TCP SACK processing can calculate an incorrect reordering value in
   some cases, fix from Neal Cardwell.

2) tcp_mark_head_lost() can split SKBs in situations where it should
   not, violating send queue invariants expected by other pieces of
   code and thus resulting (eventually) in corrupted retransmit state
   counters.  Also from Neal Cardwell.

3) qla3xxx erroneously calls spin_lock_irqrestore() with constant
   hw_flags of zero.  Fix from Santosh Nayak.

4) Fix NULL deref in rt2x00, from Gabor Juhos.

5) pch_gbe passes address of wrong typed object to pch_gbe_validate_option
   thus corrupting part of the value.  From Dan Carpenter.

6) We must check the return value of nlmsg_parse() before trying to use
   the results.  From Eric Dumazet.

7) Bridging code fails to check return value of ipv6_dev_get_saddr()
   thus potentially leaving uninitialized garbage in the outgoing ipv6
   header.  From Ulrich Weber.

8) Due to rounding and a reversed operation on jiffies, bridge message
   ages can go backwards instead of forwards, thus breaking STP.  Fixes
   from Joakim Tjernlund.

9) r8169 modifies Config* registers without properly holding the
   Config9346 lock, resulting in corrupted IP fragments on some chips.
   Fix from Francois Romieu.

10) NET_PACKET_ENGINE default wan't set properly during the network
   driver mega-move.  Fix from Stephen Hemminger.

11) vmxnet3 uses TCP header size where it actually should use the UDP
   header size, fix from Shreyas Bhatewara.

12) Netfilter bridge module autoload is busted in the compat case, fix
   from Florian Westphal.

13) Wireless Key removal was not setting multicast bits correctly thus
   accidently killing the unicast key 0 and thus all traffic stops.
   Fix from Johannes Berg.

14) Fix endless retries of A-MPDU transmissions in brcm80211 driver.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
  qla3xxx: ethernet: Fix bogus interrupt state flag.
  bridge: check return value of ipv6_dev_get_saddr()
  rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
  bridge: message age needs to increase, not decrease.
  bridge: Adjust min age inc for HZ > 256
  tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  r8169: corrupted IP fragments fix for large mtu.
  packetengines: fix config default
  vmxnet3: Fix transport header size
  enic: fix an endian bug in enic_probe()
  pch_gbe: memory corruption calling pch_gbe_validate_option()
  tg3: Fix tg3_get_stats64 for 5700 / 5701 devs
  tcp: fix false reordering signal in tcp_shifted_skb
  tcp: fix comment for tp->highest_sack
  netfilter: bridge: fix module autoload in compat case
  brcm80211: smac: only print block-ack timeout message at trace level
  brcm80211: smac: fix endless retry of A-MPDU transmissions
  mac80211: Fix a warning on changing to monitor mode from STA
  mac80211: zero initialize count field in ieee80211_tx_rate
  iwlwifi: fix key removal
  ...
2012-03-05 14:30:54 -08:00
Linus Torvalds
789ce9b9c2 Merge branch 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull per-cpu patches from Tejun Heo:
 "This pull request contains four patches.  One replaces manual clearing
  with bitmap_clear(), two fix generic definition of __this_cpu ops so
  that they don't choose unnecessarily strict arch version.  One makes
  _this_cpu definition use raw_local_irq_*() so that it doesn't end up
  wrecking irq on/off state tracking when used from inside lockdep.

  Of the four patches, the raw_local_irq_*() update is the most
  important, so please feel free to cherry pick only that one patch and
  ignore the rest if you want to - commit e920d5971d 'percpu: use
  raw_local_irq_* in _this_cpu op'."

* 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: fix __this_cpu_{sub,inc,dec}_return() definition
  percpu: use raw_local_irq_* in _this_cpu op
  percpu: fix generic definition of __this_cpu_add_and_return()
  percpu: use bitmap_clear
2012-03-05 14:28:36 -08:00
Linus Torvalds
5483f18e98 vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c
It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches.  This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.

Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.

We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-04 15:51:42 -08:00
Konstantin Khlebnikov
adb795062f percpu: fix __this_cpu_{sub,inc,dec}_return() definition
This patch adds missed "__" prefixes, otherwise these functions
works as irq/preemption safe.

Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-03-04 09:34:15 -08:00
Linus Torvalds
233ba2c5ff PARISC fixes on 20120303
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPUqnkAAoJEDeqqVYsXL0MyTwH/14R9sIt+IdQgMGje6Dys6U1
 3wxmNHcxNBB461212i+XVToH72GiPtUyWY8Q3LT+Qcq4Rab9oq8wDffxdgflDLja
 sZq9COLR8dmXJM0hduyuysCiNIlmuURV20uIMsDbYutBVtAKKqMd7ZrJBmkVh7xT
 slBJf+0lcD5IBYPYwf8lbxAI7E/C5TarMCIZk4z21I8ovF321RlGcyyo6NM62q/P
 wvLw4bJuW7nJa3q3B7BznzBcoHLmzo9tiY+CbtQlGhSdasDKJ8HuAegTVyofl6s8
 0/RTvaUcqTWUlFxXzLVDT+MCWAUP4GFR66tR2T5tygRi9st70TcpMzISiyZzKi8=
 =WzAJ
 -----END PGP SIGNATURE-----

Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6

PARISC fixes from James Bottomley:
 "This is a set of build fixes to get the cross compiled architecture
  testbeds building again"

* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] don't unconditionally override CROSS_COMPILE for 64 bit.
  [PARISC] include <linux/prefetch.h> in drivers/parisc/iommu-helpers.h
  [PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional
2012-03-03 16:33:51 -08:00
Linus Torvalds
5707c87f20 vfs: clarify and clean up dentry_cmp()
It did some odd things for unclear reasons.  As this is one of the
functions that gets changed when doing word-at-a-time compares, this is
yet another of the "don't change any semantics, but clean things up so
that subsequent patches don't get obscured by the cleanups".

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:47:15 -08:00
Linus Torvalds
0145acc202 vfs: uninline full_name_hash()
.. and also use it in lookup_one_len() rather than open-coding it.

There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.

So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:32:59 -08:00
Linus Torvalds
8966be9030 vfs: trivial __d_lookup_rcu() cleanups
These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.

They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:23:30 -08:00
H. Peter Anvin
5189fa19a4 regset: Return -EFAULT, not -EIO, on host-side memory fault
There is only one error code to return for a bad user-space buffer
pointer passed to a system call in the same address space as the
system call is executed, and that is EFAULT.  Furthermore, the
low-level access routines, which catch most of the faults, return
EFAULT already.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 11:38:15 -08:00
H. Peter Anvin
c8e252586f regset: Prevent null pointer reference on readonly regsets
The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.

Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 11:38:15 -08:00
Alan Stern
62d3c5439c Block: use a freezable workqueue for disk-event polling
This patch (as1519) fixes a bug in the block layer's disk-events
polling.  The polling is done by a work routine queued on the
system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.

Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.

The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-02 10:51:00 +01:00
Jun'ichi Nomura
fe316bf2d5 block: Fix NULL pointer dereference in sd_revalidate_disk
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               <scsi_device torn down>
      rescan_partitions
        sd_revalidate_disk
          <oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.

Reported-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-02 10:38:33 +01:00
Linus Torvalds
cfa5555cab Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
DRM fixes from Dave Airlie:
  intel: fixes for output regression on 965GM, an oops and a machine
  hang

  radeon: uninitialised var (that gcc didn't warn about for some reason)
  + a couple of correctness fixes.

  exynos: fixes for various things, drop some chunks of unused code.

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon/kms/vm: fix possible bug in radeon_vm_bo_rmv()
  drm/radeon: fix uninitialized variable
  drm/radeon/kms: fix radeon_dp_get_modes for LVDS bridges (v2)
  drm/i915: Remove use of the autoreported ringbuffer HEAD position
  drm/i915: Prevent a machine hang by checking crtc->active before loading lut
  drm/i915: fix operator precedence when enabling RC6p
  drm/i915: fix a sprite watermark computation to avoid divide by zero if xpos<0
  drm/i915: fix mode set on load pipe. (v2)
  drm/exynos: exynos_drm.h header file fixes
  drm/exynos: added panel physical size.
  drm/exynos: added postclose to release resource.
  drm/exynos: removed exynos_drm_fbdev_recreate function.
  drm/exynos: fixed page flip issue.
  drm/exynos: added possible_clones setup function.
  drm/exynos: removed pageflip_event_list init code when closed.
  drm/exynos: changed priority of mixer layers.
  drm/exynos: Fix typo in exynos_mixer.c
2012-03-01 18:23:43 -08:00
Dave Airlie
b9b3515698 Merge branch 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-2.6-samsung into HEAD
* 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-2.6-samsung:
  drm/exynos: exynos_drm.h header file fixes
  drm/exynos: added panel physical size.
  drm/exynos: added postclose to release resource.
  drm/exynos: removed exynos_drm_fbdev_recreate function.
  drm/exynos: fixed page flip issue.
  drm/exynos: added possible_clones setup function.
  drm/exynos: removed pageflip_event_list init code when closed.
  drm/exynos: changed priority of mixer layers.
  drm/exynos: Fix typo in exynos_mixer.c
2012-02-29 09:54:24 +00:00
Neal Cardwell
ecb9719236 tcp: fix comment for tp->highest_sack
There was an off-by-one error in the comments describing the
highest_sack field in struct tcp_sock. The comments previously claimed
that it was the "start sequence of the highest skb with SACKed
bit". This commit fixes the comments to note that it is the "start
sequence of the skb just *after* the highest skb with SACKed bit".

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-28 15:06:33 -05:00
Grant Likely
29f141fed0 Merge branch 'fixes-for-grant' of git://sources.calxeda.com/kernel/linux into devicetree/merge 2012-02-27 14:04:40 -07:00
Linus Torvalds
70ca00db10 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/events: Revert trace_sched_stat_sleeptime()
2012-02-27 07:55:39 -08:00
James Bottomley
97a29d59fc [PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional
The problem in

commit fea80311a9
Author: Randy Dunlap <rdunlap@xenotime.net>
Date:   Sun Jul 24 11:39:14 2011 -0700

    iomap: make IOPORT/PCI mapping functions conditional

is that if your architecture supplies pci_iomap/pci_iounmap, it expects
always to supply them.  Adding empty body defitions in the !CONFIG_PCI
case, which is what this patch does, breaks the parisc compile because
the functions become doubly defined.  It took us a while to spot this,
because we don't actually build !CONFIG_PCI very often (only if someone
is brave enough to test the snake/asp machines).

Since the note in the commit log says this is to fix a
CONFIG_GENERIC_IOMAP issue (which it does because CONFIG_GENERIC_IOMAP
supplies pci_iounmap only if CONFIG_PCI is set), there should actually
have been a condition upon this.  This should make sure no other
architecture's !CONFIG_PCI compile breaks in the same way as parisc.

The fix had to be updated to take account of the GENERIC_PCI_IOMAP
separation.

Reported-by: Rolf Eike Beer <eike@sf-mail.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-27 09:43:30 -06:00
Linus Torvalds
203738e548 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
1) ICMP sockets leave err uninitialized but we try to return it for the
   unsupported MSG_OOB case, reported by Dave Jones.

2) Add new Zaurus device ID entries, from Dave Jones.

3) Pointer calculation in hso driver memset is wrong, from Dan
   Carpenter.

4) ks8851_probe() checks unsigned value as negative, fix also from Dan
   Carpenter.

5) Fix crashes in atl1c driver due to TX queue handling, from Eric
   Dumazet.  I anticipate some TX side locking fixes coming in the near
   future for this driver as well.

6) The inline directive fix in Bluetooth which was breaking the build
   only with very new versions of GCC, from Johan Hedberg.

7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
   window, reported by Meelis Roos and fixed by Eric Dumazet.

8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.

9) Some ip6_route_output() callers test the return value for NULL, but
   this never happens as the convention is to return a dst entry with
   dst->error set.  Fixes from RonQing Li.

10) Logitech Harmony 900 should be handled by zaurus driver not
   cdc_ether, update white lists and black lists accordingly.  From
   Scott Talbert.

11) Receiving from certain kinds of devices there won't be a MAC header,
   so there is no MAC header to fixup in the IPSEC code, and if we try
   to do it we'll crash.  Fix from Eric Dumazet.

12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
   Petrilin.

13) Fix regression in link-down handling in davinci_emac which causes
   all RX descriptors to be freed up and therefore RX to wedge
   completely, from Christian Riesch.

14) It took two attempts, but ctnetlink soft lockups seem to be
   cured now, from Pablo Neira Ayuso.

15) Endianness bug fix in ENIC driver, from Santosh Nayak.

16) The long ago conversion of the PPP fragmentation code over to
   abstracted SKB list handling wasn't perfect, once we get an
   out of sequence SKB we don't flush the rest of them like we
   should.  From Ben McKeegan.

17) Fix regression of ->ip_summed initialization in sfc driver.
   From Ben Hutchings.

18) Bluetooth timeout mistakenly using msecs instead of jiffies,
   from Andrzej Kaczmarek.

19) Using _sync variant of work cancellation results in deadlocks,
   use the non _sync variants instead.  From Andre Guedes.

20) Bluetooth rfcomm code had reference counting problems leading
   to crashes, fix from Octavian Purdila.

21) The conversion of netem over to classful qdisc handling added
   two bugs to netem_dequeue(), fixes from Eric Dumazet.

22) Missing pci_iounmap() in ATM Solos driver.  Fix from Julia Lawall.

23) b44_pci_exit() should not have __exit tag since it's invoked from
   non-__exit code.  From Nikola Pajkovsky.

24) The conversion of the neighbour hash tables over to RCU added a
   race, fixed here by adding the necessary reread of tbl->nht, fix
   from Michel Machado.

25) When we added VF (virtual function) attributes for network device
   dumps, this potentially bloats up the size of the dump of one
   network device such that the dump size is too large for the buffer
   allocated by properly written netlink applications.

   In particular, if you add 255 VFs to a network device, parts of
   GLIBC stop working.

   To fix this, we add an attribute that is used to turn on these
   extended portions of the network device dump.  Sophisticaed
   applications like 'ip' that want to see this stuff  will be changed
   to set the attribute, whereas things like GLIBC that don't care
   about VFs simply will not, and therefore won't be busted by the
   mere presence of VFs on a network device.

   Thanks to the tireless work of Greg Rose on this fix.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  sfc: Fix assignment of ip_summed for pre-allocated skbs
  ppp: fix 'ppp_mp_reconstruct bad seq' errors
  enic: Fix endianness bug.
  gre: fix spelling in comments
  netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
  Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
  davinci_emac: Do not free all rx dma descriptors during init
  mlx4_core: Fixing array indexes when setting port types
  phy: IC+101G and PHY_HAS_INTERRUPT flag
  netdev/phy/icplus: Correct broken phy_init code
  ipsec: be careful of non existing mac headers
  Move Logitech Harmony 900 from cdc_ether to zaurus
  hso: memsetting wrong data in hso_get_count()
  netfilter: ip6_route_output() never returns NULL.
  ethernet/broadcom: ip6_route_output() never returns NULL.
  ipv6: ip6_route_output() never returns NULL.
  jme: Fix FIFO flush issue
  atm: clip: remove clip_tbl
  ipv4: ping: Fix recvmsg MSG_OOB error handling.
  rtnetlink: Fix problem with buffer allocation
  ...
2012-02-26 12:47:17 -08:00
Linus Torvalds
3c761ea05a Fix autofs compile without CONFIG_COMPAT
The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.

Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.

We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.

Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-26 09:44:55 -08:00
David S. Miller
e807e566e9 Merge branch 'master' of git://1984.lsi.us.es/net 2012-02-24 17:41:57 -05:00
Oleg Nesterov
d80e731eca epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24 11:42:50 -08:00
Jozsef Kadlecsik
7d367e0668 netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:24:15 +01:00
viresh kumar
7e55d0527e ARM: 7339/1: amba/serial.h: Include types.h for resolving dependency of type bool
serial.h uses bool, but its definition is missing, as it doesn't include
types.h. Fix this by including types.h

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-23 22:07:55 +00:00
Eric Dumazet
03606895cd ipsec: be careful of non existing mac headers
Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference:  https://bugzilla.kernel.org/show_bug.cgi?id=42809

Reported-by: Niccolò Belli <darkbasic@linuxsystems.it>
Tested-by: Niccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-23 16:50:45 -05:00
David S. Miller
4a2258dddd Merge branch 'nf' of git://1984.lsi.us.es/net 2012-02-23 00:20:14 -05:00
Linus Torvalds
45196cee28 USB bugfixes for 3.3-rc4
A number of new device ids, and a cleanup/fix for some of the option
 device ids that shouldn't have been added in the first place.
 
 There's also a few USB 3 fixes for problems that people have reported,
 and a usb-storage bugfix to round it out.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk9FVQMACgkQMUfUDdst+ym7RgCeNfqK8Oi7U+9rdd2hGGIElxTE
 6KgAnipmo5T5Wfls6sh0zPCv8Uh7K6Zb
 =SXPd
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

USB bugfixes for 3.3-rc4

A number of new device ids, and a cleanup/fix for some of the option
device ids that shouldn't have been added in the first place.

There's also a few USB 3 fixes for problems that people have reported,
and a usb-storage bugfix to round it out.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

* tag 'usb-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: Added Kamstrup VID/PIDs to cp210x serial driver.
  USB: Serial: ti_usb_3410_5052: Add Abbot Diabetes Care cable id
  usb-storage: fix freezing of the scanning thread
  xhci: Fix encoding for HS bulk/control NAK rate.
  USB: Set hub depth after USB3 hub reset
  USB: Fix handoff when BIOS disables host PCI device.
  USB: option: cleanup zte 3g-dongle's pid in option.c
  USB: Don't fail USB3 probe on missing legacy PCI IRQ.
  xhci: Fix oops caused by more USB2 ports than USB3 ports.
  USB: Remove duplicate USB 3.0 hub feature #defines.
2012-02-22 13:00:53 -08:00
Linus Torvalds
437cf4c7b7 Bugfixes for the NFS client.
Fix a nasty Oops in the NFSv4 getacl code, another source of infinite loops
 in the NFSv4 state recovery code, and a regression in NFSv4.1 session
 initialisation.
 Also deal with an NFSv4.1 memory leak.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPRLALAAoJEGcL54qWCgDy5yoP/0eKjYERpqf00ETRnmQw6ngt
 PCaC33mUwfsJlLdSfW6hhG+2IhKkiWR4mraCo1Es9CYS7eSsPU9/djK62neHZG3s
 lJF2xcPlfvbiYtbyG2MsmtRffUl/XRbLLMfZMADgG4iQy1y4uTDxsaxcKrBIu5ig
 mQWlEpsYeE9TLea3Qvw6oHiXMKkrwdeR9Et5aGo3Y5hxbpDhD86yR1cyw2ds2aUP
 FmXk/ERqJDJpEt+uEQKFsMDzjcZ27r/nca/AhwE+cVQku6Fi9QdnFcdNtQmwFYq6
 iwYS/grUoW0tLV7Fv/iYNvZmVtXtS2Ng1VTR77gcLRXps0naEciuM5PO7MmuUe/j
 0aS/fV1s4/ZloRCu6l/gj2JmOAkh0joy6msQTcdEt4LQcvQSxtUmLQtWyLoGm6aa
 5LwTOFg25qRH6c7mJglSsiPdjgAcOIdwzH1gikSzSJeV2NjroZZ4nPW/noz0Idfq
 l36B9T3iHc0jZa6wr/Q/S1qF6RdfLYfuXcrDKcxJFDpuYNRNDN1jjOq+OKs5yGw9
 Fgui6r9/g0uDsEKWtqR30NzUnUx4PxY0hxduT04T2sxMMapU3oXcXWW4vRgly9be
 ASx/hAEnBovNwzHeKo9wVt3WdHqP2M4wMx8dD8hFLL1qPx9uj90IeclOVneNEHKa
 UMtC3BH1DBF44FYhZgT4
 =yq1J
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Bugfixes for the NFS client.

Fix a nasty Oops in the NFSv4 getacl code, another source of infinite
loops in the NFSv4 state recovery code, and a regression in NFSv4.1
session initialisation.

Also deal with an NFSv4.1 memory leak.

* tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: fix server_scope memory leak
  NFSv4.1: Fix a NFSv4.1 session initialisation regression
  NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID
  NFSv4: Fix an Oops in the NFSv4 getacl code
2012-02-22 08:43:35 -08:00
Peter Zijlstra
8c79a045fd sched/events: Revert trace_sched_stat_sleeptime()
Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.

It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.

It also breaks the existing schedstat samples due to the side
effects of:

-               se->statistics.sleep_start = 0;
...
-               se->statistics.block_start = 0;

Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.

Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-22 12:06:55 +01:00
Linus Torvalds
faf309009e sys_poll: fix incorrect type for 'timeout' parameter
The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.

Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.

We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.

If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.

Reported-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-21 17:24:20 -08:00
Hitoshi Mitake
797a796a13 asm-generic: architecture independent readq/writeq for 32bit environment
This provides unified readq()/writeq() helper functions for 32-bit
drivers.

For some cases, readq/writeq without atomicity is harmful, and order of
io access has to be specified explicitly.  So in this patch, new two
header files which contain non-atomic readq/writeq are added.

 - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
   writeq with the order of lower address -> higher address

 - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
   writeq with reversed order

This allows us to remove some readq()s that were added drivers when the
default non-atomic ones were removed in commit dbee8a0aff ("x86:
remove 32-bit versions of readq()/writeq()")

The drivers which need readq/writeq but can do with the non-atomic ones
must add the line:

  #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */

But this will be nop in 64-bit environments, and no other #ifdefs are
required.  So I believe that this patch can solve the problem of
 1. driver-specific readq/writeq
 2. atomicity and order of io access

This patch is tested with building allyesconfig and allmodconfig as
ARCH=x86 and ARCH=i386 on top of tip/master.

Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-21 16:47:28 -08:00
Greg Rose
115c9b8192 rtnetlink: Fix problem with buffer allocation
Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 16:56:45 -05:00
Ming Lei
e920d5971d percpu: use raw_local_irq_* in _this_cpu op
It doesn't make sense to trace irq off or do irq flags
lock proving inside 'this_cpu' operations, so replace local_irq_*
with raw_local_irq_* in 'this_cpu' op.

Also the patch fixes onelockdep warning[1] by the replacement, see
below:

In commit: 933393f58fef9963eac61db8093689544e29a600(percpu:
Remove irqsafe_cpu_xxx variants), local_irq_save/restore(flags) are
added inside this_cpu_inc operation, so that trace_hardirqs_off_caller
will be called by trace_hardirqs_on_caller directly because
__debug_atomic_inc is implemented as this_cpu_inc, which may trigger
the lockdep warning[1], for example in the below ARM scenary:

	kernel_thread_helper	/*irq disabled*/
		->trace_hardirqs_on_caller	/*hardirqs_enabled was set*/
			->trace_hardirqs_off_caller	/*hardirqs_enabled cleared*/
				__this_cpu_add(redundant_hardirqs_on)
			->trace_hardirqs_off_caller	/*irq disabled, so call here*/

The 'unannotated irqs-on' warning will be triggered somewhere because
irq is just enabled after the irq trace in kernel_thread_helper.

[1],
[    0.162841] ------------[ cut here ]------------
[    0.167694] WARNING: at kernel/lockdep.c:3493 check_flags+0xc0/0x1d0()
[    0.174468] Modules linked in:
[    0.177703] Backtrace:
[    0.180328] [<c00171f0>] (dump_backtrace+0x0/0x110) from [<c0412320>] (dump_stack+0x18/0x1c)
[    0.189086]  r6:c051f778 r5:00000da5 r4:00000000 r3:60000093
[    0.195007] [<c0412308>] (dump_stack+0x0/0x1c) from [<c00410e8>] (warn_slowpath_common+0x54/0x6c)
[    0.204223] [<c0041094>] (warn_slowpath_common+0x0/0x6c) from [<c0041124>] (warn_slowpath_null+0x24/0x2c)
[    0.214111]  r8:00000000 r7:00000000 r6:ee069598 r5:60000013 r4:ee082000
[    0.220825] r3:00000009
[    0.223693] [<c0041100>] (warn_slowpath_null+0x0/0x2c) from [<c0088f38>] (check_flags+0xc0/0x1d0)
[    0.232910] [<c0088e78>] (check_flags+0x0/0x1d0) from [<c008d348>] (lock_acquire+0x4c/0x11c)
[    0.241668] [<c008d2fc>] (lock_acquire+0x0/0x11c) from [<c0415aa4>] (_raw_spin_lock+0x3c/0x74)
[    0.250610] [<c0415a68>] (_raw_spin_lock+0x0/0x74) from [<c010a844>] (set_task_comm+0x20/0xc0)
[    0.259521]  r6:ee069588 r5:ee0691c0 r4:ee082000
[    0.264404] [<c010a824>] (set_task_comm+0x0/0xc0) from [<c0060780>] (kthreadd+0x28/0x108)
[    0.272857]  r8:00000000 r7:00000013 r6:c0044a08 r5:ee0691c0 r4:ee082000
[    0.279571] r3:ee083fe0
[    0.282470] [<c0060758>] (kthreadd+0x0/0x108) from [<c0044a08>] (do_exit+0x0/0x6dc)
[    0.290405]  r5:c0060758 r4:00000000
[    0.294189] ---[ end trace 1b75b31a2719ed1c ]---
[    0.299041] possible reason: unannotated irqs-on.
[    0.303955] irq event stamp: 5
[    0.307159] hardirqs last  enabled at (4): [<c001331c>] no_work_pending+0x8/0x2c
[    0.314880] hardirqs last disabled at (5): [<c0089b08>] trace_hardirqs_on_caller+0x60/0x26c
[    0.323547] softirqs last  enabled at (0): [<c003f754>] copy_process+0x33c/0xef4
[    0.331207] softirqs last disabled at (0): [<  (null)>]   (null)
[    0.337585] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-02-21 09:23:46 -08:00
Konstantin Khlebnikov
7d96b3e55a percpu: fix generic definition of __this_cpu_add_and_return()
This patch adds missed "__" into function prefix.
Otherwise on all archectures (except x86) it expands to irq/preemtion-safe
variant: _this_cpu_generic_add_return(), which do extra irq-save/irq-restore.
Optimal generic implementation is __this_cpu_generic_add_return().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-02-21 08:57:10 -08:00
Joerg Willmann
88ba136d66 netfilter: ebtables: fix alignment problem in ppc
ebt_among extension of ebtables uses __alignof__(_xt_align) while the
corresponding kernel module uses __alignof__(ebt_replace) to determine
the alignment in EBT_ALIGN().

These are the results of these values on different platforms:

x86 x86_64 ppc
__alignof__(_xt_align) 4 8 8
__alignof__(ebt_replace) 4 8 4

ebtables fails to add rules which use the among extension.

I'm using kernel 2.6.33 and ebtables 2.0.10-4

According to Bart De Schuymer, userspace alignment was changed to
_xt_align to fix an alignment issue on a userspace32-kernel64 system
(he thinks it was for an ARM device). So userspace must be right.
The kernel alignment macro needs to change so it also uses _xt_align
instead of ebt_replace. The userspace changes date back from
June 29, 2009.

Signed-off-by: Joerg Willmann <joe@clnt.de>
Signed-off by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-21 13:29:06 +01:00
Linus Torvalds
8ebbfb4957 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Assorted fixes, sat in -next for a week or so...

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
  vfs: fix compat_sys_stat() handling of overflows in st_nlink
  quota: Fix deadlock with suspend and quotas
  vfs: Provide function to get superblock and wait for it to thaw
  vfs: fix panic in __d_lookup() with high dentry hashtable counts
  autofs4 - fix lockdep splat in autofs
  vfs: fix d_inode_lookup() dentry ref leak
2012-02-20 16:13:58 -08:00
John W. Linville
9d4990a260 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-20 14:47:17 -05:00