This allows user space to start/stop NAN interface.
A NAN interface is like P2P device in a few aspects: it
doesn't have a netdev associated to it.
Add the new interface type and prevent operations that
can't be executed on NAN interface like scan.
Define several attributes that may be configured by user space
when starting NAN functionality (master preference and dual
band operation)
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for drivers that implement static WEP internally, i.e.
expose connection keys to the driver in connect flow and don't
upload the keys after the connection.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The TXQ path restructure requires ieee80211_tx_dequeue() to call TX
handlers and parts of the xmit_fast path. Move the function to later in
tx.c in preparation for this.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We use __read_cr4() vs __read_cr4_safe() inconsistently. On
CR4-less CPUs, all CR4 bits are effectively clear, so we can make
the code simpler and more robust by making __read_cr4() always fix
up faults on 32-bit kernels.
This may fix some bugs on old 486-like CPUs, but I don't have any
easy way to test that.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: david@saggiorato.net
Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We need to call GET_LE to read hdr->e_type.
Fixes: 57f90c3dfc ("x86/vdso: Error out if the vDSO isn't a valid DSO")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-next@vger.kernel.org
Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The condition for reading CR4 was wrong: there are some CPUs with
CPUID but not CR4. Rather than trying to make the condition exact,
use __read_cr4_safe().
Fixes: 18bc7bd523 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code performing irqtime nsecs stats flushing to kcpustat is roughly
the same for hardirq and softirq. So lets consolidate that common code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474849761-12678-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The irqtime accounting currently implement its own ad hoc implementation
of u64_stats API. Lets rather consolidate it with the appropriate
library.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474849761-12678-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce light versions of u64_stats helpers for context where
either preempt or IRQs are disabled. This way we can make this library
usable by scheduler irqtime accounting which currenty implement its
ad-hoc version.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474849761-12678-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The callers of the functions performing irqtime kcpustat updates have
IRQS disabled, no need to disable them again.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474849761-12678-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can safely use the preempt-unsafe accessors for irqtime when we
flush its counters to kcpustat as IRQs are disabled at this time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474849761-12678-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While going through enqueue/dequeue to review the movement of
set_curr_task() I noticed that the (2nd) update_min_vruntime() call in
dequeue_entity() is suspect.
It turns out, its actually wrong because it will consider
cfs_rq->curr, which could be the entry we just normalized. This mixes
different vruntime forms and leads to fail.
The purpose of the second update_min_vruntime() is to move
min_vruntime forward if the entity we just removed is the one that was
holding it back; _except_ for the DEQUEUE_SAVE case, because then we
know its a temporary removal and it will come back.
However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr
will still be set (and per the above, can be tranformed into a
different unit), so update_min_vruntime() should also consider
curr->on_rq. This also fixes another corner case where the enqueue
(which also does update_curr()->update_min_vruntime()) happens on the
rq->lock break in schedule(), between dequeue and put_prev_task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 1e87623178 ("sched: Fix ->min_vruntime calculation in dequeue_entity()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide SCHED_WARN_ON as wrapper for WARN_ON_ONCE() to avoid
CONFIG_SCHED_DEBUG wrappery.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Almost all scheduler functions update state with the following
pattern:
if (queued)
dequeue_task(rq, p, DEQUEUE_SAVE);
if (running)
put_prev_task(rq, p);
/* update state */
if (queued)
enqueue_task(rq, p, ENQUEUE_RESTORE);
if (running)
set_curr_task(rq, p);
set_user_nice() however misses the running part, cure this.
This was found by asserting we never enqueue 'current'.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the ia64 only set_curr_task() symbol is gone, provide a
helper just like put_prev_task().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename the ia64 only set_curr_task() function to free up the name.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a task switches to fair scheduling class, the period between now
and the last update of its utilization is accounted as running time
whatever happened during this period. This incorrect accounting applies
to the task and also to the task group branch.
When changing the property of a running task like its list of allowed
CPUs or its scheduling class, we follow the sequence:
- dequeue task
- put task
- change the property
- set task as current task
- enqueue task
The end of the sequence doesn't follow the normal sequence (as per
__schedule()) which is:
- enqueue a task
- then set the task as current task.
This incorrectordering is the root cause of incorrect utilization accounting.
Update the sequence to follow the right one:
- dequeue task
- put task
- change the property
- enqueue task
- set task as current task
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: linaro-kernel@lists.linaro.org
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1473666472-13749-8-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
select_idle_siblings() is a known pain point for a number of
workloads; it either does too much or not enough and sometimes just
does plain wrong.
This rewrite attempts to address a number of issues (but sadly not
all).
The current code does an unconditional sched_domain iteration; with
the intent of finding an idle core (on SMT hardware). The problems
which this patch tries to address are:
- its pointless to look for idle cores if the machine is real busy;
at which point you're just wasting cycles.
- it's behaviour is inconsistent between SMT and !SMT hardware in
that !SMT hardware ends up doing a scan for any idle CPU in the LLC
domain, while SMT hardware does a scan for idle cores and if that
fails, falls back to a scan for idle threads on the 'target' core.
The new code replaces the sched_domain scan with 3 explicit scans:
1) search for an idle core in the LLC
2) search for an idle CPU in the LLC
3) search for an idle thread in the 'target' core
where 1 and 3 are conditional on SMT support and 1 and 2 have runtime
heuristics to skip the step.
Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu
goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT
siblings of the CPU going idle. Similarly, we clear
sd_llc_shared->has_idle_cores when we fail to find an idle core.
Step 2) tracks the average cost of the scan and compares this to the
average idle time guestimate for the CPU doing the wakeup. There is a
significant fudge factor involved to deal with the variability of the
averages. Esp. hackbench was sensitive to this.
Step 3) is unconditional; we assume (also per step 1) that scanning
all SMT siblings in a core is 'cheap'.
With this; SMT systems gain step 2, which cures a few benchmarks --
notably one from Facebook.
One 'feature' of the sched_domain iteration, which we preserve in the
new code, is that it would start scanning from the 'target' CPU,
instead of scanning the cpumask in cpu id order. This avoids multiple
CPUs in the LLC scanning for idle to gang up and find the same CPU
quite as much. The down side is that tasks can end up hopping across
the LLC for no apparent reason.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cmpxchg contained definitions for unused (x)add_* operations, dating back
to the original ticket spinlock implementation. Nowadays these are
unused so remove them.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1474913478-17757-1-git-send-email-n.borisov.lkml@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc
location into the much more natural sched_domain_shared location.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since struct sched_domain is strictly per cpu; introduce a structure
that is shared between all 'identical' sched_domains.
Limit to SD_SHARE_PKG_RESOURCES domains for now, as we'll only use it
for shared cache state; if another use comes up later we can easily
relax this.
While the sched_group's are normally shared between CPUs, these are
not natural to use when we need some shared state on a domain level --
since that would require the domain to have a parent, which is not a
given.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no point in doing a call_rcu() for each domain, only do a
callback for the root sched domain and clean up the entire set in one
go.
Also make the entire call chain be called destroy_sched_domain*() to
remove confusion with the free_sched_domains() call, which does an
entirely different thing.
Both cpu_attach_domain() callers of destroy_sched_domain() can live
without the call_rcu() because at those points the sched_domain hasn't
been published yet.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Small cleanup; nothing uses the @cpu argument so make it go away.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The partial initialization of wait_queue_t in prepare_to_wait_event() looks
ugly. This was done to shrink .text, but we can simply add the new helper
which does the full initialization and shrink the compiled code a bit more.
And. This way prepare_to_wait_event() can have more users. In particular we
are ready to remove the signal_pending_state() checks from wait_bit_action_f
helpers and change __wait_on_bit_lock() to use prepare_to_wait_event().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140055.GA6167@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__wait_on_bit_lock() doesn't need abort_exclusive_wait() too. Right
now it can't use prepare_to_wait_event() (see the next change), but
it can do the additional finish_wait() if action() fails.
abort_exclusive_wait() no longer has callers, remove it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140053.GA6164@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
___wait_event() doesn't really need abort_exclusive_wait(), we can simply
change prepare_to_wait_event() to remove the waiter from q->task_list if
it was interrupted.
This simplifies the code/logic, and this way prepare_to_wait_event() can
have more users, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160908164815.GA18801@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
include/linux/wait.h | 7 +------
kernel/sched/wait.c | 35 +++++++++++++++++++++++++----------
2 files changed, 26 insertions(+), 16 deletions(-)
Otherwise this logic only works if mode is "compatible" with another
exclusive waiter.
If some wq has both TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE waiters,
abort_exclusive_wait() won't wait an uninterruptible waiter.
The main user is __wait_on_bit_lock() and currently it is fine but only
because TASK_KILLABLE includes TASK_UNINTERRUPTIBLE and we do not have
lock_page_interruptible() yet.
Just use TASK_NORMAL and remove the "mode" arg from abort_exclusive_wait().
Yes, this means that (say) wake_up_interruptible() can wake up the non-
interruptible waiter(s), but I think this is fine. And in fact I think
that abort_exclusive_wait() must die, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140047.GA6157@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit:
2159197d66 ("sched/core: Enable increased load resolution on 64-bit kernels")
we now have two different fixed point units for load:
- 'shares' in calc_cfs_shares() has 20 bit fixed point unit on 64-bit
kernels. Therefore use scale_load() on MIN_SHARES.
- 'wl' in effective_load() has 10 bit fixed point unit. Therefore use
scale_load_down() on tg->shares which has 20 bit fixed point unit on
64-bit kernels.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471874441-24701-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Current code can call set_cpu_sibling_map() and invoke sched_set_topology()
more than once (e.g. on CPU hot plug). When this happens after
sched_init_smp() has been called, we lose the NUMA topology extension to
sched_domain_topology in sched_init_numa(). This results in incorrect
topology when the sched domain is rebuilt.
This patch fixes the bug and issues warning if we call sched_set_topology()
after sched_init_smp().
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while back, Paolo and Hannes sent an RFC patch adding threaded-able
napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)
The problem seems to be that softirqs are very aggressive and are often
handled by the current process, even if we are under stress and that
ksoftirqd was scheduled, so that innocent threads would have more chance
to make progress.
This patch makes sure that if ksoftirq is running, we let it
perform the softirq work.
Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/
Tested:
- NIC receiving traffic handled by CPU 0
- UDP receiver running on CPU 0, using a single UDP socket.
- Incoming flood of UDP packets targeting the UDP socket.
Before the patch, the UDP receiver could almost never get CPU cycles and
could only receive ~2,000 packets per second.
After the patch, CPU cycles are split 50/50 between user application and
ksoftirqd/0, and we can effectively read ~900,000 packets per second,
a huge improvement in DOS situation. (Note that more packets are now
dropped by the NIC itself, since the BH handlers get less CPU cycles to
drain RX ring buffer)
Since the load runs in well identified threads context, an admin can
more easily tune process scheduling parameters if needed.
Reported-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472665349.14381.356.camel@edumazet-glaptop3.roam.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This file was only including module.h for exception table related
functions. We've now separated that content out into its own file
"extable.h" so now move over to that and avoid all the extra header
content in module.h that we don't really need to compile this.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines that were _just_ short enough that the RA order still fit in
the line buffer, the line was instead padded with an insufficient
amount of spaces. This was caused by examining the size of the
allocated line buffer rather than the length of the string to be
displayed.
For con3270_cline_end(), we just compare against the line length. For
con3270_update_string() however that isn't available anymore, so we
check whether the Repeat to Address order is present.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git)
Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines too long to include the RA order, one byte was left
uninitialised. This was caused by an off-by-one bug in the loop that
pads out the line. Since the buffer is allocated from a common pool,
the single byte left uninitialised contained some previous buffer
content. Usually this was just a space or some character (which can
result in clutter but is otherwise harmless). Sometimes, however, it
was a Repeat to Address order, messing up the entire screen layout and
causing the display to send the entire buffer content on every
keystroke.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git)
Reported-by: Liu Jing <liujbjl@linux.vnet.ibm.com>
Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since commit 9f3d6d7 chsc_get_channel_measurement_chars is called with
interrupts disabled during resume from hibernate. Since this function
used spin_unlock_irq, interrupts have been enabled accidentally. Fix
this by using the irqsave variant.
Since we can't guarantee the IRQ-enablement state for all (future/
external) callers, change the locking in related functions to prevent
similar bugs in the future.
Fixes: 9f3d6d7 ("s390/cio: update measurement characteristics")
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This implements ndo_poll_controller in net_device_ops callbacks for mlx5,
which is necessary to use netconsole with this driver.
Acked-By: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set upper 32 bits of destination register to zeros after
load from the context structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sctp dumps all the ep->assocs, it needs to lock_sock first,
but now it locks sock in rcu_read_lock, and lock_sock may sleep,
which would break rcu_read_lock.
This patch is to get and hold one sock when traversing the list.
After that and get out of rcu_read_lock, lock and dump it. Then
it will traverse the list again to get the next one until all
sctp socks are dumped.
For sctp_diag_dump_one, it fixes this issue by holding asoc and
moving cb() out of rcu_read_lock in sctp_transport_lookup_process.
Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: a bunch of fixes for prsctp polices
This patchset is to fix 2 issues for prsctp polices:
1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
when overloading the CPU.
2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
instead of only local side".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now before using prsctp polices, sctp uses asoc->prsctp_enable to
check if prsctp is enabled. However asoc->prsctp_enable is set only
means local host support prsctp, sctp should not abandon packet if
peer host doesn't enable prsctp.
So this patch is to use asoc->peer.prsctp_capable to check if prsctp
is enabled on both side, instead of asoc->prsctp_enable, as asoc's
peer.prsctp_capable is set only when local and peer both enable prsctp.
Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp uses chunk->prsctp_param to save the prsctp param for all the
prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
and reuse msg->expires_at for TTL policy, as the prsctp polices and old
expires policy are mutual exclusive.
This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
polices.
Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
as it needs a u64 variables to save the expires_at time.
This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
issue.
Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now pahole sctp_chunk, it has 2 memory holes:
struct sctp_chunk {
struct list_head list;
atomic_t refcnt;
/* XXX 4 bytes hole, try to pack */
...
long unsigned int prsctp_param;
int sent_count;
/* XXX 4 bytes hole, try to pack */
This patch is to move up sent_count to fill the 1st one and eliminate
the 2nd one.
It's not just another struct compaction, it also fixes the "netperf-
Throughput_Mbps -37.2% regression" issue when overloading the CPU.
Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also can address following UBSAN warnings:
[ 36.640343] ================================================================================
[ 36.648772] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:857:26
[ 36.656853] shift exponent 64 is too large for 32-bit type 'int'
[ 36.663348] ================================================================================
[ 36.671783] ================================================================================
[ 36.680213] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx4/fw.c:861:27
[ 36.688297] shift exponent 35 is too large for 32-bit type 'int'
[ 36.694702] ================================================================================
Tested:
reboot with UBSAN, no warning.
Signed-off-by: David Decotigny <decot@googlers.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements:
https://tools.ietf.org/html/rfc7559
Backoff is performed according to RFC3315 section 14:
https://tools.ietf.org/html/rfc3315#section-14
We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).
We also add a new setting:
/proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>