mm: fix kthread_use_mm() vs TLB invalidate

For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable.  This then presents the
following race condition:

  CPU0			CPU1

  flush_tlb_mm(mm)	use_mm(mm)
    <send-IPI>
			  tsk->active_mm = mm;
			  <IPI>
			    if (tsk->active_mm == mm)
			      // flush TLBs
			  </IPI>
			  switch_mm(old_mm,mm,tsk);

Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

Of the (SMP) architectures that have IPI based TLB invalidate:

  Alpha    - checks active_mm
  ARC      - ASID specific
  IA64     - checks active_mm
  MIPS     - ASID specific flush
  OpenRISC - shoots down world
  PARISC   - shoots down world
  SH       - ASID specific
  SPARC    - ASID specific
  x86      - N/A
  xtensa   - checks active_mm

So at the very least Alpha, IA64 and Xtensa are suspect.

On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.

[akpm@linux-foundation.org: add comment]

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Peter Zijlstra 2020-08-06 23:17:16 -07:00 committed by Linus Torvalds
parent 4a93025cbe
commit 38cf307c1f

View File

@ -1241,13 +1241,16 @@ void kthread_use_mm(struct mm_struct *mm)
WARN_ON_ONCE(tsk->mm); WARN_ON_ONCE(tsk->mm);
task_lock(tsk); task_lock(tsk);
/* Hold off tlb flush IPIs while switching mm's */
local_irq_disable();
active_mm = tsk->active_mm; active_mm = tsk->active_mm;
if (active_mm != mm) { if (active_mm != mm) {
mmgrab(mm); mmgrab(mm);
tsk->active_mm = mm; tsk->active_mm = mm;
} }
tsk->mm = mm; tsk->mm = mm;
switch_mm(active_mm, mm, tsk); switch_mm_irqs_off(active_mm, mm, tsk);
local_irq_enable();
task_unlock(tsk); task_unlock(tsk);
#ifdef finish_arch_post_lock_switch #ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch(); finish_arch_post_lock_switch();
@ -1276,9 +1279,11 @@ void kthread_unuse_mm(struct mm_struct *mm)
task_lock(tsk); task_lock(tsk);
sync_mm_rss(mm); sync_mm_rss(mm);
local_irq_disable();
tsk->mm = NULL; tsk->mm = NULL;
/* active_mm is still 'mm' */ /* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk); enter_lazy_tlb(mm, tsk);
local_irq_enable();
task_unlock(tsk); task_unlock(tsk);
} }
EXPORT_SYMBOL_GPL(kthread_unuse_mm); EXPORT_SYMBOL_GPL(kthread_unuse_mm);