Commit Graph

298 Commits

Author SHA1 Message Date
Vineet Gupta
aa0efcde45 ARCv2: smp: [plat-*]: No need to explicitly call mcip_init_smp()
MCIP now registers it's own per cpu setup routine (for IPI IRQ request)
using smp_ops.init_irq_cpu().

So no need for platforms to do that. This now completely decouples
platforms from MCIP.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:41 +05:30
Vineet Gupta
286130ebf1 ARC: smp: Introduce smp hook @init_irq_cpu called for all cores
Note this is not part of platform owned static machine_desc,
but more of device owned plat_smp_ops (rather misnamed) which a IPI
provider or some such typically defines.

This will help us seperate out the IPI registration from platform
specific init_cpu_smp() into device specific init_irq_cpu()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:41 +05:30
Vineet Gupta
8721a7f5a6 ARC: smp: Rename platform hook @init_smp -> @init_cpu_smp
This conveys better that it is called for each cpu

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:40 +05:30
Vineet Gupta
26b8f99623 ARCv2: smp: [plat-*]: No need to explicitly call mcip_init_early_smp()
MCIP now registers it's own probe callback with smp_ops.init_early_smp()
which is called by ARC common code, so no need for platforms to do that.

This decouples the platforms and MCIP and helps confine MCIP details
to it's own file.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:40 +05:30
Vineet Gupta
e55af4da02 ARC: smp: Introduce smp hook @init_early_smp for Master core
This adds a platform agnostic early SMP init hook which is called on
Master core before calling setup_processor()

  setup_arch()
     smp_init_cpus()
         smp_ops.init_early_smp()
     ...
     setup_processor()

How this helps:
 - Used for one time init of certain SMP centric IP blocks, before
   calling setup_processor() which probes various bits of core,
   possibly including this block

 - Currently platforms need to call this IP block init from their
   init routines, which doesn't make sense as this is specific to ARC
   core and not platform and otherwise requires copy/paste in all
   (and hence a possible point of failure)

e.g. MCIP init is called from 2 platforms currently (axs10x and sim)
which will go away once we have this.

This change only adds the hooks but they are empty for now. Next commit
will populate them and remove the explicit init calls from platforms.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:40 +05:30
Vineet Gupta
4c82f28617 ARC: remove @init_time, @init_irq platform callbacks
These are not in use for ARC platforms. Moreover DT mechanims exist to
probe them w/o explicit platform calls.

 - clocksource drivers can use CLOCKSOURCE_OF_DECLARE()
 - intc IRQCHIP_DECLARE() calls + cascading inside DT allows external
   intc to be probed automatically

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:39 +05:30
Vineet Gupta
e0868e6f67 ARC: smp: irqchip: handle IPI as percpu irq like timer
The reason this was not done so far was lack of genuine IPI_IRQ for
ARC700, as we don't have a SMP version of core yet (which might change
soon thx to EZChip). Nevertheles to increase the build coverage, we
need to allow CONFIG_SMP for ARC700 and still be able to run it on a
UP platform (nsim or AXS101) with a UP Device Tree (SMP-on-UP)

The build itself requires some define for IPI_IRQ and even a dummy
value is fine since that code won't run anyways.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 16:13:39 +05:30
Vineet Gupta
d0890ea5b6 ARC: boot log: decode more mmu config items
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:26 +05:30
Vineet Gupta
964cf28f9d ARC: boot log: move helper macros to header for reuse
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:25 +05:30
Vineet Gupta
b598e17f6a ARC: mm: compute TLB size as needed from ways * sets
This frees up some bits to hold more high level info such as PAE being
present, w/o increasing the size of already bloated cpuinfo struct

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:25 +05:30
Vineet Gupta
5c35ee642a ARC: make write_aux_reg safer against macro substitution
It was generating warnings when called as write_aux_reg(x, paddr >> 32)

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:24 +05:30
Vineet Gupta
55a2ae775a ARC: [arcompact] entry.S: Improve early return from exception
The requirement is to
 - Reenable Exceptions (AE cleared)
 - Reenable Interrupts (E1/E2 set)

We need to do wiggle these bits into ERSTATUS and call RTIE.

Prev version used the pre-exception STATUS32 as starting point for what
goes into ERSTATUS. This required explicit fixups of U/DE/L bits.

Instead, use the current (in-exception) STATUS32 as starting point.
Being in exception handler U/DE/L can be safely assumed to be correct.
Only AE/E1/E2 need to be fixed.

So the new implementation is slightly better
 -Avoids read form memory
 -Is 4 bytes smaller for the typical 1 level of intr configuration
 -Depicts the semantics more clearly

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:22 +05:30
Vineet Gupta
9dbd3d9bfd ARC: [arcompact] don't check for hard isr calling local_irq_enable()
Historically this was done by ARC IDE driver, which is long gone.
IRQ core is pretty robust now and already checks if IRQs are enabled
in hard ISRs. Thus no point in checking this in arch code, for every
call of irq enabled.

Further if some driver does do that - let it bring down the system so we
notice/fix this sooner than covering up for sucker

This makes local_irq_enable() - for L1 only case atleast simple enough
so we can inline it.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:22 +05:30
Vineet Gupta
c7119d56d2 ARCv2: mm: THP: flush_pmd_tlb_range make SMP safe
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:21 +05:30
Vineet Gupta
722fe8fd36 ARCv2: mm: THP: Implement flush_pmd_tlb_range() optimization
Implement the TLB flush routine to evict a sepcific Super TLB entry,
vs. moving to a new ASID on every such flush.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:21 +05:30
Vineet Gupta
fe6c1b8611 ARCv2: mm: THP support
MMUv4 in HS38x cores supports Super Pages which are basis for Linux THP
support.

Normal and Super pages can co-exist (ofcourse not overlap) in TLB with a
new bit "SZ" in TLB page desciptor to distinguish between them.
Super Page size is configurable in hardware (4K to 16M), but fixed once
RTL builds.

The exact THP size a Linx configuration will support is a function of:
 - MMU page size (typical 8K, RTL fixed)
 - software page walker address split between PGD:PTE:PFN (typical
   11:8:13, but can be changed with 1 line)

So for above default, THP size supported is 8K * 256 = 2M

Default Page Walker is 2 levels, PGD:PTE:PFN, which in THP regime
reduces to 1 level (as PTE is folded into PGD and canonically referred
to as PMD).

Thus thp PMD accessors are implemented in terms of PTE (just like sparc)

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:18 +05:30
Vineet Gupta
24830fc782 ARC: mm: Introduce PTE_SPECIAL
Needed for THP, but will also come in handy for fast GUP later

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-09 17:04:23 +05:30
Vineet Gupta
129cbed54a ARC: mm: pte flags comsetic cleanups, comments
No semantical changes

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-09 17:04:22 +05:30
Vineet Gupta
e8a75963a4 ARC: mm: switch pgtable_to to pte_t *
ARC is the only arch with unsigned long type (vs. struct page *).
Historically this was done to avoid the page_address() calls in various
arch hooks which need to get the virtual/logical address of the table.

Some arches alternately define it as pte_t *, and is as efficient as
unsigned long (generated code doesn't change)

Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-09 17:04:22 +05:30
Linus Torvalds
30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Vineet Gupta
9b28829d6d ARCv2: perf: Finally introduce HS perf unit
With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:59:07 +05:30
Alexey Brodkin
e6b1d126bb ARCv2: perf: implement exclusion of event counting in user or kernel mode
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:58:14 +05:30
Alexey Brodkin
36481cf7fb ARCv2: perf: Support sampling events using overflow interrupts
In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.

Put simply only "perf stat" was functional.

Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.

ARC performance counters act in the following way in regard of
interrupts generation.
 [1] A counter counts starting from value set in PCT_COUNT register pair
 [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised

Basic setup look like this:
 [1] PCT_COUNT = 0;
 [2] PCT_INT_CNT = __limit_value__;
 [3] Enable interrupts for that counter and let it run
 [4] Let counter reach its limit
 [5] Handle interrupt when it happens

Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:57:43 +05:30
Vineet Gupta
fb7c572551 ARC: perf: cap the number of counters to hardware max of 32
The number of counters in PCT can never be more than 32 (while
countable conditions could be 100+) for both ARCompact and ARCv2

And while at it update copyright dates.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27 14:57:03 +05:30
Vineet Gupta
090749502f ARC: add/fix some comments in code - no functional change
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 19:05:49 +05:30
Yuriy Kolerov
6de6066c0d ARC: change some branchs to jumps to resolve linkage errors
When kernel's binary becomes large enough (32M and more) errors
may occur during the final linkage stage. It happens because
the build system uses short relocations for ARC  by default.
This problem may be easily resolved by passing -mlong-calls
option to GCC to use long absolute jumps (j) instead of short
relative branchs (b).

But there are fragments of pure assembler code exist which use
branchs in inappropriate places and cause a linkage error because
of relocations overflow.

First of these fragments is .fixup insertion in futex.h and
unaligned.c. It inserts a code in the separate section (.fixup)
with branch instruction. It leads to the linkage error when
kernel becomes large.

Second of these fragments is calling scheduler's functions
(common kernel code) from entry.S of ARC's code. When kernel's
binary becomes large it may lead to the linkage error because
scheduler may occur far enough from ARC's code in the final
binary.

Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:53:15 +05:30
Vineet Gupta
eb2cd8b72b ARC: ensure futex ops are atomic in !LLSC config
W/o hardware assisted atomic r-m-w the best we can do is to disable
preemption.

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:01 +05:30
Vineet Gupta
882a95ae0a ARC: make futex_atomic_cmpxchg_inatomic() return bimodal
Callers of cmpxchg_futex_value_locked() in futex code expect bimodal
return value:
  !0 (essentially -EFAULT as failure)
   0 (success)

Before this patch, the success return value was old value of futex,
which could very well be non zero, causing caller to possibly take the
failure path erroneously.

Fix that by returning 0 for success

(This fix was done back in 2011 for all upstream arches, which ARC
obviously missed)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:00 +05:30
Vineet Gupta
ed574e2bbd ARC: futex cosmetics
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:16:00 +05:30
Vineet Gupta
31d30c8208 ARC: add barriers to futex code
The atomic ops on futex need to provide the full barrier just like
regular atomics in kernel.

Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic()
as core code already does that

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:15:59 +05:30
Alexey Brodkin
f2b0b25a37 ARCv2: Support IO Coherency and permutations involving L1 and L2 caches
In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
 [1] Only L1 cache exists
 [2] Both L1 and L2 exist, but no IO coherency unit
 [3] L1, L2 caches and IO coherency unit exist

Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.

This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.

As for IOC support for it we need:
 [a] Implement empty DMA ops because IOC takes care of cache
     coherency with DMAed data
 [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
     This is required to make IOC work in first place and also
     serves as optimization as LD/ST to coherent buffers can be
     srviced from caches w/o going all the way to memory

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta:
  -Added some comments about IOC gains
  -Marked dma ops as static,
  -Massaged changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:11:17 +05:30
Vineet Gupta
1097163870 ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff
The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow

This can be done in 1 using ROtate Left (ROL)

Suggested-by: Nigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-07 13:56:16 +05:30
Vineet Gupta
87ce62802f ARC: Make pt_regs regs unsigned
KGDB fails to build after f51e2f1911 ("ARC: make sure instruction_pointer()
returns unsigned value")

The hack to force one specific reg to unsigned backfired. There's no
reason to keep the regs signed after all.

|  CC      arch/arc/kernel/kgdb.o
|../arch/arc/kernel/kgdb.c: In function 'kgdb_trap':
| ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment
|   instruction_pointer(regs) -= BREAK_INSTR_SIZE;

Reported-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Fixes: f51e2f1911 ("ARC: make sure instruction_pointer() returns unsigned value")
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-05 11:48:21 +05:30
Vineet Gupta
b89aa12c17 ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
The previous commit for delayed retry of SCOND needs some fine tuning
for spin locks.

The backoff from delayed retry in conjunction with spin looping of lock
itself can potentially cause the delay counter to reach high values.
So to provide fairness to any lock operation, after a lock "seems"
available (i.e. just before first SCOND try0, reset the delay counter
back to starting value of 1

Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:35 +05:30
Vineet Gupta
e78fdfef84 ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
This is to workaround the llock/scond livelock

HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
coherency transactions in the SCU. The exclusive line state keeps rotating
among contenting cores leading to a never ending cycle. So break the cycle
by deferring the retry of failed exclusive access (SCOND). The actual delay
needed is function of number of contending cores as well as the unrelated
coherency traffic from other cores. To keep the code simple, start off with
small delay of 1 which would suffice most cases and in case of contention
double the delay. Eventually the delay is sufficient such that the coherency
pipeline is drained, thus a subsequent exclusive access would succeed.

Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.com
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:34 +05:30
Vineet Gupta
69cbe630f5 ARC: LLOCK/SCOND based rwlock
With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

This in turn elides the EXchange instruction based spinning which causes
the cacheline transition to exclusive state and concurrent spinning
across cores would cause the line to keep bouncing around.
LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps
the cacheline in shared state.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta
ae7eae9e03 ARC: LLOCK/SCOND based spin_lock
Current spin_lock uses EXchange instruction to implement the atomic test
and set of lock location (reads orig value and ST 1). This however forces
the cacheline into exclusive state (because of the ST) and concurrent
loops in multiple cores will bounce the line around between cores.

Instead, use LLOCK/SCOND to implement the atomic test and set which is
better as line is in shared state while lock is spinning on LLOCK

The real motivation of this change however is to make way for future
changes in atomics to implement delayed retry (with backoff).
Initial experiment with delayed retry in atomics combined with orig
EX based spinlock was a total disaster (broke even LMBench) as
struct sock has a cache line sharing an atomic_t and spinlock. The
tight spinning on lock, caused the atomic retry to keep backing off
such that it would never finish.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta
8ac0665fb6 ARC: refactor atomic inline asm operands with symbolic names
This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:32 +05:30
Vineet Gupta
f5959cb0c3 Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock"
Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()

So remove this and make way for a proper workaround

This reverts commit a5c8b52abe.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:31 +05:30
Vineet Gupta
e13c42ecbe ARCv2: Fix the peripheral address space detection
With HS 2.1 release, the peripheral space register no longer contains
the uncached space specifics, causing the kernel to panic early on.
So read the newer NON VOLATILE AUX register to get that info.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-03 19:34:07 +05:30
Peter Zijlstra
de9e432cb5 atomic: Collapse all atomic_{set,clear}_mask definitions
Move the now generic definitions of atomic_{set,clear}_mask() into
linux/atomic.h to avoid endless and pointless repetition.

Also, provide an atomic_andnot() wrapper for those few archs that can
implement that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra
e6942b7de2 atomic: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Peter Zijlstra
cda7e4137a arc: Provide atomic_{or,xor,and}
Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:21 +02:00
Laurent Dufour
f2abeef9fd mm: clean up per architecture MM hook header files
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.

As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.

The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17 16:39:53 -07:00
Alexey Brodkin
f51e2f1911 ARC: make sure instruction_pointer() returns unsigned value
Currently instruction_pointer() returns pt_regs->ret and so return value
is of type "long", which implicitly stands for "signed long".

While that's perfectly fine when dealing with 32-bit values if return
value of instruction_pointer() gets assigned to 64-bit variable sign
extension may happen.

And at least in one real use-case it happens already.
In perf_prepare_sample() return value of perf_instruction_pointer()
(which is an alias to instruction_pointer() in case of ARC) is assigned
to (struct perf_sample_data)->ip (which type is "u64").

And what we see if instuction pointer points to user-space application
that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
leading 32 zeros. But if instruction pointer points to kernel address
space that starts from 0x8000_0000 then "ip" is set with 32 leadig
"f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.

In particular that issuse broke output of perf, because perf was unable
to associate addresses like 0xffff_ffff__8100_0000 with anything from
/proc/kallsyms.

That's what we used to see:
 ----------->8----------
  6.27%  ls       [unknown]                [k] 0xffffffff8046c5cc
  2.96%  ls       libuClibc-0.9.34-git.so  [.] memcpy
  2.25%  ls       libuClibc-0.9.34-git.so  [.] memset
  1.66%  ls       [unknown]                [k] 0xffffffff80666536
  1.54%  ls       libuClibc-0.9.34-git.so  [.] 0x000224d6
  1.18%  ls       libuClibc-0.9.34-git.so  [.] 0x00022472
 ----------->8----------

With that change perf output looks much better now:
 ----------->8----------
  8.21%  ls       [kernel.kallsyms]        [k] memset
  3.52%  ls       libuClibc-0.9.34-git.so  [.] memcpy
  2.11%  ls       libuClibc-0.9.34-git.so  [.] malloc
  1.88%  ls       libuClibc-0.9.34-git.so  [.] memset
  1.64%  ls       [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
  1.41%  ls       [kernel.kallsyms]        [k] __d_lookup_rcu
 ----------->8----------

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-07-13 13:33:18 +05:30
Vineet Gupta
9138d4138d ARC: Add llock/scond to futex backend
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-07-09 17:36:33 +05:30
Vineet Gupta
80f420842f ARC: Make ARC bitops "safer" (add anti-optimization)
ARCompact/ARCv2 ISA provide that any instructions which deals with
bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
lower 5 bits. i.e. auto-clamp the pos to 0-31.

ARC Linux bitops exploited this fact by NOT explicitly masking out upper
bits for @nr operand in general, saving a bunch of AND/BMSK instructions
in generated code around bitops.

While this micro-optimization has worked well over years it is NOT safe
as shifting a number with a value, greater than native size is
"undefined" per "C" spec.

So as it turns outm EZChip ran into this eventually, in their massive
muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
from 63 to 0 and gcc was weirdly optimizing away the first iteration
(so it was really adhering to standard by implementing undefined behaviour
vs. removing all the iterations which were phony i.e. (1 << [63..32])

| for i = 63 to 0
|    X = ( 1 << i )
|    if X == 0
|       continue

So fix the code to do the explicit masking at the expense of generating
additional instructions. Fortunately, this can be mitigated to a large
extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
masking into shift operation itself. It is currently not enabled in ARC
gcc backend, but could be done after a bit of testing.

Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")

Reported-by: Noam Camus <noamc@ezchip.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-07-09 17:36:32 +05:30
Chris Metcalf
a6e2f029ae Make asm/word-at-a-time.h available on all architectures
Added the x86 implementation of word-at-a-time to the
generic version, which previously only supported big-endian.

Omitted the x86-specific load_unaligned_zeropad(), which in
any case is also not present for the existing BE-only
implementation of a word-at-a-time, and is only used under
CONFIG_DCACHE_WORD_ACCESS.

Added as a "generic-y" to the Kbuilds of all architectures
that didn't previously have it.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-07-08 16:41:55 -04:00
Linus Torvalds
2d01eedf1d Merge branch 'akpm' (patches from Andrew)
Merge third patchbomb from Andrew Morton:

 - the rest of MM

 - scripts/gdb updates

 - ipc/ updates

 - lib/ updates

 - MAINTAINERS updates

 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (67 commits)
  genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
  genalloc: rename dev_get_gen_pool() to gen_pool_get()
  x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
  MAINTAINERS: add zpool
  MAINTAINERS: BCACHE: Kent Overstreet has changed email address
  MAINTAINERS: move Jens Osterkamp to CREDITS
  MAINTAINERS: remove unused nbd.h pattern
  MAINTAINERS: update brcm gpio filename pattern
  MAINTAINERS: update brcm dts pattern
  MAINTAINERS: update sound soc intel patterns
  MAINTAINERS: remove website for paride
  MAINTAINERS: update Emulex ocrdma email addresses
  bcache: use kvfree() in various places
  libcxgbi: use kvfree() in cxgbi_free_big_mem()
  target: use kvfree() in session alloc and free
  IB/ehca: use kvfree() in ipz_queue_{cd}tor()
  drm/nouveau/gem: use kvfree() in u_free()
  drm: use kvfree() in drm_free_large()
  cxgb4: use kvfree() in t4_free_mem()
  cxgb3: use kvfree() in cxgb_free_mem()
  ...
2015-07-01 17:47:51 -07:00