Commit Graph

635 Commits

Author SHA1 Message Date
Tvrtko Ursulin
acd2784562 drm/i915: Simplify intel_init_ring_buffer prototype
Engine contains dev_priv so need to pass it in.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
2016-07-14 11:17:17 +01:00
Tvrtko Ursulin
c78d606134 drm/i915: Make more use of the shared engine irq setup
Use more of the shared engine setup data for legacy engine
initialization. This time to simplify the irq initialization
code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
2016-07-14 11:17:12 +01:00
Tvrtko Ursulin
8b3e2d3639 drm/i915: Unify engine init loop
With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.

v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
2016-07-14 11:17:06 +01:00
Dave Gordon
c2c7f24008 drm/i915: unify first-stage engine struct setup
intel_lrc.c has a table of "logical rings" (meaning engines), while
intel_ringbuffer.c has separately open-coded initialisation for each
engine. We can deduplicate this somewhat by using the same first-stage
engine-setup function for both modes.

So here we expose the function that transfers information from the
static table of (all) known engines to the dev_priv->engine array of
engines available on this device (adjusting the names along the way)
and then embed calls to it in both the LRC and the legacy-mode setup.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-07-14 11:16:18 +01:00
Ville Syrjälä
035ea405c9 drm/i915: Unbreak interrupts on pre-gen6
Prior to gen6 we didn't have per-ring IMR registers, which means that
since commit 61ff75ac20 ("drm/i915: Simplify enabling
user-interrupts with L3-remapping") we're now masking off all interrupts
when init_render_ring() gets called. That's rather rude. Let's limit
the ring IMR frobbing to machines that actually have the per-ring IMR
registers.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 61ff75ac20 ("drm/i915: Simplify enabling user-interrupts with L3-remapping")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468340687-3596-1-git-send-email-ville.syrjala@linux.intel.com
Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-07-13 17:04:00 +03:00
Chris Wilson
91c8a326a1 drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.

   text	   data	    bss	    dec	    hex	filename
1068757	   4565	    416	1073738	 10624a	drivers/gpu/drm/i915/i915.ko
1066949	   4565	    416	1071930	 105b3a	drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)

and for good measure the dev_priv->dev backpointer was removed entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
2016-07-05 11:58:45 +01:00
Chris Wilson
fac5e23e3c drm/i915: Mass convert dev->dev_private to to_i915(dev)
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().

   text    data     bss     dec     hex filename
1073824    4562     416 1078802  107612 drivers/gpu/drm/i915/i915.ko
1068976    4562     416 1073954  106322 drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:

@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04 12:54:07 +01:00
Chris Wilson
7b4d3a16dd drm/i915: Remove stop-rings debugfs interface
Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:23 +01:00
Chris Wilson
61ff75ac20 drm/i915: Simplify enabling user-interrupts with L3-remapping
Borrow the idea from intel_lrc.c to precompute the mask of interrupts we
wish to always enable to avoid having lots of conditionals inside the
interrupt enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:04:17 +01:00
Chris Wilson
31bb59cc01 drm/i915: Move the get/put irq locking into the caller
With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
we can reduce the code size by moving the common preamble into the
caller, and we can also eliminate the reference counting.

For completeness, as we are no longer doing reference counting on irq,
rename the get/put vfunctions to enable/disable respectively and are
able to review the use of posting reads. We only require the
serialisation with hardware when enabling the interrupt (i.e. so we
cannot miss an interrupt by going to sleep before the hardware truly
enables it).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:04:16 +01:00
Chris Wilson
f8973c217f drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.

Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.

Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:00:53 +01:00
Chris Wilson
7d5ea80720 drm/i915: Refactor scratch object allocation for gen2 w/a buffer
The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch
buffer. If we pass in the size we want to allocate for the scratch
buffer, both callers can use the same routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:00:50 +01:00
Chris Wilson
de8fe1663a drm/i915: Allocate scratch page from stolen
With the last direct CPU access to the scratch page removed, we can now
allocate it from our small amount of reserved system pages (stolen
memory).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-10-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:59:45 +01:00
Chris Wilson
f8291952bd drm/i915: Stop mapping the scratch page into CPU space
After the elimination of using the scratch page for Ironlake's
breadcrumb, we no longer need to kmap the object. We therefore can move
it into the high unmappable space and do not need to force the object to
be coherent (i.e. snooped on !llc platforms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:48 +01:00
Chris Wilson
1b7744e7ba drm/i915: Use HWS for seqno tracking everywhere
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

v2: Fix semaphore_passed() to look at the signaling engine (not the
waiter's)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:48 +01:00
Chris Wilson
688e6c7258 drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:43 +01:00
Chris Wilson
ed00307893 drm/i915/ringbuffer: Move all default irq vfuncs init to a separate func
Just plonk all the default irq vfuncs together in one function to keep
the initialisers of reasonable size.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-07-01 09:48:24 +01:00
Chris Wilson
6f7bef75d1 drm/i915/ringbuffer: Move all generic engine->dispatch_batchbuffer together
Consolidate the block of default vfuncs for dispatching the batchbuffer.
Just a minor tweak on top of Tvrtko's great job of tidying up the vfunc
initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-07-01 09:48:00 +01:00
Tvrtko Ursulin
8d228911ff drm/i915: Trim some if-else braces
Just a bit of cleanup after the previous refactoring.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467212972-861-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-06-30 17:20:45 +01:00
Tvrtko Ursulin
4b8e38a9ef drm/i915: Consolidate legacy semaphore initialization
Replace per-engine initialization with a common half-programatic,
half-data driven code for ease of maintenance and compactness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:45 +01:00
Tvrtko Ursulin
c38c651b39 drm/i915: Compact gen8_ring_sync
Store the semaphore offset in a temporary variable to avoid
having to get the VMA offset twice.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:45 +01:00
Tvrtko Ursulin
1b9e665064 drm/i915: Compact Gen8 semaphore initialization
Replace the macro initializer with a programatic loop which
results in smaller code and hopefully just as clear.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:45 +01:00
Tvrtko Ursulin
db3d4019ab drm/i915: Move semaphore object creation into intel_ring_init_semaphores
The object needs to be created before semaphores can be initialized
on any ring and it makes sense to pull it out to this semaphore
dedicated helper.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:45 +01:00
Tvrtko Ursulin
d9a64610b2 drm/i915: Consolidate semaphore vfuncs init
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-30 17:20:44 +01:00
Tvrtko Ursulin
960ecaad75 drm/i915: Consolidate dispatch_execbuffer vfunc
v2: Put dispatch_execbuffer before add_request. (Chris Wilson)
v3: Fix add_request and irq_seqno_barrier for gen8+.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:44 +01:00
Tvrtko Ursulin
1d8a1337f3 drm/i915: Consolidate init_hw vfunc
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:44 +01:00
Tvrtko Ursulin
604096d778 drm/i915: Consolidate get/set_seqno
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:44 +01:00
Tvrtko Ursulin
b9700325cb drm/i915: Consolidate get and put irq vfuncs
v2: Consistent INTEL_GEN vs IS_GEN usage. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:44 +01:00
Tvrtko Ursulin
cc54a82830 drm/i915: Consolidate seqno_barrier vfunc
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:43 +01:00
Tvrtko Ursulin
7445a2a413 drm/i915: Consolidate add_request vfunc
All engines apart from render select this based on Gen.

Move it to the common helper as well.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:43 +01:00
Tvrtko Ursulin
06a2fe2279 drm/i915: Consolidate write_tail vfunc initializer
Introduce a function which initializes vfuncs mostly common
across engines and move write_tail initialization in it since
only one engine overrides the default.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:43 +01:00
Chris Wilson
76f8421f2a drm/i915: Perform Sandybridge BSD tail write under the forcewake
Since we have a sequence of register reads and writes, we can reduce the
latency of starting the BSD ring by performing all the mmio operations
under the same forcewake wakeref.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-62-git-send-email-chris@chris-wilson.co.uk
2016-06-30 15:42:34 +01:00
Chris Wilson
d54fe4aad7 drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()
By using the out-of-line intel_wait_for_register() not only do we can
efficiency from using the hybrid wait_for() contained within, but we
avoid code bloat from the numerous inlined loops, in total (all patches):

   text    data     bss     dec     hex filename
1078551    4557     416 1083524  108884 drivers/gpu/drm/i915/i915.ko
1070775    4557     416 1075748  106a24 drivers/gpu/drm/i915/i915.ko

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-48-git-send-email-chris@chris-wilson.co.uk
2016-06-30 15:42:27 +01:00
Chris Wilson
3d808eb171 drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()
By using the out-of-line intel_wait_for_register() not only do we can
efficiency from using the hybrid wait_for() contained within, but we
avoid code bloat from the numerous inlined loops, in total (all patches):

   text    data     bss     dec     hex filename
1078551    4557     416 1083524  108884 drivers/gpu/drm/i915/i915.ko
1070775    4557     416 1075748  106a24 drivers/gpu/drm/i915/i915.ko

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-47-git-send-email-chris@chris-wilson.co.uk
2016-06-30 15:42:26 +01:00
Chris Wilson
25ab57f42f drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register()
By using the out-of-line intel_wait_for_register() not only do we can
efficiency from using the hybrid wait_for() contained within, but we
avoid code bloat from the numerous inlined loops, in total (all patches):

   text    data     bss     dec     hex filename
1078551    4557     416 1083524  108884 drivers/gpu/drm/i915/i915.ko
1070775    4557     416 1075748  106a24 drivers/gpu/drm/i915/i915.ko

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-46-git-send-email-chris@chris-wilson.co.uk
2016-06-30 15:42:26 +01:00
Chris Wilson
c7c3c07d16 drm/i915: Treat kernel context as initialised
The kernel context exists simply as a placeholder and should never be
executed with a render context. It does not need the golden render
state, as that will always be applied to a user context. By skipping the
initialisation we can avoid issues in attempting to program the golden
render context when trying to make the hardware idle.

v2: Rebase

Testcase: igt/drm_module_reload_basic #byt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95634
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-3-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:02:44 +01:00
Chris Wilson
0cb26a8ed1 drm/i915: Move legacy kernel context pinning to intel_ringbuffer.c
This is so that we have symmetry with intel_lrc.c and avoid a source of
if (i915.enable_execlists) layering violation within i915_gem_context.c -
that is we move the specific handling of the dev_priv->kernel_context
for legacy submission into the legacy submission code.

This depends upon the init/fini ordering between contexts and engines
already defined by intel_lrc.c, and also exporting the context alignment
required for pinning the legacy context.

v2: Separate out pin/unpin context funcs for greater symmetry with
intel_lrc. One more step towards unifying behaviour between the two
classes of engines and towards fixing another bug in i915_switch_context
vs requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:02:34 +01:00
arun.siluvery@linux.intel.com
780f0aebda drm/i915/bxt: Add WaDisablePooledEuLoadBalancingFix
This is a WA affecting pooled eu which is a bxt specific feature.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Winiarski, Michal <michal.winiarski@intel.com>
Cc: Zou, Nanhai <nanhai.zou@intel.com>
Cc: Yang, Rong R <rong.r.yang@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-14 10:44:07 +01:00
Tim Gore
a8ab5ed5e1 drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate
This patch enables a workaround for a mid thread preemption
issue where a hardware timing problem can prevent the
context restore from happening, leading to a hang.

v2: move to gen9_init_workarounds (Arun)
v3: move to start of gen9_init_workarounds (Arun)

Signed-off-by: Tim Gore <tim.gore@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com
2016-06-13 14:48:02 +01:00
Mika Kuoppala
71dce58c8e drm/i915/skl: Extend WaDisableChickenBitTSGBarrierAckForFFSliceCS
There is ambiguity in the documentation between D0 and E0.
Extend this workaround to E0.

References: BSID#779
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-23-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:26:05 +03:00
Mika Kuoppala
954337aa96 drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing
This is needed for all kbl revision.

v2: Don't add revid checks to generic gen9 init (Arun)

References: HSD#2135593
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-21-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:25:49 +03:00
Mika Kuoppala
4de5d7ccbc drm/i915/kbl: Add WaDisableGafsUnitClkGating
We need to disable clock gating in this unit to work around
hardware issue causing possible corruption/hang.

v2: name the bit (Ville)
v3: leave the fix enabled for 2227050 and set correct bit (Matthew)
v4: Split out the skl part in separate commit for easier backport

References: HSD#2227156, HSD#2227050
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-20-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:25:41 +03:00
Mika Kuoppala
ad2bdb44b1 drm/i915: Add WaInsertDummyPushConstP for bxt and kbl
Add this workaround for both bxt and kbl up to until
rev B0.

References: HSD#2136703
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-16-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:24:54 +03:00
Mika Kuoppala
c0b730d572 drm/i915/kbl: Add WaDisableDynamicCreditSharing
Bspec states that we need to turn off dynamic credit
sharing on kbl revid a0 and b0. This happens by writing bit 28
on 0x4ab8.

References: HSD#2225601, HSD#2226938, HSD#2225763
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-15-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:24:34 +03:00
Mika Kuoppala
fe90581987 drm/i915/kbl: Add WaDisableLSQCROPERFforOCL
Extend the scope of this workaround, already used in skl,
to also take effect in kbl.

v2: Fix KBL_REVID_E0 (Matthew)

References: HSD#2132677
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-12-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:24:03 +03:00
Mika Kuoppala
8401d42fd5 drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0
Add this workaround for kbl revid A0 only.

v2: rebase
v3: carve out a non related workaround (Chris)

References: HSD#1911714
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-9-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:22:44 +03:00
Mika Kuoppala
e587f6cb0a drm/i915/kbl: Add WaEnableGapsTsvCreditFix
We need this crucial workaround from skl also to all kbl revisions.
Lack of it was causing system hangs on skl enabling so this is
a must have.

v2: Don't add revid checks to gen9 init workarounds (Arun)

References: HSD#2126660
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-8-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:21:39 +03:00
Mika Kuoppala
bbaefe72a0 drm/i915: Mimic skl with WaForceEnableNonCoherent
Past evidence with system hangs and hsds tie
WaForceEnableNonCoherent and WaDisableHDCInvalidation to
WaForceContextSaveRestoreNonCoherent. Documentation
states that WaForceContextSaveRestoreNonCoherent would
not be needed on skl past E0 but evidence proved otherwise. See
commit <510650e8b2ab> ("drm/i915/skl: Fix spurious gpu hang with gt3/gt4
revs"). In this scope consider kbl to be skl with a bigger revision than
E0 so play it safe and bind these two workarounds to the
WaForceContextSaveRestoreNonCoherent, and apply to all gen9.

v2: fix comment (Matthew)

References: HSD#2134449, HSD#2131413
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-7-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:21:22 +03:00
Mika Kuoppala
5b0e365929 drm/i915/gen9: Always apply WaForceContextSaveRestoreNonCoherent
The revision id range for this workaround has changed. So apply
it to all revids on all gen9.

References: HSD#2134449
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-6-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:20:33 +03:00
Mika Kuoppala
e5f81d65ac drm/i915/kbl: Init gen9 workarounds
Kabylake is part of gen9 family so init the generic gen9
workarounds for it.

v2: rebase

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-3-git-send-email-mika.kuoppala@intel.com
2016-06-08 16:19:53 +03:00