Commit Graph

144 Commits

Author SHA1 Message Date
Michel Thierry
acdd884a2e drm/i915/bdw: Two-stage execlist submit process
Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:10:59 +02:00
Oscar Mateo
582d67f0b1 drm/i915: Add temporary ring->ctx backpointer
The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 18:42:59 +02:00
Oscar Mateo
1564858526 drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Ditch the check for aliasing ppgtt. It'll break soon and
execlists requires full ppgtt anyway.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:12:34 +02:00
Oscar Mateo
73d477f6bb drm/i915/bdw: Interrupts with logical rings
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:06:58 +02:00
Oscar Mateo
4712274c36 drm/i915/bdw: GEN-specific logical ring emit flush
Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:44:37 +02:00
Oscar Mateo
4da46e1e5b drm/i915/bdw: GEN-specific logical ring emit request
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:49 +02:00
Oscar Mateo
82e104cc26 drm/i915/bdw: New logical ring submission mechanism
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:36 +02:00
Oscar Mateo
9b1136d505 drm/i915/bdw: GEN-specific logical ring init
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 17:03:28 +02:00
Oscar Mateo
48d823878d drm/i915/bdw: Generic logical ring init and cleanup
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:55:17 +02:00
Daniel Vetter
0c7dd53b84 drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:18:17 +02:00
Oscar Mateo
84c2377fce drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:10:58 +02:00
Mika Kuoppala
f260fe7b2f drm/i915: Don't accumulate hangcheck score on forward progress
If the actual head has progressed forward inside a batch (request),
don't accumulate hangcheck score.

As the hangcheck score in increased only by acthd jumping backwards,
the result is that we only declare an active batch as stuck if it is
trapped inside a loop. Or that the looping will dominate the batch
progression so that it overcomes the bonus that forward progress gives.

v2: Improved commit message (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: s/active_loop/active (loop)/ as requested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-07 14:04:07 +02:00
Oscar Mateo
1b5d063faf drm/i915: Generalize intel_ring_get_tail to take a ringbuf
Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:02 +02:00
Ben Widawsky
3e78998a58 drm/i915/bdw: implement semaphore signal
Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)

v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)

v4: Fixed compilation rebase conflict
commit 1ec9e26dda
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Feb 14 14:01:11 2014 +0100

    drm/i915: Consolidate binding parameters into flags

v5: VCS2 rebase
Replace hweight_long with hweight32

v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
   	      * fixing conflicst on rebase
    	      * minor fixes on address table
	      * remove WARN_ON

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:16:23 +02:00
Rodrigo Vivi
ddd4dbc6c1 drm/i915: Updating comments.
ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:02:49 +02:00
Chris Wilson
4be173813e drm/i915: Reorder semaphore deadlock check
If a semaphore is waiting on another ring, which in turn happens to be
waiting on the first ring, but that second semaphore has been signalled,
we will be able to kick the second ring and so can treat the first ring
as a valid WAIT and not as HUNG.

v2: Be paranoid and cap the potential recursion depth whilst visiting
the semaphore signallers. (Mika)

References: https://bugs.freedesktop.org/show_bug.cgi?id=54226
References: https://bugs.freedesktop.org/show_bug.cgi?id=75502
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-06-11 11:06:43 +03:00
Oscar Mateo
273497e5cd drm/i915: s/i915_hw_context/intel_context
Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:41:17 +02:00
Oscar Mateo
93b0a4e0b2 drm/i915: Split the ringbuffers from the rings (3/3)
Manual cleanup after the previous Coccinelle script.

Yes, I could write another Coccinelle script to do this but I
don't want labor-replacing robots making an honest programmer's
work obsolete (also, I'm lazy).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:30:18 +02:00
Oscar Mateo
ee1b1e5ef3 drm/i915: Split the ringbuffers from the rings (2/3)
This refactoring has been performed using the following Coccinelle
semantic script:

    @@
    struct intel_engine_cs r;
    @@
    (
    - (r).obj
    + r.buffer->obj
    |
    - (r).virtual_start
    + r.buffer->virtual_start
    |
    - (r).head
    + r.buffer->head
    |
    - (r).tail
    + r.buffer->tail
    |
    - (r).space
    + r.buffer->space
    |
    - (r).size
    + r.buffer->size
    |
    - (r).effective_size
    + r.buffer->effective_size
    |
    - (r).last_retired_head
    + r.buffer->last_retired_head
    )

    @@
    struct intel_engine_cs *r;
    @@
    (
    - (r)->obj
    + r->buffer->obj
    |
    - (r)->virtual_start
    + r->buffer->virtual_start
    |
    - (r)->head
    + r->buffer->head
    |
    - (r)->tail
    + r->buffer->tail
    |
    - (r)->space
    + r->buffer->space
    |
    - (r)->size
    + r->buffer->size
    |
    - (r)->effective_size
    + r->buffer->effective_size
    |
    - (r)->last_retired_head
    + r->buffer->last_retired_head
    )

    @@
    expression E;
    @@
    (
    - LP_RING(E)->obj
    + LP_RING(E)->buffer->obj
    |
    - LP_RING(E)->virtual_start
    + LP_RING(E)->buffer->virtual_start
    |
    - LP_RING(E)->head
    + LP_RING(E)->buffer->head
    |
    - LP_RING(E)->tail
    + LP_RING(E)->buffer->tail
    |
    - LP_RING(E)->space
    + LP_RING(E)->buffer->space
    |
    - LP_RING(E)->size
    + LP_RING(E)->buffer->size
    |
    - LP_RING(E)->effective_size
    + LP_RING(E)->buffer->effective_size
    |
    - LP_RING(E)->last_retired_head
    + LP_RING(E)->buffer->last_retired_head
    )

Note: On top of this this patch also removes the now unused ringbuffer
fields in intel_engine_cs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: Add note about fixup patch included here.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:27:25 +02:00
Oscar Mateo
8ee149756e drm/i915: Split the ringbuffers from the rings (1/3)
As advanced by the previous patch, the ringbuffers and the engine
command streamers belong in different structs. This is so because,
while they used to be tightly coupled together, the new Logical
Ring Contexts (LRC for short) have a ringbuffer each.

In legacy code, we will use the buffer* pointer inside each ring
to get to the pertaining ringbuffer (the actual switch will be
done in the next patch). In the new Execlists code, this pointer
will be NULL and we will use instead the one inside the context
instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:02:16 +02:00
Oscar Mateo
a4872ba6d0 drm/i915: s/intel_ring_buffer/intel_engine_cs
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:01:05 +02:00
Brad Volkin
44e895a8a2 drm/i915: Use hash tables for the command parser
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.

The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.

NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.

v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-12 19:15:51 +02:00
Ben Widawsky
9bcb144c83 drm/i915: Support 64b execbuf
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:01:58 +02:00
Ben Widawsky
024a43e12c drm/i915: Move ring_begin to signal()
Add_request has always contained both the semaphore mailbox updates as
well as the breadcrumb writes. Since the semaphore signal is the one
which actually knows about the number of dwords it needs to emit to the
ring, we move the ring_begin to that function. This allows us to remove
the hideously shared #define

On a related not, gen8 will use a different number of dwords for
semaphores, but not for add request.

v2: Make number of dwords an explicit part of signalling (via function
argument). (Chris)

v3: very slight comment change

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
78325f2d27 drm/i915: Virtualize the ringbuffer signal func
This abstraction again is in preparation for gen8. Gen8 will bring new
semantics for doing this operation.

While here, make the writes of MI_NOOPs explicit for non-existent rings.
This should have been implicit before.

NOTE: This is going to be removed in a few patches.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
ebc348b2ad drm/i915: Move semaphore specific ring members to struct
This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

v2: Move mbox stuff to a separate struct

v3: Rebased over VCS2 work

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:52 +02:00
Zhao Yakui
845f74a701 drm/i915:Initialize the second BSD ring on BDW GT3 machine
Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.

V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:46 +02:00
Zhao Yakui
b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Chris Wilson
e3efda49e7 drm/i915: Preserve ring buffers objects across resume
Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.

After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.

v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!

Reported-by: Jae-hyeon Park <jhyeon@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:37 +02:00
Chris Wilson
9991ae787a drm/i915: Move all ring resets before setting the HWS page
In commit a51435a313
Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Date:   Wed Mar 12 16:39:40 2014 +0530

    drm/i915: disable rings before HW status page setup

we reordered stopping the rings to do so before we set the HWS register.
However, there is an extra workaround for g45 to reset the rings twice,
and for consistency we should apply that workaround before setting the
HWS to be sure that the rings are truly stopped.

Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 17:16:45 +02:00
Ben Widawsky
057f6a8ad7 drm/i915: Invariably invalidate before ctx switch
We have been setting the bit which was originally BIOS dependent since:
commit f05bb0c7b6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jan 20 16:33:32 2013 +0000

    drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Therefore, we do not need to try to figure it out dynamically and we can
just always invalidate the TLBs.

It's a partial revert of:
commit 12b0286f49
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Mon Jun 4 14:42:50 2012 -0700

    drm/i915: possibly invalidate TLB before context switch

The original commit attempted to only invalidate when necessary
(very much a relic from the old days). Now, we can just always invalidate.

I guess the old TODO still exists. Since we seem to have abandoned ILK
contexts however, there isn't much point in even remembering.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 11:41:39 +02:00
Chris Wilson
508774452d drm/i915: Broadwell expands ACTHD to 64bit
As Broadwell has an increased virtual address size, it requires more
than 32 bits to store offsets into its address space. This includes the
debug registers to track the current HEAD of the individual rings, which
may be anywhere within the per-process address spaces. In order to find
the full location, we need to read the high bits from a second register.
We then also need to expand our storage to keep track of the larger
address.

v2: Carefully read the two registers to catch wraparound between
    the reads.
v3: Use a WARN_ON rather than loop indefinitely on an unstable
    register read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop spurious hunk which conflicted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-28 18:33:14 +01:00
Naresh Kumar Kachhi
e9fea5747d drm/i915: wait for rings to become idle once disabled
make sure we wait for rings to become idle once they are
disabled. In case of timeout print an error message

Signed-off-by: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
[danvet: Frob patch as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-12 16:04:19 +01:00
Daniel Vetter
e8e6e6012d Linux 3.14-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTHSaRAAoJEHm+PkMAQRiG7G8IAJHElwFDNSQE7Y9MmbicrAMG
 kfjhBtBpTaVrJKQXegCNUwDaLLyC4oLIxDheW84oPXbrEGDLqPtBov/hrcFkHVr4
 lh/ZYk02nYtcfpN0JnL/Yj2oKHVmBWs0vFlM7StSFsJCj10DoCVQQdmAJ8XODTPo
 CXMapk+UikTX1TlIO8+B5toyl3R1OqPmW211UV1vQVLKy66hu+MKVN/V+/EyopL0
 1jO81EDpaRaeIJh1/okcyUoIq9pqLkAWNpeQ7uyXZ+Sfivt9RXwLYKmAB3lP20Hc
 ZMIIoHSCyYRFjxLlQvt02bA9nY4wTY7YN5kZ2kk65y7TFfhcGsCw1Sc69iyCoKs=
 =CJcA
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc6' into drm-intel-next-queued

Linux 3.14-rc6

I need the hdmi/dvi-dual link fixes in 3.14 to avoid ugly conflicts
when merging Ville's new hdmi cloning support into my -next tree

Conflicts:
	drivers/gpu/drm/i915/Makefile
	drivers/gpu/drm/i915/intel_dp.c

Makefile cleanup conflicts with an acpi build fix, intel_dp.c is
trivial.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-10 21:43:46 +01:00
Brad Volkin
351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Ville Syrjälä
753b1ad4a2 drm/i915: Add intel_ring_cachline_align()
intel_ring_cachline_align() emits MI_NOOPs until the ring tail is
aligned to a cacheline boundary.

Cc: Bjoern C <lkml@call-home.ch>
Cc: Alexandru DAMIAN <alexandru.damian@intel.com>
Cc: Enrico Tagliavini <enrico.tagliavini@gmail.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org (prereq for the next patch)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-11 23:00:19 +01:00
Mika Kuoppala
b6b0fac04d drm/i915: Use hangcheck score to find guilty context
With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.

Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:

If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.

v2: Rebase

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 11:57:24 +01:00
Chris Wilson
092467327c drm/i915: Write RING_TAIL once per-request
Ignoring the legacy DRI1 code, and a couple of special cases (to be
discussed later), all access to the ring is mediated through requests.
The first write to a ring will grab a seqno and mark the ring as having
an outstanding_lazy_request. Either through explicitly adding a request
after an execbuffer or through an implicit wait (either by the CPU or by
a semaphore), that sequence of writes will be terminated with a request.
So we can ellide all the intervening writes to the tail register and
send the entire command stream to the GPU at once. This will reduce the
number of *serialising* writes to the tail register by a factor or 3-5
times (depending upon architecture and number of workarounds, context
switches, etc involved). This becomes even more noticeable when the
register write is overloaded with a number of debugging tools. The
astute reader will wonder if it is then possible to overflow the ring
with a single command. It is not. When we start a command sequence to
the ring, we check for available space and issue a wait in case we have
not. The ring wait will in this case be forced to flush the outstanding
register write and then poll the ACTHD for sufficient space to continue.

The exception to the rule where everything is inside a request are a few
initialisation cases where we may want to write GPU commands via the CS
before userspace wakes up and page flips.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-10 15:35:58 +02:00
Mika Kuoppala
da66146425 drm/i915: include hangcheck action and score in error_state
Score and action reveals what all the rings were doing
and why hang was declared. Add idle state so that
we can distinguish between waiting and idle ring.

v2: - add idle as a hangcheck action
    - consensed hangcheck status to single line (Chris)
    - mark active explicitly when we are making progress (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:56:17 +02:00
Chris Wilson
3c0e234c84 drm/i915; Preallocate the lazy request
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:53 +02:00
Chris Wilson
1823521d2b drm/i915: Rename ring->outstanding_lazy_request
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:12 +02:00
Chris Wilson
0d1aacac36 drm/i915: Embed the ring->private within the struct intel_ring_buffer
We now have more devices using ring->private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring->private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:55 +02:00
Damien Lespiau
e3ce7633ba drm/i915: Remove I915_READ_{NOPID, SYNC_0, SYNC_1})()
The code directly uses the registers and ring->mmio_base.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:24 +02:00
Jani Nikula
f2f4d82faf drm/i915: give more distinctive names to ring hangcheck action enums
The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:37 +02:00
Daniel Vetter
c7113cc35f drm/i915: unify ring irq refcounts (again)
With the simplified locking there's no reason any more to keep the
refcounts seperate.

v2: Readd the lost comment that ring->irq_refcount is protected by
dev_priv->irq_lock.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:49 +02:00
Daniel Vetter
59cdb63d52 drm/i915: kill dev_priv->rps.lock
Now that the rps interrupt locking isn't clearly separated (at elast
conceptually) from all the other interrupt locking having a different
lock stopped making sense: It protects much more than just the rps
workqueue it started out with. But with the addition of VECS the
separation started to blurr and resulted in some more complex locking
for the ring interrupt refcount.

With this we can (again) unifiy the ringbuffer irq refcounts without
causing a massive confusion, but that's for the next patch.

v2: Explain better why the rps.lock once made sense and why no longer,
requested by Ben.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:43 +02:00
Mika Kuoppala
ad8beaeada drm/i915: store ring hangcheck action
For guilty batchbuffer analysis later on when rings are reset,
store what state the ring was on when hang was declared.
This helps to weed out the waiting rings from the active ones.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:17 +02:00
Chris Wilson
6274f2126a drm/i915: Don't count semaphore waits towards a stuck ring
If we detect a ring is in a valid wait for another, just let it be.
Eventually it will either begin to progress again, or the entire system
will come grinding to a halt and then hangcheck will fire as soon as the
deadlock is detected.

This error was foretold by Ben in
commit 05407ff889
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu May 30 09:04:29 2013 +0300

    drm/i915: detect hang using per ring hangcheck_score

"If ring B is waiting on ring A via semaphore, and ring A is making
progress, albeit slowly - the hangcheck will fire. The check will
determine that A is moving, however ring B will appear hung because
the ACTHD doesn't move. I honestly can't say if that's actually a
realistic problem to hit it probably implies the timeout value is too
low."

v2: Make sure we don't even incur the KICK cost whilst waiting.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-11 11:49:28 +02:00
Chris Wilson
c65355bbef drm/i915: Track when we dirty the scanout with render commands
This is required for tracking render damage for use with FBC and will be
used in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-07 17:56:45 +02:00
Mika Kuoppala
05407ff889 drm/i915: detect hang using per ring hangcheck_score
Keep track of ring seqno progress and if there are no
progress detected, declare hang. Use actual head (acthd)
to distinguish between ring stuck and batchbuffer looping
situation. Stuck ring will be kicked to trigger progress.

This commit adds a hard limit for batchbuffer completion time.
If batchbuffer completion time is more than 4.5 seconds,
the gpu will be declared hung.

Review comment from Ben which nicely clarifies the semantic change:

"Maybe I'm just stating the functional changes of the patch, but in case
they were unintended here is what I see as potential issues:

1. "If ring B is waiting on ring A via semaphore, and ring A is making
   progress, albeit slowly - the hangcheck will fire. The check will
   determine that A is moving, however ring B will appear hung because
   the ACTHD doesn't move. I honestly can't say if that's actually a
   realistic problem to hit it probably implies the timeout value is too
   low.

2. "There's also another corner case on the kick. If the seqno = 2
   (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
   we try to kick it... we don't actually try to find out if the kick
   helped"

v2: use atchd to detect stuck ring from loop (Ben Widawsky)

v3: Use acthd to check when ring needs kicking.
Declare hang on third time in order to give time for
kick_ring to take effect.

v4: Update commit msg

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Paste in Ben's review comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:58:21 +02:00