linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-25 20:14:25 +08:00

Author	SHA1	Message	Date
Chris Wilson	57f275a22b	drm/i915: Move common seqno reset to intel_engine_cs.c Since the intel_engine_init_seqno() is shared by all engine submission backends, move it out of the legacy intel_ringbuffer.c and into the new home for common routines, intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-21-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:08 +01:00
Chris Wilson	adc320c4b7	drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Since the scratch allocation and cleanup is shared by all engine submission backends, move it out of the legacy intel_ringbuffer.c and into the new home for common routines, intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-20-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:08 +01:00
Chris Wilson	56c0f1a7c1	drm/i915: Use VMA for scratch page tracking Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-19-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:07 +01:00
Chris Wilson	57e8853181	drm/i915: Use VMA for ringbuffer tracking Use the GGTT VMA as the primary cookie for handing ring objects as the most common action upon the ring is mapping and unmapping which act upon the VMA itself. By restructuring the code to work with the ring VMA, we can shrink the code and remove a few cycles from context pinning. v2: Move the flush of the object back to before the first pin. We use the am-I-bound? query to only have to check the flush on the first bind and so avoid stalling on active rings. Lots of little renames and small hoops. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-18-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:05 +01:00
Chris Wilson	e5cdb22b27	drm/i915: Move assertion for iomap access to i915_vma_pin_iomap Access through the GTT requires the device to be awake. Ideally i915_vma_pin_iomap() is short-lived and the pinning demarcates the access through the iomap. This is not entirely true, we have a mixture of long lived pins that exceed the wakelock (such as legacy ringbuffers) and short lived pin that do live within the wakelock (such as execlist ringbuffers). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-17-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:04 +01:00
Chris Wilson	7abc98fadf	drm/i915: Only change the context object's domain when binding We know that the only access to the context object is via the GPU, and the only time when it can be out of the GPU domain is when it is swapped out and unbound. Therefore we only need to clflush the object when binding, thus avoiding any potential stall on touching the domain on an active context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-16-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:03 +01:00
Chris Wilson	bf3783e52a	drm/i915: Use VMA as the primary object for context state When working with contexts, we most frequently want the GGTT VMA for the context state, first and foremost. Since the object is available via the VMA, we need only then store the VMA. v2: Formatting tweaks to debugfs output, restored some comments removed in the next patch Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:02 +01:00
Chris Wilson	d31d7cb146	drm/i915: Support for creating write combined type vmaps vmaps has a provision for controlling the page protection bits, with which we can use to control the mapping type, e.g. WB, WC, UC or even WT. To allow the caller to choose their mapping type, we add a parameter to i915_gem_object_pin_map - but we still only allow one vmap to be cached per object. If the object is currently not pinned, then we recreate the previous vmap with the new access type, but if it was pinned we report an error. This effectively limits the access via i915_gem_object_pin_map to a single mapping type for the lifetime of the object. Not usually a problem, but something to be aware of when setting up the object's vmap. We will want to vary the access type to enable WC mappings of ringbuffer and context objects on !llc platforms, as well as other objects where we need coherent access to the GPU's pages without going through the GTT v2: Remove the redundant braces around pin count check and fix the marker in documentation (Chris) v3: - Add a new enum for the vmalloc mapping type & pass that as an argument to i915_object_pin_map. (Tvrtko) - Use PAGE_MASK to extract or filter the mapping type info and remove a superfluous BUG_ON.(Tvrtko) v4: - Rename the enums and clean up the pin_map function. (Chris) v5: Drop the VM_NO_GUARD, minor cosmetics. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk	2016-08-12 13:06:36 +01:00
Tvrtko Ursulin	c1bb11451e	drm/i915: Store number of active engines in device info Until now code was calling hweight32 to figure out the number from device_info->ring_mask at runtime. Instead we can cache it at engine init time and use directly. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-08-11 11:33:10 +01:00
Chris Wilson	737aac2465	drm/i915: Mark unmappable GGTT entries as PIN_HIGH We allocate a few objects into the GGTT that we never need to access via the mappable aperture (such as contexts, status pages). We can request that these are bound high in the VM to increase the amount of mappable aperture available. However, anything that may be frequently pinned (such as logical contexts) we want to use the fast search & insert. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-1-git-send-email-chris@chris-wilson.co.uk	2016-08-10 16:07:51 +01:00
Chris Wilson	dbd6ef29a7	drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh The bottom-half we use for processing the breadcrumb interrupt is a task, which is an RCU protected struct. When accessing this struct, we need to be holding the RCU read lock to prevent it disappearing beneath us. We can use the RCU annotation to mark our irq_seqno_bh pointer as being under RCU guard and then use the RCU accessors to both provide correct ordering of access through the pointer. Most notably, this fixes the access from hard irq context to use the RCU read lock, which both Daniel and Tvrtko complained about. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk	2016-08-10 10:37:49 +01:00
Matthew Auld	575e3ccbce	drm/i915: fix WaInsertDummyPushConstPs As pointed out by Chris Harris, we are using the wrong WA name, it should in fact be WaToEnableHwFixForPushConstHWBug, also it should be applied from C0 onwards for both BXT and KBL. Fixes: `7b9005cd45` ("drm/i915: Add WaInsertDummyPushConstP for bxt and kbl") Cc: Chris Harris <chris.harris@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reported-by: Chris Harris <chris.harris@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470127013-29653-1-git-send-email-matthew.auld@intel.com	2016-08-05 13:16:55 +03:00
Chris Wilson	dcff85c844	drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:37 +01:00
Chris Wilson	90f4fcd56b	drm/i915: Remove forced stop ring on suspend/unload Before suspending (or unloading), we would first wait upon all rendering to be completed and then disable the rings. This later step is a remanent from DRI1 days when we did not use request tracking for all operations upon the ring. Now that we are sure we are waiting upon the very last operation by the engine, we can forgo clobbering the ring registers, though we do keep the assert that the engine is indeed idle before sleeping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:36 +01:00
Chris Wilson	de895082f7	drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confusion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. v2: Add a redundant GEM_BUG_ON(!view) to i915_gem_obj_lookup_or_create_ggtt_vma() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:00 +01:00
Chris Wilson	776f32364d	drm/i915: s/__i915_wait_request/i915_wait_request/ There is only one wait on request function now, so drop the "expert" indication of leading __. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:29 +01:00
Chris Wilson	37db14700e	drm/i915: Disable waitboosting for a saturated engine If the user floods the GPU with so many requests that the engine stalls waiting for free space, don't automatically promote the GPU to maximum frequencies. If the GPU really is saturated with work, it will migrate to high clocks by itself, otherwise it is merely a user flooding us with busy-work. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-20-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:28 +01:00
Chris Wilson	7da844c5c6	drm/i915: Move the special case wait-request handling to its one caller Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-19-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:27 +01:00
Chris Wilson	675d9ad71b	drm/i915: Track requests inside each intel_ring By tracking each request occupying space inside an individual intel_ring, we can greatly simplify the logic of tracking available space and not worry about other timelines. (Each ring is an ordered timeline of committed requests.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-17-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:26 +01:00
Chris Wilson	efdf7c0605	drm/i915: Rename request->list to link for consistency We use "list" to denote the list and "link" to denote an element on that list. Rename request->list to match this idiom. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:23 +01:00
Chris Wilson	96a945aa42	drm/i915: Move the common engine cleanup to intel_engine_cs.c Now that we initialize the state to both legacy and execlists inside intel_engine_cs, we should also clean up that state from the common functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470226756-24401-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-08-03 14:01:12 +01:00
Chris Wilson	ad7bdb2b99	drm/i915: Rename engine->semaphore.sync_to, engine->sempahore.signal locals In order to be more consistent with the rest of the request construction and ring emission, use the common names for the ring and request. Rather than using signaler_req, waiter_req, and intel_ring *wait, we use plain req and ring. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-32-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-23-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:33 +01:00
Chris Wilson	ddf07be7a2	drm/i915: Simplify calling engine->sync_to Since requests can no longer be generated as a side-effect of intel_ring_begin(), we know that the seqno will be unchanged during ring-emission. This predicatablity then means we do not have to check for the seqno wrapping around whilst emitting the semaphore for engine->sync_to(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:32 +01:00
Chris Wilson	618e4ca7b1	drm/i915/ringbuffer: Specialise SNB+ request emission for semaphores As gen6_emit_request() only differs from i9xx_emit_request() when semaphores are enabled, only use the specialised vfunc in that scenario. v2: Reorder semaphore init so as to keep engine->emit_request default vfunc selection compact. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-27-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-18-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:29 +01:00
Chris Wilson	b0411e7d45	drm/i915: Reuse legacy breadcrumbs + tail emission As GEN6+ is now a simple variant on the basic breadcrumbs + tail write, reuse the common code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-26-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-17-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:29 +01:00
Chris Wilson	9242f974dc	drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal() Rather than pass in the num_dwords that the caller wishes to use after the signal command packet, split the breadcrumb emission into two phases and have both the signal and breadcrumb individiually acquire space on the ring. This makes the interface simpler for the reader, and will simplify for patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-25-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-16-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:28 +01:00
Chris Wilson	ddd66c5154	drm/i915: Unify request submission Move request submission from emit_request into its own common vfunc from i915_add_request(). v2: Convert I915_DISPATCH_flags to BIT(x) whilst passing v3: Rename a few functions to match. v4: Reenable execlists submission after disabling guc. v5: Be aware that everyone calls i915_guc_submission_disable()! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-23-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-14-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:26 +01:00
Chris Wilson	8f9420184a	drm/i915: Move the modulus for ring emission to the register write Space reservation is already safe with respect to the ring->size modulus, but hardware only expects to see values in the range 0...ring->size-1 (inclusive) and so requires the modulus to prevent us writing the value ring->size instead of 0. As this is only required for the register itself, we can defer the modulus to the register update and not perform it after every command packet. We keep the intel_ring_advance() around in the code to provide demarcation for the end-of-packet (which then can be compared against intel_ring_begin() as the number of dwords emitted must match the reserved space). v2: Assert that the ring size is a power-of-two to match assumptions in the code. Simplify the comment before writing the tail value to explain why the modulus is necessary. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-13-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:25 +01:00
Chris Wilson	c5efa1ad09	drm/i915: Convert engine->write_tail to operate on a request If we rewrite the I915_WRITE_TAIL specialisation for the legacy ringbuffer as submitting the request onto the ringbuffer, we can unify the vfunc with both execlists and GuC in the next patch. v2: Drop the modulus from the I915_WRITE_TAIL as it is currently being applied in intel_ring_advance() after every command packet, and add a comment explaining why we need the modulus at all. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-22-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-12-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:24 +01:00
Chris Wilson	803688babd	drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer - we need only one vfunc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-20-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-10-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:21 +01:00
Chris Wilson	7c9cf4e33a	drm/i915: Reduce engine->emit_flush() to a single mode parameter Rather than passing a complete set of GPU cache domains for either invalidation or for flushing, or even both, just pass a single parameter to the engine->emit_flush to determine the required operations. engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE) engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH) engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH \| EMIT_INVALIDATE) This allows us to extend the behaviour easily in future, for example if we want just a command barrier without the overhead of flushing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	c7fe7d25ed	drm/i915: Remove obsolete engine->gpu_caches_dirty Space for flushing the GPU cache prior to completing the request is preallocated and so cannot fail - the GPU caches will always be flushed along with the completed request. This means we no longer have to track whether the GPU cache is dirty between batches like we had to with the outstanding_lazy_seqno. With the removal of the duplication in the per-backend entry points for emitting the obsolete lazy flush, we can then further unify the engine->emit_flush. v2: Expand a bit on the legacy of gpu_caches_dirty Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	aad29fbbb8	drm/i915: Rename intel_pin_and_map_ring() For more consistent oop-naming, we would use intel_ring_verb, so pick intel_ring_pin() and intel_ring_unpin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-17-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-6-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:18 +01:00
Chris Wilson	32c04f16f0	drm/i915: Rename residual ringbuf parameters Now that we have a clear ring/engine split and a struct intel_ring, we no longer need the stopgap ringbuf names. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-16-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-5-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:17 +01:00
Chris Wilson	7e37f889b5	drm/i915: Rename struct intel_ringbuffer to struct intel_ring The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:16 +01:00
Chris Wilson	1dae2dfb0b	drm/i915: Rename request->ringbuf to request->ring Now that we have disambuigated ring and engine, we can use the clearer and more consistent name for the intel_ringbuffer pointer in the request. @@ struct drm_i915_gem_request *r; @@ - r->ringbuf + r->ring Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-12-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-2-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:15 +01:00
Chris Wilson	b5321f309b	drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Both perform the same actions with more or less indirection, so just unify the code. v2: Add back a few intel_engine_cs locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-11-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-1-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:13 +01:00
Chris Wilson	33a051a5fc	drm/i915/cmdparser: Remove stray intel_engine_cs *ring When we refer to intel_engine_cs, we want to use engine so as not to confuse ourselves about ringbuffers. v2: Rename all the functions as well, as well as a few more stray comments. v3: Split the really long error message strings Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-6-git-send-email-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-1-git-send-email-chris@chris-wilson.co.uk	2016-07-27 16:23:05 +01:00
Dave Gordon	38a0f2db5b	drm/i915: rename 'ring' where it refers to an engine or engine_id 'ring' is an old deprecated term for a GPU engine. Chris Wilson wants to use the name for what is currently known as an intel_ringbuffer, but it will be dreadfully confusing if some rings are ringbuffers but other rings are still engines. So this patch changes the names of a bunch of parameters called 'ring' to either 'engine' or 'engine_id' according to what they actually are. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-3-git-send-email-david.s.gordon@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-21 09:59:41 +01:00
Mika Kuoppala	4ba9c1f7c7	drm/i915/gen9: Add WaInPlaceDecompressionHang Add this workaround to prevent hang when in place compression is used. References: HSD#2135774 Cc: stable@vger.kernel.org Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-07-20 16:10:20 +03:00
Chris Wilson	39df91905d	drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Rather than recomputing whether semaphores are enabled, we can do that computation once during early initialisation as the i915.semaphores module parameter is now read-only. s/i915_semaphores_is_enabled/i915.semaphores/ v2: Add the state to the debug dmesg as well Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:43 +01:00
Chris Wilson	f2f0ed718b	drm/i915: Rename ring->virtual_start as ring->vaddr Just a different colour to better match virtual addresses elsewhere. s/ring->virtual_start/ring->vaddr/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-9-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-8-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:39 +01:00
Chris Wilson	406ea8d22f	drm/i915: Treat ringbuffer writes as write to normal memory Ringbuffers are now being written to either through LLC or WC paths, so treating them as simply iomem is no longer adequate. However, for the older !llc hardware, the hardware is documentated as treating the TAIL register update as serialising, so we can relax the barriers when filling the rings (but even if it were not, it is still an uncached register write and so serialising anyway.). For simplicity, let's ignore the iomem annotation. v2: Remove iomem from ringbuffer->virtual_address v3: And for good measure add iomem elsewhere to keep sparse happy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2 Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-8-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-7-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:14 +01:00
Chris Wilson	f8c417cdb1	drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Ultimately wraps kref_put(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_unreference/i915_gem_object_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:12 +01:00
Chris Wilson	9a6feaf0d7	drm/i915: Rename i915_gem_context_reference/unreference() As these are wrappers around kref_get/kref_put() it is preferable to follow the naming convention and use the same verb get/put in our wrapper names for manipulating a reference to the context. s/i915_gem_context_reference/i915_gem_context_get/ s/i915_gem_context_unreference/i915_gem_context_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-3-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-2-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:10 +01:00
Chris Wilson	04769652c8	drm/i915: Derive GEM requests from dma-fence dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk	2016-07-20 09:29:53 +01:00
Tvrtko Ursulin	019bf27763	drm/i915: Pull out some more common engine init code Created two common helpers for engine setup and engine init phases respectively to help with code sharing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468422221-12132-1-git-send-email-tvrtko.ursulin@linux.intel.com Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:24 +01:00
Tvrtko Ursulin	acd2784562	drm/i915: Simplify intel_init_ring_buffer prototype Engine contains dev_priv so need to pass it in. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:17 +01:00
Tvrtko Ursulin	c78d606134	drm/i915: Make more use of the shared engine irq setup Use more of the shared engine setup data for legacy engine initialization. This time to simplify the irq initialization code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:12 +01:00
Tvrtko Ursulin	8b3e2d3639	drm/i915: Unify engine init loop With the unified common engine setup done, and the execlist engine initialization loop clearly split into two phases, we can eliminate the separate legacy engine initialization code. v2: Fix cleanup path for legacy. v3: Rename constructors. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:06 +01:00
Dave Gordon	c2c7f24008	drm/i915: unify first-stage engine struct setup intel_lrc.c has a table of "logical rings" (meaning engines), while intel_ringbuffer.c has separately open-coded initialisation for each engine. We can deduplicate this somewhat by using the same first-stage engine-setup function for both modes. So here we expose the function that transfers information from the static table of (all) known engines to the dev_priv->engine array of engines available on this device (adjusting the names along the way) and then embed calls to it in both the LRC and the legacy-mode setup. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-14 11:16:18 +01:00
Ville Syrjälä	035ea405c9	drm/i915: Unbreak interrupts on pre-gen6 Prior to gen6 we didn't have per-ring IMR registers, which means that since commit `61ff75ac20` ("drm/i915: Simplify enabling user-interrupts with L3-remapping") we're now masking off all interrupts when init_render_ring() gets called. That's rather rude. Let's limit the ring IMR frobbing to machines that actually have the per-ring IMR registers. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `61ff75ac20` ("drm/i915: Simplify enabling user-interrupts with L3-remapping") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468340687-3596-1-git-send-email-ville.syrjala@linux.intel.com Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-13 17:04:00 +03:00
Chris Wilson	91c8a326a1	drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm Since drm_i915_private is now a subclass of drm_device we do not need to chase the drm_i915_private->dev backpointer and can instead simply access drm_i915_private->drm directly. text data bss dec hex filename `1068757` 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko 1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ struct drm_i915_private *d; identifier i; @@ ( - d->dev->i + d->drm.i \| - d->dev + &d->drm ) and for good measure the dev_priv->dev backpointer was removed entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk	2016-07-05 11:58:45 +01:00
Chris Wilson	fac5e23e3c	drm/i915: Mass convert dev->dev_private to to_i915(dev) Since we now subclass struct drm_device, we can save pointer dances by noting the equivalence of struct drm_device and struct drm_i915_private, i.e. by using to_i915(). text data bss dec hex filename 1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko 1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ expression E; identifier p; @@ - struct drm_i915_private p = E->dev_private; + struct drm_i915_private p = to_i915(E); Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk	2016-07-04 12:54:07 +01:00
Chris Wilson	7b4d3a16dd	drm/i915: Remove stop-rings debugfs interface Now that we have (near) universal GPU recovery code, we can inject a real hang from userspace and not need any fakery. Not only does this mean that the testing is far more realistic, but we can simplify the kernel in the process. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:23 +01:00
Chris Wilson	61ff75ac20	drm/i915: Simplify enabling user-interrupts with L3-remapping Borrow the idea from intel_lrc.c to precompute the mask of interrupts we wish to always enable to avoid having lots of conditionals inside the interrupt enabling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:17 +01:00
Chris Wilson	31bb59cc01	drm/i915: Move the get/put irq locking into the caller With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:16 +01:00
Chris Wilson	f8973c217f	drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) On Ironlake, there is no command nor register to ensure that the write from a MI_STORE command is completed (and coherent on the CPU) before the command parser continues. This means that the ordering between the seqno write and the subsequent user interrupt is undefined (like gen6+). So to ensure that the seqno write is completed after the final user interrupt we need to delay the read sufficiently to allow the write to complete. This delay is undefined by the bspec, and empirically requires 75us even though a register read combined with a clflush is less than 500ns. Hence, the delay is due to an on-chip buffer rather than the latency of the write to memory. Note that the render ring controls this by filling the PIPE_CONTROL fifo with stalling commands that force the earliest pipe-control with the seqno to be completed before the command parser continues. Given that we need a barrier operation for BSD, we may as well forgo the extra per-batch latency by using a common per-interrupt barrier. Studying the impact of adding the usleep shows that in both sequences of and individual synchronous no-op batches is negligible for the media engine (where the write now is unordered with the interrupt). Converting the render engine over from the current glutton of pie-controls over to the per-interrupt delays speeds up both the sequential and individual synchronous no-ops by 20% and 60%, respectively. This speed up holds even when looking at the throughput of small copies (4KiB->4MiB), both serial and synchronous, by about 20%. This is because despite adding a significant delay to the interrupt, in all likelihood we will see the seqno write without having to apply the barrier (only in the rare corner cases where the write is delayed on the last required is the delay necessary). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307 Testcase: igt/gem_sync #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:53 +01:00
Chris Wilson	7d5ea80720	drm/i915: Refactor scratch object allocation for gen2 w/a buffer The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch buffer. If we pass in the size we want to allocate for the scratch buffer, both callers can use the same routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:50 +01:00
Chris Wilson	de8fe1663a	drm/i915: Allocate scratch page from stolen With the last direct CPU access to the scratch page removed, we can now allocate it from our small amount of reserved system pages (stolen memory). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-10-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:59:45 +01:00
Chris Wilson	f8291952bd	drm/i915: Stop mapping the scratch page into CPU space After the elimination of using the scratch page for Ironlake's breadcrumb, we no longer need to kmap the object. We therefore can move it into the high unmappable space and do not need to force the object to be coherent (i.e. snooped on !llc platforms). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	1b7744e7ba	drm/i915: Use HWS for seqno tracking everywhere By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	688e6c7258	drm/i915: Slaughter the thundering i915_wait_request herd One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:43 +01:00
Chris Wilson	ed00307893	drm/i915/ringbuffer: Move all default irq vfuncs init to a separate func Just plonk all the default irq vfuncs together in one function to keep the initialisers of reasonable size. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-01 09:48:24 +01:00
Chris Wilson	6f7bef75d1	drm/i915/ringbuffer: Move all generic engine->dispatch_batchbuffer together Consolidate the block of default vfuncs for dispatching the batchbuffer. Just a minor tweak on top of Tvrtko's great job of tidying up the vfunc initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-01 09:48:00 +01:00
Tvrtko Ursulin	8d228911ff	drm/i915: Trim some if-else braces Just a bit of cleanup after the previous refactoring. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467212972-861-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	4b8e38a9ef	drm/i915: Consolidate legacy semaphore initialization Replace per-engine initialization with a common half-programatic, half-data driven code for ease of maintenance and compactness. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	c38c651b39	drm/i915: Compact gen8_ring_sync Store the semaphore offset in a temporary variable to avoid having to get the VMA offset twice. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	1b9e665064	drm/i915: Compact Gen8 semaphore initialization Replace the macro initializer with a programatic loop which results in smaller code and hopefully just as clear. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	db3d4019ab	drm/i915: Move semaphore object creation into intel_ring_init_semaphores The object needs to be created before semaphores can be initialized on any ring and it makes sense to pull it out to this semaphore dedicated helper. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	d9a64610b2	drm/i915: Consolidate semaphore vfuncs init Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	960ecaad75	drm/i915: Consolidate dispatch_execbuffer vfunc v2: Put dispatch_execbuffer before add_request. (Chris Wilson) v3: Fix add_request and irq_seqno_barrier for gen8+. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	1d8a1337f3	drm/i915: Consolidate init_hw vfunc Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	604096d778	drm/i915: Consolidate get/set_seqno Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	b9700325cb	drm/i915: Consolidate get and put irq vfuncs v2: Consistent INTEL_GEN vs IS_GEN usage. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	cc54a82830	drm/i915: Consolidate seqno_barrier vfunc Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Tvrtko Ursulin	7445a2a413	drm/i915: Consolidate add_request vfunc All engines apart from render select this based on Gen. Move it to the common helper as well. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Tvrtko Ursulin	06a2fe2279	drm/i915: Consolidate write_tail vfunc initializer Introduce a function which initializes vfuncs mostly common across engines and move write_tail initialization in it since only one engine overrides the default. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Chris Wilson	76f8421f2a	drm/i915: Perform Sandybridge BSD tail write under the forcewake Since we have a sequence of register reads and writes, we can reduce the latency of starting the BSD ring by performing all the mmio operations under the same forcewake wakeref. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-62-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:34 +01:00
Chris Wilson	d54fe4aad7	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-48-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:27 +01:00
Chris Wilson	3d808eb171	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-47-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:26 +01:00
Chris Wilson	25ab57f42f	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-46-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:26 +01:00
Chris Wilson	c7c3c07d16	drm/i915: Treat kernel context as initialised The kernel context exists simply as a placeholder and should never be executed with a render context. It does not need the golden render state, as that will always be applied to a user context. By skipping the initialisation we can avoid issues in attempting to program the golden render context when trying to make the hardware idle. v2: Rebase Testcase: igt/drm_module_reload_basic #byt Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95634 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-3-git-send-email-chris@chris-wilson.co.uk	2016-06-24 15:02:44 +01:00
Chris Wilson	0cb26a8ed1	drm/i915: Move legacy kernel context pinning to intel_ringbuffer.c This is so that we have symmetry with intel_lrc.c and avoid a source of if (i915.enable_execlists) layering violation within i915_gem_context.c - that is we move the specific handling of the dev_priv->kernel_context for legacy submission into the legacy submission code. This depends upon the init/fini ordering between contexts and engines already defined by intel_lrc.c, and also exporting the context alignment required for pinning the legacy context. v2: Separate out pin/unpin context funcs for greater symmetry with intel_lrc. One more step towards unifying behaviour between the two classes of engines and towards fixing another bug in i915_switch_context vs requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk	2016-06-24 15:02:34 +01:00
arun.siluvery@linux.intel.com	780f0aebda	drm/i915/bxt: Add WaDisablePooledEuLoadBalancingFix This is a WA affecting pooled eu which is a bxt specific feature. Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-06-14 10:44:07 +01:00
Tim Gore	a8ab5ed5e1	drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate This patch enables a workaround for a mid thread preemption issue where a hardware timing problem can prevent the context restore from happening, leading to a hang. v2: move to gen9_init_workarounds (Arun) v3: move to start of gen9_init_workarounds (Arun) Signed-off-by: Tim Gore <tim.gore@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com	2016-06-13 14:48:02 +01:00
Mika Kuoppala	71dce58c8e	drm/i915/skl: Extend WaDisableChickenBitTSGBarrierAckForFFSliceCS There is ambiguity in the documentation between D0 and E0. Extend this workaround to E0. References: BSID#779 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-23-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:26:05 +03:00
Mika Kuoppala	954337aa96	drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing This is needed for all kbl revision. v2: Don't add revid checks to generic gen9 init (Arun) References: HSD#2135593 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-21-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:25:49 +03:00
Mika Kuoppala	4de5d7ccbc	drm/i915/kbl: Add WaDisableGafsUnitClkGating We need to disable clock gating in this unit to work around hardware issue causing possible corruption/hang. v2: name the bit (Ville) v3: leave the fix enabled for 2227050 and set correct bit (Matthew) v4: Split out the skl part in separate commit for easier backport References: HSD#2227156, HSD#2227050 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-20-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:25:41 +03:00
Mika Kuoppala	ad2bdb44b1	drm/i915: Add WaInsertDummyPushConstP for bxt and kbl Add this workaround for both bxt and kbl up to until rev B0. References: HSD#2136703 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-16-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:54 +03:00
Mika Kuoppala	c0b730d572	drm/i915/kbl: Add WaDisableDynamicCreditSharing Bspec states that we need to turn off dynamic credit sharing on kbl revid a0 and b0. This happens by writing bit 28 on 0x4ab8. References: HSD#2225601, HSD#2226938, HSD#2225763 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-15-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:34 +03:00
Mika Kuoppala	fe90581987	drm/i915/kbl: Add WaDisableLSQCROPERFforOCL Extend the scope of this workaround, already used in skl, to also take effect in kbl. v2: Fix KBL_REVID_E0 (Matthew) References: HSD#2132677 Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-12-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:03 +03:00
Mika Kuoppala	8401d42fd5	drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0 Add this workaround for kbl revid A0 only. v2: rebase v3: carve out a non related workaround (Chris) References: HSD#1911714 Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-9-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:22:44 +03:00
Mika Kuoppala	e587f6cb0a	drm/i915/kbl: Add WaEnableGapsTsvCreditFix We need this crucial workaround from skl also to all kbl revisions. Lack of it was causing system hangs on skl enabling so this is a must have. v2: Don't add revid checks to gen9 init workarounds (Arun) References: HSD#2126660 Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-8-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:21:39 +03:00
Mika Kuoppala	bbaefe72a0	drm/i915: Mimic skl with WaForceEnableNonCoherent Past evidence with system hangs and hsds tie WaForceEnableNonCoherent and WaDisableHDCInvalidation to WaForceContextSaveRestoreNonCoherent. Documentation states that WaForceContextSaveRestoreNonCoherent would not be needed on skl past E0 but evidence proved otherwise. See commit <510650e8b2ab> ("drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs"). In this scope consider kbl to be skl with a bigger revision than E0 so play it safe and bind these two workarounds to the WaForceContextSaveRestoreNonCoherent, and apply to all gen9. v2: fix comment (Matthew) References: HSD#2134449, HSD#2131413 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-7-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:21:22 +03:00
Mika Kuoppala	5b0e365929	drm/i915/gen9: Always apply WaForceContextSaveRestoreNonCoherent The revision id range for this workaround has changed. So apply it to all revids on all gen9. References: HSD#2134449 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-6-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:20:33 +03:00
Mika Kuoppala	e5f81d65ac	drm/i915/kbl: Init gen9 workarounds Kabylake is part of gen9 family so init the generic gen9 workarounds for it. v2: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-3-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:19:53 +03:00
Mika Kuoppala	eee8efb02a	drm/i915/skl: Add WaDisableGafsUnitClkGating We need to disable clock gating in this unit to work around hardware issue causing possible corruption/hang. v2: name the bit (Ville) v3: leave the fix enabled for 2227050 and set correct bit (Matthew) References: HSD#2227156, HSD#2227050 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-2-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:19:32 +03:00
arun.siluvery@linux.intel.com	6bb6285582	drm/i915/gen9: Add WaVFEStateAfterPipeControlwithMediaStateClear Kernel only need to add a register to HW whitelist, required for a preemption related issue. Reference: HSD#2131039 Reviewed-by: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465203169-16591-1-git-send-email-arun.siluvery@linux.intel.com	2016-06-06 13:04:37 +01:00
Chris Wilson	e075a32f51	drm/i915: Stop automatically retiring requests after a GPU hang Following a GPU hang, we break out of the request loop in order to unlock the struct_mutex for use by the GPU reset. However, if we retire all the requests at that moment, we cannot identify the guilty request after performing the reset. v2: Not automatically retiring requests forces us to recheck for available ringspace. Fixes: `f4457ae71f` ("drm/i915: Prevent leaking of -EIO from i915_wait_request()") Testcase: igt/gem_reset_stats/ban-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Tested-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-4-git-send-email-chris@chris-wilson.co.uk	2016-05-13 12:39:20 +01:00
Tvrtko Ursulin	ac657f6461	drm/i915: Introduce IS_GEN macro To be used for more efficient Gen range checking. v2: Remove spurious chunk. (Chris Wilson) v3: Rebase. v4: Renamed from INTEL_GEN_RANGE and added GEN_FOREVER. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462874228-6601-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-05-11 12:27:28 +01:00
Tvrtko Ursulin	7e22dbbbae	drm/i915: Replace "INTEL_INFO->gen == x" checks with IS_GENx This way optimization from a previous patch works even better. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com>	2016-05-11 12:27:27 +01:00
Chris Wilson	c033666a94	drm/i915: Store a i915 backpointer from engine, and use it text data bss dec hex filename 6309351 3578714 696320 10584385 a18141 vmlinux 6308391 3578714 696320 10583425 a17d81 vmlinux Almost 1KiB of code reduction. v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions text data bss dec hex filename 6304579 3578778 696320 10579677 a16edd vmlinux 6303427 3578778 696320 10578525 a16a5d vmlinux Now over 1KiB! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk	2016-05-09 13:41:24 +01:00
Imre Deak	36579cb63b	drm/i915: Clean up L3 SQC register field definitions No need for hard-coding the register value, the corresponding fields are defined properly in BSpec. No functional change. v2: - Rebased on BXT L3 SQC tuning patch merged meanwhile. CC: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Link: http://patchwork.freedesktop.org/patch/msgid/1462280061-1457-3-git-send-email-imre.deak@intel.com	2016-05-03 16:49:15 +03:00
Chris Wilson	6ef48d7f01	drm/i915: Reload PD tables after semaphore wait on gen8 When the engine idles waiting upon a semaphore, it loses its pagetables and we must reload them before executing the batch. v2: Restrict w/a to non-RCS rings (RCS works correctly apparently). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-5-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:59:56 +01:00
Chris Wilson	f9a4ea35b8	drm/i915: Fix serialisation of pipecontrol write vs semaphore signal In order for the MI_SEMAPHORE_SIGNAL command to wait until after the pipecontrol writing the signal value is complete, we have to pause the CS inside the PIPE_CONTROL with the CS_STALL bit. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-4-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:59:41 +01:00
Chris Wilson	215a7e3210	drm/i915: Fix gen8 semaphores id for legacy mode With the introduction of a distinct engine->id vs the hardware id, we need to fix up the value we use for selecting the target engine when signaling a semaphore. Note that these values can be merged with engine->guc_id. Fixes: `de1add3605` Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-3-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:59:26 +01:00
Chris Wilson	a58c01aa6e	drm/i915: Apply strongly ordered RCS breadcrumb to gen8/legacy For legacy ringbuffer mode, we need the new ordered breadcrumb emission tried and tested on execlists in order to avoid the dreaded "missed interrupt" syndrome. A secondary advantage of the execlists method is that it writes to an arbitrary address, useful if one wants to write a breadcrumb elsewhere. This fix is taken from commit `7c17d37737` (drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-1-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:58:26 +01:00
Chris Wilson	a0442461f2	drm/i915: Trim the flush for the legacy request emission At the start of request emission, we flush some space for the request, estimating the typical size for the request body. The tail is now much larger than the typical body, so we can shrink the flush slightly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461917226-9132-2-git-send-email-chris@chris-wilson.co.uk	2016-04-29 10:20:51 +01:00
Chris Wilson	6310346e75	drm/i915: Preallocate enough space for the average request Rather than being interrupted when we run out of space halfway through the request, and having to restart from the beginning (and returning to userspace), flush a little more free space when we prepare the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-15-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	bfa0120073	drm/i915: Manually unwind after a failed request allocation In the next patches, we want to move the work out of freeing the request and into its retirement (so that we can free the request without requiring the struct_mutex). This means that we cannot rely on unreferencing the request to completely teardown the request any more and so we need to manually unwind the failed allocation. In doing so, we reorder the allocation in order to make the unwind simple (and ensure that we don't try to unwind a partial request that may have modified global state) and so we end up pushing the initial preallocation down into the engine request initialisation functions where we have the requisite control over the state of the request. Moving the initial preallocation into the engine is less than ideal: it moves logic to handle a specific problem with request handling out of the common code. On the other hand, it does allow those backends significantly more flexibility in performing its allocations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-14-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	0251a96322	drm/i915: Remove the identical implementations of request space reservation Now that we share intel_ring_begin(), reserving space for the tail of the request is identical between legacy/execlists and so the tautology can be removed. In the process, we move the reserved space tracking from the ringbuffer on to the request. This is to enable us to reorder the reserved space allocation in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	987046ad65	drm/i915: Unify intel_ring_begin() Combine the near identical implementations of intel_logical_ring_begin() and intel_ring_begin() - the only difference is that the logical wait has to check for a matching ring (which is assumed by legacy). In the process some debug messages are culled as there were following a WARN if we hit an actual error. v2: Updated commentary Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	3d77e9be62	drm/i915: Use i915_vma_pin_iomap on the ringbuffer object Similarly to i915_gem_object_pin_map on LLC platforms, we can use the new VMA based io mapping on !LLC to amoritize the cost of ringbuffer pinning and unpinning. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-6-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	fe3db79b0b	drm/i915: Propagate error from drm_gem_object_init() Propagate the real error from drm_gem_object_init(). Note this also fixes some confusion in the error return from i915_gem_alloc_object... v2: (Matthew Auld) - updated new users of gem_alloc_object from latest drm-nightly - replaced occurrences of IS_ERR_OR_NULL() with IS_ERR() v3: (Joonas Lahtinen) - fix double "From:" in commit message - add goto teardown path v4: (Matthew Auld) - rebase with i915_gem_alloc_object name change Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461587533-8841-1-git-send-email-matthew.auld@intel.com [Joonas: Removed spurious " = NULL" from _init() function] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-04-28 12:28:58 +03:00
Chris Wilson	bcbdb6d011	drm/i915: Protect gen7 irq_seqno_barrier with uncore lock Faced with sporadic machine hangs on gen7, that mimic the issue of concurrent writes to the same cacheline and seem to start with commit `9b9ed30936` (drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+), let us restore the spinlock around the mmio read. Fixes: `9b9ed30936` (drm/i915: Remove forcewake dance from seqno/irq...) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk Tested-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-04-28 09:11:44 +01:00
Dave Gordon	d37cd8a887	drm/i915: rename i915_gem_alloc_object() to i915_gem_object_create() Because having both i915_gem_object_alloc() and i915_gem_alloc_object() (with different return conventions) is just too confusing! (i915_gem_object_alloc() is the low-level memory allocator, and remains unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO initialises the newly-allocated object.) Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com	2016-04-25 12:31:34 +01:00
Tim Gore	050fc4653c	drm/i915:bxt: implement WaProgramL3SqcReg1DefaultForPerf This patch applies a performance enhancement workaround based on analysis of DX and OCL S-Curve workloads. We increase the General Priority Credits for L3SQ from the hardware default of 56 to the max value 62, and decrease the High Priority credits from 8 to 2. v2: Only apply to B0 onwards v3: Move w/a to per engine init, ie bxt_init_workarounds Signed-off-by: Tim Gore <tim.gore@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461314761-36854-1-git-send-email-tim.gore@intel.com Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2016-04-25 10:06:56 +01:00
Dave Gordon	8305216ff8	drm/i915: check for ERR_PTR from i915_gem_object_pin_map() The newly-introduced function i915_gem_object_pin_map() returns an ERR_PTR (not NULL) if the pin-and-map opertaion fails, so that's what we must check for. And it's nicer not to assign such a pointer-or-error to a structure being filled in until after it's been validated, so we should keep it local and avoid exporting a bogus pointer. Also, for clarity and symmetry, we should clear 'virtual_start' along with 'vma' when unmapping a ringbuffer. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-04-20 16:59:03 +01:00
Tvrtko Ursulin	791bee125b	drm/i915: Remove a couple pointless WARN_ONs Just two WARN_ONs followed by pointer dereference I spotted by accident. v2: Remove some more of the same. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1461080770-14693-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-04-20 09:59:17 +01:00
Tim Gore	bfd8ad4e4a	drm/i915/gen9: implement WaEnableSamplerGPGPUPreemptionSupport WaEnableSamplerGPGPUPreemptionSupport fixes a problem related to mid thread pre-emption. Signed-off-by: Tim Gore <tim.gore@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461077152-31899-1-git-send-email-tim.gore@intel.com	2016-04-20 09:58:19 +01:00
Chris Wilson	a687a43a48	drm/i915: Force ringbuffers to not be at offset 0 For reasons unknown Sandybridge GT1 (at least) will eventually hang when it encounters a ring wraparound at offset 0. The test case that reproduces the bug reliably forces a large number of interrupted context switches, thereby causing very frequent ring wraparounds, but there are similar bug reports in the wild with the same symptoms, seqno writes stop just before the wrap and the ringbuffer at address 0. It is also timing crucial, but adding various delays hasn't helped pinpoint where the window lies. Whether the fault is restricted to the ringbuffer itself or the GTT addressing is unclear, but moving the ringbuffer fixes all the hangs I have been able to reproduce. References: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=93262 Testcase: igt/gem_exec_whisper/render-contexts-interruptible #snb-gt1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-12-git-send-email-chris@chris-wilson.co.uk	2016-04-14 10:45:40 +01:00
Chris Wilson	f4457ae71f	drm/i915: Prevent leaking of -EIO from i915_wait_request() Reporting -EIO from i915_wait_request() has proven very troublematic over the years, with numerous hard-to-reproduce bugs cropping up in the corner case of where a reset occurs and the code wasn't expecting such an error. If the we reset the GPU or have detected a hang and wish to reset the GPU, the request is forcibly complete and the wait broken. Currently, we report either -EAGAIN or -EIO in order for the caller to retreat and restart the wait (if appropriate) after dropping and then reacquiring the struct_mutex (essential to allow the GPU reset to proceed). However, if we take the view that the request is complete (no further work will be done on it by the GPU because it is dead and soon to be reset), then we can proceed with the task at hand and then drop the struct_mutex allowing the reset to occur. This transfers the burden of checking whether it is safe to proceed to the caller, which in all but one instance it is safe - completely eliminating the source of all spurious -EIO. Of note, we only have two API entry points where we expect that userspace can observe an EIO. First is when submitting an execbuf, if the GPU is terminally wedged, then the operation cannot succeed and an -EIO is reported. Secondly, existing userspace uses the throttle ioctl to detect an already wedged GPU before starting using HW acceleration (or to confirm that the GPU is wedged after an error condition). So if the GPU is wedged when the user calls throttle, also report -EIO. v2: Split more carefully the change to i915_wait_request() and assorted ABI from the reset handling. v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code so that we don't start to leak EIO there in future (and break our hang resistant modesetting). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk	2016-04-14 10:45:40 +01:00
Chris Wilson	299259a3a9	drm/i915: Store the reset counter when constructing a request As the request is only valid during the same global reset epoch, we can record the current reset_counter when constructing the request and reuse it when waiting upon that request in future. This removes a very hairy atomic check serialised by the struct_mutex at the time of waiting and allows us to transfer those waits to a central dispatcher for all waiters and all requests. PS: With per-engine resets, we obviously cannot assume a global reset epoch for the requests - a per-engine epoch makes the most sense. The challenge then is how to handle checking in the waiter for when to break the wait, as the fine-grained reset may also want to requeue the request (i.e. the assumption that just because the epoch changes the request is completed may be broken - or we just avoid breaking that assumption with the fine-grained resets). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk	2016-04-14 10:45:40 +01:00
Chris Wilson	c19ae989b0	drm/i915: Hide the atomic_read(reset_counter) behind a helper This is principally a little bit of syntatic sugar to hide the atomic_read()s throughout the code to retrieve the current reset_counter. It also provides the other utility functions to check the reset state on the already read reset_counter, so that (in later patches) we can read it once and do multiple tests rather than risk the value changing between tests. v2: Be more strict on converting existing i915_reset_in_progress() over to the more verbose i915_reset_in_progress_or_wedged(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk	2016-04-14 10:45:40 +01:00
Mika Kuoppala	97ea6be161	drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs Experiments with heaven 4.0 benchmark and skylake gt3e (rev 0xa) suggest that WaForceContextSaveRestoreNonCoherent is needed for all revs. Extending this to all revs cures a gpu hang with rev 0xa when running heaven4.0 gpu benchmark. We have been here before, with problems enabling gt4e and extending up to revision F0 instead of false claims of bspec of E0 only. See commit <e238659ddd88> ("drm/i915/skl: Default to noncoherent access up to F0"). In retrospect we should have covered this with this big blanket back then already, as E0 vs F0 discrepancy was suspicious enough. Previously the WaForceEnableNonCoherent has been tied to context non-coherence, atleast in relevant hsds. So keep this tie and extended this alongside. Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Timo Aaltonen <tjaalton@ubuntu.com> Cc: stable@vger.kernel.org Reported-by: Mike Lothian <mike@fireburn.co.uk> References: https://bugs.freedesktop.org/show_bug.cgi?id=93491 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459860977-27751-2-git-send-email-mika.kuoppala@intel.com	2016-04-13 15:30:25 +03:00
Chris Wilson	0a798eb92e	drm/i915: Refactor duplicate object vmap functions We now have two implementations for vmapping a whole object, one for dma-buf and one for the ringbuffer. If we couple the mapping into the obj->pages lifetime, then we can reuse an obj->mapping for both and at the same time couple it into the shrinker. There is a third vmapping routine in the cmdparser that maps only a range within the object, for the time being that is left alone, but will eventually use these routines in order to cache the mapping between invocations. v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala) v3: Call unpin_vmap from the right dmabuf unmapper v4: Rename vmap to map as we don't wish to imply the type of mapping involved, just that it contiguously maps the object into kernel space. Add kerneldoc and lockdep annotations Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-4-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>	2016-04-11 17:11:44 +01:00
Chris Wilson	d2cad5358b	drm/i915: Consolidate common error handling in intel_pin_and_map_ringbuffer_obj After we pin the ringbuffer into the GGTT, all error paths need to unpin it again. Move this common step into one block, and make the unable to iomap error code consistent (i.e. treat it as out of memory to avoid confusing it with a invalid argument). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-3-git-send-email-chris@chris-wilson.co.uk	2016-04-11 17:11:08 +01:00
Chris Wilson	c04e0f3b4e	drm/i915: Separate out the seqno-barrier from engine->get_seqno In order to simplify future patches, extract the lazy_coherency optimisation our of the engine->get_seqno() vfunc into its own callback. v2: Rename the barrier to engine->irq_seqno_barrier to try and better reflect that the barrier is only required after the user interrupt before reading the seqno (to ensure that the seqno update lands in time as we do not have strict seqno-irq ordering on all platforms). Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2] v3: Comments for hangcheck paranoia. Mika wanted to keep the extra barrier inside the hangcheck, just in case. I can argue that it doesn't provide a barrier against anything, but the side-effects of applying the barrier may prevent a false declaration of a hung GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk	2016-04-09 12:09:05 +01:00
Chris Wilson	9b9ed30936	drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ In order to ensure seqno/irq coherency, we currently read a ring register. The mmio transaction following the interrupt delays the inspection of the seqno long enough for the MI_STORE_DWORD_IMM to update the CPU cache. However, it is only the memory timing that is important for the purposes of the delay, we do not need nor desire the extra forcewake. v3: Update commentary Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-1-git-send-email-chris@chris-wilson.co.uk	2016-04-09 12:08:53 +01:00
Akash Goel	782f6bc0ab	drm/i915: Fixup the free space logic in ring_prepare Currently for the case where there is enough space at the end of Ring buffer for accommodating only the base request, the wrapround is done immediately and as a result the base request gets added at the start of Ring buffer. But there may not be enough free space at the beginning to accommodate the base request, as before the wraparound, the wait was effectively done for the reserved_size free space from the start of Ring buffer. In such a case there is a potential of Ring buffer overflow, the instructions at the head of Ring (ACTHD) can get overwritten. Since the base request can fit in the remaining space, there is no need to wraparound immediately. The wraparound will anyway happen later when the reserved part starts getting used. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1457688402-10411-1-git-send-email-akash.goel@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org	2016-04-09 11:37:48 +01:00
Chris Wilson	01347126f4	drm/i915: Reset engine->last_submitted_seqno When we change the current seqno, we also need to remember to reset the last_submitted_seqno for the engine. Testcase: igt/gem_exec_whisper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-7-git-send-email-chris@chris-wilson.co.uk	2016-04-08 11:44:22 +01:00
Chris Wilson	a058d93482	drm/i915: Reset semaphore page for gen8 An oversight is that when we wrap the seqno, we need to reset the hw semaphore counters to 0. We did this for gen6 and gen7 and forgot to do so for the new implementation required for gen8 (legacy). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-6-git-send-email-chris@chris-wilson.co.uk	2016-04-08 11:43:59 +01:00
Chris Wilson	29dcb57026	drm/i915: Move the hw semaphore initialisation from GEM to the engine Since we are setting engine local values that are tied to the hardware, move it out of i915_gem_init_seqno() into the intel_ring_init_seqno() backend, next to where the other hw semaphore registers are written. v2: Make the explanatory comment about always resetting the semaphores to 0 irrespective of the value of the reset seqno. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-4-git-send-email-chris@chris-wilson.co.uk	2016-04-08 11:43:38 +01:00
Chris Wilson	d04bce48a5	drm/i915: Remove unneeded drm_device pointer from intel_ring_init_seqno() We only use drm_i915_private within the function, so delete the unneeded drm_device local. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-3-git-send-email-chris@chris-wilson.co.uk	2016-04-08 11:43:25 +01:00
Joonas Lahtinen	72e96d6450	drm/i915: Refer to GGTT {,VM} consistently Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt", "vm" or indirectly through other variables like "dev_priv->ggtt.base" to avoid confusion with the i915_ggtt object itself and PPGTT VMs. Refer to the GGTT as "ggtt" instead of indirectly through chaining. As a bonus gets rid of the long-standing i915_obj_to_ggtt vs. i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt! v2: - Added some more after grepping sources with Chris v3: - Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm (Chris) v4: - Convert all dev_priv->ggtt->foo accesses to ggtt->foo. v5: - Make patch checker happy Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-31 17:55:43 +03:00
Dave Gordon	c3232b1883	drm/i915: introduce for_each_engine_id() Equivalent to the existing for_each_engine() macro, this will replace the latter wherever the third argument is actually wanted (in most places, it is not used). The third argument is renamed to emphasise that it is an engine id (type enum intel_engine_id). All the callers of the macro that actually need the third argument are updated to use this version, and the argument (generally 'i') is also updated to be 'id'. Other callers (where the third argument is unused) are untouched for now; they will be updated in the next patch. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-24 14:34:06 +00:00
Tomas Elf	fc0768ceac	drm/i915/tdr: Initialize hangcheck struct for each engine Initialize hangcheck struct during driver load. Since we do the same after recovering from a reset, this is extracted into a helper function. v2: remove redundant hangcheck init during load as this is done when engines are initialized (Chris) Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1458577619-12006-1-git-send-email-arun.siluvery@linux.intel.com	2016-03-22 13:52:42 +02:00
Joonas Lahtinen	62106b4f6b	drm/i915: Rename dev_priv->gtt to dev_priv->ggtt Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt. Fix a couple of whitespace problems while at it. v2: - Fix a typo in commit message. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-03-18 15:18:15 +02:00
Tim Gore	950b2aaeea	drm/i915/gen9: add WaClearFlowControlGpgpuContextSave This allows writes to EU flow control registers. Together with SIP code from the user-mode driver this resolves a hang seen in some pre-emption scenarios. Note that this patch is just the kernel mode part of this workaround. v2. Oops, add FLOW_CONTROL_ENABLE macro to i915_reg.h. Signed-off-by: Tim Gore <tim.gore@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1458144826-17269-1-git-send-email-tim.gore@intel.com	2016-03-18 11:12:29 +00:00
Tvrtko Ursulin	39dabecd99	drm/i915: Use shorter route to dev_private where possible Where we have a request we can use req->i915 directly instead of going through the engine and device. Coccinelle script: @@ function f; identifier r; @@ f(..., struct drm_i915_gem_request r, ...) { ... - engine->dev->dev_private + r->i915 ... } @@ struct drm_i915_gem_request req; @@ ( req-> - engine->dev->dev_private + i915 ) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1458219850-21007-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-03-18 09:50:37 +00:00
Tvrtko Ursulin	117897f42c	drm/i915: More renaming of rings to engines This time using only sed and a few by hand. v2: Rename also intel_ring_id and intel_ring_initialized. v3: Fixed typo in intel_ring_initialized. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1458126040-33105-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-03-16 15:33:30 +00:00
Tvrtko Ursulin	666796da7a	drm/i915: More intel_engine_cs renaming Some trivial ones, first pass done with Coccinelle: @@ @@ ( - I915_NUM_RINGS + I915_NUM_ENGINES \| - intel_ring_flag + intel_engine_flag \| - for_each_ring + for_each_engine \| - i915_gem_request_get_ring + i915_gem_request_get_engine \| - intel_ring_idle + intel_engine_idle \| - i915_gem_reset_ring_status + i915_gem_reset_engine_status \| - i915_gem_reset_ring_cleanup + i915_gem_reset_engine_cleanup \| - init_ring_lists + init_engine_lists ) But that didn't fully work so I cleaned it up with: for f in .[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done for f in .[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done for f in .[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done for f in .[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done for f in .[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done for f in .[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-16 15:33:24 +00:00
Tvrtko Ursulin	4a570db57c	drm/i915: Rename intel_engine_cs struct members below and a couple manual fixups. @@ identifier I, J; @@ struct I { ... - struct intel_engine_cs J; + struct intel_engine_cs engine; ... } @@ identifier I, J; @@ struct I { ... - struct intel_engine_cs J; + struct intel_engine_cs engine; ... } @@ struct drm_i915_private d; @@ ( - d->ring + d->engine ) @@ struct i915_execbuffer_params p; @@ ( - p->ring + p->engine ) @@ struct intel_ringbuffer r; @@ ( - r->ring + r->engine ) @@ struct drm_i915_gem_request req; @@ ( - req->ring + req->engine ) v2: Script missed the tracepoint code - fixed up by hand. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-16 15:33:17 +00:00
Tvrtko Ursulin	0bc40be85f	drm/i915: Rename intel_engine_cs function parameters @@ identifier func; @@ func(..., struct intel_engine_cs * - ring + engine , ...) { <... - ring + engine ...> } @@ identifier func; type T; @@ T func(..., struct intel_engine_cs * - ring + engine , ...); Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-16 15:33:10 +00:00
Tvrtko Ursulin	e2f8039147	drm/i915: Rename local struct intel_engine_cs variables Done by the Coccinelle script below plus a manual intervention to GEN8_RING_SEMAPHORE_INIT. @@ expression E; @@ - struct intel_engine_cs ring = E; + struct intel_engine_cs engine = E; <+... - ring + engine ...+> @@ @@ - struct intel_engine_cs ring; + struct intel_engine_cs engine; <+... - ring + engine ...+> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-03-16 15:33:00 +00:00
Chris Wilson	e26e1b976d	drm/i915: Don't ERROR for an expected intel_rcs_ctx_init() interruption intel_rcs_ctx_init() can be interrupted by a signal (if it has to wait upon a full ring to advance). Don't emit an error for this. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1454086145-16160-3-git-send-email-chris@chris-wilson.co.uk	2016-02-15 10:48:08 +01:00
Daniele Ceraolo Spurio	ff3dc0875c	drm/i915: check that rpm ref is held when accessing ringbuf in stolen mem While running some tests on the scheduler patches with rpm enabled I came across a corruption in the ringbuffer, which was root-caused to the GPU being suspended while commands were being emitted to the ringbuffer. The access to memory was failing because the GPU needs to be awake when accessing stolen memory (where my ringbuffer was located). Since we have this constraint it looks like a sensible idea to check that we hold a refcount when we access the rungbuffer. v2: move the check from ring_begin to ringbuffer iomap time (Chris) v3: update comment (Chris) Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1453909429-11024-1-git-send-email-daniele.ceraolospurio@intel.com	2016-02-11 06:57:32 +01:00
Arun Siluvery	6ecf56ae1d	drm/i915/gen9: Add WaOCLCoherentLineFlush This is mainly required for future enabling of pre-emptive command execution. v2: explain purpose of change (Chris) Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453412634-29238-9-git-send-email-arun.siluvery@linux.intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-01-25 16:49:15 +01:00
Arun Siluvery	a78536e73f	drm/i915/skl: Enable Per context Preemption granularity control Per context preemption granularity control is only available from SKL:E0+ Actual WA is to disable percontext preemption granularity control until D0 which is the default case so this is equivalent to the inverse of WaDisablePerCtxtPreemptionGranularityControl:skl v2: add some detail to commit msg (Chris) Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453412634-29238-8-git-send-email-arun.siluvery@linux.intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-01-25 16:48:52 +01:00

1 2 3 4 5 ...

782 Commits