linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-20 09:34:44 +08:00

Author	SHA1	Message	Date
Mika Kuoppala	24a65e624b	drm/i915/hangcheck: Prevent long walks across full-ppgtt With full-ppgtt, it takes the GPU an eon to traverse the entire 256PiB address space, causing a loop to be detected. Under the current scheme, if ACTHD walks off the end of a batch buffer and into an empty address space, we "never" detect the hang. If we always increment the score as the ACTHD is progressing then we will eventually timeout (after ~46.5s (31 * 1.5s) without advancing onto a new batch). To counter act this, increase the amount we reduce the score for good batches, so that only a series of almost-bad batches trigger a full reset. DoS detection suffers slightly but series of long running shader tests will benefit. Based on a patch from Chris Wilson. Testcase: igt/drv_hangman/hangcheck-unterminated Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1456930109-21532-1-git-send-email-mika.kuoppala@intel.com	2016-03-04 15:17:14 +02:00
Tvrtko Ursulin	c6a2ac712d	drm/i915: Execlists small cleanups and micro-optimisations Assorted changes in the areas of code cleanup, reduction of invariant conditional in the interrupt handler and lock contention and MMIO access optimisation. * Remove needless initialization. * Improve cache locality by reorganizing code and/or using branch hints to keep unexpected or error conditions out of line. * Favor busy submit path vs. empty queue. * Less branching in hot-paths. v2: * Avoid mmio reads when possible. (Chris Wilson) * Use natural integer size for csb indices. * Remove useless return value from execlists_update_context. * Extract 32-bit ppgtt PDPs update so it is out of line and shared with two callers. * Grab forcewake across all mmio operations to ease the load on uncore lock and use chepear mmio ops. v3: * Removed some more pointless u8 data types. * Removed unused return from execlists_context_queue. * Commit message updates. v4: * Unclumsify the unqueue if statement. (Chris Wilson) * Hide forcewake from the queuing function. (Chris Wilson) Version 3 now makes the irq handling code path ~20% smaller on 48-bit PPGTT hardware, and a little bit less elsewhere. Hot paths are mostly in-line now and hammering on the uncore spinlock is greatly reduced together with mmio traffic to an extent. Benchmarking with "gem_latency -n 100" (keep submitting batches with 100 nop instruction) shows approximately 4% higher throughput, 2% less CPU time and 22% smaller latencies. This was on a big-core while small-cores could benefit even more. Most likely reason for the improvements are the MMIO optimization and uncore lock traffic reduction. One odd result is with "gem_latency -n 0" (dispatching empty batches) which shows 5% more throughput, 8% less CPU time, 25% better producer and consumer latencies, but 15% higher dispatch latency which is yet unexplained. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-03-01 10:36:02 +00:00
Alex Dai	397097b026	drm/i915/guc: Decouple GuC engine id from ring id Previously GuC uses ring id as engine id because of same definition. But this is not true since this commit: commit `de1add3605` Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Fri Jan 15 15:12:50 2016 +0000 drm/i915: Decouple execbuf uAPI from internal implementation Added GuC engine id into GuC interface to decouple it from ring id used by driver. v2: Keep ring name print out in debugfs; using for_each_ring() where possible to keep driver consistent. (Chris W.) Signed-off-by: Alex Dai <yu.dai@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com	2016-01-25 10:56:30 +00:00
Chris Wilson	426960bed3	drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids Tvrtko was looking through the execbuffer-ioctl and noticed that the uABI was tightly coupled to our internal engine identifiers. Close inspection also revealed that we leak those internal engine identifiers through the busy-ioctl, and those internal identifiers already do not match the user identifiers. Fortuitiously, there is only one user of the set of busy rings from the busy-ioctl, and they only wish to choose between the RENDER and the BLT engines. Let's fix the userspace ABI while we still can. v2: Update the uAPI documentation to explain the identifiers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_busy Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk	2016-01-21 11:00:35 +00:00
Tvrtko Ursulin	de1add3605	drm/i915: Decouple execbuf uAPI from internal implementation At the moment execbuf ring selection is fully coupled to internal ring ids which is not a good thing on its own. This dependency is also spread between two source files and not spelled out at either side which makes it hidden and fragile. This patch decouples this dependency by introducing an explicit translation table of execbuf uAPI to ring id close to the only call site (i915_gem_do_execbuffer). This way we are free to change driver internal implementation details without breaking userspace. All state relating to the uAPI is now contained in, or next to, i915_gem_do_execbuffer. As a side benefit, this patch decreases the compiled size of i915_gem_do_execbuffer. v2: Extract ring selection into eb_select_ring. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-01-21 10:55:44 +00:00
Chris Wilson	7c17d37737	drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists Broadwell and later currently use the same unordered command sequence to update the seqno in the HWS status page and then assert the user interrupt. We should apply the w/a from legacy (where we do an mmio read to delay the seqno read after the interrupt), but this is not enough to enforce coherent seqno visibilty on Skylake. Rather than search for the proper post-interrupt seqno barrier, use a strongly ordered command sequence to write the seqno, then assert the user interrupt from the ring. v2: Move around the wa tail dwords to avoid adding duplicate code. v3: Add references, comments on workarounds and bit5 check. References: https://bugs.freedesktop.org/show_bug.cgi?id=93693 Testcase: igt/gem_ring_sync_loop #skl Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com	2016-01-21 11:53:09 +02:00
Dave Gordon	ed54c1a1d1	drm/i915: abolish separate per-ring default_context pointers Now that we've eliminated a lot of uses of ring->default_context, we can eliminate the pointer itself. All the engines share the same default intel_context, so we can just keep a single reference to it in the dev_priv structure rather than one in each of the engine[] elements. This make refcounting more sensible too, as we now have a refcount of one for the one pointer, rather than a refcount of one but multiple pointers. From an idea by Chris Wilson. v2: transform an extra instance of ring->default_context introduced by `42f1cae8c` drm/i915: Restore inhibiting the load of the default context That patch's commentary includes: v2: Mark the global default context as uninitialized on GPU reset so that the context-local workarounds are reloaded upon re-enabling The code implementing that now also benefits from the replacement of the multiple (per-ring) pointers to the default context with a single pointer to the unique kernel context. v4: Rebased, remove underused local (Nick Hoath) Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-01-21 09:21:29 +01:00
Jani Nikula	e282891472	drm/i915: turn some bogus kernel-doc comments to normal comments Apparently accidental or misplaced /** kernel-doc comments, confusing the tool. Turn them to normal comments. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453101588-18008-2-git-send-email-jani.nikula@intel.com	2016-01-20 10:22:08 +02:00
Tvrtko Ursulin	0eb973d31d	drm/i915: Cache ringbuffer GTT VMA Purpose is to avoid calling i915_gem_obj_ggtt_offset from the interrupt context without the big lock held. v2: Renamed gtt_start to gtt_offset. (Daniel Vetter) v3: Cache the VMA instead of address. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-2-git-send-email-tvrtko.ursulin@linux.intel.com	2016-01-18 09:58:40 +00:00
Tvrtko Ursulin	ca82580c9c	drm/i915: Do not call API requiring struct_mutex where it is not available LRC code was calling GEM API like i915_gem_obj_ggtt_offset from places where the struct_mutex cannot be grabbed (irq handlers). To avoid that this patch caches some interesting bits and values in the engine and context structures. Some usages are also removed where they are not needed like a few asserts which are either impossible or have been checked already during engine initialization. Side benefit is also that interrupt handlers and command submission stop evaluating invariant conditionals, like what Gen we are running on, on every interrupt and every command submitted. This patch deals with logical ring context id and descriptors while subsequent patches will deal with the remaining issues. v2: * Cache the VMA instead of the address. (Chris Wilson) * Incorporate Dave Gordon's good comments and function name. v3: * Extract ctx descriptor template to a function and group functions dealing with ctx descriptor & co together near top of the file. (Dave Gordon) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-01-18 09:58:36 +00:00
Mika Kuoppala	61642ff035	drm/i915: Inspect subunit states on hangcheck If head seems stuck and engine in question is rcs, inspect subunit state transitions from undone to done, before deciding that this really is a hang instead of limited progress. Only account the transitions of subunits from undone to done once, to prevent unstable subunit states to keep us falsely active. As this adds one extra steps to hangcheck heuristics, before hang is declared, it adds 1500ms to to detect hang for render ring to a total of 7500ms. We could sample the subunit states on first head stuck condition but decide not to do so only in order to mimic old behaviour. This way the check order of promotion from seqno > atchd > instdone is consistently done. v2: Deal with unstable done states (Arun) Clear instdone progress on head and seqno movement (Chris) Report raw and accumulated instdone's in in debugfs (Chris) Return HANGCHECK_ACTIVE on undone->done References: https://bugs.freedesktop.org/show_bug.cgi?id=93029 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1448985372-19535-1-git-send-email-mika.kuoppala@intel.com	2016-01-08 13:06:04 +02:00
Dave Gordon	b0366a54b4	drm/i915: intel_ring_initialized() must be simple and inline Based on Chris Wilson's patch from 6 months ago, rebased and adapted. The current implementation of intel_ring_initialized() is too heavyweight; it's a non-inlined function that chases several levels of pointers. This wouldn't matter too much if it were rarely called, but it's used inside the iterator test of for_each_ring() and is therefore called quite frequently. So let's make it simple and inline ... The idea here is to use ring->dev as an indicator showing which engines have been initialised and are therefore to be included in iterations that use for_each_ring(). This allows us to avoid multiple memory references and a (non-inlined) function call on each iteration of each such loop. Fixes regression from commit `48d823878d` Author: Oscar Mateo <oscar.mateo@intel.com> Date: Thu Jul 24 17:04:23 2014 +0100 drm/i915/bdw: Generic logical ring init and cleanup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1449586956-32360-2-git-send-email-david.s.gordon@intel.com	2015-12-10 14:14:36 +01:00
Ville Syrjälä	f0f59a00a1	drm/i915: Type safe register read/write Make I915_READ and I915_WRITE more type safe by wrapping the register offset in a struct. This should eliminate most of the fumbles we've had with misplaced parens. This only takes care of normal mmio registers. We could extend the idea to other register types and define each with its own struct. That way you wouldn't be able to accidentally pass the wrong thing to a specific register access function. The gpio_reg setup is probably the ugliest thing left. But I figure I'd just leave it for now, and wait for some divine inspiration to strike before making it nice. As for the generated code, it's actually a bit better sometimes. Eg. looking at i915_irq_handler(), we can see the following change: lea 0x70024(%rdx,%rax,1),%r9d mov $0x1,%edx - movslq %r9d,%r9 - mov %r9,%rsi - mov %r9,-0x58(%rbp) - callq 0xd8(%rbx) + mov %r9d,%esi + mov %r9d,-0x48(%rbp) callq 0xd8(%rbx) So previously gcc thought the register offset might be signed and decided to sign extend it, just in case. The rest appears to be mostly just minor shuffling of instructions. v2: i915_mmio_reg_{offset,equal,valid}() helpers added s/_REG/_MMIO/ in the register defines mo more switch statements left to worry about ring_emit stuff got sorted in a prep patch cmd parser, lrc context and w/a batch buildup also in prep patch vgpu stuff cleaned up and moved to a prep patch all other unrelated changes split out v3: Rebased due to BXT DSI/BLC, MOCS, etc. v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/ Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com	2015-11-18 15:39:11 +02:00
Ville Syrjälä	f92a916220	drm/i915: Add functions to emit register offsets to the ring When register type safety happens, we can't just try to emit the register itself to the ring. Instead we'll need to extract the offset from it first. Add some convenience functions that will do that. v2: Convert MOCS setup too Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2015-11-18 14:35:24 +02:00
Chris Wilson	608c1a526c	drm/i915: Recover all available ringbuffer space following reset Having flushed all requests from all queues, we know that all ringbuffers must now be empty. However, since we do not reclaim all space when retiring the request (to prevent HEADs colliding with rapid ringbuffer wraparound) the amount of available space on each ringbuffer upon reset is less than when we start. Do one more pass over all the ringbuffers to reset the available space Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com>	2015-10-28 17:10:31 +00:00
Chris Wilson	01101fa7cc	drm/i915: Refactor common ringbuffer allocation code A small, very small, step to sharing the duplicate code between execlists and legacy submission engines, starting with the ringbuffer allocation code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-09-04 10:17:00 +02:00
Imre Deak	319404df2f	drm/i915/bxt: work around HW coherency issue when accessing GPU seqno By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. v2: - fix using the proper logical && instead of a bitwise & (Jani, Mika) - limit the workaround to A stepping, on later steppings this HW issue is fixed v3: - use a separate get_seqno/set_seqno vfunc (Chris) Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-08-26 09:39:13 +02:00
Daniel Vetter	ca6e440577	Merge tag 'drm-intel-fixes-2015-07-15' into drm-intel-next-queued Backmerge fixes since it's getting out of hand again with the massive split due to atomic between -next and 4.2-rc. All the bugfixes in 4.2-rc are addressed already (by converting more towards atomic instead of minimal duct-tape) so just always pick the version in next for the conflicts in modeset code. All the other conflicts are just adjacent lines changed. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_drv.h drivers/gpu/drm/i915/intel_ringbuffer.h Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-07-15 16:36:50 +02:00
Tomas Elf	94f7bbe150	drm/i915: Snapshot seqno of most recently submitted request. The hang checker needs to inspect whether or not the ring request list is empty as well as if the given engine has reached or passed the most recently submitted request. The problem with this is that the hang checker cannot grab the struct_mutex, which is required in order to safely inspect requests since requests might be deallocated during inspection. In the past we've had kernel panics due to this very unsynchronized access in the hang checker. One solution to this problem is to not inspect the requests directly since we're only interested in the seqno of the most recently submitted request - not the request itself. Instead the seqno of the most recently submitted request is stored separately, which the hang checker then inspects, circumventing the issue of synchronization from the hang checker entirely. This fixes a regression introduced in commit `44cdd6d219` Author: John Harrison <John.C.Harrison@Intel.com> Date: Mon Nov 24 18:49:40 2014 +0000 drm/i915: Convert 'ring_idle()' to use requests not seqnos v2 (Chris Wilson): - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather than compute it over again. - Remove extra whitespace. Issue: VIZ-5998 Signed-off-by: Tomas Elf <tomas.elf@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add regressing commit citation provided by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-13 22:42:39 +02:00
Abdiel Janulgue	919032ec7c	drm/i915: Enable resource streamer bits on MI_BATCH_BUFFER_START Adds support for enabling the resource streamer on the legacy ringbuffer for HSW and GEN8. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-06 10:25:57 +02:00
John Harrison	79bbcc299f	drm/i915: Reserve space improvements An earlier patch was added to reserve space in the ring buffer for the commands issued during 'add_request()'. The initial version was pessimistic in the way it handled buffer wrapping and would cause premature wraps and thus waste ring space. This patch updates the code to better handle the wrap case. It no longer enforces that the space being asked for and the reserved space are a single contiguous block. Instead, it allows the reserve to be on the far end of a wrap operation. It still guarantees that the space is available so when the wrap occurs, no wait will happen. Thus the wrap cannot fail which is the whole point of the exercise. Also fixed a merge failure with some comments from the original patch. v2: Incorporated suggestion by David Gordon to move the wrap code inside the prepare function and thus allow a single combined wait_for_space() call rather than doing one before the wrap and another after. This also makes the prepare code much simpler and easier to follow. v3: Fix for 'effective_size' vs 'size' during ring buffer remainder calculations (spotted by Tomas Elf). For: VIZ-5115 CC: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-03 07:38:59 +02:00
John Harrison	bccca494f7	drm/i915: Remove the now obsolete 'outstanding_lazy_request' The outstanding_lazy_request is no longer used anywhere in the driver. Everything that was looking at it now has a request explicitly passed in from on high. Everything that was relying upon it behind the scenes is now explicitly creating/passing/submitting its own private request. Thus the OLR can be removed. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:32 +02:00
John Harrison	59c35a4d12	drm/i915: Remove the now obsolete intel_ring_get_request() Much of the driver has now been converted to passing requests around instead of rings/ringbufs/contexts. Thus the function for retreiving the request from a ring (i.e. the OLR) is no longer used and can be removed. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:31 +02:00
John Harrison	ccd98fe499	drm/i915: Add _ring_begin() to request allocation Now that the _ring_begin() functions no longer call the request allocation code, it is finally safe for the request allocation code to call *_ring_begin(). This is important to guarantee that the space reserved for the subsequent i915_add_request() call does actually get reserved. v2: Renamed functions according to review feedback (Tomas Elf). For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:30 +02:00
John Harrison	5fb9de1a2e	drm/i915: Update intel_ring_begin() to take a request structure Now that everything above has been converted to use requests, intel_ring_begin() can be updated to take a request instead of a ring. This also means that it no longer needs to lazily allocate a request if no-one happens to have done it earlier. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:29 +02:00
John Harrison	bba09b12b4	drm/i915: Update cacheline_align() to take a request structure Updated intel_ring_cacheline_align() to take a request instead of a ring. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:28 +02:00
John Harrison	f71696876a	drm/i915: Update ring->signal() to take a request structure Updated the various ring->signal() implementations to take a request instead of a ring. This removes their reliance on the OLR to obtain the seqno value that should be used for the signal. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:27 +02:00
John Harrison	599d924c6b	drm/i915: Update ring->sync_to() to take a request structure Updated the ring->sync_to() implementations to take a request instead of a ring. Also updated the tracer to include the request id. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> [danvet: Rebase since I didn't merge the patch which added ->uniq.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:27 +02:00
John Harrison	be795fc17b	drm/i915: Update ring->emit_bb_start() to take a request structure Updated the ring->emit_bb_start() implementation to take a request instead of a ringbuf/context pair. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:26 +02:00
John Harrison	53fddaf70d	drm/i915: Update ring->dispatch_execbuffer() to take a request structure Updated the various ring->dispatch_execbuffer() implementations to take a request instead of a ring. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:25 +02:00
John Harrison	c4e766389e	drm/i915: Update ring->emit_request() to take a request structure Updated the ring->emit_request() implementation to take a request instead of a ringbuf/request pair. Also removed its use of the OLR for obtaining the request's seqno. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:24 +02:00
John Harrison	ee044a8863	drm/i915: Update ring->add_request() to take a request structure Updated the various ring->add_request() implementations to take a request instead of a ring. This removes their reliance on the OLR to obtain the seqno value that the request should be tagged with. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:24 +02:00
John Harrison	7deb4d3980	drm/i915: Update ring->emit_flush() to take a request structure Updated the various ring->emit_flush() implementations to take a request instead of a ringbuf/context pair. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:23 +02:00
John Harrison	a84c3ae168	drm/i915: Update ring->flush() to take a requests structure Updated the various ring->flush() functions to take a request instead of a ring. Also updated the tracer to include the request id. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> [danvet: Rebase since I didn't merge the addition of req->uniq.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:21 +02:00
John Harrison	4866d729ab	drm/i915: Update flush_all_caches() to take request structures Updated the *_ring_flush_all_caches() functions to take requests instead of rings or ringbuf/context pairs. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:20 +02:00
John Harrison	2f20055d36	drm/i915: Update a bunch of execbuffer helpers to take request structures Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and i915_emit_box() to take request structures instead of ring or ringbuf/context pairs. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:18 +02:00
John Harrison	6258fbe23f	drm/i915: Update queue_flip() to take a request structure Updated the display page flip code to do explicit request creation and submission rather than relying on the OLR and just hoping that the request actually gets submitted at some random point. The sequence is now to create a request, queue the work to the ring, assign the known request to the flip queue work item then actually submit the work and post the request. Note that every single flip function used to finish with '__intel_ring_advance(ring);'. However, immediately after they return there is now an add request call which will do the advance anyway. Thus the many duplicate advance calls have been removed. v2: Updated commit message with comment about advance removal. v3: The request can now be allocated by the _sync() code earlier on. Thus the page flip path does not necessarily need to allocate a new request, it may be able to re-use one. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:15 +02:00
John Harrison	8753181e10	drm/i915: Update init_context() to take a request structure Now that everything above has been converted to use requests, it is possible to update init_context() to take a request pointer instead of a ring/context pair. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:12 +02:00
John Harrison	29b1b415fc	drm/i915: Reserve ring buffer space for i915_add_request() commands It is a bad idea for i915_add_request() to fail. The work will already have been send to the ring and will be processed, but there will not be any tracking or management of that work. The only way the add request call can fail is if it can't write its epilogue commands to the ring (cache flushing, seqno updates, interrupt signalling). The reasons for that are mostly down to running out of ring buffer space and the problems associated with trying to get some more. This patch prevents that situation from happening in the first place. When a request is created, it marks sufficient space as reserved for the epilogue commands. Thus guaranteeing that by the time the epilogue is written, there will be plenty of space for it. Note that a ring_begin() call is required to actually reserve the space (and do any potential waiting). However, that is not currently done at request creation time. This is because the ring_begin() code can allocate a request. Hence calling begin() from the request allocation code would lead to infinite recursion! Later patches in this series remove the need for begin() to do the allocate. At that point, it becomes safe for the allocate to call begin() and really reserve the space. Until then, there is a potential for insufficient space to be available at the point of calling i915_add_request(). However, that would only be in the case where the request was created and immediately submitted without ever calling ring_begin() and adding any work to that request. Which should never happen. And even if it does, and if that request happens to fall down the tiny window of opportunity for failing due to being out of ring space then does it really matter because the request wasn't doing anything in the first place? v2: Updated the 'reserved space too small' warning to include the offending sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added re-initialisation of tracking state after a buffer wrap to keep the sanity checks accurate. v3: Incremented the reserved size to accommodate Ironlake (after finally managing to run on an ILK system). Also fixed missing wrap code in LRC mode. v4: Added extra comment and removed duplicate WARN (feedback from Tomas). For: VIZ-5115 CC: Tomas Elf <tomas.elf@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:56 +02:00
Arun Siluvery	17ee950df3	drm/i915/gen8: Add infrastructure to initialize WA batch buffers Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). Chris provided some helpful macros and suggestions which further simplified the code, they will also help in reducing code duplication when WA for other Gen are added. Add detailed comments explaining restrictions. Use do {} while(0) for wa_ctx_emit() macro. (Many thanks to Chris, Dave and Thomas for their reviews and inputs) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:39 +02:00
Francisco Jerez	c1091b25f5	drm/i915: Extend the parser to check register writes against a mask/value pair. In some cases it might be unnecessary or dangerous to give userspace the right to write arbitrary values to some register, even though it might be desirable to give it control of some of its bits. This patch extends the register whitelist entries to contain a mask/value pair in addition to the register offset. For registers with non-zero mask, any LRM writes and LRI writes where the bits of the immediate given by the mask don't match the specified value will be rejected. This will be used in my next patch to grant userspace partial write access to some sensitive registers. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-06-15 16:00:43 +03:00
Francisco Jerez	4e86f725ce	drm/i915: Extend the parser to check register writes against a mask/value pair. In some cases it might be unnecessary or dangerous to give userspace the right to write arbitrary values to some register, even though it might be desirable to give it control of some of its bits. This patch extends the register whitelist entries to contain a mask/value pair in addition to the register offset. For registers with non-zero mask, any LRM writes and LRI writes where the bits of the immediate given by the mask don't match the specified value will be rejected. This will be used in my next patch to grant userspace partial write access to some sensitive registers. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-15 12:34:50 +02:00
Chris Wilson	06fbca713e	drm/i915: Split the batch pool by engine I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:04 +02:00
John Harrison	6689cb2b62	drm/i915: Move common request allocation code into a common function The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:30 +02:00
Paulo Zanoni	dbef0f15b5	drm/i915: add frontbuffer tracking to FBC Kill the blt/render tracking we currently have and use the frontbuffer tracking infrastructure. Don't enable things by default yet. v2: (Rodrigo) Fix small conflict on rebase and typo at subject. v3: (Paulo) Rebase on RENDER_CS change. v4: (Paulo) Rebase. v5: (Paulo) Simplify: flushes don't have origin (Daniel). Also rebase due to patch order changes. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-17 22:29:56 +01:00
John Harrison	8e004efc16	drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 22:43:29 +01:00
Thomas Daniel	b07da53c79	drm/i915: Shift driver's HWSP usage out of reserved range As of Gen6, the general purpose area of the hardware status page has shrunk and now begins at dword 0x30. i915 driver uses dword 0x20 to store the seqno which is now reserved. So shift our HWSP dwords up into the general purpose range before this bites us. Note that all available documentation just says this is reserved without going into details about what it's used for. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [danvet: Add clarification from Thomas that unfortunately Bspec is silent on what "reserverd" precisely means.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-24 11:50:32 +01:00
Damien Lespiau	af75f26918	drm/i915: Make intel_ring_setup_status_page() static This function is only used in intel_ringbuffer.c, so restrict it to that file. The function was moved around to avoid a forward declaration and group it with its user. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Squash in fixup from Wu Fengguang.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-13 23:28:27 +01:00
Nick Hoath	21076372af	drm/i915: Remove FIXME_lrc_ctx backpointer The first pass implementation of execlists required a backpointer to the context to be held in the intel_ringbuffer. However the context pointer is available higher in the call stack. Remove the backpointer from the ring buffer structure and instead pass it down through the call stack. v2: Integrate this changeset with the removal of duplicate request/execlist queue item members. v3: Rebase v4: Rebase. Remove passing of context when the request is passed. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:53 +01:00
Nick Hoath	72f95afa5f	drm/i915: Removed duplicate members from submit_request Where there were duplicate variables for the tail, context and ring (engine) in the gem request and the execlist queue item, use the one from the request and remove the duplicate from the execlist queue item. Issue: VIZ-4274 v1: Rebase v2: Fixed build issues. Keep separate postfix & tail pointers as these are used in different ways. Reinserted missing full tail pointer update. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:52 +01:00

1 2 3 4 5

206 Commits