Commit Graph

328 Commits

Author SHA1 Message Date
Chris Wilson
9af90d19f8 drm/i915: cache the last object lookup during pin_and_relocate()
The most frequent relocation within a batchbuffer is a contiguous sequence
of vertex buffer relocations, for which we can virtually eliminate the
drm_gem_object_lookup() overhead by caching the last handle to object
translation.

In doing so we refactor the pin and relocate retry loop out of
do_execbuffer into its own helper function and so improve the error
paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-20 10:51:50 +01:00
Chris Wilson
1d7cfea152 drm/i915: Do interrupible mutex lock first to avoid locking for unreference
One of the primarily consumers of the i915 driver is X, a large signal
driven application. Frequently when writing into the buffers, there is a
pending signal which causes us not to take the interruptible lock but
then we need to take that same lock around the object unreference. By
rearranging the code to do the interruptible lock as the first check, we
can avoid the frequent additional locking around the unreference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:20:23 +01:00
Chris Wilson
4f27b75d56 drm/i915: rearrange mutex acquisition for pread
... to avoid the double acquisition along fast[er] paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:19:55 +01:00
Chris Wilson
fbd5a26d50 drm/i915: Rearrange acquisition of mutex during pwrite
... to avoid reacquiring it to drop the object reference count on
exit. Note we have to make sure we now drop (and reacquire) the lock
around acquiring the mm semaphore on the slow paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:19:47 +01:00
Chris Wilson
b5e4feb661 drm/i915: Attempt to prefault user pages for pread/pwrite
... in the hope that it makes the atomic fast paths more likely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:19:37 +01:00
Chris Wilson
202f2fef7a drm/i915: Avoid taking the mutex for dropping the refcnt upon creation
After allocation a handle for the fresh object, we know that we can
safely drop the refcnt without triggering a free so we do not need the
mutex. Strangely, this mutex acquisition is the one that appears on
driver profiles.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:19:28 +01:00
Chris Wilson
f0c43d9b7e drm/i915: Perform relocations in CPU domain [if in CPU domain]
Avoid an early eviction of the batch buffer into the uncached GTT
domain, and so do the relocation fixup in cacheable memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:19:18 +01:00
Chris Wilson
2549d6c26c drm/i915: Avoid vmallocing a buffer for the relocations
... perform an access validation check up front instead and copy them in
on-demand, during i915_gem_object_pin_and_relocate(). As around 20% of
the CPU overhead may be spent inside vmalloc for the relocation entries
when submitting an execbuffer [for x11perf -aa10text], the savings are
considerable and result in around a 10% throughput increase [for glyphs].

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-19 09:18:36 +01:00
Chris Wilson
e59f2bac15 drm/i915: Wait for pending flips on the GPU
Currently, if a batch buffer refers to an object with a pending flip,
then we sleep until that pending flip is completed (unpinned and
signalled). This is so that a flip can be queued and the user can
continue rendering to the backbuffer oblivious to whether the buffer is
still pinned as the scan out. (The kernel arbitrating at the last moment
to stall the batch and wait until the buffer is unpinned and replaced as
the front buffer.)

As we only have a queue depth of 1, we can simply wait for the current
pending flip to complete and continue rendering. We can achieve this
with a single WAIT_FOR_EVENT command inserted into the ring buffer prior
to executing the batch, *without* stalling the client.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-07 19:10:09 +01:00
Dave Airlie
fb7ba2114b Merge remote branch 'korg/drm-fixes' into drm-vmware-next
necessary for some of the vmware fixes to be pushed in.

Conflicts:
	drivers/gpu/drm/drm_gem.c
	drivers/gpu/drm/i915/intel_fb.c
	include/drm/drmP.h
2010-10-06 11:10:48 +10:00
Chris Wilson
35b62a89b0 drm/i915: Skip pread/pwrite if size to copy is 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-04 10:07:46 +01:00
Chris Wilson
df6d075a4d Merge branch 'drm-intel-fixes' into drm-intel-next 2010-10-04 10:07:38 +01:00
Chris Wilson
7dcd2499de drm/i915: Rephrase pwrite bounds checking to avoid any potential overflow
... and do the same for pread.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2010-10-03 14:16:18 +01:00
Chris Wilson
ce9d419dbe drm/i915: Sanity check pread/pwrite
Move the access control up from the fast paths, which are no longer
universally taken first, up into the caller. This then duplicates some
sanity checking along the slow paths, but is much simpler.
Tracked as CVE-2010-2962.

Reported-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2010-10-03 14:16:17 +01:00
Chris Wilson
58e10eb92d Merge branch 'drm-intel-fixes' into drm-intel-next
Conflicts:
	drivers/gpu/drm/i915/i915_gem_evict.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_dp.c
2010-10-03 10:56:11 +01:00
Julia Lawall
929f49bf22 drivers/gpu/drm/i915/i915_gem.c: Add missing error handling code
Extend the error handling code with operations found in other nearby error
handling code

A simplified version of the sematic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
@r@
statement S1,S2,S3;
constant C1,C2,C3;
@@

*if (...)
 {... S1 return -C1;}
...
*if (...)
 {... when != S1
    return -C2;}
...
*if (...)
 {... S1 return -C3;}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2010-10-02 15:21:26 +01:00
Chris Wilson
1cdf7fef79 drm/i915: Don't mask the return code whilst relocating.
The return from move_to_gtt_domain() may indicate a pending signal which
needs to handled as opposed to an actual error, for instance, so report
the original return value rather than forcing an EINVAL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-02 15:12:41 +01:00
Chris Wilson
069efc1dac drm/i915: Clear fence registers on GPU reset
When the GPU is reset, the fence registers are invalidated, so release
the objects and clear them out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-01 14:45:22 +01:00
Chris Wilson
812ed49243 drm/i915: Force the domain to CPU on unbinding whilst wedged.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30083
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-01 14:45:21 +01:00
Chris Wilson
73aa808f10 drm: Move the GTT accounting to i915
Only drm/i915 does the bookkeeping that makes the information useful,
and the information maintained is driver specific, so move it out of the
core and into its single user.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
2010-10-01 14:45:20 +01:00
Dave Airlie
29d08b3efd drm/gem: handlecount isn't really a kref so don't make it one.
There were lots of places being inconsistent since handle count
looked like a kref but it really wasn't.

Fix this my just making handle count an atomic on the object,
and have it increase the normal object kref.

Now i915/radeon/nouveau drivers can drop the normal reference on
userspace object creation, and have the handle hold it.

This patch fixes a memory leak or corruption on unload, because
the driver had no way of knowing if a handle had been actually
added for this object, and the fbcon object needed to know this
to clean itself up properly.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-01 09:17:44 +10:00
Chris Wilson
f394940b8d drm/i915: Remove redundant deletion of obj->gpu_write_list
At that point as the object is no longer in any GPU write domain it must
not be on the list, so the list_del() is redundant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-30 09:30:51 +01:00
Chris Wilson
5cdf588174 drm/i915: Make get/put pages static
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-30 09:30:13 +01:00
Chris Wilson
23bc598253 drm/i915/debug: Convert i915_verify_active() to scan all lists
... and check more regularly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-30 09:30:11 +01:00
Chris Wilson
891b48cfc8 drm/i915: Avoid blocking the kworker thread on a stuck mutex
Just reschedule the retire requests again if the device is currently
busy. The request list will be pruned along other paths so will never
grow unbounded and so we can afford to miss the occasional pruning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-29 12:26:37 +01:00
Chris Wilson
3d2a812ae4 drm/i915/debug: Remove default WATCH_BUF
Replaced by tracepoints.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-29 11:41:19 +01:00
Chris Wilson
97d1ebaf81 drm/i915/debug: Remove defunct WATCH_LRU
This has bitrotted through inuse and superseded by tracing and debugfs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-29 11:41:18 +01:00
Chris Wilson
e0e41598b4 Merge branch 'drm-intel-fixes' into drm-intel-next 2010-09-28 15:48:38 +01:00
Chris Wilson
a56ba56c27 Revert "drm/i915: Drop ring->lazy_request"
With multiple rings generating requests independently, the outstanding
requests must also be track independently.

Reported-by: Wang Jinjin <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-28 11:30:52 +01:00
Chris Wilson
ced270fa89 drm/i915: Ensure that the mode change flushing is currently uninterruptible
Introduced by 48b956c5, I had thought I had already fixed this. Oh well.

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-26 22:50:36 +01:00
Chris Wilson
1c25595f8d drm/i915: Convert the file mutex into a spinlock
Daniel Vetter pointed out that in this case is would be clearer and
cleaner to use a spinlock instead of a mutex to protect the per-file
request list manipulation. Make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-26 11:03:27 +01:00
Chris Wilson
76c1dec197 drm/i915: Make the mutex_lock interruptible on ioctl paths
... and combine it with the wedged completion handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-25 12:23:12 +01:00
Chris Wilson
30dbf0c07f drm/i915: Adjust hangcheck EIO semantics
Owain Ainsworth reported an issue between the interaction of the
hangcheck and userspace immediately (and permanently) falling back to
s/w rasterisation. In order to break the mutex and begin resetting the
GPU, we must abort the current operation (usually within the wait) and
climb sufficiently far back up the call chain to drop the mutex. In his
implementation, Owain has a loop within the ioctl handler to detect the
hang and then sleep until the error handler has run. I've chosen to
return to userspace and report an EAGAIN which should trigger the
userspace ioctl handler to repeat the call (simply because it felt less
invasive...). Before hitting a wedged GPU, we then wait upon completion
of the error handler.

Reported-by: Owain G. Ainsworth <zerooa@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-25 12:23:12 +01:00
Chris Wilson
f787a5f59e drm/i915: Only hold a process-local lock whilst throttling.
Avoid cause latencies in other clients by not taking the global struct
mutex and moving the per-client request manipulation a local per-client
mutex. For example, this allows a compositor to schedule a page-flip
(through X) whilst an OpenGL application is monopolising the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-24 21:03:00 +01:00
Chris Wilson
e6c3a2a6d3 drm/i915: Use an uninterruptible wait for page-flips during modeset
We need to drain the pending flips prior to disabling the pipe during
modeset, and these need to be done in an uninterruptible fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-24 14:19:57 +01:00
Chris Wilson
20f0cd55f6 drm/i915: Remove the broken flush_ring from page-flip
This is already performed with the pipelined flush, so by the time we
schedule the flush in the page-flip, the ring is NULL and we OOPs
instead.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-23 11:02:55 +01:00
Chris Wilson
9b74f7348f drm/i915: Fix 945GM regression in e259befd
A minor typo caused a single fence register to be incorrectly
programmed, resulting in occassional tiling corruption.

Reported-and-tested-by: Hans de Bruin <bruinjm@xs4all.nl>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18962
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2010-09-23 10:30:57 +01:00
Chris Wilson
5c12a07e80 drm/i915: Drop ring->lazy_request
We are not currently using it as intended, so remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-22 11:58:55 +01:00
Chris Wilson
dfaae392f4 drm/i915: Clear the gpu_write_list on resetting write_domain upon hang
Otherwise we will hit a list handling assertion when moving the object
to the inactive list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-22 10:31:52 +01:00
Chris Wilson
9e0ae53404 drm/i915: Don't overwrite the returned error-code
During i915_gem_create_mmap_offset() if the subsystem reports an error
code, use it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 15:05:24 +01:00
Chris Wilson
f13d3f7311 drm/i915: Track pinned objects
Keep a list of pinned objects and display it via debugfs. Now all
objects that exist in the GTT are always tracked on one of the
active, flushing, inactive or pinned lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:24:17 +01:00
Chris Wilson
265db9585e drm/i915: Drain any pending flips on the fb prior to unpinning
If we have queued a page flip on the current fb and then request a mode
change, wait until the page flip completes before performing the new
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:24:17 +01:00
Chris Wilson
c78ec30bba drm/i915: Merge ring flushing and lazy requests
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:24:16 +01:00
Chris Wilson
53640e1d07 drm/i915: Track gpu fence usage
Track if the gpu requires the fence for the execution of a batch buffer
and so only wait upon the retirement of the object's last rendering
seqno if the fence is in use by the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:54 +01:00
Chris Wilson
c7f9f9a8b8 drm/i915: Use ring->flush() instead of MI_FLUSH
Use the ring abstraction to hide the details of having choose the
appropriate flushing method.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:59 +01:00
Xiang, Haihao
5c1143bbec drm/i915: do not export the instances of struct intel_ring_buffer
Introduce intel_init_render_ring_buffer(), intel_init_bsd_ring_buffer
for ring initialization.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:55 +01:00
Chris Wilson
77f0123022 drm/i915: Clear GPU read domains on reset
Clear the GPU read domain for the inactive objects on a reset so that
they are correctly invalidated on reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:53 +01:00
Chris Wilson
9375e446e7 drm/i915: Clear flushing lists on GPU reset
Owain Ainsworth noticed that the reset code failed to clear the flushing
list leaving the driver in an inconsistent state following a hung GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:52 +01:00
Chris Wilson
9220434a87 drm/i915: Only emit a flush request on the active ring.
When flushing the GPU domains,we emit a flush on *both* rings, even
though they share a unified cache. Only emit the flush on the currently
active ring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:51 +01:00
Chris Wilson
b84d5f0c22 drm/i915: Inline i915_gem_ring_retire_request()
Change the semantics to retire any buffer older than the current seqno
rather than repeatedly calling calling the function to retire the
buffer at the head of the list matching the request seqno.

Whilst this should have no semantic impact on the implementation, Daniel
was wondering if there was a bug where we might miss a retirement and so
end up with a continually growing active list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:50 +01:00