We need the fix adding a Block decoration to the BuiltIn struct in
SpvModuleScopeVarParserTest_BuiltinPosition_BuiltIn_Position_Initializer.spvasm.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16221>
Also document additional piglit failures and passes.
Multiple changes, mostly notable:
- few new tests
- fixed test for upcoming mesa MR
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16785>
Since we're now actually running Vulkan tests on Dozen, the comment
was wrong and we actually want the runtime. The referenced issue
has been fixed anyway.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16756>
In addition to pushing it to the current latest stable, the v5.17 kernel
for mesa CI pulls a patch to address a regression in drm that affects at
least the lima jobs.
The dtb for sc7180-trogdor-lazor-limozeen-nots is also updated since the
old one no longer exists in v5.17.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16641>
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the gitlab section markers from deqp, for example,
to not work, inputting noise into the log.
This commit makes the simplest fix which is to replace the mangled
characters to the fixed ones.
This approach is error-prone, since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA team itself.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the colored lines from deqp, for example, to not
work, inputting noise into the log.
This commit makes the most straightforward fix which is to replace the
mangled characters to the fixed ones.
This approach is error-prone since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA developers itself.
Fixes: #5503
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
When deqp-runner times out, it kills the deqp process, which in our case
is the previous invocation of our shell script, so the crosvm and socat
cleanup never happened. crosvm has a way to cleanly shut down a previous
crosvm invocation, so let's just use that and do our cleanup when we need
to.
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16485>
The experimental support has been merged for months already, but has
been left unexercised in CI even though dEQP has a mesh_shader.nv
category.
This commit enables the experimental support, which should bring
additional testing \o/.
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16550>
Also document additional piglit failures and crashes with new tests.
Multiple changes, mostly notable:
- few new tests
- traces downloader improvements
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16428>
We currently default to 4h of execution time, but this is unnecessary
on all our platforms. Tighten it to 45 minutes per run, and overall of
1.5h to allow for restarts.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16472>
I am not super happy about doing this, as this means we will
potentially be introducing regressions because CI would have papered
over a rare race condition... But the situation is that this is already
happening, and improving the reliability will likely come from tracking
the occurence rate of hangs in an external system.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16472>
Currently, the LAVA job submitter fetches the job results from the LAVA
XMLRPC call, but that is not necessary, as the job result is easily
found in the logs. E.g. the bare-metal and poe jobs uses that log to set
the final job status of their runs.
Another reason for the change is that the LAVA signals are not reliable
in some devices with one serial port, causing some troubles in a618
recently. So, if one signal fails to be sent/received, the job will
ultimately fail even when the hwci script has been successful.
Fixes: #6435
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16425>
This was set as an attempt to detect when the GPU got hung, and let
our hangcheck detect this issue and kill the run. This however has
not been working so well, so let's try without it.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16423>
This parameter is used by radv-ci to perform extra validation while
running VKCTS.
This change catches more issues, which are will be addressed in !16248:
- dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2,Crash
- dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_u8vec2,Crash
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16277>
This reverts commit 0464117ad9. Now that
the shim back-channel communicates with nouveau that the "GPU" is always
idle, we can get the nouveau compiler back into the CI path.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16150>
After a LAVA job submitter rework, the init-stage2.sh was changed to be
compatible with new LAVA job definitions, but the result from the script
represented by `HWCI_TEST_SCRIPT` variable is obfuscated by the `set -e`
command. So when the test script fails, `set` will override the exit
code and the jobs will pass when they should fail.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16325>
Rarely the jobs.logs RPC call can return corrupted data, such as
mal-formed YAML data. As this is expected and very rare to occur, let's
retry this RPC call several times to give it a chance to fix itself.
Retrying would not swallow the log lines since we keep track of how many
log lines each job has.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Move exceptions to its own file.
Create MesaCITimeoutError and MesaCIRetryError with specific exception
data for better exception classification.
Avoid the use of `fatal_err` in favor of raising a proper exception.
Make _call_proxy exception handling exhaustive, add missing
ResponseError treatment.
Also, detect JobError during job result parsing. So when a LAVA timeout error
happens, it is probably cause by some boot/network issues with a
specific device, we can retry the same job in other device with the same
device_type.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
During development, we may want to test lava_job_submitter.py locally,
sometimes one can find what one is been looking for before the LAV job
is done. Let's respond to SIGINT signal and cancel the LAVA job, as we
can't follow what is happening anymore via script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
A normal boot on LAVA device would take less than 1 minute. Sometimes
the depthcharge freezes and LAVA waits until the current timeout (25
minutes) to kick in and fail the job.
With this lower timeout value, the job could fail faster and eventually
LAVA will be able to retry it.
Furthermore, set a default timeout for all actions to fix an undesired
behavior with some LAVA job subactions, such as depthcharge-action.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Make jwt-file argument optional, this means that this submitter could
deal with more generic use cases, such as running it locally, since the
MINIO JWT is a sensitive information, which is not really required to
test if the submitter is working well.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
LAVA can filter which test suite to show the results from, let's list
all testcases possible in the mesa test suite, to be able to divide more
complex jobs into test_cases.
Another advantage is that the test case can vary its name.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Now, every rootfs based device has bash installed, so we can use bash
shebang instead of dash to enforce the use of bash's echo, not dash's
one.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Any daemon executed in init-stage2.sh may interfere with LAVA signals,
since any threaded output to console may clutter the signals, which are
based on the log output.
E.g: This job
https://gitlab.freedesktop.org/gallo/mesa/-/jobs/20779120#L2102
has failed because capture-devcoredump.sh was alive and emitting kernel
messages to the console during the LAVA signal handling, mangling the
output and making the LAVA to fail to check the results of the job.
Another problem is that CONFIG_DEBUG_STACK_USAGE Kconfig is enabled.
This causes process exit to dump a `RESULT=[ 246.756067] lava-test-case
(156) used greatest stack depth: ... bytes left` kernel message to the
logs corrupting LAVA signal message. Empirically, it happens one in
every 280 jobs. To solve that, compose the lava-test-case custom script
with a short sleep to give time for kernel to dump the message clearly
and a exit command to keep the return code from init-stage2.sh script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
The exit code is automatically parsed to fail/pass the job, so this
commit removes the `hwci.*pass|fail` regex and printings.
Add mesa-job-name parameter to get the CI_JOB_NAME easily to serve as
test name.
Besides, there is the treatment for the mesa job naeme, as LAVA does not
like whitespace character inside the test case/suite name, since it
interprets it as a LAVA signal parameter and it can make the job fail
when the script looks for the results from the LAVA RPC.
And the slash character seems to break gitlab log sectioning, so
removing every character after the first whitespace.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
By default, LAVA emit signals as specific lines of message to the
console serial for subsequent parsing. However, this is prone to be
interleaved with the dmesg messages, particularly with debug messages
that can happen just after the process exit, such as the ones set by
CONFIG_STACK_DEBUG_USAGE Kconfig.
Setting the `lava-signal` to `kmsg` will make those special messages to
be dumped to /dev/kmsg, where the each line is printed atomically
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
When jobs.validate returns something, it means that there is a dict
filled with the error message, so we were running the job with some
validation errors for a quite while
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Start to differentiate between the different types of LAVA log message;
the only visible change right now is that we make warnings and errors
bold and red to match bare-metal, but it comes in useful later as we
will use the results markers to watch us step through the different
stages of job execution.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Less free-form passing stuff around, and also makes it easier to
implement log-based following in future.
The new class has:
- job log polling: This allows us to get rid of some more function-local
state; the job now contains where we are, and the timeout etc is
localised within the thing polling it.
- has-started detection into job class
- heartbeat logic to update the job instance state with the start time
when the submitter begins to track the logs from the LAVA device
Besides:
- Split LAVA jobs and Mesa CI policy
- Update unit tests with LAVAJob class
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
We rate-limit LAVA API calls as they are standard polling calls rather
than blocking for changes. However when we sleep after making the calls
rather than before, we can block when we want to exit - e.g. after
getting the final logs, we will still sleep even though we can drop out.
Fix this by moving the calls to before the API calls, rather than after.
This means that the first calls (when we're waiting to be scheduled, or
haven't got our first log lines yet), will be delayed compared to
previously, but that's not going to slow us down as even in the best
case we won't be executing in a device within the first 15 seconds.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
As an additional measure to mitigate thermal throttling, set the upper
limit for the CPU scaling frequency to 65% of maximum allowed by the
hardware.
The impact on the overall tests duration should be minimal since the
performance tests do not really put high load on the CPU.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16164>
Update intel-gpu-freq.sh script to offer the possibility to adjust CPU
operating frequencies in addition to GPU.
Note this is currently limited to just setting the maximum scaling
frequency as percentage of the maximum frequency allowed by the
hardware.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16164>