python-fire auto-converts `item1,item2` into a tuple, but if there is a
dash `-` inside the argument, it treats it as a string.
Let's validate the data, when it comes as a `str` or `tuple`.
For more details, here are the tested scenarios:
| --lava-tags= | Type | Value |
|--------------|-------|---------------------|
| None | bool | True |
| '' | str | '' |
| tag1 | str | "tag1" |
| tag1, | tuple | ("tag1",) |
| tag-1,tag-2 | str | 'tag-1,tag-2' |
| tag1,tag2 | tuple | ("tag1", "tag2") |
| ',' | str | ',' |
| ',,' | str | ',,' |
| 'tag1,,' | str | 'tag1,,' |
See also:
https://google.github.io/python-fire/guide/#argument-parsing
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
We don't need the /done anymore, because we have better job
dependencies. But the mainline-or-fork query is still helpful, so
refactor that out into a common helper we can reuse for other things.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
Instead of providing a hardcoded set of arguments, allow overlays to be
added to the submitter script. Passing Python dicts as a string
representation and relying on coercion from strings is far from great,
but fire doesn't give anything else, so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
We compose the rootfs from a mixture of the base rootfs (exported from
the container build stage, currently lava_build.sh, which can be reused
as long as the container isn't rebuilt), the Mesa build overlay
(exported from the debian-* build job, which can be reused for every job
in that pipeline), and the per-job rootfs (containing job-specific
variables which cannot be reused).
Instead of having LAVA pull the base rootfs and separately downloading
the build/per-job parts on the DUT, get LAVA to compose the whole thing
by using overlays.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
FORCE_KERNEL_TAG allows testing kernel uprevs without rebuilding
containers by supplying an external kernel directly for booting on
hardware devices. Renaming it to EXTERNAL_KERNEL_TAG clarifies its
purpose, distinguishing it from KERNEL_TAG which rebuilds containers.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31795>
If you're emitting a section header under set -x, you will see:
+ section_start foo "foo"
+ x_off
[the section header]
This is kind of annoying. Instead of trying to squash it everywhere by
dancing around local set +x management, play some extremely stupid
tricks with shells to make sure we never emit it.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
All the foreground colours pass 1 to ANSI SGR, which sets bold. The
other arguments are either a colour from 30-37 (passed directly), or
38;5;nnn, where nnn is an extended RGB colour. It looks like most of the
definitions were cargo-culted from FG_RED, which correctly sets an
extended colour, because the arguments there were being parsed as
setting blinking, followed by 197 which was ignored as unknown.
Fix them to just set the original definition.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
Set exit_code to 1 in case of an exception; otherwise,
the job exits with 0, and GitLab shows the job as successful.
Fixes: b9cee06f9e ("ci/lava: handle non-zero exit codes")
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31556>
The LAVA job submitter always exits with code 1, regardless
of the HWCI_TEST_SCRIPT's exit code. This commit fixes the
LAVA job submitter to exit with the actual code returned by
the test.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31189>
Instead of unpacking the x86_64_build container and its billion build
dependencies every time, switch to using only what's in the minimal
pyutils container, and the Python scripts we get as an artifact from the
python-test job. Pulling the artifacts from S3 rather than using GitLab
is also much more efficient.
This should substantially reduce the runtime required to get to testing.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31151>
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.
A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.
Also add unit tests for this behaviour.
cc: mesa-stable
Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
Fastboot devices need an indirection for creating a boot image via
`mkbootimg`, so we need to propagate the cmdline from LAVA and our extra
arguments to it properly.
This commit fixes it by retrieving the default cmdline from LAVA and
sending it, together with the `extra_nfsroot_args` to the `mkbootimg`
command.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29611>
Farm variable is added for devices in collabora farm.
So add support in LAVA job submitter to use this in
structured log files.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29583>
Due to the expiration time in `mesa-lava` (1m), the kernel used in mesa is now
using `mesa-rootfs` (1y). Due to this change, a fresh kernel image has been
prepared and mesa has also a few changes to adapt to this redirection.
cc: mesa-stable
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Co-developed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28979>
Improves the error logging in the LAVA job submitter by capturing and
logging the exception message rather than just the exception type when a
job fails to run.
Additionally, introduces a clearer script interruption
message to aid in debugging and immediate understanding of job
submission failures.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28778>
This commit refactors the exception hierarchy to differentiate between
retriable and fatal errors in the CI pipeline, specifically within the
LAVA job submission process.
A new base class, `MesaCIRetriableException`, is introduced for
exceptions that should trigger a retry of the CI job, while
`MesaCIFatalException` is added for non-recoverable errors that halt the
process immediately.
Additionally, the logic for deciding whether a job should be retried or
not is updated to check for instances of `MesaCIRetriableException`,
improving the robustness and reliability of the CI job execution
strategy.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28778>
This workaround is termporary and will be dropped once the
farm is moved to its permament location.
Signed-off-by: Martin Krastev <martin.krastev@broadcom.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28261>
The r8152 error detection is now considering any order of the known
patterns to detect variations of the r8152 issues during the test phase.
This includes a small refactoring for eventual new issues.
Additionally, adjusted the timing for setting the `start_time` in
`test_lava_job_submitter.py` to ensure consistency and reliability in
test execution, aligning the start time closer to the job submission
process.
With this fix, the bad state shown in the following job will be
detected:
https://gitlab.freedesktop.org/drm/msm/-/jobs/55033953
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27688>
In our process of monitoring LAVA logs, we typically skip numerous
messages to enhance log clarity. We already exclude `feedback` messages
from display. These messages were just used as a heartbeat signal,
indicating that if we are receiving them, the Device Under Test (DUT) is
active.
Practically, if we continuously receive feedback messages without any
other message level (either `debug` or `target`) for several minutes,
this could be a cause for concern, as it likely indicates the device is
in a kind of livelock state.
Therefore, it is more prudent to ignore feedback messages, as they tend
to occur frequently in unstable scenarios. However, it's important to
note that any other message level will still be considered as a
heartbeat signal.
Real case where several minutes of feedback messages indicate a hang:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/53546660
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26996>
This week we found that the r8152 issue can happen during the boot
phase, make the necessary adjustments to detect it.
https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/53651940
Notes:
- The kernel messages during the boot phase is being redirected to the
feedback messages due to the namespaces from the SSH job.
- Update the unit tests:
- Add boot phase detection
- Correctly set the boot phase when mocking LogFollower
Reported-by: Vignesh Raman <vignesh.raman@collabora.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>
We were just detecting if a log like
[ 143.080663] r8152 2-1.3:1.0 eth0: Tx status -71
happened once before
[ 316.389695] nfs: server 192.168.201.1 not responding, still trying
But we can use a counter to be more assured that the device is
struggling to recover and we can add let this detection happen during
the boot phase.
This mimics how other freedreno devices deal with this problem, see
`cros_servo_run.py:64` for example.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>
We need update kernel often. We need test kernel changes often.
Introduced `KERNEL_EXTERNAL_TAG` to differ between `KERNEL_TAG` which is
also used to rebuild the containers. We don't need rebuild containers
for the external kernel, so this way we don't have to.
Updating kernel goes wruuuuuum.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23563>