mesa/.gitlab-ci/tests/utils
Deborah Brouwer 72c182f873 ci/lava: Detect a6xx gpu recovery failures
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.

A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.

Also add unit tests for this behaviour.

cc: mesa-stable

Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
2024-07-19 23:41:13 +00:00
..
__init__.py
test_lava_farm.py ci/lava: Add LavaFarm class to find LAVA farm from runner tag 2023-02-16 13:08:41 +00:00
test_lava_job_definition.py ci/lava: Add unit tests covering job definition 2023-11-02 03:31:50 +00:00
test_lava_log.py ci/lava: Detect a6xx gpu recovery failures 2024-07-19 23:41:13 +00:00