torture: Make kvm-remote.sh account for network failure in pathname checks

In a long-duration kvm-remote.sh run, almost all of the remote accesses will
be simple file-existence checks.  These are thus the most likely to be caught
out by network failures, which do happen from time to time.

This commit therefore takes a first step towards tolerating temporary
network outages by making the file-existence checks repeat in the face of
such an outage.  They also print a message every minute during a outage,
allowing the user to take appropriate action.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit is contained in:
Paul E. McKenney 2021-04-27 09:56:42 -07:00
parent 063f5a4df9
commit c43d3b0083

View File

@ -159,6 +159,28 @@ do
fi
done
# Function to check for presence of a file on the specified system.
# Complain if the system cannot be reached, and retry after a wait.
# Currently just waits forever if a machine disappears.
#
# Usage: checkremotefile system pathname
checkremotefile () {
local ret
local sleeptime=60
while :
do
ssh $1 "test -f \"$2\""
ret=$?
if test "$ret" -ne 255
then
return $ret
fi
echo " ---" ssh failure to $1 checking for file $2, retry after $sleeptime seconds. `date`
sleep $sleeptime
done
}
# Function to start batches on idle remote $systems
#
# Usage: startbatches curbatch nbatches
@ -178,7 +200,7 @@ startbatches () {
echo $((nbatches + 1))
return 0
fi
if ssh "$i" "test -f \"$resdir/$ds/remote.run\"" 1>&2
if checkremotefile "$i" "$resdir/$ds/remote.run" 1>&2
then
continue # System still running last test, skip.
fi
@ -216,7 +238,7 @@ echo All batches started. `date`
# Wait for all remaining scenarios to complete and collect results.
for i in $systems
do
while ssh "$i" "test -f \"$resdir/$ds/remote.run\""
while checkremotefile "$i" "$resdir/$ds/remote.run"
do
sleep 30
done