gh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (#103927)

This runs test_asyncio sub-tests in parallel using sharding from Cinder. This suite is typically the longest-pole in runs because it is a test package with a lot of further sub-tests otherwise run serially. By breaking out the sub-tests as independent modules we can run a lot more in parallel.

After porting we can see the direct impact on a multicore system.

Without this change:
  Running make test is 5 min 26 seconds
With this change:
  Running make test takes 3 min 39 seconds

That'll vary based on system and parallelism. On a `-j 4` run similar to what CI and buildbot systems often do, it reduced the overall test suite completion latency by 10%.

The drawbacks are that this implementation is hacky and due to the sorting of the tests it obscures when the asyncio tests occur and involves changing CPython test infrastructure but, the wall time saved it is worth it, especially in low-core count CI runs as it pulls a long tail. The win for productivity and reserved CI resource usage is significant.

Future tests that deserve to be refactored into split up suites to benefit from are test_concurrent_futures and the way the _test_multiprocessing suite gets run for all start methods. As exposed by passing the -o flag to python -m test to get a list of the 10 longest running tests.

---------

Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Gregory P. Smith <greg@krypto.org> [Google, LLC]
This commit is contained in:
Joshua Herman 2023-04-29 20:26:24 -05:00 committed by GitHub
parent 00e2c5960e
commit 9e011e7c77
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -143,6 +143,14 @@ STDTESTS = [
# set of tests that we don't want to be executed when using regrtest
NOTTESTS = set()
#If these test directories are encountered recurse into them and treat each
# test_ .py or dir as a separate test module. This can increase parallelism.
# Beware this can't generally be done for any directory with sub-tests as the
# __init__.py may do things which alter what tests are to be run.
SPLITTESTDIRS = {
"test_asyncio",
}
# Storage of uncollectable objects
FOUND_GARBAGE = []
@ -158,7 +166,7 @@ def findtestdir(path=None):
return path or os.path.dirname(os.path.dirname(__file__)) or os.curdir
def findtests(testdir=None, stdtests=STDTESTS, nottests=NOTTESTS):
def findtests(testdir=None, stdtests=STDTESTS, nottests=NOTTESTS, *, split_test_dirs=SPLITTESTDIRS, base_mod=""):
"""Return a list of all applicable test modules."""
testdir = findtestdir(testdir)
names = os.listdir(testdir)
@ -166,8 +174,13 @@ def findtests(testdir=None, stdtests=STDTESTS, nottests=NOTTESTS):
others = set(stdtests) | nottests
for name in names:
mod, ext = os.path.splitext(name)
if mod[:5] == "test_" and ext in (".py", "") and mod not in others:
tests.append(mod)
if mod[:5] == "test_" and mod not in others:
if mod in split_test_dirs:
subdir = os.path.join(testdir, mod)
mod = f"{base_mod or 'test'}.{mod}"
tests.extend(findtests(subdir, [], nottests, split_test_dirs=split_test_dirs, base_mod=mod))
elif ext in (".py", ""):
tests.append(f"{base_mod}.{mod}" if base_mod else mod)
return stdtests + sorted(tests)