Python Test Runners

For the past year and half I’ve been working on a parallel test runner for python tests, stestr. I started this project to fit a need in the OpenStack community and it since has been adopted as the test runner used in the OpenStack project. But, it is generally a useful tool and something I think would provide general value for people writing python tests. I thought it would be valuable to explain two things in this post, first the testing stack underlying stestr and the history behind the project. The second aspect is how it compares to other popular python test runners are out there, and the purpose it fills.

Python unittest

Included in the python standard library is the unittest library. This provides the basic framework for writing, discovering, and running tests in python. It uses an object oriented model where tests are organized in classes (called test cases) with individual methods that represent a single test. Each test class has some standard fixtures like setUp and tearDown which define functions to return at certain phases of the text execution. These classes and their modules are combined to build test suites. These suites can either be manually constructed or use test discovery to build the suite automatically by scanning a directory.

The thing which is often misunderstood, especially given it’s name,  is that Python unittest is not strictly limited to unit testing or testing python code. The framework provides a mechanism for running structured python code and treat the execution of this code as test results. This enables you to write tests that do anything in python. I’ve personally seen examples that test a wide range of things outside of python code. Including hardware testing, CPU firmware testing, and Rest API service testing.


As unittest is a python library included in the cpython standard library it gets improvements and new features with each release of cPython. This makes writing tests that support multiple versions of python a bit more difficult, especially if you want to leverage features in newer version of python. This is where the unittest2 library comes in, it provides backports of features from newer versions of python. This enables older versions of python to leverage features from newer unittest. It was originally written to leverage features included in python 2.7 unittest with python 2.6 and older. But, it also backports features from newer versions of python 3 to older versions of python.


Building on the unittest framework is the testtools library. Testtools provides an extension on top of unittest to provide additional features like additional assert methods and a generic matcher framework to do more involved object comparisons.  While it is an extension on top of unittest testtools maintains compatibility with the upstream python standard lib unittest. So you can write tests that leverage this extra functionality and use them with any other unittest suite or runner.

One of the key things that testtools provides for the layers above it in this stack is it’s improved results stream. This includes the concept of details, which are like attachments that enable storing things like logging or stdout with test result.


Subunit is streaming protocol for test results. It provides a mechanism for sharing test results from multiple sources in real time. It’s a language agnostic protocol with bindings for multiple languages including: python, perl, c, c++, go, js and others. I also recently learned that someone created a wikipedia page for the protocol:

The python implementation of the subunit library (the subunit library repository  is multilanguage) is built by extending testtools. It builds off of testtools’s test runner and the result stream additions that testtools adds on to base unittest. This means that any unittest suite (or testtools) can simply replace their test runner with subunit’s and get a real time subunit output stream. It’s this library that enables parallel execution and strict unittest compatibility in the tools above it on the stack.

The other thing which is worth pointing out is that because of OpenStack’s usage of testr and stestr we have developed a lot of tooling in the community around consuming subunit. Including things like stackviz, subunit2sql, and subunit2html. All of which can be reused easily by anything that uses subunit. There are also tools to convert between different test result formats, like junitxml,  and subunit.


Also known as testr, which is the command name, is a bit of a misunderstood project depending on who you talk too. Testrepository is technically a repository for storing test results, just as it’s name implies. As part of that it includes a test running mechanism to run any command which will generate a subunit results stream. It supports running those commands in parallel both locally and on remote machines. Having a repository results then enables using that data for future runs. For example, testr lets you rerun a test suite only with tests that failed in the previous run, or  use the previous run to bisect failures to try and find issues with test isolation. This has proven to be a very useful feature in practice, especially when working with large and/or slow test suites.

But for several years it was the default test runner used by the OpenStack project, and a lot of people just see it as the parallel test runner used by OpenStack. Even though the scope of the project is much larger than just python testing.


Since the OpenStack project started using testr in late 2012/early 2013 there were a lot of UX issues and complaints people had with it.  People started working around these in a number of ways, there was the introduction of multiple setuptools entrypoints to expose commands off of to inovke testr. There were also multiple bash scripts floating around to run testr with an alternative UI called started in the tempest project and was quickly copied into most projects using testr. However each copy tended to diverge and embed their own logic or preferences. ostestr was developed to try and unify those bash scripts, and it shows. It was essentially a bash script written in python that would literally subprocess out and call testr.


This is where stestr entered the field. After having maintained ostestr for a while I was getting frustrated with a number of bugs and quirks in testrepository itself. Instead of trying to work around them it would just be better to fix things at the source. However, given the lack of activity in the testrepository project this would have been difficult. I personally had pull requests sitting idle for years on the project. So I decided after a lot of personal deliberation to fork it.

I took the test runner UX lessons I learned from maintaining ostestr and started rewriting large chunks of testr to make stestr. I also tried to restructure the code to be easier to maintain and also leverage newer python features. testrepository was started 8 years ago and it supported python versions < 2.7 (having been started before 2.7’s release) this included a lot of code to implement things that were included standard in newer, more modern versions of the language.

The other aspect to stestr is that it’s scoped to just being a parallel python test runner. While testrepository is designed to be a generic test runner runner that will work with any test runner that emits a subunit result stream, stestr will only deal with python tests. Personally I always felt there was a tension in the project when using it strictly as a python test runner, some of the abstractions testr had to make caused a lot of extra work for people using it only for python testing. Which is why I rescoped the project to only be concerned with python testing.

Other Popular Test Runners

While I am partial to the tools described above and the way the stack is constructed (for the most part). These tools are far from the only way to run python tests. In fact outside of OpenStack this stack isn’t that popular. I haven’t seen it used in too many other places. So it’s worth looking at other popular test runners out there, and how they compare to stestr.


nosetests at one time was the test runner used by the majority of python projects. (including OpenStack) It provided a lot  of missing functionality from python unittest; especially before python 2.7 which is when python unittest really started getting more mature. However it did this by basically writing it’s own library for testing and coupling that with the runner. While you can use nosetests for running unitttest suites in most cases, the real power with nose comes from using it’s library in conjunction with the runner. This made using other runners or test tooling with a nose test suite very difficult. Having personally worked with a large test suite that was written using nose migrating that to work with any unittest runner is not a small task. (it took several months to get it so tests could run with unittest)

Currently nosetests is a mostly inactive project in maintenance mode. There is a successor project, nose2, which was trying to fix some of the issues with nose and make it up to date. But it too is currently in maintenance mode, and not really super active anymore. (but it’s more active then nose proper) Instead the docs refer people to use pytest.


pytest is in my experience by far the most popular python test runner out there. It provides a great user experience (arguably the most pleasant), is pluggable, and seems to have the most momentum as a python test runner. This is with good reason there are a lot of nice features with pytest, including very impressive failure introspection which basically will just tell you why a test failed, making debugging failures much simpler.

But there are a few things to consider when using pytest, the biggest of which is it’s not strictly unittest compatible. While pytest is capable of running tests written using the unittest library it’s not actually a unittest based runner. So things like test discovery differ in how they work on pytest. (it’s worth noting pytest supports nose test suites too)

The other things that’s missing from pytest by default is parallel test execution. There is a plugin pytest-xdist which enables this, and it has come a long way in the last several years. It doesn’t provide all of the same features for parallel execution as stestr, especially around isolation and debugging failures. But for most people it’s probably enough.

Quite frankly, if I weren’t the maintainer of stestr and if I didn’t need or want things that stestr provides like first class parallel execution support, a local results repository, strict unittest compatibility, or subunit support I’d  probably just use pytest for my projects.

Leave a Reply

Your email address will not be published. Required fields are marked *