Dev Ups

Published 2023-08-07 in python

unittest, Python's batteries included unit test framework

I'll refer to my repository, quotations with unittest, which I created to demonstrate this article.

I have a folder just for tests. If I have enough source files to test they get their own folder. Here they have occupied the app folder:

└── project_name
    ├── app
    │   ├── any.py
    │   └── big.py
    └── tests
        ├── test_any.py
        └── test_big.py

Test runners

The test runner is the module invoked by python to run the tests.

There are numerous possible test runners. Here we're considering unittest, but pytest is at least as pervasive so worth mentioning. Pytest offers fixtures, meaningful messages when ordinary asserts fail, and the parametrize decorator. Pytest is an additional dependency.

I've almost never used pytest without also needing to import modules from unittest. Thankfully, the 2 programs are largely interchangeable tokens in the command line. pytest is even able to run unittest structured tests. The reverse is not true, unittest requires its tests to have a specific, class-based, structure.

Test structure

The suggested test folder structure can be divided further:

└── tests
    ├── unit
    │   ├── test_any.py
    │   └── test_big.py
    └── integration
        ├── test_any.py
        └── test_big.py

We may be tempted to subdivide further. At this stage I would question if a single module may be doing too much. A 1:1 ratio between test and source files is ideal. If I want to test different scenario's, or use cases, that may be a job for higher level tests, such as integration or behavioural. unittest provides scope to test different scenarios with the same file.

unittest test structure

Check out quotations with unittest, on GitHub, and my other public repositories' CI pipelines.

In general a unittest test is structured:

import unittest
from foo import bar

class ModuleFooTestCase(unittest.TestCase):
    def test_bar_100(self):
        assert bar("100") == 100

We can have functions that run once for multiple tests in a class, like this:

import unittest
from foo import bar
from pathlib import Path
import os, shutil

class ModuleFooTestCase(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        Path(os.path.join("tmp")).mkdir(parents=True, exist_ok=True)

    @classmethod
    def tearDownClass(cls):
        shutil.rmtree(os.path.join("tmp"))

    def test_bar_100(self):
        assert bar("100") == 100

    def test_bar_101(self):
        assert bar("101") == 101

I mentioned scenario based testing in the last section. While the goal of unit testing is to achieve a broad but shallow coverage, sometimes we want to exercise specific scenarios.

class Base10FooTestCase(unittest.TestCase):
    def test_bar_100(self):
        assert bar("100", 10) == 100


class Base2FooTestCase(unittest.TestCase):
    def test_bar_100(self):
        assert bar("100", 2) == 4

We'd need to have complicated functionality to justify this approach. This example is contrived, but it's nice to have the scope of classes to facilitate this abstraction. A more realistic application could involve one TestCase to run sample data through a higher level function, and another to run mock data. This straddles a grey area between unit and component testing. There's nothing wrong with using unittest for component, or even integration testing.

Purpose of tests

Unit tests should prioritise granularity and speed. It may be expedient to dump swathes of real life, sample, data to make tests fast, without mocking all dependencies. This produces fast tests, quickly, but it isn't granular. It is excusable for beginner test engineers, or a developer's, own, personal consumption. Different classes for each scenario can aide a transition to better, more granular, tests.

Perhaps the simplest tangible objective from unit tests is coverage, which we'll get too shortly.

If a test is hard to write, you have a chicken and egg situation. It will be hard to justify and explain code functionality if you cannot assign it a simple test. It's likely the unit under test is doing too much. This is undesirable in promoting loose coupling and tight cohesion especially. It violates the Single Responsibility Principle. Simpler units are conducive to simpler unit tests.

Single Responsibility is the first, and arguably most important, of the SOLID principles, coined by Robert Martin in his 2003 paper, Design Principles and Design Patterns. Like all principles, it takes experience to appreciate both sides of the argument. Knowing the principle helps in debating the merits of your code.

Command lines

The most basic invocation is:

python -m unittest

This implicitly appends "discover", so that all tests files matching a certain pattern get run. The default pattern is test_*, and configurable via the -p, or --pattern parameter.

To logically separate tests, between unit and integration, for example, create subdirectories under tests. An "extra" step, but simple, and with huge returns.

Coverage

Coverage helps to find untested lines:

# Run tests and collect coverage data:
python -m coverage run -m unittest
# then view the coverage with either:
python -m coverage html
# or
python -m coverage report

html and report represent the output formats of html and text respectively. The html is more immediately understandable than the default, "report". Missed lines are presented, in web pages and in context, highlighted red. If you don't have the code at your elbows, then html can be preferable.

The default report, among other benefits, lends itself more to further processing, as part of a ci/cd pipeline. codecov.io won't accept this output. I ask myself what more they could do for it. Simply add coverage xml or coverage json to generate compatible output.

Branch coverage

So far, we've sought only to extend coverage to as many lines of code as possible. There's another measurement that is sometimes employed, branch coverage.

Typically, we'd mock the return from a function to exercise lines of code in a branch of a function. What if a function runs code conditionally but otherwise does nothing? For complete coverage, we might not care to test that lines don't get run. Unit testing can be seen as unproductive use of developer time. A thorough test would mock everything inside a function and assert that the calls made were those expected. Repetitive, tedious stuff.

A better way of ensuring empty branches are hit, without testing the lines in their alternatives weren't exercised, is branch coverage.

For example:

def foobar():    
    if get_bytes_used("cached_calcs") < 1_000_000:
        cache_calculations()

I don't know anyone who uses branch coverage, but it is mentioned in literature, and it can be hard to think of a use for it. Which is why I added this sub-section.

Conclusion

Learn unittest first. Then pytest. I see nose and tox used increasingly, but once you know two, the transition should be easy.

Use coverage, get a badge. If you don't have any users, make a robot (automated tests) and the users will come, or at least not be disgusted by your dog food.

Unit testing is fundamental to a test strategy. It may not be easy at first but develops skills transferable to the other tests in the hierarchy. Testing can seem unglamorous and unproductive. Get over this and learn to write tests well. You will spend less time re-writing them and developers will spend less time correcting bugs that slip through the code integration process.