On Setup/Teardown for Integration Testing in Rust

07 Mar, 2024

I've been working on primarily Rust services for a bit over a year now. It's been a pretty good experience, all-around. The ecosystem is getting much more mature, and there's a lot of good crates out there covering a large problem space.

All of this is why I was pretty surprised to run into a lack of solutions around a use-case I'd consider to be pretty common: Setup and Teardown in Integration Testing.

Some Background

Rust's libtest is great. It's really easy to write unit testing next to your code to ensure correctness.

fn add(x: u32, y: u32) -> u32 {
  x + y
}

#[test]
fn test_add() {
  assert_eq!(4, add(1,3));
  assert_eq!(4, add(2,2));
}

libtest will find all these tests and run them when you run cargo test. (And you have the usual niceties like running tests subsets by file or test name.)

At the next layer of testing, you can write tests in a separate directory. By default, Cargo will look in the tests directory, but you can point it at other directories with some edits to Cargo.toml.

At my current gig, we've been leveraging this to write integration tests for our services. It works pretty well most of the time - you can write common test code in submodules and write tests with the same #[test] attribute you use for unit tests. You've still got that integration with cargo test and libtest handles discovering all the tests.

Problem the First: Teardown

Where this has been a bit of a pain for us, however, is in writing end-to-end integration tests.

In my experience, it's pretty common to want to write a suite of tests that share some common resource in your service. As a simple example, let's assume we're working on a service that exposes a number of APIs that all interact with a resource called "Blob". We're writing end-to-end tests, so we're going to have to ask our service to create a Blob before we can test any of the APIs that interact with it. We'd also like to clean that up after we've run our tests.

If I were doing this in Java using JUnit, I might write something like：

class BlobTest {
    @Test
    void testApi1() {
        var client = new BlobClient(...);
        var blob = client.createBlob(...);
        var result = client.api1(blob, ...)
        // Assert some things about the result
        client.deleteBlob(...);
    }
    
    @Test
    void testApi2() {
        var client = new BlobClient(...);
        var blob = client.createBlob(...);
        var result = client.api2(blob, ...)
        // Assert some things about the result
        client.deleteBlob(...);
    }
    
    // And so on
}

We can write something similar in Rust:

#[test]
fn test_api_1() {
    let mut client = BlobClient::new(..);
    let blob = client.create_blob(...);
    let result = client.api_1(blob, ...);
    // Assert some things
    client.delete_blob(...);
}

#[test]
fn test_api_2() {
    let mut client = BlobClient::new(..);
    let blob = client.create_blob(...);
    let result = client.api_2(blob, ...);
    // Assert some things
    client.delete_blob(...);
}

// And so on

These are a start, but you've surely spotted the repetitiveness of this all. Let's try to address that.

In Java:

class BlobTest {
    private Blob blob;
    private BlobClient client;
    
    @BeforeEach
    void setup() {
        client = new BlobClient(...);
        blob = client.createBlob(...);
    }
    
    @AfterEach
    void teardown() {
        client.deleteBlob(...);
        blob = null;
    }
    
    @Test
    void testApi1() {
        var result = client.api1(blob, ...)
        // Assert some things about the result
    }
    
    @Test
    void testApi2() {
        var result = client.api2(blob, ...)
        // Assert some things about the result
    }
    
    // and so on
}

JUnit annotations let us easily DRY up our test code without changing the meaning. This has the nice property that any tests added to this class will also have setup and teardown handled without requiring any such work in the test functions.

We're going to have to work a little harder in Rust.

We could do:

fn setup() -> (BlobClient, Blob) {
    let client = BlobClient::new(..);
    let blob = client.create_blob(...);
    (client, blob)
}

fn teardown(client: &mut BlobClient) {
    client.delete_blob(...);
}

#[test]
fn test_api_1() {
    let (mut client, blob) = setup();
    let result = client.api_1(blob, ...);
    // Assert some things
    teardown(&mut client);
}

#[test]
fn test_api_2() {
    let (mut client, blob) = setup();
    let result = client.api_1(blob, ...);
    // Assert some things
    teardown(&mut client);
}

// and so on

That's... not quite as satisfying. If you write a new test, you have to deal with calling setup and teardown.

And if any of the assertions fail, teardown won't even get called!

We can fix that latter part by moving all the assertions post-teardown, but that's not much prettier.

If you want to write a "test runner" function to try to dry this up, you're going to have to start using catch_unwind to catch asserts, which... is still not very elegant.

You could try to do something with RAII via implementing Drop, but do you really want your Drop impl making network calls?

Thus, we have our first problem: There's not really any good way to deal with test teardown in Rust.

Problem the Second: One-time Setup and Teardown

With the already-limited support for setup and teardown, doing "global" or "one-time" test setup and teardown cleanly in Rust is even harder. Writing end-to-end integration tests is a practice that I've found particularly likely to make you desire such capabilities.

Example 1

Let's expand on our Blob Service testing. You may be aware that JUnit has @BeforeAll and @AfterAll annotations that would let us write something like the following:

class BlobTest {
    private Blob blob;
    private BlobClient client;

    @BeforeAll
    void setup() {
        client = new BlobClient(...);
        blob = client.createBlob(...);
    }

    @AfterAll
    void teardown() {
        client.deleteBlob(...);
    }

    @Test
    void testApi1() {
        var result = client.api1(blob, ...)
        // Assert some things about the result
    }

    @Test
    void testApi2() {
        var result = client.api2(blob, ...)
        // Assert some things about the result
    }

    // and so on
}

This introduces a concept known as "Global Mutable State". It is important to note that this code is not semantically equivalent to the prior tests! Instead of creating a new Blob for each test, we are creating a Blob once, using it for every test, and then tearing it down once all tests have run.

This is fraught with peril.

State is carried over between tests.
This means that tests can influence each others' behavior.
This means that the behavior of this suite of tests can depend on the order in which the tests are executed.

This all sounds bad. I agree with the sentiment that one should avoid global mutable state!

But sometimes, it makes itself a necessary evil.

Imagine that we have 10 APIs to test, and would like to write on average 2 tests per API. (In my experience, these are low estimates.)

Creating and deleting resources like Blobs is often nontrivial. It is not unrealistic to envision a scenario where creating or deleting a blob is an operation which takes on the order of 10 seconds, while the test functions themselves are faster, perhaps taking one second on average.

Running the test suite with @BeforeAll and @AfterAll will take 10 seconds to set up, 20 seconds to run the tests, and 10 seconds to tear down, for a total of 40 seconds.

Running the same test suite with @BeforeEach and @AfterEach will take 10 seconds to set up, 1 second to run, and 10 seconds to tear down per test, for a total of 20 * 21 = 420 seconds.

If you were instead running 200 tests, you're looking at the difference between 4 minutes and 70 minutes.

You can mitigate this to some degree by running tests in parallel - but parallelism is limited, and if you're creating and deleting resources in a complex system, that might introduce its own headaches. For example, what if your service limits the number of Blobs per user?

Example 2

Another real-world use-case I've encountered is writing end-to-end tests to run against a local set of services, which run as docker containers. Before testing, we need to launch those services, and after testing, we'd like to tear them down.

Of course, this introduces all the same concerns about global mutable state. But launching docker resources is not particularly fast, and it'd be quite the pain to have to set up and tear down the entire stack between each test, both in terms of time and potential for failure.

In this case, I'm again more willing to spend the effort to write tests that work around the issues with global mutable state.

Solutions?

libtest Improvements

It looks like there's a project called libtest-next that's trying to solve some of these problems, but it's not clear where progress is on that, or how soon we can expect to see solutions hit the core language.

I'd love to see this happen, though! Having support in core tooling is the optimal outcome, as far as I'm concerned.

Custom Test Harnesses

Cargo has support for running a custom test harness. Don't get too excited, though, because all that means is "it'll compile a binary, run it, and tell you whether it returned a success exit code." All the details of implementing a test harness are left up to you, and you lose nice things like the test result formatting and automatic test discover from libtest.

Naturally, upon reading about this, I asked "well, are there any crates that do this?".

Unfortunately, there's not really any great options there, either.

libtest-mimic will help you format test results nicely, but you still have to manually declare all your tests as part of a main function in a way that's pretty boilerplate-heavy.

cargo-nextest gets talked about a lot. It's got support for running setup shell scripts before tests, but no teardown support, and I'd really like to be able to write my setup and teardown in Rust.

The most promising thing I've found is rust-rspec. However, it's somewhat stale, and has restrictions on data (particularly around Sync/Send). My team is also fairly allergic to the rspec test specification format, and I must admit that it doesn't feel particularly "rusty". I'm also not sure it'll play nice if you need to be testing async things.

Which leaves us with "write your own test harness". I found a Medium post about using the inventory crate to help with test discovery, though it's still more boilerplate heavy-than I'd like. I've played with that idea a bit, and it's probably workable, but gets much more complicated if you want to be able to write async test functions.

In the event I find myself continuing down this road, I'll probably be looking at combining libtest-mimic with a discovery mechanism like inventory, maybe via some custom macro rules.

If you know of a good existing option here, I'd love to hear about it.

Don't Write Your Integration Tests in Rust?

If the Rust ecosystem doesn't have good support for this sort of thing, why not use a mature test library in a different language? You're writing end-to-end tests, after all. Or, you could write a wrapper script that does setup and teardown outside of your Rust tests.

These may well be the best options available to you, if you're looking for something you can work with now.

#rust #software #testing