Proving E2E tests are a Scam

The Why and an Ode to Testing Past

As I’m what might be considered a veteran of the software industry, I remember reading J. B. Rainsberger’s somewhat controversial piece at the time piece regarding integrated tests are a scam. There is an early section where he discusses the code branches and number of integration tests you’d need to satisfy basic correctness. As history seems doomed to repeat itself I find myself facing similar arguments today as an advocate of Contract Based vs End to End (E2E) testing for distributed systems. It’s an argument I often blithely respond to by throwing out some comment about exponential vs linear behaviour knowing full well that if ever asked to prove this it would be as if the curtain was pulled back. So, as we have spent a year indoors, I found myself scratching this ‘itch’ to see if the above does in fact hold true.

Defining the problem

I’ll start with a couple of assumptions to narrow the problem. Firstly I’m going to make the assumption that you want to have complete test coverage of each interaction in your system. That is, each contract has some automated test that will fail if a breaking change is introduced to either a consumer or provider. This proof is based on the consumption of compute resources which you might think of as servers. So the question I am essentially trying to answer is: when I make a change how many servers do I need to spin up to know if I have a broken contract in my system? And secondly, as we live in an era of powerful cloud compute services, I am going to mostly remove time from the equation. I do this by making the assertion that you can spin up as many servers as needed on a charge by use model. So in theory you can run infinitely in parallel for the same cost as running them sequentially. This is an oversimplification that plays in the favour of E2E tests performance as otherwise they will also pay a time penalty in addition to compute resources. This, I argue, more than compensates some potential spot pricing savings.

Words matter

Here the terms Consumer to mean ‘client’ and Provider as ‘Server’. The reason for the different terminologies is that in todays’ systems clients can be servers and vice versa so this traditional terminology breaks down. I am going to analyze performance in systems with a number of Nodes (maybe you think of these as servers) and Connections (you can think of these as contracts) The higher the number of Nodes(N) and Connections(E) the more complex the system. Ultimately we want a function that minimizes the number of Test Nodes(Y) needed to test these systems or more formally:

A simple example

We have 2 nodes with one being solely a Consumer and one solely a Provider. There is one connection from Consumer to Provider.

One Consumer, Two Providers

Let’s add an extra provider for a slightly more complex example.

Finding our best and worst case scenario

From the previous section we ended up with 2 formulas for how many nodes we would need for running a full Pact and Contract testing suite and the resulting test fixtures that result from this:

Cost and fixtures for Contract and E2E tests
Cost and fixtures for Contract and E2E tests
Scaling costs with an average of 4 connections
Number of Nodes fixed to 4 and scaling Edges(E)
Number of Edges fixed to 4, Nodes scaling

Optimizing our testing

So far our hypothesis has been on testing the whole system, however, this is generally not how production systems have changes deployed. We can have a test strategy that focuses on testing just the deltas that changed and their dependencies. How does this affect the relevant performance of both approaches?

Batch testing in E2E

A common optimization for E2E tests is to group our tests together. We could group our fixtures and have one fixture actually test more than one relationship. Let’s say into α batches. I am going to make a couple of simplifying assumptions for now. They are that these batches are perfectly balanced (so tests are evenly distributed between batches) and every edge has all it’s tests in one batch. On a change we only run one of those batches. We then end up with these formulas:

A more complex interconnected example

Using measurements other than Compute

Now it may have occured to you that there are some advanced techniques to reduce the costs of E2E tests. We looked at batching but I’m sure with a bit of thought applied to your system you could come up with some other creative ways to game the numbers. While I’m convinced you’ll not achieve deterministic and linear performance let’s briefly apply this thinking to some other costs in running E2E tests and see if the same logic still holds true.

Does stuff with computers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store