Testing across boundaries with internal DSLs

Posted on Jun 6, 2023

One of the most crucial components of software development, and an engineering approach to software, is writing automated tests - to ensure the quality1, and functionality of our systems.

These tests help us with documenting our expected system behaviour, identifiying and fixing bugs, preventing regressions, and helping us iterate towards better and better design. However, writing maintainable and robust tests can sometimes be as challenging as writing the system itself. Let’s explore how a “DSL” (aka a domain specific language, or more specifically, an internal DSL, more on this later) based approach can help us with easier to write and easier to maintain tests.

# An introduction

If you ask ChatGPT how to test with a DSL, you’ll get something along the lines of:

Creating a Domain Specific Language (DSL) involves designing a language syntax and implementing a parser or interpreter that can understand and execute it. Let’s see how we could build a simple DSL for writing automated tests in an e-commerce system using Java.

First, we should design our DSL. For simplicity, let’s say we have three basic operations: createUser, addProductToCart, and checkout.

And then it will spit out a parser2 for a plain-text DSL that looks something like:

createUser "John Doe"
addProductToCart "Product ID" 5
checkout

This is essentially re-implementing something akin to Gherkin syntax - which has its uses. However, what we’re talking about in our example is an internal DSL - a DSL implemented entirely within the constructs of your normal programming language (in this case, Java). Writing a DSL as an internal one gives you the full power of your language - tooling, IDE integration, libraries, static types if your language supports them - and this can give you an unrivaled level of ease when building and using your DSL3.

To give you a concrete example, take a look at these two tests for registering a user over an API, using some Avro builders and serialization wrappers under the hood to talk to the system-under-test. Don’t worry about reading this in detail, it’s just to illustrate a point.

As implemented in Java, as I’ve seen it done many times:

@Test
void shouldBeAbleToRegisterANewUserWithoutDsl()
{
    // Given
    final String alias = UUID.randomUUID().toString();
    final HttpClient httpClient = HttpClient.newHttpClient();
    final String aliasedUsername = "%s-%s".formatted("username", alias);
    final HttpRequest<?> request = new HttpRequest<AcceptanceSupplementaryData>()
        .withMethodName("POST")
        .withPath("/api/user/register")
        .withBody(
            RegisterRequest.newBuilder()
                .setUsername(aliasedUsername)
                .build()
        );
    final ExpectedHttpResponse<?> httpResponseExpectation = new ExpectedHttpResponse<AcceptanceSupplementaryData>()
        .withIgnoredPath("UserDetails.userId")
        .withStatusCode(201)
        .withContentType("application/json")
        .withBody(
            RegisterResponse.newBuilder()
                .setUser(
                    UserDetails.newBuilder()
                        .setUserId("IGNORED")
                        .setUsername(aliasedUsername)
                        .build()
                )
                .build()
        );

    // When
    final java.net.http.HttpResponse<String> actualHttpResponse = sendHttpRequest(httpClient, request);

    // Then
    assertResponseEqualsExpected(actualHttpResponse, httpResponseExpectation);
}

private static void assertResponseEqualsExpected(final java.net.http.HttpResponse<String> actualHttpResponse, final ExpectedHttpResponse<?> httpResponseExpectation)
{
    // ...
}

private static java.net.http.HttpResponse<String> sendHttpRequest(final HttpClient httpClient, final HttpRequest<?> request)
{
    // ...
}

Compared with this same (absolutely identical in fact) test when written using an internal DSL:

@Test
void shouldBeAbleToRegisterANewUserWithDsl()
{
    dsl.when().httpUser().sendsRequest(fixture.registerRequest().withUsername("luke"));
    dsl.then().httpUser().receivesResponse(fixture.expectedRegisterResponse().withUsername("luke"));
}

It’s not hard to explain why it’s easier to understand the expected behaviour in the second test compared to the first. Granted, it’s not a fair comparison - a lot of the code in the first version could be abstracted in to methods and builders and other things on an ad hoc basis, and would be shared across tests - but a DSL is simply creating a strong boundary, structured in the nomenclature of the domain (in this case, sending API requests and receiving responses), with a layered abstraction that is more robust to change.

There are some benefits to the first approach: one obvious one is that it’s much clearer from a single test file what is going on, and - up to a certain scale - it’s much easier to navigate. The key word here is scale: how do you keep maintaining the first approach as you scale to thousands or tens of thousands of tests per bounded context / product area / other unit of scale?

Another interesting property of tests like these: what test boundary4 is this test at? It’s not immediately clear, at least not just from the test case, but this is sort of the point - it allows you to share sequences and test data in a very defined way across test boundaries. The DSL (or more specifically, the driver, more later) handles all the plumbing necessary for this test sequence to be run in-memory in a few milliseconds, or on a live system using real dependencies such as a database, giving you both fast feedback loops (from your in-memory tests, more later) and end-to-end confidence that a feature works.

# So, how do you build a DSL?

This could be a whole blog post of its own, but to summarise, the core principles are:

  • Layering - the test layer, the DSL layer, and the driver layer
  • Encapsulation - encapsulate all interactions with your system-under-test in the driver layer - this allows you to have separate driver implementations (more on this later)

Layering means not exposing things to the test layer that are really an implementation detail. Back to our previous example:

@Test
void shouldBeAbleToRegisterANewUserWithDsl()
{
    dsl.when().httpUser().sendsRequest(fixture.registerRequest().withUsername("luke"));
    dsl.then().httpUser().receivesResponse(fixture.expectedRegisterResponse().withUsername("luke"));
}

Things that are not exposed to the test layer:

  • Setup
  • Serialization
  • Matching and assertions
  • Synchronous and in-memory vs asynchronous over the network
  • Aliasing in test boundaries that share data across tests

Things that are exposed to the test layer:

  • Actors (httpUser)
  • Domain-driven behaviour (sendsRequest, receivesResponse)
  • “Fixtures” for values (which are builders in the nomenclature of the domain, and can encapsulate information of their own)

# The driver layer

The driver layer is where things are really interesting - this is where you interact with your system under test.

Your driver layer can have a few different implementations:

# 1. Sending requests to a locally running system (for classic request-response-style services, for example)

This will include the full path of I/O + serialization, persistence, and other such concerns. This running system will be shared across all tests, and as such your DSL will handle anything necessary to ensure unique aliasing of all data hitting the running system transparently to your test layer. The DSL will also handle the asynchronous nature of this, and will have to poll / wait on all assertions to ensure tests are not flakey. In our above example, this is sending a HTTP request to register a user to the service - you might call these “acceptance tests”. You can even do very large-boundary tests that includes things like killing downstream services and seeing how other services react, or killing nodes of a multi-node message bus or database, or other such resiliency testing.

# 2. Instantiating your core domain model / service in-memory using stubbed dependencies and communicating through in-process memory and method calls

These would be “unit tests”, with the unit being the entire service, and where you can can control things like threads and wall time. These tests will (or should) be very fast, and should allow you to test all edge cases or paths through your business logic, write parameterized / fuzzing tests, do property-based testing, and all sorts of other good stuff.

This is where Test-Driven D[esign|evelopment] becomes so powerful - if you start by writing a few tests and a rudimentary test DSL, then you are much more likely to end up with a simple and clean implementation of an in-memory driver layer. Your I/O, randomness, time, and other such effectful or non-deterministic dependencies will be stubbed through reasonable interfaces at reasonable boundaries that will make your code have good separation of concerns, simply because: why would you purposefully make writing the DSL hard if you’re writing it up front?

# 3. Instantiating your core domain model / service in-memory using a mix of stubbed and real-production-code dependencies

You might call these “integration tests” - they will be faster than “acceptance tests”, but will still test a large part of the core path of the system, while still being able to stub some dependencies in-memory. These are often useful for more infrastructural concerns, such as testing interactions between your core services and your gateway services5.

# So, why do all of this?

Ultimately: your tests and test fixtures will only be coupled to the observable behaviour of the system, but not at all to the implementation. It doesn’t mean that changing the implementation doesn’t mean changing any test code - you still may need to change the driver or DSL implementation - but this is the best possible position to be in.

Being fully decoupled from the implementation at the test layer allows for some pretty powerful things, which brings us back to the title of this post. Here are two test files:

public class UserServiceCoreTest
{
    @RegisterExtension
    final UserServiceCoreDsl dsl = UserServiceCoreDsl.newDsl();
    final UserServiceFixtures fixture = new UserServiceFixtures();

    @Test
    void shouldBeAbleToRegisterANewUser()
    {
        dsl.when().httpUser().sendsRequest(fixture.registerRequest().withUsername("luke"));
        dsl.then().httpUser().receivesResponse(fixture.expectedRegisterResponse().withUsername("luke"));
    }
}

and

public class UserServiceAcceptanceTest
{
    @RegisterExtension
    final AcceptanceDsl dsl = AcceptanceDsl.newDsl();
    final UserServiceFixtures fixture = new UserServiceFixtures();

    @Test
    void shouldBeAbleToRegisterANewUser()
    {
        dsl.when().httpUser().sendsRequest(fixture.registerRequest().withUsername("luke"));
        dsl.then().httpUser().receivesResponse(fixture.expectedRegisterResponse().withUsername("luke"));
    }
}

Notice anything? The tests are the same across boundaries! Testing in this way allows you to share the exact same test sequence across multiple test boundaries, accelerating your feedback loops and ensuring extremely strong confidence in your system.

However, this can take careful management:

  • IDs and timestamps and other forms of non-determinism cannot be predicted at the acceptance boundary, and so you must be able to not match on certain values at certain boundaries
  • Not all tests can be shared - your unit tests will often cover a lot more edges cases and race conditions than your acceptance boundary can. It’s important not to over-test at these slower boundaries.

Other benefits of testing this way include:

  • Your tests look the same everywhere in your system, even though different units and boundaries may behave or be interacted with very differently, which makes it easier to move around different parts of your system
  • You can build all of your DSLs from some set of common code, which allows you to add tricks or invariants that apply across large swathes of your system very easily.

When building software you always have to make your own decisions, and decide what patterns and techniques will work for you, your team, and your project. However - I will say that from experience, the above technique is one of the biggest single improvements to the long-term evolution and maintenance of a system I’ve ever seen. You can also adopt it incrementally - writing a DSL test harness around a unit does not make all your other tests irrelevant. Try it out, and if you have any interesting thoughts, feel free to email me about them (blogthoughts at this domain).


  1. Mesasured in the clarity, robustness, and (most importantly) ease of change, of the system ↩︎

  2. https://gist.github.com/lukeramsden/ae5f35dbbad29b560ff11b82708151fd ↩︎

  3. This should have a blog post of its own to flesh out this point, but for now, work with me on the assumption that internal DSLs are much better. ↩︎

  4. Test boundary: how much of your system is exercised by your test. For example, you might have a test boundary where everything is in-memory and all I/O-related dependencies are stubbed out, and another test boundary where you spin up your live system on a single host (or even multiple hosts), and exercise the full end-to-end of the system including I/O, persistence, serialization, etc. You often want to test at both boundaries because the in-memory ones give you quicker feedback on your business logic (and can also cover more edge cases / race conditions / fuzzing tests / whatever else that adds a lot of test cases), and the live system ones show the feature actually works for end users. ↩︎

  5. “Gateway Design”, if you’re interested. This is very financial industry oriented, but the same idea applies to something like an API gateway or request orchestrator which exists in my service-oriented architectures. ↩︎