Let’s talk about how Test Driven Development fails in Data Management scenarios, and how to fix it.
Hi! I’m Jason Williscroft, Chief Engineer at the Enterprise Data Foundation. Let’s talk about how Test Driven Development fails in Data Management scenarios, and how to fix it.
So let’s say you have some process you’d like to test.
In order to test your process, you’ll need a set of inputs, and after you invoke your process on this input data, it’s going to produce some kind of output.
You’re going to make some assertions about what you expect to see on the output, based on what you know about the process and the data you passed in, and if the output data validates your assertions, your test passes. That’s a unit test.
If you start with simple versions of all this and iterate into something more complex, then bang! Welcome to the 21st Century: you’re doing Test Driven Development. And you’re in luck… TDD as a concept has been in place for almost two decades, and whatever platform you’re developing for, there’s probably a unit testing framework to support TDD on it.
At least, there is until you run into Data Management, where your “platform” is really just part of your platform. See, all of these sophisticated unit testing frameworks rely on one critical assumption, which is that every part of the test lives in a single execution context, usually local memory on an application server or a dev workstation.
But what if your execution context is more distributed than that?
Say the process you want to test lives on some application server, and the data you want to inject at your test input needs to arrive in the form of a text file on some network file share, or as a set of correlated rows in a collection of database tables? What if, instead of writing its result to local memory, your process writes to a different data model in a different database? Or generates another text file someplace else?
What if it gets really weird?
On top of that, your “dev environment” is now way more than just a stack of applications on a single workstation. Now it involves other machines, database servers, web applications, message queues… So what about when it’s time to push to another environment?
If your environments are really complex, can they ever really be the same? Yet shouldn’t the tests that ran in DEV also run successfully in TEST? Not only to validate your code, but also that the environment is operating as expected?
And what about higher environments, up to and including PROD? And if I’m going to go to the trouble of writing tests for TDD, then shouldn’t higher environments be able to leverage the same tests in a non-interactive mode to support automated regression testing and continuous delivery?
So that’s a pretty straightforward Data Management scenario. But as testing challenges go, this one is pathologically complicated. Not one of the unit testing frameworks we saw a minute ago can handle it, not right out of the box nor even anywhere NEAR the box.
So Test Driven Development fails spectacularly in routine Data Management scenarios… unless you have deltaTest.
So… what exactly IS deltaTest?
Let’s go back to that testable process, and see what it looks like to exercise it with deltaTest.
So remember that any testing process is going to begin with the injection of input data. This is also the first part of the deltaTest process. deltaTest will drop a file, populate database tables, or take other steps in order to set the precursor conditions for the test.
Next deltaTest will invoke the process to be tested. This is very flexible… basically, if a process can be invoked from the Windows command line, then deltaTest can invoke it.
As the process runs, it will generate output data in whatever form it takes: text files, database rows, emails, smoke signals. It is what it is… rather, THEY are what THEY are, because just as was true of the input, the output might consist of correlated records across multiple objects and formats.
Now for reasons we’ll get to in a minute, deltaTest doesn’t even TRY to parse all those different output formats and artifacts. Instead, it converts them all to text, while–and this is key–leaving OUT any information that might change run-over-run or across environments.
Then all that text gets aggregated into a single output file. ONE text file: simple, practical, and perfectly human-readable. Which it HAS to be, because the first thing you will do after deltaTest produces it is to READ it, in order to validate that the output your process produced was the expected one.
If it isn’t, then you’ve got a defect: go fix your code! But what if this text file, the Result file, looks EXACTLY as it should?
Let’s make a copy of the Result file and call it the Certified File. If we run our test again and get a result just like this one, our test should pass. Which, when you think about it, sounds a lot like the Assertions we made during unit testing, doesn’t it?
So the next time we run our test, deltaTest will compare the new Result file against the version we just Certified. If they are identical, then our test passes. If they are different in any way—any way at all—then our test has failed!
If it is running as part of some automated regression test, deltaTest will simply note the failure and move on. But if we are running the test interactively, as we would in many TDD scenarios, deltaTest will respond to the failure by launching a text comparison tool that will help us troubleshoot the failure.
One more scenario: what if there’s a diff, but it’s an EXPECTED diff, either because of some change to the test input or a deliberate change to the code? In that case we will FORCE the test into a passing state by certifying the new result file, creating an incremental improvement in the test that reflects an incremental improvement in the code.
So that’s deltaTest : a universal testing platform that is simple, powerful, open-source, cross-platform, built in Powershell, and is perfect for complex projects where nothing else will work.
Visit us on GitHub, pull the repo, and get testing!