By Tobias Schlitt, first published at Tue, 15 Nov 2016 09:38:48 +0100
Download our free e-book "Crafting Quality Software" with a selection of the finest blog posts as PDF or EPub.
You can also buy a printed version of the book on Amazon or on epubli.
We already discussed when and how to reset a database between tests. But on top of that you often need some basic or more complex data to run the tests against. We discuss different variants and their respective trade-offs in this post…
The simplest approach - we still see this in the wild - for schema and test data management is to dump and replay live data for testing purposes. The benefit here is that you only have to adapt data in one place when the schema changes and it is really easy to set up. But there are a couple of drawbacks to this approach:
Large live data sets
The live data usually is "large" – at least a couple of hundred MB (while database are usually not considered large before reaching TB). It takes a lot of time to reset a database even with such a dump. You do not want to spend such an amount of time before each test, test suite or even before each test run.
Boundary value test cases
There is often a need for special test cases, like strange characters or other type of boundary values in your tests. Those data sets might not exist in the live data so you end up creating this data in your tests on top of the data dump anyways.
Data privacy
Depending on your use case and business your developers should not have access to all live data. There might be sensitive information which should be locked down on your servers. In this case a live SQL dump is no option. Especially in Germany we might be required by law to lock certain data away from certain people.
Changing data on the live system
It is obvious that data on the live system changes over time. This can make it hard to author tests that are stable and reproducible. In addition, tests might become less meaningful if you need to craft them in a way that they can cope with changing live data.
To avoid the problems mentioned above a next step usually is a modified SQL file, which is smaller and does contain sensible test data, like boundary values or stable data without any sensitive information for reproducible tests.
The problem which arises now is that you have to adapt two files when you change the data structure. And the schema and properties of the data will divert over time – no matter how careful you are. In the end this approach is always hard to maintain, so what can be done?
First we suggest you implement some kind of sensible schema management like DBDeploy, or even Doctrine Migrations if this works for your use-case. In this post we want to focus on the data / fixture management, though.
Most applications require a set of base data, like default or admin users, some groups and permissions or something similar. This will be same in all installations (production, staging, development, testing, …). Depending on your schema management solution you will be able to insert this data during schema initialization. With DBDeploy you can just add some INSERT
statements with Doctrine you could use Doctrine data fixtures (or even the Symfony bundle).
The more important thing is the test data. Inserting sensible test data is tightly coupled to your test database resetting strategy which we discussed earlier.
When you reset your database before each test you also want to insert the test data right inside the test. This is very obvious, makes tests easy to read and understand. In theory this is clearly the best solution. But as mentioned in the referenced blog post this will cost a lot of time to execute and likely be to slow for any non-trivial application.
When you reset the database before each test case you can create the data throughout the test case. An simple common CRUD example could be the following tests:
Create a new data set
Load data set
Fail loading a non-existent data set
Load listing (with one item)
Delete data set
Fail deleting non-existent data set
Those tests depend on each other which should be indicated by using @depends
annotations like in this example from our demo project.
This approach is still very clean and from reading the test case you can understand the full context. Another developer will still be able to understand what is going on. This is a lot harder when the data is inserted in another place, since you'll always have to look in multiple places to get the full image. And new developers might not yet know all the places to look at. Tests which provide all the context you need to know in one file are very helpful for everybody.
This strategy will get more complex if you have dependent data – like tests for user permissions, which also require users and groups to exist. You could either use custom helpers or tools like Alice for this.
If you decided to only reset the database once before all tests are run you usually need a more complete data fixture. You will want to fill all tables with some basic data you can work with. Especially in such a case tools like Alice are very useful. By relying on Faker you even get sensible looking data without too much custom work.
If you want help finding the correct testing strategy for you and get a kickstart in testing – book us for an on-site workshop.
The way you initialise your database fixtures depends on your test schema management. We suggest to reset schemas at the begin of each test case and creating the data right in the test case. This proved to be a good compromise between speed, test (case) atomicity and test readability. Tools like Alice and Faker can ease the process of creating data a lot.
Stay up to date with regular new technological insights by subscribing to our newsletter. We will send you articles to improve your developments skills.
Artem Ostretsov on Tue, 15 Nov 2016 14:38:43 +0100
What do you think about using SQLite for tests?
Link to commentYou could have two different configurations for dev and test environments. By using ORM you could have the same code for both databases: sqlite and something else (Oracle, PostgreSQL, etc). It's a good way until you use difficult native queries. And even in this case, you could support two queries for different drivers.
Kore on Tue, 15 Nov 2016 15:55:02 +0100
@Artem: This can be a solution as long as your usage of the database is fairly simple. With complex queries you will, at some point, notice differences between the databases. Until then especially the in-memory database from SQLite is really nice for testing.
Link to commentWe briefly covered it here: https://qafoo.com/blog/090_database_tests_with_phpunit.html#mocking-the-database-away
Theo Fidry on Tue, 22 Nov 2016 20:53:16 +0100
I would recommend HautelookAliceBundle rather than DoctrineDataFixtures, but otherwise a big +1 on this article: don't create unmanageable big fixtures, but have small sets for each scenarios.
Link to comment