This month’s #tsql2sday is hosted by Hamish Watson (b|t) and this time touches on the extremely important topic of database unit testing. Its high importance and value should be a no-brainer. But it can be a very difficult topic, and the hardest thing about it, in my opinion, is knowing where to start. Hopefully, with this blog post, I can help you out with that.
First off, is you’re totally new to unit testing as a whole, you should check out Hamish’s invitation post. He gives a very good summary of what it is, when it should be used, and why it’s important.
Now we just need to know how to tackle it in terms of databases.
Hamish asks us about the value of database unit testing. My response is: Like, duh? How is this even a question? OF COURSE it’s important and has a very high value. The real question is not that of “whether” we should be doing database unit testing – but “how”! And the biggest question of them all – “how to start?”
What are we testing
First, we must understand how does database unit testing differ from regular, good old fashioned application unit testing. Well, by dealing with data, of course.
Functions, stored procedures, triggers, and even scheduled jobs are all data-driven “modules” (or “routines”, as they’re academically known). Their purpose is to either query or manipulate data, or both.
Therefore, their “validity” depends on the state of data in the tables that they’re accessing or manipulating.
(by the way, Rob Farley explains well the difference between “state” and “statelessness” in his own post)
If you have stored procedures / functions that don’t access any tables (for example, they’re doing some kind of arithmetic/string/date calculation, i.e. they’re stateless and deterministic), then designing unit tests for them should be quite straightforward. Just like any other application unit test: i.e. what kind of input goes in -> what kind of output should be coming out.
What we’ll be dealing with here today, though, are routines that do access data in tables.
So how should we build unit tests for these modules? Most likely, by testing the data in the underlying tables.
Real-world example
Consider the following: an example script that I wrote a while ago for a customer. The script receives as parameters:
- Names of two stored procedures
- List of table names that are supposedly affected by the procedures
The script then executes the two procedures and compares the contents of the specified tables before and after the execution of each procedure.
The idea behind this script was to compare two versions of a procedure, after a supposed performance improvement. The point was to make sure that the new version was still valid in terms of the data manipulation it performed, i.e. the state of the data, as a result of execution, remained the same.
There are, of course, other elements to this script, such as custom filters per table, list of columns to ignore during the comparison (such as time-dependent columns, identity columns, etc.), transaction begin and rollback, and some other stuff.
You can find the script in my GitHub Gists.
Specifically, take note of this extra query at the start of the script:
select DISTINCT TableName = QUOTENAME(OBJECT_SCHEMA_NAME(referenced_major_id)) + '.' + QUOTENAME(OBJECT_NAME(referenced_major_id))
, ExcludeColumns = (
SELECT [Column] = c.[name]
FROM sys.columns AS c
WHERE c.[object_id] = d.referenced_major_id
AND c.is_identity = 1
FOR XML PATH('ExcludeColumns'), ELEMENTS
)
from sys.sql_dependencies AS d
where [object_id] IN (OBJECT_ID(@ProcNameA), OBJECT_ID(@ProcNameB))
AND OBJECTPROPERTY(referenced_major_id, 'IsUserTable') = 1
AND d.is_updated = 1
This query uses the sys.sql_dependencies system view (now deprecated in favor of sys.sql_expression_dependencies, but that’s beside the point), in order to find which tables are being referenced by the two specified stored procedures.
We can use the output of this query as parameter values for the main script, in order to know which tables we should check, and which columns we can ignore.
This kind of operation, of finding which routines are dependent on which tables, is part of a larger topic called “Data Lineage Analysis“. Data Lineage basically means: discovering where your data is coming from and where it is going.
Designing the unit test
Once you know which tables are affected by which routines, you’d have a much easier time designing your unit tests. It’s one of those things where, if you just focus on who does what with your data, the “ideas” for your unit tests should become obvious to you. In other words: You’ll know it when you see it.
If you’re still having trouble coming up with automated unit tests for a routine, a good starting point would be to simply ask yourself: “If I was manually testing this routine to make sure it works well, what would I do?” Take your answer, and try to see how to design such a test so that it could be automated.
My script mentioned above is just a small example of a unit test for stored procedures. Unfortunately, it’s not very useful in automation. But it should at least give you a general idea of what kind of tests we can perform on table data (in my use case – comparison between two data sets).
Tools of the Trade
Common (and free) database unit testing tools such as SSDT and tSQLt would be capable of setting up most of the tests you need.
Less common tools (with a cost), such as SentryOne Test and Red-Gate’s SQL Test can provide even more powerful capabilities.
I’m not going to expand on those tools myself here because that’s beyond the scope of this blog post. Luckily, given this month’s T-SQL Tuesday topic, other people wrote about these tools better than I could:
- Jess Pomfret gets started with tSQLt
- Kevin Chant recommended this post by Nikolai Thomassen about using tSQLt with SSDT and Azure DevOps
- Sander Stad provides a couple very useful Powershell tools: SSDT-With-tSQLt-Template to initialize a basic SSDT project with tSQLt objects, and PStSQLtTestGenerator to quickly generate basic tSQLt unit tests.
- I will edit this post and add more links here if/when they become available.
Also, check out the links under “Additional Resources” at the end of this page.
But wait, there’s more!
By the way, speaking of data lineage…
Functions, procedures, triggers and scheduled jobs are not the only things that may be querying or manipulating our data… There are also data-driven “modules” that exist outside the database engine itself (and yet still fall within the “realm” of the DBA’s responsibilities):
- SSIS packages
- Azure Data Factory packages
- SSRS reports
- Power BI reports
- SSAS cubes and models
- Tableau
- Salesforce
All of these components and more should also be accounted for and tested wherever possible.
SSDT, tSQLt and Red-Gate’s SQL Test do not support these kinds of modules. But more advanced tools, such as SentryOne Test, are capable of setting up unit tests for just about any of these with surprising ease.
Also, discovering such “extra-database” modules may not be easy to do… This is where data lineage analysis tools become extremely handy, such as SentryOne Document (aka Database Mapper), MANTA, and Azure Purview.
Understanding what types of information are moving to and from your system can be a crucial step in preparing to unit test your data. Once you discover what each of those components is doing with your database, it becomes much clearer what kind of unit tests can and should be set up.
Understanding the lineage of your data also enables you to forecast and preemptively identify the impact of proposed database changes. Using a good tool that has the ability to explore the dependencies in your environment, you’ll save countless hours in assessing how a change in one area will impact other areas.
Important to Remember
Unit testing is all about testing small units. Within this context, Data Lineage Analysis would be relevant to understanding the dependencies between a function or stored procedure and the underlying tables, and the how and why of their usage. But not necessarily more than that.
The true power of data lineage analysis in tools such as SentryOne Document and Azure Purview is by revealing all dependencies across your entire data estate. This can be especially useful, for example, when designing integration testing.
But that… is a story for another time 😉.
Additional Resources
- Getting Started with SQL Server Database Unit Testing in SSDT | Microsoft Docs
- tSQLt – Free Open Source Database Unit Testing for SQL Server
- SentryOne Document Data Lineage Capabilities | Tyler Lynch
- SentryOne Document Online Demo (no sign-in required) | Tyler Lynch
- SentryOne Document Product Page
- Validating data with SentryOne Test | John Welch
- SentryOne Test Product Page
- What is Microsoft Purview? | Microsoft Docs