Data compression in SQL Server can definitely go a long way in saving storage space, as well as improve performance. It can especially be significant in very large databases. But in certain scenarios it can actually cause more harm than good, and it’s important to identify these scenarios.
I wrote this script which performs compression savings estimation checks for an ENTIRE database. Yeah there are plenty such scripts out there already, but there’s something in particular that they all seem to be missing, and that is the ability to actually generate the relevant remediation commands for you. After all, ain’t nobody got time for writing all those “ALTER TABLE/INDEX” commands themselves!
This is why I created the T-SQL script below. Check out the “messages” tab once the script completes execution, to find the resulting remediation commands. In addition, you will also get a query result with full details.
The script also performs various checks to determine whether PAGE or ROW compression type would be more beneficial (based on stuff such as range scans vs. updates percentage, difference in compression savings, etc.). It has several parameters at the top to control various thresholds and behaviors, so that you could fine tune it for your specific database.
The default values for all parameters should fit most use cases, but you can still customize them if you have to. I’ve added comments next to each parameter with more details.
Speaking of which, I added a bunch of comments all over the script, and tried my best to make it as concise and reader-friendly as possible. But, if something is still isn’t clear in the script, please let me know (either at the comments section here or at the GitHub Gist), and I’ll do my best to clarify.
A few important notes first
The script performs the check for an entire database based on what’s specified in the @DatabaseName parameter. So don’t forget to change the @DatabaseName parameter to the one you actually want to check (or leave it as NULL to check the database you’re currently connected to).
BE MINDFUL IN PRODUCTION ENVIRONMENTS !
Running this script may take a very long time on databases with large tables, and significant IO and CPU stress may be noticeable. Ironically though, that is usually where you’ll have the biggest benefit to compression.
Schema-level locks may be held for a while per each table, and will possibly block other sessions performing DDL operations (ALTER, DROP, TRUNCATE etc.).
One extra important thing to remember is that this script uses the built-in data compression savings estimation procedure (sp_estimate_data_compression_savings) per each table and compression type (PAGE and ROW). This procedure copies 5% of your table to TempDB and compresses it there. This means that if you have an extremely huge table, it could cause TempDB to significantly inflate in size, which might fill out the storage disk it’s on, or reach its max size (both of which will make your SQL Server unresponsive!).
This is why I’ve added a few cautionary checks and additional threshold parameters, in order to avoid such scenarios: Specifically, @FeasibilityCheckOnly, @MaxSizeMBForActualCheck and @TempDBSpaceUsageThresholdPercent. The parameter @FeasibilityCheckOnly in particular is used to determine whether you want to run sp_estimate_data_compression_savings, or you just want to do a “basic” check (which shouldn’t cause any overheads or major issues) to find possible candidates for compression.
In general, if you want to run this script in a production environment, be sure to do it during low-traffic hours, and make sure you have enough free disk space on the TempDB drive.
Click here to get it from my GitHub Gists.
The script is also mirror-published at Madeira’s Toolbox GitHub repository here.
License and Disclaimer
The script is completely free and is provided as-is, without any warrantee. Use at your own risk, and as stated above, be extra careful when running it in production environments.
Possible side-effects may include (especially if executed within very large databases):
- Extra IO workload
- Extra CPU workload
- Extra schema-level locks
- Extra disk-space used by TempDB
- Rapid loss of disk free space
- Possible schema-level deadlocks (in some SQL versions. It may be solved using MAXDOP 1 at instance-level)
- Extra time spent in front of your computer while sweating profusely
- … or extra requirement of coffee intake
- … or booze
- … or a sudden inexplicable need to seek new job openings
- … or all of the above!
Please handle with care! 😉
If this topic interests you, you should check out the following resources which may give you additional perspective on compression savings estimation in SQL server. I used these resources myself while building this script:
- What_to_Compress_v2 by SQLBalls at SQLServerCentral.com
- Estimate compression for all tables and indexes with both Row and Page by DawnLove at SQLServerCentral.com
- Compression Estimates by osragde at SQLServerCentral.com
- Evaluation Compression Gains by Microsoft Tiger Team
- Data compression in SQL Server (Microsoft Docs)
Cover Image by Gerd Altmann from Pixabay
Pingback: Troubleshooting Long-Running SHRINK Operations – Eitan Blumin's Blog
Pingback: The new “metadata-only column changes” feature in SQL Server 2016 is useless! – Eitan Blumin's Blog
Pingback: I learned to love WHILE (true) and you should too – Eitan Blumin's Blog
Have you considered adding Columnstore compression?
Yes, actually. I did indeed. However, considerations for columnstore indexing are exceedingly more complex than what can be inferred from the SQL server metadata.
There are other similar solutions out there specifically dedicated for columnstore index considerations.
Here is an example:
Pingback: Detect Low Compression Rates in all Databases – Eitan Blumin's Blog