The Ultimate Compression Savings Estimation Script for an Entire Database

2020-02-182025-03-08
Tools and Scripts
29 Comments

Data compression in SQL Server can definitely go a long way in saving storage space, as well as improve performance. It can especially be significant in very large databases. But in certain scenarios it can actually cause more harm than good, and it’s important to identify these scenarios.

I wrote this script which performs compression savings estimation checks for an ENTIRE database. Yeah there are plenty such scripts out there already, but there’s something in particular that they all seem to be missing, and that is the ability to actually generate the relevant remediation commands for you. After all, ain’t nobody got time for writing all those “ALTER TABLE/INDEX” commands themselves!

This is why I created the T-SQL script below. Check out the “messages” tab once the script completes execution, to find the resulting remediation commands. In addition, you will also get a query result with full details.

The script also performs various checks to determine whether PAGE or ROW compression type would be more beneficial (based on stuff such as range scans vs. updates percentage, difference in compression savings, etc.). It has several parameters at the top to control various thresholds and behaviors, so that you could fine tune it for your specific database.

The default values for all parameters should fit most use cases, but you can still customize them if you have to. I’ve added comments next to each parameter with more details.

Speaking of which, I added a bunch of comments all over the script, and tried my best to make it as concise and reader-friendly as possible. But, if something is still isn’t clear in the script, please let me know (either at the comments section here or at the GitHub Gist), and I’ll do my best to clarify.

A few important notes first

The script performs the check for an entire database based on what’s specified in the @DatabaseName parameter. So don’t forget to change the @DatabaseName parameter to the one you actually want to check (or leave it as NULL to check the database you’re currently connected to).

BE MINDFUL IN PRODUCTION ENVIRONMENTS !

Running this script may take a very long time on databases with large tables, and significant IO and CPU stress may be noticeable. Ironically though, that is usually where you’ll have the biggest benefit to compression.

Schema-level locks may be held for a while per each table, and will possibly block other sessions performing DDL operations (ALTER, DROP, TRUNCATE etc.).

One extra important thing to remember is that this script uses the built-in data compression savings estimation procedure (sp_estimate_data_compression_savings) per each table and compression type (PAGE and ROW). This procedure copies 5% of your table to TempDB and compresses it there. This means that if you have an extremely huge table, it could cause TempDB to significantly inflate in size, which might fill out the storage disk it’s on, or reach its max size (both of which will make your SQL Server unresponsive!).

This is why I’ve added a few cautionary checks and additional threshold parameters, in order to avoid such scenarios: Specifically, @FeasibilityCheckOnly, @MaxSizeMBForActualCheck and @TempDBSpaceUsageThresholdPercent. The parameter @FeasibilityCheckOnly in particular is used to determine whether you want to run sp_estimate_data_compression_savings, or you just want to do a “basic” check (which shouldn’t cause any overheads or major issues) to find possible candidates for compression.

In general, if you want to run this script in a production environment, be sure to do it during low-traffic hours, and make sure you have enough free disk space on the TempDB drive.

The Script:

Click here to get it from Madeira’s GitHub repository.

License and Disclaimer

The script is completely free and is provided as-is, without any warrantee. Use at your own risk, and as stated above, be extra careful when running it in production environments.

Possible side-effects may include (especially if executed within very large databases):

Extra IO workload
Extra CPU workload
Extra schema-level locks
Extra disk-space used by TempDB
Rapid loss of disk free space
Possible schema-level deadlocks (in some SQL versions. It may be solved using MAXDOP 1 at instance-level)
Extra time spent in front of your computer while sweating profusely
… or extra requirement of coffee intake
… or booze
… or a sudden inexplicable need to seek new job openings
… or all of the above!

Please handle with care! 😉

Tracking Low Compression Success Rates

The column page_compression_success_count from the system function sys.dm_db_index_operational_stats can be used to track whether a table properly benefits from PAGE compression, or would rather be “downgraded” to ROW compression to save on unnecessary CPU load.

This is based on how many PAGE compressions were completed successfully on top of ROW compressions, resulting in 20% or more additionally saved space (you can find more details in Paul’s blog post).

My version of the script adds the following:

Outputs a remediation command to REBUILD each relevant index with ROW compression.
The script has parameters for:
- Minimum number of compression attempts
- Maximum success rate percentage
- Index rebuild options
The script outputs additional details such as:
- Whether the index/table in question is partitioned or not.
- Range-scan percentage
- Leaf-level updates percentage
It runs the check for ALL accessible and writeable databases.

The script is available in our Madeira Github repository

So, now we should have the whole package to properly manage data compression in our SQL Server estate:

Run the Ultimate Compression Savings Estimation script.
Review and apply its recommended remediations.
Run the new script to track its success, and rebuild as necessary with ROW compression instead of PAGE compression.

Something interesting I noticed about the page_compression_success_count column, is that it’s populated not only when data is inserted/updated in the table, but also when a PAGE-compressed index is built or re-built on the table.

This means that you should have the information you need immediately after creating or rebuilding an index with PAGE compression, and using the script I provided (or Paul’s query) you’ll be able to determine whether PAGE compression was indeed beneficial, or it would be better to rebuild it using ROW compression instead.

Additional Resources

If this topic interests you, you should check out the following resources which may give you additional perspective on compression savings estimation in SQL server. I used these resources myself while building this script:

Enjoy!

Cover Image by Gerd Altmann from Pixabay

29 thoughts on “The Ultimate Compression Savings Estimation Script for an Entire Database”

Pingback: Troubleshooting Long-Running SHRINK Operations – Eitan Blumin's Blog
Pingback: The new “metadata-only column changes” feature in SQL Server 2016 is useless! – Eitan Blumin's Blog
Pingback: I learned to love WHILE (true) and you should too – Eitan Blumin's Blog
Calvin Jones 2021-02-18 at 19:08

Reply

Have you considered adding Columnstore compression?

Loading...
1. Eitan Blumin 2021-02-18 at 19:15
  
  Reply
  
  Yes, actually. I did indeed. However, considerations for columnstore indexing are exceedingly more complex than what can be inferred from the SQL server metadata.
  
  There are other similar solutions out there specifically dedicated for columnstore index considerations.
  
  Loading...
2. 1. Eitan Blumin 2021-02-18 at 19:41
    
    Reply
    
    Here is an example:
    
    https://columnscore.com/
    
    Loading...
  2. Yossi 2023-06-12 at 19:12
    
    Reply
    
    size_MB compressible_data scan_percent update_percent compression_type
    62822 99 % 0 % 0 % PAGE
    62822 99 % 0 % 0 % ROW
    10585 100 % 0 % 0 % PAGE
    
    Hi Eitan,
    What can I understand from this report? and why what does “0%” means?
    (“scan_percent” / “update_percent”)
    
    Loading...
  3. 1. Eitan Blumin 2023-06-12 at 19:28
      
      Reply
      
      Hi Yossi!
      
      From your report you can understand that 99% of the data in the first two tables is compressible, and 100% of the data in the third table is compressible. This depends on column data types.
      
      Based on the size_MB, I’m guessing that the first two rows are actually one and the same table, only with a different compression_type.
      
      The “0%” refers to scan_percent and update_percent. It calculates the ratio of access/usage type of the table. In other words: Out of the total number of accesses to this table, 0% of them are scans and updates.
      
      In terms of compression, the following is generally true:
      Tables with a high percentage of scans will benefit greatly from compression.
      Tables with a high percentage of updates will SUFFER from compression in terms of performance.
      This is because compression is most beneficial for scan operations, but it’s the most costly in terms of performance for update operations.
      
      When both access types are “0%”, this makes things a bit ambiguous. It could be because no such access type was captured in the metadata currently existing in cache. It may or may not represent the general behavior in the server, as it depends on how long ago these statistics were collected and how long is the SQL Server uptime.
      
      Or it could be because most of the access into the table is “seek”. That kind of operation can still benefit from compression, just not as much as scan operations.
      
      Or it could also be because the table is only inserted into, but not queried from. This kind of use case can benefit from compression in terms of storage space, but the insertions will most likely have somewhat worse performance.
      
      I hope that cleared things up!
      
      Loading...
Pingback: Detect Low Compression Rates in all Databases – Eitan Blumin's Blog
Venkat 2025-04-24 at 15:01

Reply

Eitan,

You did good job extended system proc.
Can I run these scripts on AlwaysON secondary readable database ?
Why this script needs database to be writeable/updateable ?

Loading...
1. Eitan Blumin 2025-04-24 at 16:17
  
  Reply
  
  That’s an excellent question, Venkat.
  
  It was long ago so I don’t quite remember. But I think for some reason I got the impression that sp_estimate_data_compression_savings was not supported on read-only databases.
  But now I double-checked and looks like it is supported.
  
  So, I fixed the script and it should now support read-only databases as well.
  
  However, you should take into consideration that the analysis is based on index usage stats, among other things.
  And on a readable secondary replica in AlwaysOn the data in those system views may not fully represent how the tables are used.
  Especially when it comes to the update rate.
  So, please take that into consideration when performing the analysis.
  For best results, it’s best to run this on the primary replica rather than a secondary.
  
  Thanks!
  
  Loading...
2. 1. Venkat 2025-04-24 at 18:47
    
    Reply
    
    Thanks Eitan. When I got the error message, I was wondering if this script make any changes to the database. Started searching for DML statements in it.
    
    Loading...
  2. 1. Eitan Blumin 2025-04-24 at 22:16
      
      Reply
      
      The estimation script itself does not perform any DML or DDL operations on database objects.
      
      But obviously, it generates DDL statements as part of its output (i.e. index rebuilds).
      
      Loading...
    2. 1. Venkat 2025-04-25 at 22:09
        
        When I run the script, it says “is_compression_candidate” = YES & “is_compression_recommended” = YES (best guess)…..but “compression_size_saving_MB” is NULL.
        
        Also, it generates the resultset. It would be better you put some comments of what those output columns are how you arrive those number. Few sentences.
        Also, print the row count in the resultset….that way, we know how big the table we are dealing with.
        
        Sorry, my wish list is big.
        
        Loading...
      3. Eitan Blumin 2025-04-25 at 22:41
        
        Sure thing, Venkat. No worries.
        
        All you need to do is set the @FeasibilityCheckOnly parameter to 0.
        
        This will perform the “full” compression savings estimation, including execution of the sp_estimate_data_compression_savings procedure.
        
        Loading...
Venkat 2025-04-27 at 18:19

Reply

Eitan,

Another thing I noticed is “scan_percent” and “update_percent” is displaying as 0% for all the databases I ran the script. What does it mean ?

Loading...
1. Eitan Blumin 2025-04-27 at 22:16
  
  Reply
  
  Hi Venkat, your curiosity is delightful 🙂
  
  The columns “scan_percent” and “update_percent” are based on a calculation from data returned from the system function sys.dm_db_index_operational_stats.
  Specifically, “scan_percent” is the percentage of the value in range_scan_count relative to the total of all counter columns. And “update_percent” is the same thing but for the column leaf_update_count.
  
  When these columns return 0% it means that, relative to the total number of operations of all types on this index, the scan/update are significantly minuscule or even none at all.
  
  This can happen when all or nearly 100% of the operations on the index are only deletes, inserts, and/or page merge operations.
  
  However, this may also happen on a readable secondary replica when all operations are limited to seeks and/or lookups.
  Updates, deletes, and inserts are not possible on secondary replicas. And in your case, it appears that range scans don’t happen either.
  
  Loading...
Venkat 2025-04-28 at 00:40

Reply

In the AlwaysON scenario, does the DMV “sys.dm_db_index_operational_stats” has same data on both primary and secondary replica ?
I understand dmvs are stored in process memory and gets cleared after restart. When update is executed, it gets replicated to secondary…just wondering sec replica has the same entry in sec replicat memory.

Loading...
1. Eitan Blumin 2025-04-28 at 15:37
  
  Reply
  
  Hi Venkat.
  As far as I know, the answer is no, those DMVs do NOT hold the same data across all replicas.
  That’s probably why you’re seeing all those zeros when running the script on the secondary.
  
  Loading...
Venkat 2025-04-28 at 03:43

Reply

Eitan,

Sorry for askin too many questions.
Script produces 2 cols, “scan_percent” and “update_percent” .
All DML ops (Ins, Upd, Del) gets affected by compression….correct ? Do you put the all 3 ops under one bucket (update_percent) ?
Or do I have to write another query to find the delete_percent & insert_percent inside this script ?

Thanks,
Venkat

Loading...
1. Eitan Blumin 2025-04-28 at 15:39
  
  Reply
  
  Hi Venkat, thank you for all the questions 🙂
  
  No, I don’t put all 3 ops under “update_percent”. I only check the leaf_update_count.
  
  While it’s true that all DML ops get affected by compression, it’s only the updates that cause a noticeable performance hit.
  
  Loading...
Venkat 2025-05-01 at 04:27

Reply

Eitan,

Is there any way to figure out ROW vs PAGE level compression better ?
I know you gave 2 rows, but is there anything else I can do decide either one for the index ?

Thank you.

Loading...
1. Eitan Blumin 2025-05-01 at 05:04
  
  Reply
  
  Once run with the @FeasibilityCheckOnly parameter to 0, the output results would have an indication of what’s the more suitable compression level, ROW or PAGE.
  
  There are several threshold parameters that are used for determining which is the preferable compression level:
  
  ,@MinimumScanPctForPage INT = 40 -- Minimum percent of scans when comparing to update percent, to deem PAGE compression preferable (otherwise, ROW compression will be preferable) ,@MaximumUpdatePctForPage INT = 40 -- Maximum percent of updates when comparing to scan percent, to deem PAGE compression preferable ,@MaximumUpdatePctForRow INT = 60 -- Maximum percent of updates when comparing to scan percent, to deem ROW compression preferable
  
  Loading...
Venkat 2025-05-01 at 15:40

Reply

Hi Eitan,

Just a question out of curiosity,
You iterate the system proc “sp_estimate_data_compression_savings” for each potential index. It means, this proc dumps 5% of data into tempdb.
Question is: after each iteration, does it delete temp table from tempdb (or) release all temp tables it created at the end of the proc ?

Loading...
1. Eitan Blumin 2025-05-01 at 16:57
  
  Reply
  
  Venkat,
  This works normally like any stored procedure creating a temp table.
  The temp table is dropped at the end of each execution of “sp_estimate_data_compression_savings”.
  
  Loading...
Venkat 2025-05-05 at 15:41

Reply

Hi Eitan,

Another question:
I run the script with @FeasibilityCheckOnly=1 on AlwaysON primary.
And then run the same script with @FeasibilityCheckOnly=0 on Prod server.
Can I combine the 2 result set and merge the data ?
The reason why I ask the question is: Prod load is high and sensitive. We cannot run the script (@FeasibilityCheckOnly=0) on prod server. I am thinking run the dm_db_operation_stats (@FeasibilityCheckOnly=1) on prod & sp_estimate_data… on AlwaysON…..And then combine the results.

Thank you.

Loading...
1. Eitan Blumin 2025-05-07 at 09:14
  
  Reply
  
  Yes, Venkat.
  If you can manually ascertain which compression level is preferable based on the threshold parameters, then yes.
  
  Loading...
Venkat 2025-05-07 at 02:28

Reply

Hi Eitan,

Question:
Is it better to depend on operational_stats dmv to decide PAGE vs ROW ?
Or do we reply on space savings % first and then check operational_stats counters?

Thank you

Loading...
1. Eitan Blumin 2025-05-07 at 09:52
  
  Reply
  
  Hi Venkat. Thank you for the great questions.
  To decide between PAGE and ROW compressions, I’d say it’s a combination of both space savings % and also operational stats counters.
  While the main purpose of compression is to save space, we must also be careful not to overdo it, since compression also has a performance impact.
  This is especially relevant in your case, based on how sensitive you describe your primary production environment to be.
  Remember that the main performance impact of compression is on DML commands, which only occur on the primary replica.
  
  The threshold parameters in the script responsible for determining preference towards PAGE or ROW compression are the following:
  
  ,@MinimumRatioDifferenceForPage FLOAT = 20 -- Minimum difference in percentage between ROW and PAGE compression types, in order to deem PAGE compression preferable ,@MinimumSavingsMBDifferenceForPage FLOAT = 40 -- Minimum difference in saved space in MB between ROW and PAGE compression types, in order to deem PAGE compression preferable
  
  I also recommend reading this on the subject: https://eitanblumin.com/2022/01/17/detect-low-compression-rates-in-all-databases/
  
  Loading...

The Ultimate Compression Savings Estimation Script for an Entire Database

A few important notes first

BE MINDFUL IN PRODUCTION ENVIRONMENTS !

The Script:

License and Disclaimer

Tracking Low Compression Success Rates

Additional Resources

Like this:

Related

29 thoughts on “The Ultimate Compression Savings Estimation Script for an Entire Database”

Leave a ReplyCancel reply

Eitan Blumin

The Ultimate Compression Savings Estimation Script for an Entire Database

A few important notes first

BE MINDFUL IN PRODUCTION ENVIRONMENTS !

The Script:

License and Disclaimer

Tracking Low Compression Success Rates

Additional Resources

Share this:

Like this:

Related

29 thoughts on “The Ultimate Compression Savings Estimation Script for an Entire Database”

Leave a ReplyCancel reply