Troubleshooting Long-Running SHRINK Operations

Ever had your SHRINK operation “stuck” and didn’t know why, or how to solve it? It’s more common than you think. Check out this post for more info, tricks and tips.

TL;DR: This blog post, these two scripts and also this script.

Disclaimer

Let me start off with a mandatory warning: as a rule of thumb, shrink operations are to be avoided, if possible. Lots of professionals have spoken about this topic and the reasons why it’s so bad, so I wouldn’t waste the space here. You can view the link above for more info.

With that being said, we must also acknowledge that there are some (rare) use cases where you would want to perform a shrink anyway…

Problem

Using the TRUNCATEONLY option would be preferable whenever possible. It will clear out unused space from the “tail” of your file without moving any pages. But, more often than not, it’s not good enough. Maybe you deleted or migrated a huge chunk of data which you know will never return. Maybe you activated data compression on a significant portion of your tables and now you want to reclaim that freed space for other uses. Or maybe you simply have a very serious shortage of disk space and you gotta scrape every gigabyte you can (is that scenario even possible in our day and age? Oh you’d be surprised).

Whatever the cause may be, you find yourself running DBCC SHRINKFILE without the TRUNCATEONLY option, and now you’re waiting… And waiting… And waiting… Until you start getting worried. So you run a small query from sys.dm_exec_requests or use something like spWhoIsActive, and you discover that your shrink session has been stuck on the same percentage of completion for the past half-hour, even though it seems to do… Well, something! Otherwise, why do the reads and writes counters keep going up??? It takes minutes, and then hours, and in some cases – even days! Good heavens, what to do?!!?

The Unusual Suspects

Hey, that’s a really good title for a movie!
Anyways, um…

If you happen to be an experienced SQL Server DBA, you’re probably familiar with this blog post from Paul S. Randal, or maybe this ancient blog post, also from Paul S. Randal, who explains that slow SHRINK operations can happen due to how the SQL Server storage engine has to perform a table scan for each LOB_DATA and ROW_OVERFLOW_DATA value that it wants to move around during SHRINK. If you didn’t know that, well then now you do!

When a text page is moved by data file shrink, a table scan must be performed for each LOB value to find the β€˜owning’ data record so the pointer can be updated. […]

Paul S. Randal (source)
Anger from inside-out movie
Paul really hates SHRINK with a burning passion. Like, seriously, he really does.

In any case, while this is an horrendous behavior in the SQL Server storage engine, it doesn’t seem like Microsoft are planning to fix this issue anytime soon. So, we gotta deal with it for the time being.

But wait! There’s more!

Remember Heap tables? Remember what happens if you create non clustered indexes on them? What is saved at the leaf level of every such index? That’s right, it’s our old pal RID, which represents the physical location of every row in the heap. This is used for the RID Lookup operator in execution plans.

But wait… What happens if a row in such a table needs to move around during a SHRINK operation? You guessed it – SQL Server has to go to each and every non-clustered index and update the RID so that it would point to the new row location. That is something that can definitely make your SHRINK operations go very slow.

Even if you have a heap table without any non clustered indexes to maintain, there’s still one more use case that can slow down your SHRINK operations: Table Partitioning. If you happen to have a partitioned heap table, it will cause the same issues that a heap table with non clustered indexes would have caused, even if the table has no indexes at all.

Solution

There are a few things that you can do to assist in your shrinking operation:

1. Solve the damn issue

Look, if you’ve read the paragraphs leading up to this point, then you should realize by now that we actually know already what causes shrink operations to run slow. That means we can detect these scenarios in advance, and that means we can solve them (most of them, at least). So why not do that?

I wrote the following script which will find all the problematic tables in your database which may cause your shrink operations to run slow, so that you’d know what to look out for:

These two scripts are available in my Gists

You might notice that there’s also an extra script file there that does something different. While the first script has a “proactive” purpose (i.e. detect potential trouble makers BEFORE they start making trouble), the second script has a “reactive” purpose (i.e. you use it once the trouble has already started):

The second script will query for any currently running shrink operations, and detect which tables they are currently holding locks on. I found that looking into the locks held by the session was more reliable than inspecting the transaction log file (even though shrink operations are fully logged). You may have to run the query multiple times until you “catch” the session at the right moment when it holds the locks. It might be annoying sometimes, but eventually, it works.

Once you detect which table is causing all the fuss, now you know where to focus your efforts.
Let’s break it down by use-cases:

A. Un-Heap Your Heap Tables

Whether it’s a heap table with or without non clustered indexes, with or without partitioning, the solution is the same: un-heap it. Simply adding a clustered index will definitely solve the issue and send your shrink operation flying like a champ.

You can use the script in the following blog post to help you out with un-heaping your heaps:

How to Un-Heap your Heaps (Resolving Tables Without Clustered Indexes)

B. Reorganize or Rebuild the LOB and ROW OVERFLOW data

One of the things you could try is to rebuild or reorganize your entire table. It’s not guaranteed to help, but there is a chance that while moving the data elsewhere in the file, you’d eliminate the need to move it during the SHRINK operation, and it’ll work better (hopefully, with the TRUNCATEONLY option). This will only help if you have enough unused space somewhere in the beginning of your file. Otherwise, you may just make things much worse by moving your whole table to the end of the file – right where the SHRINK operation will cause serious havoc.

To cause less impact on production activity, you can reorganize your indexes:

ALTER INDEX ALL ON <object> WITH REORGANIZE

If you got Enterprise edition, or Azure SQL DB / Managed Instance, you can do rebuild online:

ALTER INDEX ALL ON <object> REBUILD WITH (ONLINE=ON)

If the table is a Heap:

ALTER TABLE <object> REBUILD

Just to be sure, after you do this, try running the SHRINK operation with TRUNCATEONLY, to avoid moving the pages around again.

Is there a way for us to predict whether rebuilding/reorganizing our table would actually help? Possibly, by visualizing our data page allocation. Check out the following open-source project that I created for this purpose:

SQL Server Data Allocation Reports – Queries and reports to visualize your SQL data page allocations, which pages are reserved for which objects, and which pages are not used.

C. Move Out Your LOB and ROW OVERFLOW DATA

Another possible solution is to move your whole table somewhere outside the file entirely. Whether it be to a different filegroup, or even a different database.

To move your table to a different filegroup, you could do the following:

  1. Use SSMS to generate a CREATE script for all of the table’s indexes, including the clustered index.
  2. In the generated script, change the filegroup of each of the indexes (don’t forget to add a new one if you don’t have one already).
  3. Set the DROP_EXISTING=ON option for all of the indexes.

PRO TIP: If you have a combination of clustered and non-clustered indexes, it’s recommended to first drop all of your non-clustered indexes, rebuild the clustered index, and only then re-create the non-clustered indexes. Otherwise, you’d have to wait for SQL to rebuild all the clustered keys in your non-clustered indexes to RIDs and back to clustered keys again.

To move the table to a different database, you could use the SELECT INTO syntax and then TRUNCATE the original table (wait, woah… did I really just say that?? 😬 I should probably say something here about how dangerous this is and prone to catastrophic user error! CAREFUL NOT TO LOSE YOUR DATA! Remember: Backups are your soul-mates).

After that, you can try running your shrink operation again, and when you’re done, move the table back again (or don’t. I won’t judge. It’s your funeral πŸ€·β€β™‚οΈ).

2. Shrink using smaller intervals

Moving your tables to different filegroups, dropping and creating indexes… Those are all serious schema changes, and may not always be possible. But there might be a different solution which wouldn’t require any schema change:

Instead of shrinking directly to your desired file size right away, you should perform the shrink in “chunks”. The smallest possible interval in SHRINK is 1 MB. You’d be surprised, but there are some scenarios where even such a small chunk can take upwards of 15 minutes to complete. It may not solve your issue entirely (or at all), but at least it’ll make your shrink operations more manageable.

To help you with this, I wrote the following script below. Change the parameters to fine-tune it for your use case:

The script is available in my Gists

In general, it’s a good idea to always shrink huge files gradually, even if you don’t expect any trouble. After all, that’s when trouble tends to bite the hardest – when you least expect it.

Conclusion

You should be 100 % certain that you want to do that SHRINK operation.
Are you sure you wanna do it?
Are you sure you’re willing to suffer the consequences of fragmentation, and possible performance issues resulting from file auto-growth? πŸ€”

If you do, I hope I gave you a few tools to, at the very least, make your SHRINK operation faster and easier to manage.

Good luck! πŸ‘

Here’s a Turbo Snail for you

See Also

Cover Image by Michal Jarmoluk from Pixabay

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.