How to Un-Heap Your Heaps?

Eitan Blumin

6 years ago

Today’s script has something very interesting to do with Heap tables. Specifically, how to turn them into something that’s not a heap. Continue reading for more info…

What is a Heap?

In case you didn’t know, “Heap” tables in SQL Server are tables that don’t have a clustered index on them.

There’s plenty of information already available on the internet about these tables. Here is what Microsoft Docs has to say about it:

If a table is a heap and does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.

When a table is stored as a heap, individual rows are identified by reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page. The row id is a small and efficient structure. Sometimes data architects use heaps when data is always accessed through nonclustered indexes and the RID is smaller than a clustered index key.

Heap tables can have detrimental implications on performance in the following scenarios:

When the data is frequently returned in a sorted order. A clustered index on the sorting column could avoid the sorting operation.
When the data is frequently grouped together. Data must be sorted before it is grouped, and a clustered index on the sorting column could avoid the sorting operation.
When ranges of data are frequently queried from the table. A clustered index on the range column will avoid sorting the entire heap.
When there are no nonclustered indexes and the table is large. In a heap, all rows of the heap must be read to find any row.

So yeah, Heap tables are pretty bad, especially when they contain a lot of data. Many talented bloggers have already wrote about this issue, explaining heap tables, comparing them to clustered index tables, and telling you how important it is to design your database tables properly.

Which column(s) should be clustered?

But I’m not here to re-hash repeated and tired mantras. What I’m here to do today, is to share with you a trick that I’ve got up my sleeve to Quickly Generate Cluster Index Recommendations for Heap Tables! This would be especially useful to you if you have A LOT of databases in your SQL Server, and many of them containing A LOT of heap tables. Going through each and every one could be very tiresome.

So what I did, is basically write a “guestimation” script which tries to make use of whatever metadata and statistics SQL Server has, which may give a hint as to what would be the most probable clustered index to create. Its algorithm goes something like this:

Look in index usage stats for the most “popular” non-clustered indexes which would be a good candidate as clustered index. If no such was found, then:
If there’s any non-clustered index at all, get the first one created with the highest number of INCLUDE columns, give priority to UNIQUE indexes. If no such was found, then:
Look in missing index stats for the most impactful index that has the highest number of INCLUDE columns. If no such was found, then:
Use the IDENTITY column in the table. If no such was found, then:
Check for any column statistics in the table and look for the column which is the most selective (most unique values). If no such was found, then:
Use the first date/time column in the table, give priority to columns with a default constraint. If no such was found, then:
Use the first int/bigint/smallint/tinyint column in the table, give priority to columns without a default constraint. If no such was found, then:
Use the first non-nullable column in the table, give priority to columns without a default constraint. If no such was found, then:
Bummer. I’m out of ideas. No automated recommendations are possible.

Note: the above may not be up to date when you read this post. I occasionally update the script with improvements and things could change such as new recommendations added, or their order of consideration changed. The most up-to-date algorithm would be documented within the script itself.

You can find the script in my GitHub Gists here:

See the gist on github.

I’ve had the chance to use this script in several environments, and it works very well for most common scenarios.

WARNING! CONSIDER THE FOLLOWING BEFORE MAKING ANY ACTUAL CHANGES:

DO NOT APPLY the recommendations from the above script blindly!
The script only generates estimated recommendations based on the little information it can gather from system tables.
You would have to use precaution and careful discretion to consider what would be the right clustered index per each table.
You would need to consult with the developers / product managers to understand the implications of making index changes in the database schema and whether they agree to it (for example, if the same changes would also need to be made in some kind of a database source control, or when the database belongs to a vendor and making changes to it would void the warranty).
Be extremely careful when dealing with very large tables, especially when the SQL Server edition is not Enterprise and therefore doesn’t support ONLINE index operations. In which case: This probably means DOWNTIME. Prepare accordingly.
- Also, even if you have Enterprise Edition, you will still not be able to perform ONLINE index operations if the table contains a deprecated data type (image / text / ntext).
The following index options are NOT taken into consideration by this script and would not be retained by the output scripts (this will be fixed in a future version):
- Non-default file groups or partition schemas
- Fill Factor
- Allow Row/Page Locks options
- IGNORE_DUP_KEY option
- Index padding option

The script also generates the actual CREATE script for these clustered indexes, but as I said, be careful not to run those scripts blindly without thinking first. So, while the script will save you significant time and give you helpful ideas, you should still evaluate each recommendation and make sure it’s fine.

Got any comments? Ideas? Let me know below.

Additional Resources:

Cover Image by PublicDomainPictures from Pixabay

What is a Heap?

Which column(s) should be clustered?

WARNING! CONSIDER THE FOLLOWING BEFORE MAKING ANY ACTUAL CHANGES:

Additional Resources:

Share this: