It’s been a long while since I’ve published a blog post with a useful T-SQL script. So, today’s the day! Today’s script has something very interesting to do with Heap tables. Continue reading for more info…
In case you didn’t know, “Heap” tables in SQL Server are tables that don’t have a clustered index on them.
There’s plenty of information already available on the internet about these tables. Here is what Microsoft Docs has to say about it:
If a table is a heap and does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.
When a table is stored as a heap, individual rows are identified by reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page. The row id is a small and efficient structure. Sometimes data architects use heaps when data is always accessed through nonclustered indexes and the RID is smaller than a clustered index key.
Heap tables can have detrimental implications on performance in the following scenarios:
- When the data is frequently returned in a sorted order. A clustered index on the sorting column could avoid the sorting operation.
- When the data is frequently grouped together. Data must be sorted before it is grouped, and a clustered index on the sorting column could avoid the sorting operation.
- When ranges of data are frequently queried from the table. A clustered index on the range column will avoid sorting the entire heap.
- When there are no nonclustered indexes and the table is large. In a heap, all rows of the heap must be read to find any row.
So yeah, Heap tables are pretty bad, especially when they contain a lot of data. Many talented bloggers have already wrote about this issue, explaining heap tables, comparing them to clustered index tables, and telling you how important it is to design your database tables properly.
But I’m not here to re-hash repeated and tired mantras. What I’m here to do today, is to share with you a trick that I’ve got up my sleeve to Quickly Generate Cluster Index Recommendations for Heap Tables! This would be especially useful to you if you have A LOT of databases in your SQL Server, and many of them containing A LOT of heap tables. Going through each and every one could be very tiresome.
So what I did, is basically write a “guestimation” script which tries to make use of whatever metadata and statistics SQL Server has, which may give a hint as to what would be the most probable clustered index to create. Its algorithm goes something like this:
- Look in index usage stats for the most “popular” non-clustered indexes which would be a good candidate as clustered index. If no such was found, then:
- Look in missing index stats for the most impactful index that has the highest number of INCLUDE columns. If no such was found, then:
- If there’s any non-clustered index at all, get the first non-clustered index created with the highest number of INCLUDE columns. If no such was found, then:
- Check for any column statistics in the table and look for the column which is the most selective (most unique values). If no such was found, then:
- Use the IDENTITY column in the table. If no such was found, then:
- Use the first date/time column in the table. If no such was found, then:
- Bummer. I’m out of ideas. No automated recommendations are possible.
You can find the script in my GitHub Gists here:
WARNING! CONSIDER THE FOLLOWING BEFORE MAKING ANY ACTUAL CHANGES:
- DO NOT APPLY the recommendations from the above script blindly!
The script only generates estimated recommendations based on the little information it can gather from system tables.
You would have to use precaution and careful discretion to consider what would be the right clustered index per each table.
- You would need to consult with the developers / product managers to understand the implications of making index changes in the database schema and whether they agree to it (for example, if the same changes would also need to be made in some kind of a database source control, or when the database belongs to a vendor and making changes to it would void the warranty).
- Be extremely careful when dealing with very large tables, especially when the SQL Server edition is not Enterprise and therefore doesn’t support ONLINE index operations. In which case: This probably means DOWNTIME. Prepare accordingly.
I may consider in the future to also write something that generates the actual CREATE script for these clustered indexes, but I’m thinking maybe I shouldn’t, because that may prompt people to run those scripts blindly without thinking first. So, while the script will save you significant time and give you helpful ideas, you would still need to evaluate each recommendation and create the actual clustered indexes on your own.
Got any comments? Ideas? Let me know below.
- My Script to generate “guestimated” clustered index recommendations
- (Microsoft Docs) Heaps (Tables Without Clustered Indexes)
- (MSSQL Tips) SQL Server Clustered Tables vs Heap Tables
- (Brent Ozar) Tables Without Clustered Indexes