How to Un-Heap Your Heaps?

Today’s script has something very interesting to do with Heap tables. Specifically, how to turn then into something that’s not a heap. Continue reading for more info…

In case you didn’t know, “Heap” tables in SQL Server are tables that don’t have a clustered index on them.

There’s plenty of information already available on the internet about these tables. Here is what Microsoft Docs has to say about it:


If a table is a heap and does not have any nonclustered indexes, then the entire table must be examined (a table scan) to find any row. This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company.

When a table is stored as a heap, individual rows are identified by reference to a row identifier (RID) consisting of the file number, data page number, and slot on the page. The row id is a small and efficient structure. Sometimes data architects use heaps when data is always accessed through nonclustered indexes and the RID is smaller than a clustered index key.

Heap tables can have detrimental implications on performance in the following scenarios:

  • When the data is frequently returned in a sorted order. A clustered index on the sorting column could avoid the sorting operation.
  • When the data is frequently grouped together. Data must be sorted before it is grouped, and a clustered index on the sorting column could avoid the sorting operation.
  • When ranges of data are frequently queried from the table. A clustered index on the range column will avoid sorting the entire heap.
  • When there are no nonclustered indexes and the table is large. In a heap, all rows of the heap must be read to find any row.

So yeah, Heap tables are pretty bad, especially when they contain a lot of data. Many talented bloggers have already wrote about this issue, explaining heap tables, comparing them to clustered index tables, and telling you how important it is to design your database tables properly.

But I’m not here to re-hash repeated and tired mantras. What I’m here to do today, is to share with you a trick that I’ve got up my sleeve to Quickly Generate Cluster Index Recommendations for Heap Tables! This would be especially useful to you if you have A LOT of databases in your SQL Server, and many of them containing A LOT of heap tables. Going through each and every one could be very tiresome.

So what I did, is basically write a “guestimation” script which tries to make use of whatever metadata and statistics SQL Server has, which may give a hint as to what would be the most probable clustered index to create. Its algorithm goes something like this:

  1. Look in index usage stats for the most “popular” non-clustered indexes which would be a good candidate as clustered index. If no such was found, then:
  2. Look in missing index stats for the most impactful index that has the highest number of INCLUDE columns. If no such was found, then:
  3. If there’s any non-clustered index at all, get the first non-clustered index created with the highest number of INCLUDE columns. If no such was found, then:
  4. Check for any column statistics in the table and look for the column which is the most selective (most unique values). If no such was found, then:
  5. Use the IDENTITY column in the table. If no such was found, then:
  6. Use the first date/time column in the table. If no such was found, then:
  7. Bummer. I’m out of ideas. No automated recommendations are possible.

Note: the above may not be up to date when you read this post. I occasionally update the script with improvements and things could change such as new recommendations added, or their order of appearance changed.

You can find the script in my GitHub Gists here:

—————————————————–
—- Generate Clustered Index Recommendations ——-
—————————————————–
Author: Eitan Blumin | https://www.eitanblumin.com
More info: https://eitanblumin.com/2019/12/30/resolving-tables-without-clustered-indexes-heaps/
—————————————————–
Description:
————
This script finds all heap tables, and "guestimates" a clustered index recommendation for each.
The script implements the following algorithm:
1. Look in index usage stats for the most "popular" non-clustered indexes which would be a good candidate as clustered index. If no such was found, then:
2. If there's any non-clustered index at all, get the first one created with the highest number of INCLUDE columns, give priority to UNIQUE indexes. If no such was found, then:
3. Look in missing index stats for the most impactful index that has the highest number of INCLUDE columns. If no such was found, then:
4. Use the IDENTITY column in the table. If no such was found, then:
5. Use the first date/time column in the table, give priority to columns with a default constraint. If no such was found, then:
6. Check for any column statistics in the table and look for the column which is the most selective (most unique values). If no such was found, then:
7. Bummer. I'm out of ideas. No recommendations are possible.
—————————————————–
Change log:
————
2020-09-30 Added optional parameters @OnlineRebuild, @SortInTempDB, @MaxDOP
2020-09-21 Added columns list in initial recommendations retrieval, removed newlines from remediation scripts
2020-07-14 Added proper support for replacing unique indexes
2020-07-14 Added generated script for replacing existing nc index with a clustered index
2020-02-19 Added support for Azure SQL DB, and added version-dependent check to ignore memory optimized tables
2020-02-12 Changed prioritization a bit for the recommendations, added automatic generation of basic CREATE script
2020-01-07 Added check of database Updateability, and moved around a few columns
2019-12-29 Added checks for IDENTITY columns, and first DATE/TIME columns
2019-12-23 First version
—————————————————–
Parameters:
————
DECLARE
@MinimumRowsInTable INT = 200000 Minimum number of rows in a table in order to check it
Parameters controlling the structure of output scripts:
,@OnlineRebuild BIT = 1 If 1, will generate CREATE INDEX commands with the ONLINE option turned on.
,@SortInTempDB BIT = 1 If 1, will generate CREATE INDEX commands with the SORT_IN_TEMPDB option turned on.
,@MaxDOP INT = NULL If not NULL, will generate CREATE INDEX commands with the MAXDOP option. Set to 1 to prevent parallelism and reduce workload.
—————————————————–
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT, ARITHABORT, XACT_ABORT ON;
IF OBJECT_ID(N'tempdb..#temp_heap') IS NOT NULL DROP TABLE #temp_heap;
DECLARE @CMD NVARCHAR(MAX), @CurrDB SYSNAME, @CurrObjId INT, @CurrTable NVARCHAR(1000);
DECLARE @RebuildOptions NVARCHAR(MAX);
Init local variables and defaults
SET @RebuildOptions = N''
IF @OnlineRebuild = 1 SET @RebuildOptions = @RebuildOptions + N', ONLINE = ON'
IF @SortInTempDB = 1 SET @RebuildOptions = @RebuildOptions + N', SORT_IN_TEMPDB = ON'
IF @MaxDOP IS NOT NULL SET @RebuildOptions = @RebuildOptions + N', MAXDOP = ' + CONVERT(nvarchar(4000), @MaxDOP)
IF @RebuildOptions LIKE N',%' SET @RebuildOptions = N' WITH (' + STUFF(@RebuildOptions, 1, 2, N'') + N')';
IF @OnlineRebuild = 1 AND ISNULL(CONVERT(int, SERVERPROPERTY('EngineEdition')),0) NOT IN (3,5,8)
BEGIN
RAISERROR(N'— WARNING: @OnlineRebuild is set to 1, but current SQL edition does not support ONLINE rebuilds.', 0, 1);
END
CREATE TABLE #temp_heap
(
[database_name] NVARCHAR(50),
table_name NVARCHAR(MAX),
full_table_name NVARCHAR(MAX),
num_of_rows INT NULL,
[object_id] INT,
candidate_index SYSNAME NULL,
candidate_columns_from_existing_index NVARCHAR(MAX) NULL,
candidate_columns_from_missing_index NVARCHAR(MAX) NULL,
identity_column SYSNAME NULL,
most_selective_column_from_stats SYSNAME NULL,
first_date_column SYSNAME NULL,
first_date_column_default NVARCHAR(MAX) NULL,
is_unique BIT NULL
);
SET @CMD = N'
INSERT INTO #temp_heap([database_name], [object_id], table_name, full_table_name, num_of_rows, candidate_index, candidate_columns_from_existing_index, is_unique)
SELECT DB_NAME() as DatabaseName, t.object_id, OBJECT_NAME(t.object_id) AS table_name, QUOTENAME(OBJECT_SCHEMA_NAME(t.object_id)) + ''.'' + QUOTENAME(OBJECT_NAME(t.object_id)) AS FullTableName
, SUM(p.rows)
, QUOTENAME(ix.name) AS CandidateIndexName
, ix_columns
, ix.is_unique
FROM sys.tables t
INNER JOIN sys.partitions p
ON t.object_id = p.OBJECT_ID
OUTER APPLY
(
SELECT TOP 1 us.index_id
FROM sys.dm_db_index_usage_stats AS us
WHERE us.database_id = DB_ID()
AND us.object_id = t.object_id
AND us.index_id > 1
ORDER BY us.user_updates DESC, us.user_scans DESC, us.user_seeks DESC
) AS ixus
LEFT JOIN sys.indexes AS ix
ON ixus.index_id = ix.index_id
AND ix.object_id = t.object_id
OUTER APPLY
(SELECT ix_columns = STUFF((
SELECT '', '' + QUOTENAME(c.name) + CASE ic.is_descending_key WHEN 1 THEN '' DESC'' ELSE '' ASC'' END
FROM sys.index_columns AS ic
INNER JOIN sys.columns AS c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE ic.object_id = t.object_id AND ic.index_id = ix.index_id AND ic.is_included_column = 0
FOR XML PATH('''')
), 1, 2, '''')
) AS ixcolumns
WHERE p.index_id = 0
AND t.is_ms_shipped = 0
AND t.OBJECT_ID > 255'
Ignore memory-optimized tables in SQL Server versions 2014 and newer
+ CASE WHEN CONVERT(int, (@@microsoftversion / 0x1000000) & 0xff) >= 12 THEN N'
AND t.is_memory_optimized = 0'
ELSE N'' END + N'
GROUP BY t.object_id, ix.name, ix.is_unique, ix_columns
' + ISNULL(N'HAVING SUM(p.rows) >= ' + CONVERT(nvarchar,@MinimumRowsInTable), N'')
IF CONVERT(varchar(300),SERVERPROPERTY('Edition')) = 'SQL Azure'
BEGIN
exec (@CMD)
END
ELSE
BEGIN
SET @CMD = N'
IF EXISTS (SELECT * FROM sys.databases WHERE database_id > 4 AND name = ''?'' AND state_desc = ''ONLINE'' AND DATABASEPROPERTYEX(name, ''Updateability'') = ''READ_WRITE'')
BEGIN
USE [?];'
+ @CMD + N'
END'
exec sp_MSforeachdb @CMD
END
Add recommendations based on missing index stats
UPDATE t
SET candidate_columns_from_missing_index = mi.indexColumns
FROM #temp_heap AS t
CROSS APPLY
(
SELECT TOP 1 ISNULL(mid.equality_columns, mid.inequality_columns) AS indexColumns
FROM sys.dm_db_missing_index_group_stats AS migs
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON (migs.group_handle = mig.index_group_handle)
INNER JOIN sys.dm_db_missing_index_details AS mid
ON (mig.index_handle = mid.index_handle)
WHERE mid.object_id = OBJECT_ID(QUOTENAME(t.[database_name]) + N'.' + t.full_table_name) AND mid.database_id = DB_ID(t.database_name)
GROUP BY ISNULL(mid.equality_columns, mid.inequality_columns)
ORDER BY MAX(LEN(mid.included_columns) LEN(REPLACE(mid.included_columns, ', [', ''))) DESC
, SUM(migs.avg_user_impact * migs.avg_total_user_cost) DESC
, SUM(migs.user_scans) DESC, SUM(migs.user_seeks) DESC
) AS mi
WHERE t.candidate_index IS NULL — filters for only those without existing recommendation
DECLARE Tabs CURSOR
FAST_FORWARD READ_ONLY
FOR
SELECT database_name, object_id, full_table_name
FROM #temp_heap AS t
WHERE t.candidate_columns_from_missing_index IS NULL AND t.candidate_index IS NULL — filters for only those without existing recommendation
OPEN Tabs
FETCH NEXT FROM Tabs INTO @CurrDB, @CurrObjId, @CurrTable
WHILE @@FETCH_STATUS = 0
BEGIN
Get additional metadata for current table
DECLARE @FirstIndex SYSNAME, @IsUnique BIT, @FirstIndexColumns NVARCHAR(MAX), @IdentityColumn SYSNAME, @FirstDateColumn SYSNAME, @FirstDateColumnDefault NVARCHAR(MAX);
SET @CMD = N'SELECT TOP 1
@FirstIndex = name,
@IsUnique = ix.is_unique,
@FirstIndexColumns =
STUFF((
SELECT '', '' + QUOTENAME(c.name) + CASE ic.is_descending_key WHEN 1 THEN '' DESC'' ELSE '' ASC'' END
FROM ' + QUOTENAME(@CurrDB) + N'.sys.index_columns AS ic
INNER JOIN ' + QUOTENAME(@CurrDB) + N'.sys.columns AS c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE ic.object_id = ix.object_id AND ic.index_id = ix.index_id AND ic.is_included_column = 0
FOR XML PATH('''')
), 1, 2, '''')
FROM ' + QUOTENAME(@CurrDB) + N'.sys.indexes AS ix
OUTER APPLY (SELECT SUM(CASE WHEN is_included_column = 1 THEN 1 ELSE 0 END) AS included_columns, COUNT(*) AS indexed_columns
FROM ' + QUOTENAME(@CurrDB) + N'.sys.index_columns AS ic WHERE ic.object_id = ix.object_id AND ic.index_id = ix.index_id) AS st
WHERE object_id = @ObjId AND index_id > 0
ORDER BY ix.is_unique DESC, included_columns DESC, indexed_columns ASC, index_id ASC;
SELECT @IdentityColumn = [name]
FROM ' + QUOTENAME(@CurrDB) + N'.sys.identity_columns
WHERE object_id = @ObjId;
SELECT TOP 1 @FirstDateColumn = c.[name], @FirstDateColumnDefault = dc.[definition]
FROM ' + QUOTENAME(@CurrDB) + N'.sys.columns AS c
LEFT JOIN ' + QUOTENAME(@CurrDB) + N'.sys.default_constraints AS dc
ON c.default_object_id = dc.object_id
AND c.object_id = dc.parent_object_id
WHERE c.object_id = @ObjId
AND c.system_type_id IN
(SELECT system_type_id FROM ' + QUOTENAME(@CurrDB) + N'.sys.types WHERE precision > 0 AND (name LIKE ''%date%'' OR name LIKE ''%time%''))
ORDER BY CASE WHEN dc.[definition] IS NOT NULL THEN 0 ELSE 1 END ASC, c.column_id ASC;'
PRINT @CMD;
SET @FirstIndex = NULL;
SET @IdentityColumn = NULL;
SET @FirstDateColumn = NULL;
SET @FirstDateColumnDefault = NULL
EXEC sp_executesql @CMD
, N'@ObjId INT, @FirstIndex SYSNAME OUTPUT, @IsUnique BIT OUTPUT, @FirstIndexColumns NVARCHAR(MAX) OUTPUT, @IdentityColumn SYSNAME OUTPUT, @FirstDateColumn SYSNAME OUTPUT, @FirstDateColumnDefault NVARCHAR(MAX) OUTPUT'
, @CurrObjId, @FirstIndex OUTPUT, @IsUnique OUTPUT, @FirstIndexColumns OUTPUT, @IdentityColumn OUTPUT, @FirstDateColumn OUTPUT, @FirstDateColumnDefault OUTPUT
IF @FirstIndex IS NOT NULL
BEGIN
——————-
Add recommendations based on existing non-clustered indexes (even if no existing usage stats or missing index stats found)
——————-
UPDATE #temp_heap SET candidate_index = QUOTENAME(@FirstIndex) + N' (no usage)'
, candidate_columns_from_existing_index = @FirstIndexColumns
, is_unique = @IsUnique
WHERE database_name = @CurrDB AND object_id = @CurrObjId AND candidate_index IS NULL
END
IF @IdentityColumn IS NOT NULL
BEGIN
——————-
Add recommendations based on identity column
——————-
UPDATE #temp_heap SET identity_column = QUOTENAME(@IdentityColumn)
, is_unique = ISNULL(is_unique, 1)
WHERE database_name = @CurrDB AND object_id = @CurrObjId;
END
IF @FirstDateColumn IS NOT NULL
BEGIN
Add recommendation based on the first date/time column
UPDATE #temp_heap SET first_date_column = QUOTENAME(@FirstDateColumn)
, is_unique = ISNULL(is_unique, 0)
WHERE database_name = @CurrDB AND object_id = @CurrObjId;
END
IF @FirstIndex IS NULL — Performs check only if no previous recommendations found
BEGIN
——————-
Get recommendations based on most selective column based on statistics
——————-
Get list of table columns
DECLARE @Columns AS TABLE (colName SYSNAME);
SET @CMD = N'SELECT name FROM ' + QUOTENAME(@CurrDB) + N'.sys.columns WHERE object_id = @ObjId AND is_computed = 0'
INSERT INTO @Columns
EXEC sp_executesql @CMD, N'@ObjId INT', @CurrObjId
Generate and run SHOW_STATISTICS command
SET @CMD = N'USE ' + QUOTENAME(@CurrDB) + N';
SET NOCOUNT ON;'
SELECT @CMD = @CMD + N'
BEGIN TRY
DBCC SHOW_STATISTICS(' + QUOTENAME(@CurrTable, '"') COLLATE database_default + N', ' + QUOTENAME(colName) COLLATE database_default + N') WITH DENSITY_VECTOR, NO_INFOMSGS;
END TRY
BEGIN CATCH
PRINT ERROR_MESSAGE()
END CATCH'
FROM @Columns
DECLARE @DensityStats AS TABLE (AllDensity FLOAT, AvgLength FLOAT, Cols NVARCHAR(MAX));
INSERT INTO @DensityStats
EXEC(@CMD);
IF @@ROWCOUNT > 0
BEGIN
Set most selective column
UPDATE #temp_heap
SET most_selective_column_from_stats =
(
SELECT TOP 1 QUOTENAME(Cols)
FROM @DensityStats
ORDER BY AllDensity ASC, AvgLength ASC
)
, is_unique = ISNULL(is_unique, 0)
WHERE
database_name = @CurrDB
AND object_id = @CurrObjId;
END
END
Re-init for next iteration
DELETE @Columns;
DELETE @DensityStats;
FETCH NEXT FROM Tabs INTO @CurrDB, @CurrObjId, @CurrTable
END
CLOSE Tabs
DEALLOCATE Tabs
Output results
SELECT
Details = 'Database:' + QUOTENAME([database_name]) + ', Heap Table:' + full_table_name
+ COALESCE(
N', candidate INDEX: ' + t.candidate_index + ISNULL(N' (' + t.candidate_columns_from_existing_index + N')', N'')
, N', candidate column(s) from MISSING INDEX stats: ' + t.candidate_columns_from_missing_index
, N', IDENTITY column: ' + t.identity_column
, N', first DATE/TIME column: ' + t.first_date_column
, N', most SELECTIVE column: ' + t.most_selective_column_from_stats
, N', NO RECOMMENDATION POSSIBLE')
, Script = N'USE ' + QUOTENAME(t.database_name)
+
CASE
WHEN t.candidate_index IS NOT NULL AND t.candidate_columns_from_existing_index IS NULL THEN N'; — Recreate as clustered index: ' + t.candidate_index
WHEN t.candidate_index IS NOT NULL AND t.candidate_columns_from_existing_index IS NOT NULL THEN N'; DROP INDEX ' + t.candidate_index + ' ON ' + t.full_table_name
+ N'; CREATE ' + CASE WHEN t.is_unique = 1 THEN 'UNIQUE ' ELSE N'' END + N'CLUSTERED INDEX ' + t.candidate_index + ' ON ' + t.full_table_name
+ N' (' + t.candidate_columns_from_existing_index + N')' + @RebuildOptions
ELSE
N'; CREATE ' + CASE WHEN t.is_unique = 1 THEN 'UNIQUE ' ELSE N'' END + N'CLUSTERED INDEX IX_clust ON ' + t.full_table_name
+ N' ('
+ COALESCE(
t.candidate_columns_from_missing_index,
t.identity_column,
t.first_date_column,
t.most_selective_column_from_stats
)
+ N')' + @RebuildOptions
END
, *
FROM #temp_heap AS t
DROP TABLE #temp_heap
I’ve had the chance to use this script in several environments, and it works very well for most common scenarios.

WARNING! CONSIDER THE FOLLOWING BEFORE MAKING ANY ACTUAL CHANGES:

  • DO NOT APPLY the recommendations from the above script blindly!
    The script only generates estimated recommendations based on the little information it can gather from system tables.
    You would have to use precaution and careful discretion to consider what would be the right clustered index per each table.
  • You would need to consult with the developers / product managers to understand the implications of making index changes in the database schema and whether they agree to it (for example, if the same changes would also need to be made in some kind of a database source control, or when the database belongs to a vendor and making changes to it would void the warranty).
  • Be extremely careful when dealing with very large tables, especially when the SQL Server edition is not Enterprise and therefore doesn’t support ONLINE index operations. In which case: This probably means DOWNTIME. Prepare accordingly.

The script also generates the actual CREATE script for these clustered indexes, but as I said, be careful not to run those scripts blindly without thinking first. So, while the script will save you significant time and give you helpful ideas, you should still evaluate each recommendation and make sure it’s fine.

Got any comments? Ideas? Let me know below.

Additional Resources:

Cover Image by PublicDomainPictures from Pixabay

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.