You might have seen this blog post by Paul White: New Metadata-Only Column Changes in SQL Server 2016. It talks about how, when a table is compressed, you can change (i.e. increase) a column’s data type without having to create a new table and migrate the data into it, and without having to refactor all of the underlying data. But, not everything is rainbows and unicorns…
This is an excellent mechanism on the one hand…
However, it’s completely useless when the column you want to change has a CLUSTERED INDEX defined on it (regardless of whether it’s also a PRIMARY KEY or not).
Such a scenario would especially be common with IDENTITY columns (which, ironically, is exactly the kind of examples that Paul presented in his post).
Actually no, you can’t
First, changing a column’s data type is not allowed as long as there’s any index defined for it. This is the worst possible limitation that could exist in SQL Server in this regard, because this limitation alone makes everything much worse.
To prove this point, I’m going to copy one of Paul’s last examples word-for-word:
-- This is copied from Paul White's post:
-- https://sqlperformance.com/2020/04/database-design/new-metadata-column-changes-sql-server-2016
CREATE TABLE dbo.Test
(
id integer IDENTITY NOT NULL
CONSTRAINT [PK dbo.Test id]
PRIMARY KEY CLUSTERED
WITH (DATA_COMPRESSION = PAGE),
some_value integer NOT NULL
INDEX [IX dbo.Test some_value]
NONCLUSTERED
WITH (DATA_COMPRESSION = ROW)
);
GO
-- Verify current compression status
SELECT * FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.Test')
GO
-- Now try to change the column data type
-- This will fail because of the Primary Key
ALTER TABLE dbo.Test ALTER COLUMN id BIGINT NOT NULL;
This is what the last command gives us:
Msg 5074, Level 16, State 1, Line 43
The object ‘PK dbo.Test id’ is dependent on column ‘id’.
Msg 4922, Level 16, State 9, Line 43
ALTER TABLE ALTER COLUMN id failed because one or more objects access this column.
This was tested on a machine with SQL Server 2019 Developer Edition installed, and highest database compatibility level.
Don’t even
Secondly, dropping and re-creating a CLUSTERED index would be equivalent to (or even worse than) rebuilding the ENTIRE table. If you drop the clustered index, it would also rebuild all non-clustered indexes in order to replace their CLUSTERED key with RID. Then, when you eventually re-create the clustered index, it would mean rebuilding all non-clustered indexes AGAIN in order to replace the RID with the CLUSTERED key again.
Worst of all: NONE of these operations can truly be ONLINE. All of them are BLOCKING operations, even if you specify the ONLINE option.
So, what are we left with? What would be our best method of operations?
- Drop all non-clustered indexes (so that dropping the clustered index would be faster).
- Drop the clustered index.
- Compress the table (if it’s not yet compressed).
- Change the column data type.
- Re-create the clustered index.
- Re-create the non-clustered indexes.
But, now that it has come to this, we have basically lost any benefit that this “metadata-only change” feature could give us. We had to make huge data modifications (rebuilding the entire table and all indexes), and most of our operations are blocking operations (even if we specify the ONLINE option). Also, every index that we drop is a hit to performance for any queries that might need to be running during this time, and if we need to drop a primary key then that’s an added hit to functionality and data integrity (even worse if our table is being referenced via Foreign Keys by other tables).
Back to square one again
At this point, the “old-fashioned” alternative would be much better by magnitudes: i.e. create a new table with the new data type, gradually migrate the old data into it, and then rename the tables to make them switch places.
Something like this:
-- Create staging table
CREATE TABLE [dbo].[Test_bigint](
id bigint IDENTITY NOT NULL
CONSTRAINT [PK dbo.Test_bigint id]
PRIMARY KEY CLUSTERED
WITH (DATA_COMPRESSION = PAGE),
some_value integer NOT NULL
INDEX [IX dbo.Test some_value]
NONCLUSTERED
WITH (DATA_COMPRESSION = ROW)
) ON [PRIMARY]
GO
-- Migrate data in chunks
DECLARE @ChunkSize INT = 50000
SET XACT_ABORT, ARITHABORT, NOCOUNT ON;
RAISERROR(N'Moving data...', 0,1) WITH NOWAIT;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DROP TABLE IF EXISTS #inserted;
CREATE TABLE #inserted
(
[id] [bigint] NOT NULL PRIMARY KEY WITH(IGNORE_DUP_KEY=ON)
);
WHILE 1=1
BEGIN
SET IDENTITY_INSERT [dbo].[Test_bigint] ON;
BEGIN TRANSACTION
INSERT INTO [dbo].[Test_bigint]
([id], [some_value])
OUTPUT inserted.[Id] INTO #inserted
SELECT TOP (@ChunkSize)
[id], [some_value]
FROM
[dbo].[Test] WITH(PAGLOCK, XLOCK, HOLDLOCK)
ORDER BY [id] ASC
IF @@ROWCOUNT = 0
BEGIN
IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
BREAK;
END
DELETE FROM [dbo].[Test]
WHERE [id] IN (SELECT [id] FROM #inserted);
COMMIT TRANSACTION
SET IDENTITY_INSERT [dbo].[Test_bigint] OFF;
TRUNCATE TABLE #inserted;
END
GO
-- Rename tables
PRINT N'Renaming tables...'
EXEC sp_rename '[dbo].[Test]', 'Test_int'
EXEC sp_rename '[dbo].[Test_bigint]', 'Test'
PRINT N'Renaming constraints...'
EXEC sp_rename '[dbo].[PK dbo.Test id]', 'PK dbo.Test_int id'
EXEC sp_rename '[dbo].[PK dbo.Test_bigint id]', 'PK dbo.Test id'
GO
This works, and it would be a much more “ONLINE” operation than the “new” method.
In conclusion, this new “metadata-only change” feature could only ever give us a real benefit if the columns we want to change are either not indexed, or if they have non-clustered indexes defined on them which wouldn’t be too troublesome to drop and re-create.
P.S.
Regardless of this disappointment, data compression in SQL Server is still awesome with a bunch of other benefits, and you should definitely use it.
Pingback: The Limitations of Metadata-Only Updates – Curated SQL
for rebuilding table (dbo.Test), you could
* create the new dbo.Test_bigint
* rename dbo.Test to dbo.Test_int
* create a view dbo.Test with an UNION ALL between the dbo.Test_int (old) and dbo.Test_bigint (new)
* in the WHILE loop use DELETE TOP (@chunk_size) FROM dbo.Test_int OUTPUT Deleted.* INTO dbo.Test_bigint
Particularly for really large tables (billions of rows) this would be much faster than the example above, because it does not need a WHERE id NOT IN (SELECT id from #tmp). And of course because it doesn’t even need this temp table
And it is more secure, since you don’t risk to copy data twice because your connection was closed for whatever reason and the #temp-Table dropped.
PS: when you still need to insert into the table while copying it, you need an INSTEAD OF INSERT trigger on the view dbo.Test which simply inserts the data into the dbo.Test_BigInt