Since SQL Server 2005, Microsoft had a nice addition to the common DML statements,
DELETE, by allowing the use of the TOP keyword. Well, SQL Server 2019 has just been released, and yet unfortunately for many of us, SQL Server still doesn’t support the use of the
TOP keyword in direct combination with the ORDER BY clause. So when we do need to update or delete the top x rows in a table based on a specific order, we’re bound to use all kinds of workarounds.
However, many of the commonly used workarounds are not as efficient as they can be.
The Common Workarounds
Here’s an example of one such common method to update the top x rows in a table based on a specific order:
UPDATE MyTable SET Col1 = Whatever WHERE ID IN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC)
As you can see, this method uses a subquery to get the list of unique IDs of the top 100 rows that we want to update. The problem with this method is that we’re accessing the table twice and performing a JOIN between the two sets of data (even though we didn’t explicitly write “
JOIN” in the statement, that’s still what SQL Server does in the background).
There are several variations to this method. Here’s another example:
UPDATE MyTable SET Col1 = Whatever FROM MyTable JOIN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC) AS ToDelete ON MyTable.ID = ToDelete.ID
This is an explicit JOIN statement with a subquery. This practically results in the same execution plan as the previous method, except that it allows the use of composite primary keys in the JOIN condition.
Another method is to do the same by first putting the results of the subquery in a temporary table and then executing the
UPDATE by joining with that temporary table… But that would result in even worse performance due to the extra overhead. All it does is the same as the previous two methods, with an extra step in the middle.
Another method is to use the SET ROWCOUNT statement which would limit any statement to affect a specific maximum number of rows. However, this method is strongly not recommended because unlike the TOP keyword, SQL Server doesn’t take it into consideration when building the execution plan – which could result in an execution plan optimized for many rows of data even though we put a small number in
SET ROWCOUNT. Also, small detail: Microsoft announced that this command will no longer affect DML statements in the future. So, the whole thing is deprecated anyway.
All of the above is relevant to
DELETE statements as well.
So, what’s the best way to do this?
I found that the best way to execute a DML statement on a limited number of rows based on specific ordering is actually by updating the subquery itself. This ability in SQL Server is not trivial to all DBAs and a lot of people don’t even know that it’s possible, which is a real shame, because Microsoft actually mentions it in the documentation of the TOP clause. Here’s an example of what I mean:
UPDATE TheSubQuery SET Col1 = Col1 + Col3 FROM ( SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC ) AS TheSubQuery
Note that I need to select in the subquery all the columns that are used outside the subquery (such as
Col3). And any data that you’re using outside the subquery must be available (so, suppose you perform a JOIN with another table, the columns used in the join must also be returned by the sub query).
Notice how I’m accessing the table only once (you can see it in the execution plan if you test it yourself), and obviously not doing any self-joins as a result of that.
This method works similarly using CTE:
WITH TheCTEQuery AS ( SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC ) UPDATE TheCTEQuery SET Col1 = Col1 + Col3 FROM TheCTEQuery
And even by creating a VIEW instead of the subquery, and updating that view:
CREATE VIEW MyView AS SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC GO UPDATE MyView SET Col1 = Col1 + Col3 FROM MyView
The result and performance of all three methods are practically the same.
Again, all of the above is relevant to
DELETE statements as well.
It is indeed a shame that Microsoft added the TOP keyword to DML statements but castrated it this much, since the combination of
ORDER BY seems quite trivial. So until this ability will be built-in, I hope you can make good use of the above workarounds that I’ve showed you.
This article was originally published by Eitan Blumin on April, 2011, in www.madeiradata.com