WHERE ID IN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC)
As you can see, this method uses a subquery to get the list of unique IDs of the top 100 rows that we want to update. The problem with this method is that we’re accessing the table twice and performing a JOIN between the two sets of data (even though we didn’t explicitly write “JOIN” in the statement, that’s still what SQL Server does in the background).
There are several variations to this method. Here’s another example:
UPDATE MyTable SET Col1 = Whatever
FROM MyTable JOIN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC) AS ToDelete ON MyTable.ID = ToDelete.ID
This is an explicit JOIN statement with a subquery. This practically results in the same execution plan as the previous method, except that it allows the use of composite primary keys in the JOIN condition.
Another method is to do the same by first putting the results of the subquery in a temporary table and then doing the UPDATE by joining with that temporary table, but that would result in even worse performance due to the extra overhead. All it does is the same as the previous two methods, with an extra step in the middle.
Another method is to use the SET ROWCOUNT statement which would limit any statement to affect a specific maximum number of rows. However, this method is strongly not recommended because unlike the TOP keyword, SQL Server doesn’t take it into consideration when building the execution plan – which could result in an execution plan optimized for many rows of data even though we put a small number in SET ROWCOUNT. Also, small detail: Microsoft announced that this command will no longer affect DML statements in the future. So, the whole thing is deprecated anyway.
All of the above is relevant to DELETE statements as well.
Note that I need to select in the subquery all the columns that are used outside the subquery (such as Col1 and Col3). And any data that you’re using outside the subquery must be available (so, suppose you perform a JOIN with another table, the columns used in the join must also be queried in the sub query).
Notice how I’m accessing the table only once (you can see it in the execution plan if you test it yourself), and obviously not doing any self-joins as a result of that.
This method works similarly using CTE:
The result and performance of all three methods are practically the same.
Again, all of the above is relevant to DELETE statements as well.