T-SQL Tuesday #136 – Most and Least Favorite Data Type

This month’s T-SQL Tuesday is hosted by Brent Ozar (b|t), who asks us to Blog about our Favorite Data Type (or Least Favorite).

Link back to Brent’s invitation post

I find it actually quite easy to decide both on a most-favorite, as well as a least-favorite data type in SQL Server. So, why not blog about both?

Here we go.

Favorite Data Type: xml

The xml data type in SQL Server exists since about SQL Server 2005, and it introduced a lot of very powerful and useful capabilities that were never before seen in SQL Server. It was the first-ever data type with built-in CLR methods (i.e. where you can write a dot after a column name and execute some kind of method. For example: mycolumn.nodes(…)).

The xml data type and the functionality around it made it relatively easy to “refactor” a resultset from a relational structure into a scalar structure (i.e. a single XML document) using the FOR XML directive, and vice versa (single XML document into a relational structure) using the nodes(), value(), and query() methods.

It also introduced the strengths of XQuery and XPath into SQL Server, which can allow us to perform all kinds of cool stuff with our XML documents. For example, my script provided in the blog post How to Troubleshoot THREADPOOL Waits and Deadlocked Schedulers heavily relies on XQuery and XPath to do meaningful manipulations with the XML data, without first having to restructure it in a relational structure.

The xml data type also supports XML Schema Collections, which are basically XSD documents that let you enforce a specific structure for your XML documents, enforce specific data types for attributes and elements within your XML, and this even optimizes data storage and queries.

Speaking of which, SQL Server Radio Episode 122 focuses on the XML data type, its history, purpose, and real strengths. Check it out, if you haven’t yet!

Also, XML indexes can be created on xml data type columns. They index all tags, values and paths over the XML instances in the column and benefit query performance. That’s almost like an implementation of a NoSQL document database inside the SQL Server engine! (think CosmosDB or MongoDB, but with XML documents instead of JSON)

The functionality around the XML data type also lets us do all kinds of very tricky implementations that would otherwise be very difficult in SQL Server. For example:

A specialized solution for aggregate string concatenation using the PATH mode of the FOR XML query option:

SELECT groupid,
  STUFF((SELECT ',' + string AS [text()]
         FROM dbo.Groups AS G2
         WHERE G2.groupid = G1.groupid
         ORDER BY memberid
         FOR XML PATH('')), 1, 1, '') AS string
FROM dbo.Groups AS G1
GROUP BY groupid;

(source)

Converting binary values to base64 and back:

declare @source varbinary(max), @encoded varchar(max), @decoded varbinary(max)
set @source = convert(varbinary(max), 'Hello Base64')
set @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)')
set @decoded = cast('' as xml).value('xs:base64Binary(sql:variable("@encoded"))', 'varbinary(max)')

select
convert(varchar(max), @source) as source_varchar,
@source as source_binary,
@encoded as encoded,
@decoded as decoded_binary,
convert(varchar(max), @decoded) as decoded_varchar

(source)

Also note how you can reference SQL variables from inside your XQueries, as demonstrated above, and do the same for SQL columns too.

And of course, the ability to add or update nodes inside your XML documents, without having to rewrite or decode the whole thing, with the help of the modify() method.

None of the capabilities mentioned above are available with the newer JSON capabilities introduced in SQL Server 2016. Not even close. Although, who knows? Perhaps one day it’ll catch up. Perhaps.

In terms of relational database design, it’s not exactly the best choice. But it’s perfect when you have no choice but to combine your relational database with some non-relational data.

In conclusion: No other data type in SQL Server is as robust and flexible as XML, so it’s a very easy choice to make it as my favorite!

Least Favorite Data Type: sql_variant

And then there’s this a**hole.

Look, I’m not gonna waste too many words on this one, but I’ll say this:

  • You cannot properly use values stored in sql_variant columns unless you explicitly convert them back to their original data type first.
  • Its storage is pretty yucky.
  • It’s not fully supported by ODBC.
  • It’s not valid for aggregate functions.
  • It’s janky and unexpected when used in comparison predicates.
  • Creating indexes on it becomes either impossible, or not feasible.

(check out this article from Red-Gate for more details)

And overall, if you have a sql_variant column anywhere in your database, that probably means you did something terribly wrong while designing your database schema. Go back to that normalization class, re-learn your normal forms, and redo your ERD!

(by the way, Ami Levin delivered a wonderful session on the topic of normalization and database design, in Hebrew, at the Israeli Data Platform Meetup called the SE7EN Deadly Sins of Database DesignPart 1 and Part 2. Unfortunately, I couldn’t find an English version of this session, but you’re welcome to bug Ami about it!).

2 comments

Leave a Reply to #TSQL2sday 136 Wrap-Up: Your Favorite (and Least Favorite) Data Types - Brent Ozar Unlimited® Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.