Dynamic Search Queries
SQL Injection is a widely known issue with databases that work as back-ends for websites and applications. In the most part, protecting your website from SQL Injection is fairly easy and you can find a lot of resources on that around the internet. Usually, this issue is solved by using stored procedures with strongly-typed parameters, thus ensuring that non-numerical values won’t be entered instead of numerical values, and textual values won’t be broken by single quotes and such.
But, what if “SQL Injection” is actually a part of your website’s normal intended logic?
What I’m talking about is “Dynamic Search Queries”, or “Dynamic Search Conditions”, which is a popular method of building a dynamic SQL command based on a set of optional parameters, and then executing it using sp_executesql.
There are a lot of resources on the internet about how to implement this method, so I won’t go into detail about it here. However, I strongly recommend this article by Erland Sommarskog (SQL Server MVP) about dynamic search conditions. Specifically, read the part about “Dynamic SQL” and “sp_executesql”.
After you read that article, you’d know that in such scenarios you know which parameters to expect and which operands to use per each parameter. This method will be enough for most applications and websites.
But the scenario which I really want to talk about is where you actually don’t know in advance which parameters to expect, and you don’t know in advance the operand to use per each parameter.
TL;DR: Scroll to the bottom of this post for a downloadable script.
Dynamic Search Parameterization
For example, suppose our dynamic search procedure uses the column “CustomerName”. However, there’s only one parameter provided for this column (called @CustomerName) and it’s used with a “contains” kind of predicate:
CustomerName LIKE '%' + @CustomerName + '%'
But, what if I want to give the user the power to choose the kind of operand they want for this parameter? For example, “starts with”?:
CustomerName LIKE @CustomerName + '%'
Or, what if I want to let the user choose several different values to filter? For example:
CustomerName LIKE @CustomerNameStartsWith + '%' AND CustomerName LIKE '%' + @CustomerName_Contains_1 + '%' AND CustomerName LIKE '%' + @CustomerName_Contains_2 + '%'
Now that’s a little more complicated, isn’t it?
Sure, I could add several new parameters to the procedure for the different variations, but eventually it’ll become difficult to maintain, and there will always be a limit to what the user will be able to accomplish (without resorting to SQL Injection) – because there’s a limit to the number of parameters in a stored procedure.
Let’s take a look at another dynamic search example, suppose we have a column “Age” which is of a numerical type (e.g. int). What if instead of “minimum” and “maximum” values, I want to give the user the ability to provide a list of values for this column. For example:
Age IN (5, 8, 10, 20)
I could use a single parameter for this column which I concatenate its value to the query. For example, the value of the parameter @AgeValues will literally be “5, 8, 10, 20” and I’ll concatenate it to the query like this:
SET @SQL = @SQL + N' AND Age IN (' + @AgeValues + N')'
But, this is obviously a major SQL Injection security gap.
Usually, from what I’ve seen, organizations implement such dynamic search solutions within the software code itself – dynamically constructing an Ad-Hoc query and executing it, instead of using a stored procedure with parameters. Sometimes they’d even implement a clever function to parse all the parameters and verify each of them based on its data type.
But all of these dynamic search solutions are either extremely difficult to maintain, or expose the system to dangerous SQL Injection attacks on some level.
The Solution: XML
After I was faced with exactly such a problem, I finally came up with the solution: XML.
First, you’d need two important “utility” tables in your database:
- FilterColumns – You’ll need a table which will define, for your front-end application as well as for your back-end database, which columns can be filtered using the user interface. Each row in this table will represent a table column which can be filtered, including its data type, and a unique identifier (textual or numeric).
- FilterOperands – You’ll also need a table with the available operand types that you can apply on any of the columns (starts with, greater than, equals to, contains, etc.). Each such an operand needs to provide you with information on how to build its syntax and integrate into it the relevant column name and the value used for the filter. Each operand will have a unique identifier (textual or numeric).
Next, you’ll need to design an acceptable “schema format” for your parameter values which you’ll pass on as XML. For example:
<Parameters> <ParameterValue colID="1" operandID="1">John</ParameterValue> <ParameterValue colID="1" operandID="1">Smith</ParameterValue> <ParameterValue colID="2" operandID="2"> <Value>Texas</Value> <Value>New York</Value> <Value>California</Value> </ParameterValue> <ParameterValue coldID="3" operandID="3">1980-01-15</ParameterValue> </Parameters>
The sample XML above assumes that:
- We’re querying from a table of people (e.g. Customers) that have columns with their full name, country and date of birth.
- In the FilterColumns table that we have in the database, the FullName column is identified by “1”, the Country column is identified by “2” and DateOfBirth is identified by “3”.
- In the FilterOperands table, the Contains operand is identified by “1”, the IN operand is identified by “2” and the Greater Than operand is identified by “3”.
The XML will be built in the front-end application and delivered to the database as a parameter for a stored procedure.
If we were to assume that this criteria was parsed in old-fashioned methods, the final query would have looked something like this:
SELECT * FROM Customers WHERE FullName LIKE '%John%' AND FullName LIKE '%Smith%' AND Country IN ('Texas', 'New York', 'California') AND DateOfBirth > '1980-01-15'
As you can probably guess, this is a nightmare in terms of SQL Injection. But, using the method I’m suggesting, the final query will actually look like this:
DECLARE @p1 NVARCHAR(100); SET @p1 = @XMLParameters.value('(/Parameters/ParameterValue)', 'nvarchar(100)') DECLARE @p2 NVARCHAR(100); SET @p2 = @XMLParameters.value('(/Parameters/ParameterValue)', 'nvarchar(100)') DECLARE @p3 TABLE (Value NVARCHAR(100)); INSERT INTO @p3 SELECT X.query('.').value('(/Value)', 'nvarchar(100)') FROM @XMLParameters.nodes('(/Parameters/ParameterValue)' AS T3(X) DECLARE @p4 DATETIME; SET @p4 = @XMLParameters.value('(/Parameters/ParameterValue)', 'datetime') SELECT * FROM Customers WHERE 1=1 AND FullName LIKE '%' + @p1 + '%' AND FullName LIKE '%' + @p2 + '%' AND Country IN (SELECT Value FROM @p3) AND DateOfBirth > @p4
Yeah, it looks a ugly and complicated, but as you can see, there’s no SQL Injection of a parameter value anywhere in the code.
This query is built by parsing the provided parameters in the XML one-by-one.
Using the “colID” attribute, I can make sure I’m filtering only on columns that exist in the FilterColumns table.
Using the “operandID” attribute, I can make sure I’m using only operands from the FilterOperands table with correct syntax.
Using the declaration of the local variables that receive the values from the XML, I make sure the values from the user are not injected into any code, but instead they’re accessed using XML methods.
This method is completely protected from SQL Injection as far as I can see, and it has the potential to provide unlimited dynamic control for the users on how they wish to filter their queries.
I’ve provided sample scripts at the bottom of this post. It contains two script files:
- FilterParseXMLParameters CreateScript.sql
- FilterParseXMLParameters Testing.sql
The implementation is performed using 2 procedures:
1. FilterParseXMLParameters (in FilterParseXMLParameters CreateScript.sql)
This is the “central” parsing procedure which receives the parameter set in XML form, creates the dynamic search query, and returns it as an output parameter.
This procedure is as generic as possible, so one of its parameters is the “base query” as well. Executing this procedure directly from the GUI is obviously a huge security risk for SQL Injection. In general, the same user running the GUI / application should not even have permission at all to execute this procedure, because you can potentially do anything with it using SQL Injection.
2. SampleReportProc (in FilterParseXMLParameters Testing.sql)
This is a sample “secondary” procedure that’s meant to be executed directly from the application. It only receives the XML of parameters as input, uses the central procedure (FilterParseXMLParameters) with a hard-coded “base query”, and then executes the parsed SQL command with sp_executesql.
This procedure is not generic. The idea is to create multiple such procedures, each with a different “base query” which would be hard-coded inside the procedure. And all of these different procedures will use the central FilterParseXMLParameters procedure to perform the parsing itself.
Parameter Sniffing and Recompilations
There is one potential problem with this method – re-compilations.
Since I’m declaring local variables, SQL Server won’t be able to perform “Parameter Sniffing” on the values. This would potentially create incorrect execution plans. The solution for this, is to add OPTION(RECOMPILE) at the end of the query. Of course, this in turn will cause a different problem – recompilations. If there are a lot of concurrent users executing these dynamic queries, it could potentially cause heavy CPU load (depends on the hardware and workload).
On the other hand, we probably want these re-compilations because the values are probably very different between every execution, and we won’t necessarily want the first set of values to determine the execution plan in the plan cache.
In any case, whether the query uses OPTION (RECOMPILE) or not is determined by a boolean parameter in FilterParseXMLParameters called @ForceRecompile. You may change its default value, specify different values in different “secondary” procedures, all depending on your specific needs.
I know this is probably a lot to swallow, but I suggest you download the scripts, review them and try to understand how they work.
It’ll be especially interesting to see if anyone can find a security hole in this method. I like a good challenge 🙂
The scripts are available for download from my GitHub gist here:
As always, if you have any comments, questions, suggestions etc. please post them in the comments below!
This article was originally published by Eitan Blumin on August 2012, at www.madeiradata.com