Tell me if this situation sends a chill down your spine: You’ve written a stored procedure, tested it against a variety of inputs, and finally rolled it out in production. All is well… Or so you think. You start getting complaints from some users that it’s taking forever to return. But other users are having no problem. What the..?
Veteran DBAs will know right away what’s going on (even without reading the title of this post!) — but for those of you who haven’t had the pleasure of debugging these kinds of things, the answer is that cached execution plans are not always as wonderful for performance as we might like.
For any given query, there are numerous possible execution plans that the query optimizer can come up with. Some of them are optimal, some are less than optimal. But in the end, it’s the job of the query optimizer to decide which one to use (hopefully, the optimal one). If a stored procedure is executed and its does not have a query plan in cache, whatever execution plan the optimizer decides to use will be cached for next time. This is usually a good thing — it can be quite a bit of work for the optimizer to make that decision.
But in some cases, this is where the trouble begins. One of the main factors the optimizer uses is index statistics vs. what parameters are being used for the query. This can greatly affect what the ‘correct’ execution plan is — the optimizer must decide such things as which index should be used, whether a seek or a scan should be performed, what types of joins are most efficient, etc. But as parameters change, so can the most appropriate choices.
To illustrate this better, some sample data will be useful. Break out your numbers table and run the following script, which will create a table with three columns, around 20 million rows, and a couple of indexes…
SELECT Number, DATEADD(ss, Number, 0) AS TheDate INTO DateTbl FROM Numbers DECLARE @Num INT, @Incr INT SELECT @Num = MAX(Number) + 1, @Incr = MAX(Number) + 1 FROM DateTbl WHILE @Num < 20000000 BEGIN INSERT DateTbl (Number, TheDate) SELECT Number + @Num, DATEADD(ss, Number + @Num, 0) AS TheDate FROM Numbers SET @Num = @Num + @Incr END CREATE UNIQUE CLUSTERED INDEX IX_Date ON DateTbl(TheDate) CREATE UNIQUE NONCLUSTERED INDEX IX_Number ON DateTbl(Number) ALTER TABLE DateTbl ADD AnotherCol VARCHAR(40) NULL UPDATE DateTbl SET AnotherCol = CONVERT(VARCHAR, Number) + CONVERT(VARCHAR, TheDate)
Okay! Now that your hard drive’s workout is done, let’s take a look at what we have… DateTbl has three columns: A sequential number, a datetime column, and a character column. You should have one row for every second between January 1, 1900 and sometime around August 21, 1900, depending on how big your numbers table is. The date column and the number column are indexed (we’ll be using those as predicates in the WHERE clause of the example queries), but the character column is not. That’s on purpose, to force a bookmark lookup. What can I say — this is a totally contrived example!
Put Query Analyzer into Show Execution Plan mode and check out the following:
SELECT Number, TheDate, AnotherCol FROM DateTbl WHERE TheDate BETWEEN '19000201 09:35:00' AND '19000201 09:36:00' OR Number = 10
Before I proceed, I would just like to say that anyone who comments or e-mails me saying that this query can be re-written with a UNION to get consistently better execution plans will be slapped upside the head with a trout. YES, this is a bad query, but as I said, this is a very simple example. In real life, these situations are usually much more difficult to re-write. So if you don’t like my example, go write your own article!
… Now that that’s taken care of …
The execution plan you should see will have a seek on each index. Makes sense — we’re looking at a very small chunk of data in each place. But what if we change the query to use a much larger date range?
SELECT Number, TheDate, AnotherCol FROM DateTbl WHERE TheDate BETWEEN '19000101' AND '19000201' OR Number = 10
A seek on the date index no longer makes sense. A range scan is a better option. And why bother seeking on the Number column? The row with the number 10 is already found within the selected range. SQL Server agrees with me on this, and performs only a scan of the clustered date index.
But now let’s see what happens when we throw this into a stored procedure. Create the following:
CREATE PROCEDURE GetStuff @StartDate DateTime, @EndDate DateTime, @Number INT AS SELECT Number, TheDate, AnotherCol FROM DateTbl WHERE TheDate BETWEEN @StartDate AND @EndDate OR Number = @Number
… And run the following in Query Analyzer with Show Execution Plan turned on …
EXEC GetStuff '19000201 09:35:00', '19000201 09:36:00', 10
Same execution plan as before! That’s great, right? Well…
EXEC GetStuff '19000101', '19000201', 10
… … …
This is taking a while …
Enjoy the break? Good, now get back to work!
Check out the execution plan. I guess the cached one wasn’t optimal for the second query. So how do we satisfy BOTH sets of arguments?
One way is to force the stored procedure to recompile each time:
ALTER PROCEDURE GetStuff @StartDate DateTime, @EndDate DateTime, @Number INT WITH RECOMPILE AS SELECT Number, TheDate, AnotherCol FROM DateTbl WHERE TheDate BETWEEN @StartDate AND @EndDate OR Number = @Number
You’ll notice that I added WITH RECOMPILE. And while that’s probably not a big deal for this example stored procedure, it isn’t a good idea for the types of really complex stored procedures where these problems crop up in the real world. Recompilation can be quite intensive, and I really don’t want it happening every time an active stored procedure is called.
But you already knew that wasn’t the solution, because in elementary school you were taught how to read, and the title of this article isn’t “Controlling Stored Procedure Caching with … WITH RECOMPILE”.
No, instead the title is, “Controlling Stored Procedure Caching with … Dyanmic SQL?!?”
Yes, dynamic SQL. If you don’t know about dynamic SQL, go read this article right now and come back when you’re finished.
You may have heard about a system stored procedure called sp_executesql. It lets you evaluate dynamic SQL, but it happens to also cache its execution plan. In addition, due to the fact that it accepts parameters, it makes SQL injection nearly impossible if correctly used. So it’s good stuff. We could evaluate our test query using sp_executesql like this:
DECLARE @SQL NVARCHAR(300) SET @SQL = '' + 'SELECT ' + 'Number, ' + 'TheDate, ' + 'AnotherCol ' + 'FROM DateTbl ' + 'WHERE TheDate BETWEEN @StartDate AND @EndDate ' + 'OR Number = @Number' EXEC sp_executesql @SQL, N'@StartDate DATETIME, @EndDate DATETIME, @Number INT', @StartDate = '19000201 09:35:00', @EndDate = '19000201 09:36:00', @Number = 10
… And that’s just wonderful, but it gives us absolutely nothing, because if you re-run it with the other parameters you’ll find that you have the same problem as the stored procedure version:
DECLARE @SQL NVARCHAR(300) SET @SQL = '' + 'SELECT ' + 'Number, ' + 'TheDate, ' + 'AnotherCol ' + 'FROM DateTbl ' + 'WHERE TheDate BETWEEN @StartDate AND @EndDate ' + 'OR Number = @Number' EXEC sp_executesql @SQL, N'@StartDate DATETIME, @EndDate DATETIME, @Number INT', @StartDate = '19000101', @EndDate = '19000201', @Number = 10
Time for another coffee break…
But let us not lose hope yet, because we’re still in the article that’s talking about how to control caching and recompilation and you know that I wouldn’t have written this article if I didn’t know the answer.
So what’s really being cached here? Let’s take a look:
SELECT sql FROM master..syscacheobjects WHERE sql LIKE '%datetbl%' AND cacheobjtype = 'Executable Plan' ----------------------------------------------------- (@StartDate DATETIME, @EndDate DATETIME, @Number INT) SELECT Number, TheDate, AnotherCol FROM DateTbl WHERE TheDate BETWEEN @StartDate AND @EndDate OR Number = @Number
The cached plan is cached for not just the query, but also a parameter list — and not just any parameter list, but the very parameter list that was passed in to sp_executesql. So how could we force SQL Server to cache a different plan for the same query?
… Change the parameter list!
The parameter list, of course, is correlated to the actual parameters passed in. But what you may not realize is that if you satisfy a parameter within the list, sp_executesql will not expect a correlated parameter to be passed in. For instance, the following is perfectly valid:
DECLARE @SQL NVARCHAR(300) SET @SQL = '' + 'SELECT @TheParam AS TheParam' EXEC sp_executesql @SQL, N'@TheParam VARCHAR(100) = ''This is the param'''
Not only that, but it’s been cached:
SELECT sql FROM master..syscacheobjects WHERE sql LIKE '%param%' AND cacheobjtype = 'Executable Plan' ----------------------------------------------------- (@TheParam VARCHAR(100) = 'This is the param') SELECT @TheParam AS TheParam
So what happens if we change our parameter’s value?
DECLARE @SQL NVARCHAR(300) SET @SQL = '' + 'SELECT @TheParam AS TheParam' EXEC sp_executesql @SQL, N'@TheParam VARCHAR(100) = ''This is the other_param'''
Same query, but…
SELECT sql FROM master..syscacheobjects WHERE sql LIKE '%other_param%' AND cacheobjtype = 'Executable Plan' ----------------------------------------------------- (@TheParam VARCHAR(100) = 'This is the other_param') SELECT @TheParam AS TheParam
Yes, a second cached execution plan! Exciting, isn’t it? Kind of like winning the lottery, only even better, because you don’t have to worry about how to spend all of that extra cash!
So how do we put this all together? A quick recap: We know that the query requires at least two execution plans; one for big date ranges, and one for smaller date ranges. There might be more, but we haven’t tested that, so I’ll leave it as an exercise for the reader. We also know that sp_executesql will cache a second, third, or Nth execution plan whenever the parameter list is changed. So all we need to do is change the parameter list depending on the inputs…
ALTER PROC GetStuff @StartDate DATETIME, @EndDate DATETIME, @Number INT AS DECLARE @SQL NVARCHAR(300) DECLARE @Params NVARCHAR(100) SET @SQL = '' + 'SELECT ' + 'Number, ' + 'TheDate, ' + 'AnotherCol ' + 'FROM DateTbl ' + 'WHERE TheDate BETWEEN @StartDate AND @EndDate ' + 'OR Number = @Number' IF DATEDIFF(hh, @StartDate, @EndDate) <= 2 SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 1' ELSE SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 2' EXEC sp_executesql @SQL, @Params, @StartDate, @EndDate, @Number
Pretending that we’ve actually tested for the correct thresholds (which you should do if you use this technique), you’ll notice that we’re forcing a different execution plan if the time between start date and end date is less than or equal to two hours (that will be an index seek) or more than two hours (that will be an index scan).
Since forcing evaluation of a new execution plan in this case is simply a matter of changing the value of @dX1, you can add as many conditions as necessary to control which cached plan is used for any given set of arguments. Two hours is almost certainly not the best choice here, but really, does it matter?
So in conclusion, blah, blah, blah… No one reads this far, you stopped after you saw the final stored procedure, didn’t you? Have a nice day, and enjoy your new, more dynamic stored procedures.
"So in conclusion, blah, blah, blah… No one reads this far" Not true, but nice article anyway. I think you might be on the track of what I’m trying to figure out. Currently I’m comparing two stored procedures (old and new). They do the same thing but in different ways. The first time I run them new runs in about 1100 ms and old in about 1600 ms. However, on the second run and all runs after that new runs about 400 ms and old runs at about 100 ms. What I’m trying to do is find something I can do to prevent this optimization on the second, third, etc runs. I’ve tried running them with recompile but that doesn’t change this behavior, it only seems to increase all my times in a static fashion (by about 200 ms). Ideally, I’d like all my runs to report like the first run does, but I can’t seem to find a way to prevent caching, or to clear the already created cache. Any ideas? Note: this is for testing purposes not production purposes.
With a little further googling it seems the answer to my problem is using sp_recompile.
Zach: I don’t use sp_recompile anymore, and this article is probably somewhat out-dated as well; SQL Server 2005’s OPTION(RECOMPILE) enhancement allows us to be much more precise with the whole thing. Well, most of the time at least. There are still occasions where I need to play a game or two, but it’s much less common than it used to be.
Adam,
Great post but does this mean that I could be bloating up my plan cache using Dynamic SQL in my current scenario below?
I have a stored proc that takes one string as input, the string value could be one out of the three: 1)First Name 2)Last Name 3)Phone. The input value is evaluated inside the stored proc with different OR clauses trying to match the input against the first/last columns or phone column.
What I’m saying is will end up lets say with 10 different execution plans if I searched 10 different phone numbers?
Thanks
@SQLnbe
As long as you’re parameterizing the dynamic SQL and using sp_executesql to run things, there’s no problem — you’ll wind up with at most 3 plans. If, OTOH, you’re concatenating, then YES, you are certainly causing bloat to occur.
–Adam
@Adam,
Thanks for clarifying this and one last question why would I see the same bucketID twice (cached twice) with a slightly different size but same bucketID for the Proc.
This didn’t seem to work for me… so I tried adding extra variables to @params, changing the variable names etc… and always got one plan in cache… still haven’t figured out a way to do it!
@Gareth
Which version of SQL Server? I wrote this a long, long time ago, and haven’t tried it in ages. I’m happy to give it a shot on this end if you can share some more information.
Note that when I wrote this there was no such thing as OPTION (RECOMPILE). I feel that feature has really eliminated much of the need for this kind of thing.
–Adam
Comments are closed.