A few days ago Aaron posted yet another fantastic entry in his Bad Habits series, this one discussing mishandling of date ranges in queries. This is a topic near and dear to me, having had to clean up a lot of poorly thought out code in the past few years. Aaron’s post includes many examples, all of which boil down to the proper way to do the job, a query like the following (which I’ve stolenborrowed from the post):
SELECT COUNT(*) FROM dbo.SomeLogTable WHERE DateColumn >= '20091011' AND DateColumn < '20091012';
This query asks a simple question: How many rows do we have for October 11, 2009? And assuming the presence of an index on DateColumn, the work will be done much more efficiently than the most common anti-pattern I see, converting both the input date and the date column to CHAR(8) using CONVERT format 112 (YYYYMMDD) and comparing the strings. Doing that will cause a table scan, which is definitely a bad habit worth kicking.
But what I want to show in this post is that contrary to what you might understand from working with previous versions of SQL Server, in SQL Server 2008 conversion is not always a bad thing. In conjunction with the addition of the DATE type, the query optimizer received a minor upgrade. Converting both sides of the predicate to DATE will cause the optimizer to do exactly what you probably wanted to begin with: Return all of the events that occurred today (or on whatever input date you specified), by seeking into the index rather than scanning the table.
To see this optimization, start with the following test data:
CREATE TABLE #dates ( the_date DATETIME NOT NULL PRIMARY KEY ); GO WITH numbers AS ( SELECT number FROM master..spt_values WHERE type = 'P' AND number > 0 ) INSERT #dates SELECT DATEADD(hh, -number, GETDATE()) FROM numbers UNION ALL SELECT DATEADD(hh, number, GETDATE()) FROM numbers; GO
… And now ask a question. What happened today? First we’ll ask the wrong way:
SET SHOWPLAN_TEXT ON; GO SELECT COUNT(*) FROM #dates WHERE CONVERT(CHAR(8), the_date, 112) = CONVERT(CHAR(8), GETDATE(), 112); GO SET SHOWPLAN_TEXT OFF; GO --- |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0))) |--Stream Aggregate(DEFINE:([Expr1006]=Count(*))) |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#dates]), WHERE:(CONVERT(char(8),[tempdb].[dbo].[#dates].[the_date],112)=CONVERT(char(8),getdate(),112)))
Oops! An index scan. That won’t do. But if you like writing code like this, all is not lost–you just need to do a different conversion:
SET SHOWPLAN_TEXT ON; GO SELECT COUNT(*) FROM #dates WHERE CONVERT(DATE, the_date) = CONVERT(DATE, GETDATE()); GO SET SHOWPLAN_TEXT OFF; GO --- |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0))) |--Stream Aggregate(DEFINE:([Expr1009]=Count(*))) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1007], [Expr1008], [Expr1006])) |--Compute Scalar(DEFINE:(([Expr1007],[Expr1008],[Expr1006])=GetRangeThroughConvert(CONVERT(date,getdate(),0),CONVERT(date,getdate(),0),(62)))) | |--Constant Scan |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#dates]), SEEK:([tempdb].[dbo].[#dates].[the_date] > [Expr1007] AND [tempdb].[dbo].[#dates].[the_date] < [Expr1008]), WHERE:(CONVERT(date,[tempdb].[dbo].[#dates].[the_date],0)=CONVERT(date,getdate(),0)) ORDERED FORWARD)
This plan is quite a bit more complex, but the important things to note are:
- An index seek is used, rather than an index scan, meaning that this query will return the results in a much more efficient manner
- The new query is logically equivalent to the first query
Need more than one day? Use either IN or BETWEEN, both of which produce query plans similar to the above.
SELECT COUNT(*) FROM #dates WHERE CONVERT(DATE, the_date) IN (CONVERT(DATE, GETDATE()-1), CONVERT(DATE, GETDATE())); GO SELECT COUNT(*) FROM #dates WHERE CONVERT(DATE, the_date) BETWEEN CONVERT(DATE, GETDATE()-1) AND CONVERT(DATE, GETDATE()); GO
Please note that I’m not suggesting that this methodology is any better or worse than what Aaron suggested in his post. I simply want you to know your options and–more importantly–understand that the knee-jerk “we must never use functions in the WHERE clause” approach is not always advantageous. The query optimizer has come a long way in recent versions of SQL Server and I suspect we’ll see a lot more in these areas in the coming releases. It will be interesting to see how many of today’s truths cease to be an issue as the optimizer becomes smarter and more adept at fixing user error.
Enjoy!
Yeah there was some follow-up in the comments on the optimizations for DATE. I’m not crazy about using the CONVERT() on the LHS just because it is easy to change the code and forget that the CONVERT() is only used because it is an exception to the rule (I’m having deja vu that we have discussed habits here before).
I also suggest you stop using calculations like GETDATE()-1 as this will break if you do a massive search/replace for sysdatetime(). 🙂
The rows from today.. a little extra writing but specific. Even the MS-Brain can figure it out logically
<code>
declare @date datetime
set @date = ’10/20/2009 23:59:59.999′
SELECT COUNT(*) FROM [TABLE]
WHERE (MONTH(Date_Field) = MONTH(@date) AND YEAR(Date_Field) = YEAR(@date) AND DAY(Date_Field) = DAY(@date))
</code>
Sorry, Dan, but that query will force a table scan to occur. Not a good idea.
Aaron: Yes, that’s why I put the -1 inside the CONVERT 🙂 — in this case it was just much easier to read than DATEADD. I know firsthand the pain of the overloads not being supported; I recently did a conversion of a 2005 DB to 2008, and along the way we changed all of the date/time columns in the database from [SMALLDATETIME with a constraint forcing midnight] to DATE. Suddenly lots and lots of code broke all over the place. Oops.
Placing the -1 inside the convert doesn’t help. This still yields a type clash error when you change GETDATE() to SYSDATETIME(), since it tries to subtract:
CONVERT(DATE, SYSDATETIME()-1)
I understand why you want to skip the DATEADD() syntax to make code less busy, but especially in the context of new DATE/TIME features I think it’s dangerous to do so.
I think Dan missed the link to my date/range post. 🙂
Just want to chime in and add this. In previous versions of SQL Server 2005, a lot of noise was made about datetime data type conversions and I know a lot of code was modified to use someting like below dateadd(dd,datediff(dd,0,getdate()),0). I stumbled across this post by CSS about a performance problem that the above code snippet has inherent problems with low estimates in SQL Server 2005/2008 leading to performance problems. Have any of you ran into this isse?
http://blogs.msdn.com/psssql/archive/2009/07/17/using-datediff-can-query-performance-problems-in-sql-2005-and-2008.aspx
I can see where this could cause an issue if your data distribution is heavily skewed, but in a normal distribution the estimate shouldn’t affect performance by much at all. I haven’t seen the issue yet (though I now know to watch for it). In any case, almost unilaterally when I am performing date range queries, the DATEADD/DATEDIFF calculations aren’t actually part of the query; I apply them to the incoming parameter and/or create new parameters, depending on the other functionality in the procedure. The query itself then uses the variable as opposed to a calculation that happens to use the variable. Yes this can lead to parameter sniffing issues but certainly nothing due to the use of DATEADD/DATEDIFF vs. more primitive methods like converting to a string. The big issue is when you essentially perform one or two calculations for every row, and this is certainly going to lead to more frequent, severe and predictable/preventable performance problems than an estimation problem.
Sankar: Agreed with Aaron; the potential issue seems like an edge case, not a normal situation. I have used the DATEADD/DATEDIFF techniques in dozens of projects–hundreds of queries–and have never seen any issues caused by estimates. It’s definitely good to keep in mind but there are certainly bigger issues to dwell upon!
Neither did I (and/or probably didn’t even noticed) and was curious coming from PSS. Thanks for the feedback.
Wouldn’t it be faster to use the constant phi and the closed form?
Fibonacci(@n INTEGER)
AS
RETURN
EOUND (((POWER (1.6190339887, @n)- POWER (1.0 – 1.6190339887, @n))/ SQRT (5.0)), 0);
untested. Probably ought to cast phi as DOUBLE PRECISION
Joe: Wrong post. I think you meant to be here:
http://sqlblog.com/blogs/peter_larsson/archive/2009/10/18/simple-fibonacci-calculation.aspx
Not sure what EOUND is, either 🙂
wow! that’s great! thanks!
Comments are closed.