Home Uncategorized Exploring the secrets of intermediate materialization

    Exploring the secrets of intermediate materialization

    1751
    27

    When working with SQL Server 2000, I used to have this little trick I’d pull out after exhausting all other ideas for tuning a query.  And I thought that my little trick was dead in SQL Server 2005, but thanks to fellow SQL Server MVP Rob Farley, I am officially reviving my trick from the dead here and now, in this blog post.

    … But first, let’s start with an example query.  Here’s the scenario: You work for AdventureWorks, and management has asked you to create a report to find out how many peers each employee in the company has. You see, AdventureWorks management seems to believe that if two employees are managed by the same person, they must have exactly the same job function, and they can do each others’ jobs equally well.  So what they want to do is find out which employees have too many peers (might as well downsize some of that extraneous fluff), and at the same time find out which employees, should they be hit by a bus tomorrow, could be immediately substituted for by a colleague. Now, whether or not management’s belief is utterly moronic or not is beyond the scope of this post, so dash any such thoughts from your head until you’ve read to the end, and then resume pondering along those lines, which I’m sure will end up putting a smile on your face. 

    But smiles are for later.  At this point I’ve managed to go off on a horribly involved tangent, so let’s get back to the query at hand:

    SELECT 
        x.EmployeeId,
        COUNT(*) AS TheCount
    FROM HumanResources.Employee x
    JOIN HumanResources.Employee y on y.ManagerId = x.ManagerId
    GROUP BY x.EmployeeId

    What we’re doing here is finding all employees managed by the same manager, and then taking a count of those employees.  Yes, it would have made more sense to simply find out how many employees are managed by each manager, but that’s not what management asked for, and management clearly thinks better than you do. So go run the report!

    But what does any of this have to do with query tuning tricks, you ask (while tidying up your resume a bit)?  To answer that question, let’s take a quick peek at the I/Os our query is using, in addition to the query plan.  First, a baseline for the I/Os:

    SELECT *
    FROM HumanResources.Employee

    Result, obtained via Profiler: 20 logical reads.  OK, and now the test query in question: 827 logical reads. Quite a jump for a query that only uses the one table — we’re clearly wasting a lot of resources.  And looking at the query plan, it’s obvious we can do better.  An outer table scan, looped to find the count for each employee — that’s a lot of index operations.

    A common way to start tuning this kind of query is to move the aggregation into a derived table. After peering at this query for a while, one might come to the conclusion that there’s no reason to aggregate on ManagerId more than once per manager. Why do it once per employee?

    SELECT
        x.EmployeeId,
        y.theCount
    FROM HumanResources.Employee x
    JOIN
    (
        SELECT
            p.ManagerId, 
            COUNT(*)
        FROM HumanResources.Employee p
        GROUP BY p.ManagerId
    ) y (ManagerId, theCount) ON y.ManagerId = x.ManagerId

    Seems better, but upon running it, we see that it produces 819 logical reads and almost exactly the same query plan. Not much of an improvement.  And alas, there’s not much more we can do here.  There just aren’t too many ways to skin this query, and each of them requires some kind of loop to get the count, either implied or otherwise… Right?

    And now we’re almost to “dirty trick” territory.  But let’s first try a not-so-dirty trick. A temp table might eliminate some of the overhead, right?  Then we’ll only have to query the base tables once…

    SELECT
        p.ManagerId, 
        COUNT(*) AS theCount
    INTO #y
    FROM HumanResources.Employee p
    GROUP BY p.ManagerId
    
    SELECT
        x.EmployeeId,
        y.theCount
    FROM HumanResources.Employee x
    JOIN #y y ON y.ManagerId = x.ManagerId

    207 logical reads.  Quite an improvement!  But the temp table is still using a nested loop, and a merge would be so much nicer, wouldn’t it?  A MERGE JOIN hint drops the number of reads to 115, but I still feel that we can do even better.

    Now in SQL Server 2000 at about this point in my query tuning excercise, I might try forcing intermediate materialization of the derived table, sans the temp table, by using TOP 100 PERCENT in conjunction with ORDER BY. Unfortunately, the SQL Server query optimizer team decided that this wasn’t a good idea, and the optimizer now ignores such attempts.

    And until earlier today, I thought the game was over. Until I was reminded by Rob that TOP takes a number of rows in addition to a percent. The trick, then?  Use a bigger number of rows than you’ll ever actually get back… Say, the maximum value for SQL Server’s INTEGER type (2147483647)?

    By applying TOP and ORDER BY within the derived table, we can force SQL Server to perform intermediate materialization of the results.  And by playing with the ORDER BY properly, we can even prompt the optimizer to choose a merge…

    SELECT
        x.EmployeeId,
        y.theCount
    FROM HumanResources.Employee x
    JOIN
    (
        SELECT TOP (2147483647)
            p.ManagerId, 
            COUNT(*)
        FROM HumanResources.Employee p
        GROUP BY p.ManagerId
        ORDER BY p.ManagerId
    ) y (ManagerId, theCount) on y.ManagerId = x.ManagerId

    And the result of all of this hard labor?  10 logical reads (1000% improvement over the next best method), a merge operation, and if I do say so myself, a very good topic for a blog post.

    The usual caveats apply.  Do not try this at home.  Do not rely on this undocumented behavior.  Do not pass Go.  Do not fail to hire me to tune your databases if this trick doesn’t fix all of your problems.  And, lest I forget, do not waste time reading this blog when management needs that report yesterday!

    Anyway, until next time, enjoy!

    Previous articleScalar functions, inlining, and performance: An entertaining title for a boring post
    Next articleReplacing xp_execresultset in SQL Server 2005
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    27 COMMENTS

    1. Thank you for a really efficient trick, as dirty as it may be. 🙂

      BTW: On my machine only 4 logical reads are needed for the query, 289 rows returned out of 290.

      ML

    2. Very interesting, thanks a lot!

      BTW when all OLAP functions are implemented, we could also use COUNT() OVER() as follows:

      SELECT
         x.EmployeeId,
      — no ORDER BY in the PARTITION clause, so the window is the whole partition.
         COUNT(*) OVER(PARTITION BY ManagerId)
      FROM HumanResources.Employee x

    3. That’s a genuine horse-feather [clever]
      Can you explain a little more on the "And by playing with the ORDER BY properly"
      I got lost in that somewhere.
      I’m using sql2k by the way

    4. John,

      A merge join is only possible when both tables are sorted on the join column (in this case, ManagerId)  By specifing the sort order in the subquery and forcing SQL Server to materialize the subquery before the performing the join, the optimizer can select a merge join because it has two sets that are sorted on the join column.

      Basically, he added an order by statement in the subquery. (ORDER BY p.ManagerId) As you can see in the final example. Have a look at the query plan with and without the ‘order by’

    5. Alex: Absolutely — and that works today, in SQL Server 2005.  However, check out the STATISTICS IO output.  775 reads 🙁

      John/Robert: Exactly as Robert says; a merge join works when both input sets are sorted on the same column — but the sort also needs to be in the same order.  Try changing the final example query to the following, then compare the plans and STATISTICS IO output:


      SELECT
         x.EmployeeId,
         y.theCount
      FROM HumanResources.Employee x
      JOIN
      (
         SELECT TOP (2147483647)
             p.ManagerId,
             COUNT(*)
         FROM HumanResources.Employee p
         GROUP BY p.ManagerId
         ORDER BY p.ManagerId DESC
      ) y (ManagerId, theCount) on y.ManagerId = x.ManagerId

      … and let me know if you have any other questions!

    6. I’m testing running a variety of queries based on your methodology.
      I notice in task manager that I am using primarily one of the four processors on the server.
      Do I assume from this that [a] the query only needs one processor or [b] that the query method coerces sql to use only one processor?
      So…………
      Should I ‘encourage’  sqlserver to utilize all four processors when available?
      and if so
      How do I ‘encourage’  sqlserver to utilize all four processors when available?

      By the way the queries typically have 20 to 50 million rows so its worth my while to check this out.

      Once again thanks.

    7. Hi John,
      The default maximum degree of parallelism on SQL SERVER is 0 . That means that a query can use all avaiable CPUs. You can reconfigure this value (if you wish) What you want to consider is whether or not you want one query to have the capability of consuming 100% of your CPU capacity. You can selectivly override this setting using MAXDOP query hint.
      check out
      http://msdn2.microsoft.com/en-us/library/ms181007.aspx

    8. Hi, sorry for the novice question, but what exactly does "intermediate materialization" mean? I can’t tell from the execution plan. Thank you.

    9. Hi Vlad,
      Not a novice question at all!  It means that the result set logically represented by the derived table (the "intermediate" result — in other words, a result obtained during the process, that will be used to calculate the final result) will be physically stored ("materialized") in TempDB and used from there for the remainder of the user, instead of being queried back from the base tables.  So instead of the two pieces being optimized together, there will actually be two distinct phases to the query.  This can be a benefit in some cases (such as the example shown here), and a major drawback in others.

    10. The same effect can be achieved without upsetting the Query Optimizer Team, by using a common-table expression:
      WITH y (ManagerId, theCount) AS
      (
      SELECT p.ManagerId, COUNT_BIG(*)    
      FROM HumanResources.Employee p
      GROUP BY p.ManagerId    
      )
      SELECT x.EmployeeID, COUNT_BIG(*) AS TheCount
      FROM HumanResources.Employee x
      JOIN y ON y.ManagerId = x.ManagerId
      GROUP BY x.EmployeeId
      …4 logical reads on my machine.
      An interesting technique, though.  My only other concerns with this method are that it relies on optimizer behaviour which may change at any time, and it isn’t terribly apparent to the unitiated what exactly is going on and why.

    11. Nice catch, Paul.  However, there are other situations in which, unfortunately, a CTE does not save the day.  And in those cases, like it or not, we have to do what we have to do for the sake of achieving adequate performance…

    12. Hi Adam,
      I have come across this trick with materialisation because I have implemented some nasty nested CTEs trying for maintenance and readability.  But…hit a very badly concocted query resolution in that the CTE hits the underlying tables for every leg of the successive CTEs.  Of course this is not what one wants…  I have not come across any articles addressing this per se and would like to have your thoughts on it – sorry if you have written about it already – I’m still in the Googling exercise at the moment.
      Cheers
      Drikus

    13. Hi Drikus,
      It sounds like a temp table might be the best option in your case?  It’s not always possible to totally outsmart the QO and in some cases a temp table really is better… From what you describe, you’re already well into the land of the complex, so I would personally keep it as simple as still is possible and avoid any more tricks.  But that’s just me 🙂

    14. Actually what Paul White said (the example with CTEs) does not work because he unnecessarily groups the result by employee and shows the COUNT(*) which is always 1.
      The correct query is:
      WITH y (ManagerId, theCount) AS
      (
      SELECT p.ManagerId, COUNT_BIG(*)    
      FROM HumanResources.Employee p
      GROUP BY p.ManagerId    
      )
      SELECT x.EmployeeID, y.TheCount
      FROM HumanResources.Employee x
      JOIN y ON y.ManagerId = x.ManagerId
      … but this reports 760 logical reads on my machine.

    15. Thanks jsmano!  I must have had a brain explosion that day – the posted CTE-based code does indeed do what you say.  If I’m allowed one more try, however, I think this slightly-tweaked version does the trick in 6 logical reads.  The ‘intermediate materialisation’ Adam showed is still way cooler though 😉
      WITH y (ManagerId, theCount) AS
      (
      SELECT p.ManagerId, COUNT_BIG(*)    
      FROM HumanResources.Employee AS p
      GROUP BY p.ManagerId    
      )
      SELECT x.EmployeeID,
      (
      SELECT theCount
      FROM y
      WHERE y.ManagerId = x.ManagerID
      ) AS theCount
      FROM HumanResources.Employee x
      JOIN y ON y.ManagerId = x.ManagerId;

    16. The following crime against nature does it in 4 logical reads, without a CTE…
      SELECT x.EmployeeID,
      (
      SELECT COUNT_BIG(*)
      FROM HumanResources.Employee AS e
      WHERE e.ManagerId = x.ManagerID
      ) AS theCount
      FROM HumanResources.Employee AS x
      WHERE x.ManagerID IS NOT NULL

    17. Time and time again I have wanted a subquery to be materialized and generally have to work around this limitation by turning everything inside out or actually having to create a temp table.  Its so silly when the subquery returns like 2 records against millions and the wrong thing happens.
      It would be nice to have a ‘materialize’ hint on the subquery so when in the cases where we _really_ do know better, we can apply it.
      I know CTE’s are close but at least in oracle they still weren’t guaranteed to materialize and this often happens after the initial query is built and a person just needs to optimize by adding a keyword.

    18. crokusek: Unfortunately CTEs are not close at all — they do not provide any kind of materialization in SQL Server either. They’re similar to views, and the query optimizer can (and usually will) rip them apart and optimize the entire query as one unit. That’s where these tricks come into play. Agreed that a "materialize" hint would be handy in lots of cases.

    19. I’m not sure if anyone has noticed this, but the intermediate materialization trick does have its undesired side affects, and i’m not sure whether this is a bug in the query optimizer or if this is an intended behavior
      for example, if you look at the query plan for the select statement on the bottom after applying intermediate materialization trick to the view, you will notice the filter for TransactionID happens AFTER performing a TOP operation, instead of using a clustered index seek on TransactionID then doing a TOP clause
      sample code below
      USE AdventureWorks
      GO
      — create dummy view
      CREATE VIEW [Production].[vwTransactionHistory]
      AS
      SELECT  TOP 1000000000
      TransactionID ,
             ProductID ,
             ReferenceOrderID ,
             ReferenceOrderLineID ,
             TransactionDate ,
             TransactionType ,
             Quantity ,
             ActualCost ,
             ModifiedDate
      FROM Production.TransactionHistory
      GO
      — now examine the query plan for the below query
      SELECT  TransactionID ,
             ProductID ,
             ReferenceOrderID ,
             ReferenceOrderLineID ,
             TransactionDate ,
             TransactionType ,
             Quantity ,
             ActualCost ,
             ModifiedDate
      FROM [Production].[vwTransactionHistory]
      WHERE TransactionID = 100004

    20. Hi Brian,
      Yes — that’s the whole point. Applying the "trick" forces the query optimizer down a very specific path. Whether or not that path is -ideal- is a whole other question, and this is not a trick I use very often at all. But it’s definitely nice to have in your bag for when it’s needed.
      –Adam

    21. Just wanted to say many many thanks for this post. You have just taken our customer-facing query from 2 minutes to 2 seconds.
      One multi-million row  data set joined with another, where we were only interested in 13 records that could be filtered for, and no other way to generate those 13 records first

    22. Hello Adam,
      I recently used this techinique to resolve an extremely poor performing query, from more than 4 hours, well actually the query did not return after 4 hours, to completing in less than a minute.
      The query used derived table syntax to perform an aggregation.  The first change I made was to replace the derived table syntax with a join.  This improved the query, but still it tanked.
      A second change I made was to "fish" for a better plan.  For this, I forced the join order with the directives OPTION (FORCE ORDER, MAXDOP 1).  Using this fishing technique, the original query returned in about 2 1/2 minutes–this tells me that a better plan is available.
      The final change I made was the use of TOP to strongly suggest intermeidate materilization of the dervied table.  Now, the query was down to 53 seconds.
      When one sees that the optimizer times out during plan generation/exploration, as in my case, it sometimes needs some help. This trick is I would say more of a "tip".
      Thanks,
      LJC

    23. Hi Adam!
      A solution to this can be a multiline-TVF (QO cannot break its encapsulation). Here you get most of the materialization benefits including readability and so forth, you should even be able to index it with a primary key (but without statistics though)
      Works like a blessing; one of my use-cases is a recursive cte’s with joins to other views on huge tables.
      I was so proud of this solution, even after I learned that someone had already posted it as a workaround at MSConnect
      Constantine

    Comments are closed.