Home Uncategorized “Cursors Run Just Fine”

    “Cursors Run Just Fine”

    569
    14

    I found Linchi’s recent post on use of cursors in the TPC-E test to be quite interesting. The question is, why are cursors used in the test when the commonly accepted notion within the SQL Server community is that cursors are a bad thing?

    I’ve posted in the past about situations where cursors were actually faster than set-based queries. But in this case I just don’t see it; cursoring over an input set to do an update? There’s no way that’s going to be faster.

    Greg Linwood commented in Linchi’s post that “indexed cursors run just fine for most purposes”. And although I have loads of respect for Greg and his opinions, I just can’t agree in this case.  I did a few tests on my end just to make sure, and indexed or not, even for the simplest of of queries, cursors perform at least a few times more slowly than their set-based equivalents.  Greg mentioned in this comment that the SQL Server engine executes even set-based queries “internally using cursor style processing”, but a loop in the query processor’s code is clearly not the same as a T-SQL cursor.

    The query processor is optimized internally to process data without having to pass it around to different spots in memory or switch context, whereas with a cursor the data is transferred into local variables and your code has to constantly ask the query processor to go back and get some more. This is extremely expensive, which is why even in my experiments with situations where you can see superior performance with cursors, I found that a SQLCLR cursor–which doesn’t have to do nearly as much work as a T-SQL cursor–is vastly superior.

    I’ll close with a simple example.  The following two batches each run in AdventureWorks, and indexes are irrelevant in both cases.  See for yourself which is faster.

    --Set-based
    SELECT SUM(Quantity)
    FROM Production.TransactionHistory
    GO
    
    
    --Cursor-based
    DECLARE @q INT
    INT @t INT
    SET @t = 0
    
    DECLARE c CURSOR 
    LOCAL FAST_FORWARD
    FOR
    SELECT Quantity 
    FROM Production.Transactionhistory
    
    OPEN c
    
    FETCH NEXT FROM c INTO @q
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @t = @t + @q
    
        FETCH NEXT FROM c INTO @q
    END
    
    CLOSE c
    DEALLOCATE c
    
    SELECT @t
    GO 
    
    Previous articleProgrammatic Concurrency Control: Do Simultaneous Updates to Different Columns Constitute a Collision?
    Next articleAnti-Patterns and Malpractices, Volume 1: Tumbling Data
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    14 COMMENTS

    1. This is a controversial topic, but I’ll go on the record to agree with you 100% Adam. Can a cursor be used and still have decent performance? Yes. Is there a faster way to perform the same function using set-based logic or a SQLCLR cursor? Absolutely.
      The things that troubles me with debates over cursors vs set-based logic are:
      1) Often the debate crosses platforms. Other database platforms have focused on tuning the cursor engine in ways that SQL Server just never did.
      2) A lack of understanding of how set-based logic works (or how powerful it is)
      3) Cursor vs Set is only a discussion based on performance, often in an artificial context (such as the TPC tests – yes, I believe that they artificial and not representative of real-world complexities).
      4) "The default result set" (I still find myself referring to it as the ‘firehouse cursor’) is a fast forward read-only cursor anyway. Not quite true. Actually declaring a cursor itself creates a minor performance cost in and of itself.
      I would much rather discuss the use/misuse of cursors or the benefits/complexities of set-based operations within a very specific business context. In my opinion, every benchmark is dependant on the very system it is run on.

    2. There might be very fine-grain differences between placement of data into cursor variable memory addresses & memory addresses used during set based processing, but I doubt these will amount to the differences you’re seeing. You mentioned context switching also, but how exactly does using a cursor create more context switching than using a set based expression? I’ve never measured that specifically but I can’t think of any reason why one would create more context switching than the other.
      Whilst I agree that cursors are measurably slower than set based processing for simple statements such as the example you’ve provided, the game changes as the complexity increases. This is because the more complex an individual set based expression is, the more subject it is to optimisation errors. Cursors provide a useful technique to control internal processing steps without relying on optimisation. Some developers take the approach of breaking down complex statements into multiple statements, employing temp tables etc, but cursors have the advantage of not having to spool to temp tables etc.
      I suspect this might explain why cursors are being used in tpc-e – the overhead associated with the cursor is insignificant compared to the improved physical processing determinism, especially as underlying data / statistics are changing during transaction processing.
      So to sum up my thoughts, cursors do have a slight overhead which can be exaggerated by comparing a simple scenario such as your demo, but there are more significant issues that only come to light in more complex real world scenarios. The real penalty most people bump into with cursors isn’t the use of the cursor per se but the inefficient business rules or unindexed commands within the cursor (or even the cursor expression itself)

    3. Hi Greg,
      Yes, we can use cursors to control query plans and overcome optimizer limitations–as in the case of the running sum examples I’ve posted about previously–but to paraphrase a conference session title I once saw, cursors are "the SQL of last resort."  I would first try to tune my queries with hints or rewriting the SQL, long before ever considering use of a cursor… And I don’t believe that the performance differences would be overshadowed in a more complex example, but I can’t say that I’ve tested much.  Do you have one to share?

    4. I’m sure you’ll appreciate setting up examples that demonstrate optimiser quirks on blogs is a bit too hit & miss to warrant investing much time in. Plus, it’s also fairly common knowledge that complex queries are subject to optimisation errors & that using cursors or breaking queries into discrete expressions are time-tested solutions so not much would be gained by prooving this again here. Plus, I’m currently enjoying some really interesting SQL DownUnder Code Camp presentations today (c:
      The SQL Server community is unfortunately full of self-perpetuating myths & cursors being bad is just another one imo, so the fact that something has been said in a conference doesn’t really carry much weight. The fact that cursors are used extensively in the Oracle community & now that they’re also being used in high end benchmarking by Microsoft are pretty good signs that the gods must be crazy on this topic..

    5. Hi Greg,
      I have worked for several years with applications working on both oracle and SQL Server. I think that while in Oracle there is no performance penalty for using cursors, in SQL server there is. I completely agree that complex queries may all of a sudden become the subject of an optimization error. In some cases that might happen after an upgrade, in other cases after some statistics reached some threshold. Yet I think in SQL Server we need a stronger incentive to use cursors than in Oracle, just because they are less performant.

    6. Greg,
      I don’t think optimizer quirks play any part in this discussion.  I thought the idea was to show whether cursors–when doing exactly the same thing as the QP might do internally–will perform the same.  I think a common ground would be comparing a complex query that uses a LOOP JOIN query hint to a cursor that does the exact same data access in the exact same order.  Do you think that in such a case as that, the cursor and set-based query would have the same or similar performance?

    7. Adam;
      I don’t think your example is a fair one in comparing the set-based approach and the cursor-based approach. There is no question you should use a single SELECT in this particular case. You would be using a single SELECT even with Oracle.  Using a T-SQL cursor in this case just adds more work for no reason, and is of course slower.
      In my past discussions with the Oracle folks that I know on the use of cursors, my sense is that they are implciitly assuming a more complex business logic. A typical example is the processing of an order and its order details. Say if you have a web page that can accept several orders and each order has one or more line items, and once submitted you want to perform rather complex validation business logic inside the database. Because the validation business logic can be rather complex, the solution to the problem is naturally expressed as a loop through the orders and for each order you conduct various validation checks. In this case, there may be no set-based solution for many reasons (for instance, a check may require calling other stored proced procedures).
      So it seems to me there is a continum of problems and solutions with the set-based approach optimal on one extreme and the cursor-based approach optimal on the other extreme. For problems that fall into the two extreme areas, there is little argument as to which approach should be used. But I suspect a large number of problems fall somewhere in the middle. And this is where caution needs to be exercised and where broad stroke statements favoring either approach would be wrong more often than not.

    8. Hi Linchi,
      My point was only to show that cursors are several times slower than equivalent set-based expressions.  Of course there are times to use them–otherwise they wouldn’t be included in the product.  But to say that they have no overhead is misleading at best.

    9. I certainly didn’t say there was "no" overhead, just that they are fine for most purposes. I should explain that my use of "most purposes" really meant "most purposes where they are used" rather than "for most queries".
      btw, your example performs a reporting-style summation over a large number of rows which isn’t exactly the most common type of transaction that typically occurs in OLTPs. The time difference when using cursors for smaller, more typical OLTP queries would be substantially less, & most systems aren’t heavily affected by the loss of the few milliseconds a cursor adds. Hence, I’m not surprised they’re being used in TPC-E – the overhead just isn’t as substantial as you’ve described.
      The biggest problem with cursor code often isn’t just the fact that a cursor has been used, it’s usually an inefficiency within the business rules that are applied within the iterative section of the cursor program. Very commonly, simply adding an index to an internal query which runs within a cursor will solve a performance problem which many would blindly have blamed the cursor for. We see this kind of thing on a daily basis at work & we always try to tune the queries within cursors first before engaging the customer in far more intrusive code re-writes (& the consequential testing).
      Just one more point re Alex’s comments – it sounds like the implementation of cursors in SQL Server isn’t as good as it could be & maybe MS could improve the implementation further. I knew that Oracle cursors were more efficient than SQL Server cursors & that they are far more widely used, but is the overhead really zero with Oracle cursors?
      Cheers,
      Greg

    10. Hi Greg,
      Thanks for the disambiguation.  I agree that for smaller queries it might be difficult to measure the cursor overhead and in those cases it could make sense.
      Isn’t every row-based output in Oracle a cursor ("ref cursor" or something along those lines)?  If I’m correct on that, then it makes sense that they would have tuned the thing a bit more than Microsoft did for SQL Server cursors 🙂

    11. An Oracle REF CURSOR isn’t really similar to a T-SQL cursor. A ref cursor is more like a SQL Server API server cursor. I think the context of discussions here is more on using cursors for implementing business logic inside a database than on using cursors where a client app processes the resultset.
      For complex business logic, a common alternative to avoid T-SQL cursors is to use temp tables. The boundary between ‘set-based’ and ‘procedure-based’ can become rather blurred when there is a lot of use of temp tables.

    12. I discussed this with one of my business partners tonight who is highly experienced with Oracle & he assures me there is a measurable difference in cursors vs set based & that Oracle cursors certainly don’t come without performance degradation. It might be interesting to run a plsql version of your tsql script Adam to verify this. If I can get near an Oracle instance in the near future I’ll give it a go too.
      Apparently Oracle devs are encouraged to write set based expressions where possible but they’re much more relaxed about writing cursors, especially when relatively few rows are being accessed. SQL Server devs on the other hand tend to be outright paranoid about using cursors, for no better reason than the gods pronounce that cursors are always evil.
      Like so many things in IT, I think cursors have a useful purpose & they are fine when used appropriately (when implementing relatively complex business rules within relatively small to medium data sets). The virtually universal bias against using cursors in the SQL Server community is a unwarranted imo.

    13. Hi,
      I appriciate every on for saying that "set based" operations are better than cursors.  I have a question.  If I have to get the data from 10 different tables having say 50 million rows each, as of now I am using a cursor within a cursor to get data from 10 different tables.  However, if I need to use set based operation can you give me an example as to how this should be achieved. It will be great if it can be mailed to ramanadk@gmail.com
      Thanks

    14. Ramanadk: I suggest you pick up a good book on basic SQL.  I like Joe Celko’s "SQL For Smarties", but there are many choices out there.

    Comments are closed.