SQL Server 2008 was perhaps not the meatiest of SQL Server releases, but it did have one especially promising feature: Data change detection. Or maybe two features, depending on how you look at things. I’m talking, of course, about the almost confusingly similarly named Change Data Capture and Change Tracking.
Change Data Capture was the big gun that promised to solve everyone’s change tracking woes (well at least everyone running Enterprise Edition). In practice it was a heavy, overwrought hack of a feature. Built on the foundations of replication—a feature designed to move data, not record and store changes to it—the entire system was quirky and problematic. Users had to become familiar with odd APIs, low-level transaction identifiers, and the occasional call to sp_repldone for that wonderful moment when the whole thing crashed and burned. But enough about this feature. If you want to know more about it and its future in SQL Server, look it up in the Azure SQL Database documentation.
Change Tracking, on the other hand, was designed to solve only one problem. It did so using a totally integrated approach, with changes recorded as part of the write transaction, first-class T-SQL syntax, and a very simple and well thought out API. The Change Tracking problem? Net changes. Simply put: First give me all of the rows. Then let me periodically ask what’s changed since last time I asked. When I ask, tell me how to synchronize my copy, cache, app, or whatever is on the other end, to get it into the same state as the host data set.
Fast-forward to SQL Server 2016, and now we have Temporal Tables. This feature is, in many ways, the best of both worlds when it comes to change detection. Temporal Tables is as well integrated into the query engine—and the transaction—as Change Tracking, but captures all changes, much like Change Data Capture. It has a clean and simple interface, and while it’s definitely a v1 technology it seems like a much more solid foundation than either of the SQL Server 2008 features ever did.
This month’s T-SQL Tuesday, hosted by Matt Gordon, asks for solutions to old problems using “shiny new toys.” I wondered—could I solve the Change Tracking net changes problem using Temporal Tables? Unlike its predecessors, net changes is a major use case that Temporal Tables simply doesn’t do right out of the box. Turning on both Temporal Tables and Change Tracking side-by-side is an option, but that seems like overkill. One change detection technology should be enough…or is it?
Thinking through the problem, it quickly became apparent to me that once you have all changes—as we have with Temporal Tables—getting to a net scenario is not especially difficult. Given a primary key and a time interval over which we’re concerned about changes, there are of only four possible outcomes:
- A new row for the key was inserted
- The row was updated
- The row was deleted
- Nothing happened
For now we can ignore the final outcome and focus on the first three, but of course any proper solution must ensure that the negative case is taken into account so as to not send false changes.
For now, consider the following set of propositions, given that we’re asking at time Y for all changes since a prior time X.
- INSERT: The key did not exist at time X but does exist at time Y.
- DELETE: The key existed at time X but does not exist at time Y.
- UPDATE: The key existed at both time X and at time Y, and at least one change occurred between time X and time Y.
Given these assumptions, we can begin work on a temporal queries that return the necessary rows. Solving for these conditions will require all rows that were active as of time X, rows that were (or are) active at time Y, and for the final case, all rows that were active in-between times X and Y. Since this is a range-based scenario, our best Temporal predication option will be either FROM or BETWEEN. The difference between these two is subtle: FROM uses an open interval (non-inclusive at both endpoints), whereas BETWEEN uses a half-open interval, inclusive on the end date. Given the choice in the scenario, BETWEEN makes more sense, as we can take advantage of the inclusive endpoint to avoid dropping a badly-timed row. But more on that in a moment.
To begin solving for net changes, we must first get all changes over the time interval in question:
SELECT t.pk, t.[other columns], t.valid_start_time, t.valid_end_time FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t
This query will return all rows with a valid_end_time greater than the passed-in @start_time and a valid_start_time less than or equal to the passed-in @end_time. With that in mind, we can start putting together some expressions for each of the operations in question.
First step in tackling the insert: Did the key only come into existence after @start_time? The following expression finds out by testing the minimum valid_start_time per key; if it’s after the passed-in @start_time, we know it’s a new key.
CASE WHEN @start_time < MIN(t.valid_start_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_new_key
A key could have been both inserted and deleted in the interval between @start_time and @end_time—or simply deleted altogether—so a second expression is necessary to determine whether the row still exists at @end_time. The expression below accomplishes this by checking the @end_time against the maximum valid_end_time per key. If the @end_time is greater than or equal to the maximum valid_end_time then the key must have been deleted. (This is where we’re taking advantage of the inclusive endpoint used by the BETWEEN predicate.)
CASE WHEN @end_time >= MAX(t.valid_end_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_deleted_key
The final case is that of update operations. A ranged Temporal predicate will return at least one row per key that was active during the range. If the query returns more than one row for a given key, some change must have occurred. A simple COUNT will suffice in this case.
CASE WHEN 1 < COUNT(*) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_changed_key
Putting this all together, we can come up with a general template query to answer net changes style questions. To make this work I put the above expressions into a derived table, to which I added a ROW_NUMBER. The numbering is used so that we get only a single version for each row – the most recent one. (Consider cases where a row was updated multiple times during the interval.) The query below uses the derived table [x], and applies a WHERE clause that filters the set based on the expressions, in order to output only rows we’re interested in: new, deleted, or changed rows, except in cases where a row was both inserted and deleted in the input range. The CASE expression in the outer SELECT list is used to determine what type of operation will have to be done on the synchronizing side in order to bring the two sets into parity.
SELECT x.*, CASE WHEN x.is_new_key = 1 THEN 'INSERT' WHEN x.is_deleted_key = 1 THEN 'DELETE' ELSE 'UPDATE' END AS change_operation FROM ( SELECT t.pk, t.[other columns], t.valid_start_time, t.valid_end_time, CASE WHEN @start_time < MIN(t.valid_start_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_new_key, CASE WHEN @end_time >= MAX(t.valid_end_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_deleted_key, CASE WHEN 1 < COUNT(*) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_changed_key, ROW_NUMBER() OVER (PARTITION BY t.pk ORDER BY t.valid_end_time DESC) AS rn FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t ) AS x WHERE x.rn = 1 AND ( x.is_new_key = 1 OR x.is_changed_key = 1 OR x.is_deleted_key = 1 ) AND NOT ( x.is_new_key = 1 AND x.is_deleted_key = 1 )
And in theory, that’s that. This query does in fact solve for net changes. But the devil’s in the details, and there is a subtle bug in the above query: What happens if the passed in @end_time happens to fall at exactly the same time that an update occurred on one of my rows? The answer is that the MAX(t.valid_end_time) for that key will be equal to the passed in @end_time, and so the logic above will consider the update to be a delete. Clearly not desirable behavior!
The solution? In an update operation, Temporal Tables logs the new row with the same valid start time as the old row’s valid end time. Therefore, to figure out whether an update occurred, we must look forward exactly one row from our actual intended end the. To accomplish this, the following modified version of the query creates a new search end time by bumping the input end date up by 100ns (the finest resolution supported by DATETIME2).
DECLARE @search_end_time DATETIME2 = DATEADD(NANOSECOND, 100, @end_time) SELECT x.*, CASE WHEN x.is_new_key = 1 THEN 'INSERT' WHEN x.is_deleted_key = 1 THEN 'DELETE' ELSE 'UPDATE' END AS change_operation FROM ( SELECT y.*, CASE WHEN 1 < COUNT(*) OVER (PARTITION BY y.pk) THEN 1 ELSE 0 END AS is_changed_key, ROW_NUMBER() OVER (PARTITION BY y.pk ORDER BY y.valid_end_time DESC) AS rn FROM ( SELECT t.pk, t.[other columns] t.valid_start_time, t.valid_end_time, CASE WHEN @start_time < MIN(t.valid_start_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_new_key, CASE WHEN @end_time >= MAX(t.valid_end_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_deleted_key FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @search_end_time AS t ) AS y WHERE y.valid_start_time <= @end_time ) AS x WHERE x.rn = 1 AND ( x.is_new_key = 1 OR x.is_changed_key = 1 OR x.is_deleted_key = 1 ) AND NOT ( x.is_new_key = 1 AND x.is_deleted_key = 1 )
This query uses @search_end_time in the Temporal predicate, but then does all of the comparison work using the original @end_time. For the is_new_key check, this doesn’t matter, as we’re only comparing start times. For the is_deleted_key check it makes all the difference in the world, as an update done at exactly @end_time will populate a new row which will increase the value of MAX(t.valid_end_time). Because the extended search predicate can return an extra row, the is_changed_key and rn expressions had to be moved to an outer table expression filtered by the original @end_time. If an update occurs exactly at @end_time, we don’t want it to trigger a delete operation now, but we also don’t want to actually see it until next time we synchronize.
That bug solved, we now must consider a second, slightly more important unsolved problem: What do you use for the @start_date and @end_date?
Naïvely speaking we should be able to ask for a base set of rows by using some long-past date as the @start_date—say, 1900-01-01—and the current SYSUTCDATETIME() as the @end_date. Then we should be able to pass back that same @end_date next time as the start date, and so on and so forth. But that approach will open you to a major issue; Temporal Tables simply wasn’t designed for this.
Backing up a bit: Change Tracking, which was in fact designed for answering net changes questions, works by placing a surrogate internal transaction identifier on each row touched during a transaction, and then exposing an external transaction identifier that corresponds to the commit time of the transaction. It does this by using an external “transactional commit table” in conjunction with the main changes table. But Temporal Tables uses an entirely different system, wherein there is no external table, and each row is populated with its one-and-only, both internal and external identifier, as it is being written. And every row touched by a given transaction gets the same exact time: The start time of the transaction.
So what’s does this mean when we’re talking about net changes? Pretend for a moment that you start a transaction, jump on a plane, go hang out on the beach in Jamaica for a week (highly recommended), and then return to your office (lightly sunburned), update a table of your choosing, and commit the transaction. The time stamped on the row will correspond to the start of the transaction—a week ago, before you ever boarded the plane. Meanwhile, your synchronization process has been running regularly, let’s say once an hour, and it thinks it’s lagged by only an hour. It’s never going to ask for rows that changed a week ago. This is referred to in change detection parlance as a lost update.
Does this mean we’re completely out of luck? Of course not—but it does mean that a complete solution will have to consider the transactional state of the system. And you’ll have to monitor to make sure that no one starts a transaction and leaves it hanging around for a week. (You should probably be doing that anyway.)
The key to my full solution is the sys.dm_tran_active_transactions DMV. This view contains one row per current transaction—either explicit or implicit—and a handy column called transaction_type which categorizes the transactions as read, read/write, system, or distributed. It also includes a transaction start time, which can be used to figure out which times we might not want to touch just yet, if some transaction is still outstanding from then.
The following expression uses the DMV and is designed to output a safe @end_date, that will avoid lost updates. It starts by asking for the minimum start time for any active read/write transactions (transaction_type 1). If there are none, it uses the current SYSUTCDATETIME instead. Just to be on the extra safe side—in case there was a delay in processing due to sitting on a long runnable queue, for example—the expression pulls five seconds off of the end time. Note that this code also needs to be able to covert the local time as reported by the DMV into UTC as required by our Temporal predicate. There is currently no reasonable way to get the local time zone in T-SQL (there is at least one hack involving a registry read, which I do not recommend), so you’ll have to replace the hardcoded Eastern Standard Time with whatever time zone is appropriate for your server.
CONVERT ( DATETIME2(0), DATEADD ( SECOND, -5, ISNULL ( ( SELECT CONVERT(DATETIMEOFFSET, MIN(transaction_begin_time)) AT TIME ZONE 'Eastern Standard Time' AT TIME ZONE 'UTC' FROM sys.dm_tran_active_transactions WHERE transaction_type = 1 ), SYSUTCDATETIME() ) ) )
The final operation I’ve added to the end date expression is to truncate the fractional (sub-second) precision off of the DATETIME2. The reason for this is that it’s important that subsequent change requests use the exact same start time as the prior end time. If you’re storing the value from your last synchronization in a system that has slightly different date rules than SQL Server, you may lose some precision and wind up with a rounding error. So better, in my opinion, to delay changes by up to one additional second, by ensuring that any possible date/time data type will suffice.
Once all of this is in place on your server, pointing at your table, usage is pretty simple:
-
Set a start date of 1900-01-01.
-
Set an end date using the expression above.
-
Ask for “changes.” That’s your input set. (Similar to Charge Tracking’s VERSION.)
-
Record the end date, and use it as the start date next time. The same date functions as both end date one time and start time the next because of the BETWEEN predicate’s half-open interval; if a delete operation occurred at exactly the end date, it will be included in that synchronization due to the end date being inclusive, but it will not be included in the next synchronization because the start time comparison is non-inclusive.
-
Re-populate the end date each time using the expression. It may be worthwhile to log the start and end dates so that you can detect conditions where the new end date is not much greater than the start date. This would indicate either that you might want to increase your polling interval or figure out why user transactions are running for a long time.
-
Repeat as needed. Remember that every cache is effectively stale the moment you’re done synchronizing it, so you must strike a balance between synchronized enough and not putting excessive stress on the host system trying to keep things up to date. And given the safety measures I’ve taken in the above expression, if you use this system and ask for changes more often than once every six seconds, you’ll tend to not get very far.
That’s it—net changes from Temporal Tables. Not the simplest exercise in the world, and maybe not quite as perfect as we’d like, but completely workable none the less. And of course, it’s always fun to put shiny new toys completely through their paces.
Enjoy, and happy synchronizations!
Please update "@end_time <= MAX(t.valid_end_time) OVER (PARTITION BY t.pk) THEN 1 ELSE 0 END AS is_deleted_key" to "@end_time >= …"
@Fanis
Good catch, thanks.
I was not able to get the query to work as published, at least on an Azure SQL database V12, did not try on SQL Server 2016.
This line of code
FROM [table] AS t
FOR SYSTEM_TIME BETWEEN @start_time AND @search_end_time
was causing the following error
Incorrect syntax near ‘FOR’.
I discovered the following will work..
FROM [table]
FOR SYSTEM_TIME BETWEEN @start_time AND @search_end_time as t
@Travis
Thanks for letting me know! Fixed.
How well does this scale? Curious to hear your war stories.
Also curious if you turned this into a dynamic sproc.
I’ve used this in production exactly zero times…this was a purely intellectual exercise!
That said, looking back at the query, I think in general it should scale just fine. The real problem is the partitioned COUNT(*), but if you throw batch mode at it that becomes a nonissue.
Let me know if you give it a shot!
Hi Adam,
Well, let me start by saying the first tech talk I saw in Boston was your Data Camp talk YEARS ago. I remember you mentioning off-hand how you used to be a developer but moved to just doing SQL. I’ve made a similar jump recently and your comment was in the back of my mind when I did it!
That said, I’m piloting temporal tables on a project right now. My client has lots of triggers to do audit logging, and internal to the triggerthey were checking the see if the update was a real change or a spurious update.
I think the API is really not super great, but definitely was glad to hear you share my grief over both the Change Data Capture API and the Change Tracking API. Remus Rasanu told me straight up the Change Tracking API doesn’t scale and I independently verified that long polling with multiple c# readers trying to scoop up changes to push into an Elastic Search document took 300 milliseconds on my first try. That’s not okay in my books. And Change Data Capture was Enterprise only so a complete non starter for the SaaS company I worked at. So, I didn’t even know about sp_repldone but from the sounds of things, I’m glad I have not had the pleasure.
That all said, this is yet another SQL Server API built by computer scientists with zero real world knowledge. Do you know why SQL Server improved the pagination API to the new syntax? It was because the Entity Framework product manager was talking to Frans Bouma and he told them they have no clue how to code a correct paging implementation using SQL Server row_number windowing function. It was an epic flame job where Frans kept saying over and over that computer scientists dont know how to do design practical APIs. Needless to say, once Frans pointed out that even Microsoft’s own SQL gurus could not code a proper pager query abstraction, someone at Microsoft got the point that row_number was a hamfisted approach.
I’m hoping someone like Frans champions cleaning up this disaster of three attempts Microsoft has made now. The lack of an easy way to do the following is just annoying:
1) no way to compact duplicate rows or specify what a duplicate is (e.g., it makes some sense to change a Version timestamps column on a spurious update, but it really doesn’t have utility for temporal table logging that I can see)
2) The extra columns just complicate creating covering indexes for ORM generated queries and similar such things. This reminds me of the famous Mitch Hedburg quote, “Pepperidge Farm, that stuff is fancy. Its wrapped twice. That’s why I dont buy it. I dont need another step between me and toast.”
3) The verbose guidance on when to use columnstore index vs rowstore is just begging for simplification. Why do we have cost-based optimization in writing SQL DML queries but no cost-based optimization in writing SQL DDL queries. This is why Amazon is eating Microsoft lunch and why Microsoft gives more and more Enterprise features away every release. They lack people who understand we are too busy to read 30 pages of documentation when we can just push a button to change Aurora, and then go drink beer. It’s been 10 years since I started using Ola Hallengren’s index maintenance tool compared to the crap “maintenance plans” some intern built. Hopefully someone in the community builds the next Index Maintenance scripts or sp_WhoIsActive.
4) Figuring out how to truncate tables with historical tables attached took me 3 days of tinkering to figure out because nowhere is it explained except in an obscure stored procedure in the documentation that doesn’t even ship with the product and you have to compile yourself. REALLY! I can only imagine the sprint planning at Microsoft when that decision was made. And it’s not even the most efficient as it forces you to write the data to the transaction log. There is NO easy command to truncate a table once you tag it as history versioned.
I’m not sure I see any “disasters” here — not even close. There are implementation difficulties to be certain, but that goes with the territory. Nothing is perfect. I’ve used Change Tracking a great deal and found it to scale just fine…if you know what you’re doing. Remus Rusanu certainly knows what he’s doing, but he and I see the world in very different ways. Even more so Frans Bouma and I. (Search the archives of this blog and you’ll find his name!)
Sometimes you need to dig below the surface, understand internals, and leverage them to get something to work well. Is that a product flaw? Perhaps, but I’ve yet to see any generalized platform — e.g. a programming language or database management product — that doesn’t have its fair share of edge cases. It goes with the territory.
Looking at temporal tables, its implementation in SQL Server has been done per the ANSI Standard — which is as it should be. The API that you dislike, you can argue with ANSI, not Microsoft. If it doesn’t work well for ORMs, maybe one could argue that that’s an ORM flaw. At the end of the day it doesn’t matter much. As a developer* who understands SQL Server I can circumvent the ORM and make it work.
In any case, as with Change Tracking or any other deep engine feature, there are some additional caveats due to the way SQL Server’s various components come together. For example, the fact that “duplicate” updates are considered as updates — that’s just how the product works, and always has. Fixing that is far from trivial. One could argue back and forth on whether or not that should be the case and whether or not the product team should invest in “fixing” it, but there is a lot of history there and for now it’s just something that you need to know when you’re working with SQL Server.
Truncation of temporally-enabled tables is another one. Why can’t that “just work?” Because the product doesn’t work that way. You also can’t truncate a table referenced by foreign keys. Same basic reason. One could call that a flaw or a feature depending on your point of view. In this case I’m very much inclined toward the latter. It makes sense to protect the user from massive cascading operations and I think it’s a perfectly valid and reasonable implementation.
That’s just me and of course we’re all entitled to our own opinions. If you want to ask Microsoft to change things that you disagree with, you can create some feedback items here: https://feedback.azure.com/forums/908035-sql-server
* By the way, I never stopped being a developer. SQL Server development is still very much development 🙂
Comments are closed.