Working with Temporal Data: Representing More Than Just Time
It’s probably fair to assert that time is a central component of every possible database of any interest to anyone. Imagining a database that lacks a time component is tantamount to imagining life without time passing; it simply doesn’t make sense. Without a time axis, it is impossible to describe the number of purchases made last month, the average temperature of the warehouse during the night in July, or the maximum duration that callers were required to hold the line when calling in for technical support.
Although utterly important to our data, few developers commit to really thinking in depth about the intricacies required to process temporal data successfully. Unfortunately, working with time-based data in SQL databases can be troublesome even in the best of cases. And as SQL Server developers we must be especially clever, as the DBMS leaves out a few of the better temporal features specified in the ANSI Standard.
In this and upcoming posts, Data Education founder and trainer Adam Machanic will explore the ins and outs of dealing with time in SQL Server. He will explain some of the different types of temporal requirements you might encounter, and describe how best to tackle some common—and surprisingly complex—temporal queries. In this first post on the topic, Adam discusses the many categories into which temporal data can fall.
When thinking of “temporal” data in SQL Server, the usual idea that springs to mind is a column typed as
DATETIME, representing the time that some action took (or will take) place. However, this is merely one of several possible ways that temporal data can be implemented. Temporal data can be segmented into the following categories:
Instance-based data is typified by the typical
DATETIME column described above. Scenarios in which you might model an instance include the moment a customer walks into a store, the moment a customer makes a purchase, or the moment any other kind of event takes place that you might need to record, or log, into the database. The key factor to recognize is that you’re describing a specific instant in time, based on the precision of the data type you use.
Interval-based data extends on the idea of an instance, describing a start point and an end point. A subset of interval-based data is the idea of a duration. Depending on your requirements, intervals may be modeled using a) two
DATETIME columns, b) a
DATETIME column and another column (usually numeric) that represents the amount of time that passed, or c) only a column representing an amount of time.
Period-based data is similar to interval-based data, but the questions it answers are slightly different. When working with an interval or duration, the question is “how long,” whereas for a period the question is “when?” Examples of periods include, “next month,” “yesterday,” “Labor Day,” and “the holiday season.” Although these are similar to—and can be represented by—intervals, the mindset of working with periods is slightly different and it is therefore important to realize that other options exist for modeling them. (I’ll discuss periods in more detail in a future post about working with calendar tables.)
Bitemporal data is temporal data that falls into any of the above categories but also includes a separate time component (known as a “valid time” or, more loosely, an “as of date”) indicating when the data was considered to be valid. This data pattern is commonly used in data warehouses, both for slowly changing dimensions and for updating semi-additive fact data. When querying the database bitemporally, the question transforms from “on a certain day, what happened” to “as of a certain day, what did we think happened on a certain (other) day?” The question might also be phrased as, “What is the most recent idea we have of what happened on a certain day?” This mindset can take a bit of thought to really understand; I will also discuss bitemporal data in more detail in later blog posts.