My last post in Aaron’s series drew a mixed review from some readers, and I’m sure this one will do the same. But that’s part of the fun!
One of the biggest threats to maintainability is code that’s not properly formatted. When I’m called in by a customer to debug some legacy code, often the first thing I’ll have to do is re-format the code so that I can actually figure out what’s going on. Many of Aaron’s posts have covered readability, but one of the key, key, key things that he hasn’t hit yet is indentation. The secret is simple: Whether you do it the way I like, the way Aaron likes, or some other way completely, just make certain that you do it consistently. Choose a style and stick to it, not just throughout a query but throughout every query you right. This way not only will you be able to read your code, but once they get used to your style so will other people.
The goal of an indentation scheme is to help the reader by grouping similar constructs along the same line. Code is hierarchical; if you consider a procedure, that CREATE PROCEDURE statement is the root node. Every statement in the body of the procedure is a descendant of the CREATE PROCEDURE statement, and each statement has its own descendants–for example, the predicates in the JOINs or the WHERE clause, or the elements in the column list. Improperly indented CASE expressions are a big annoyance to me, and these are similarly hierarchical, with the CASE keyword at the highest level, followed by the WHEN and ELSE conditions, their predicates, and the results of each case. Keeping all of this in mind when writing code makes it easy to see and understand what’s going on even if you’ve never seen a given piece of code before.
Unfortunately, sometimes people get lazy. Or some simply don’t care. I hope that no one reading this post falls into the latter category. As for the former, I hope that this is one area that you won’t be lazy about. There are a number of automated formatters on the market these days–some even free–so there is really no excuse. Still, I see code all the time that is confusingly indented.
Here’s an example of some code that needs work:
SELECT c.AddressLine1, c.FirstName, c.LastName, o.Subtotal, o.ShippingCost, CASE WHEN o.TotalQty > 10 THEN 'big' WHEN o.TotalQty BETWEEN 5 AND 9 THEN 'medium' ELSE 'small' END AS BoxSize FROM dbo.Orders AS o INNER JOIN dbo.Customers AS c ON o.CustomerId = c.CustomerId WHERE o.OrderDate > CONVERT(DATE, GETDATE()-1) AND c.State = 'MA'
If you think I’ve overdone this example, think again. I see code easily this carelessly formatted on a regular basis; I often wonder if the person writing the code was actually trying to confuse the next person who comes along. Where does the CASE expression start and end? How quickly do you notice that there are some predicates used as part of the JOIN condition, and others in a WHERE clause? Did it take you a moment or two to find the end of the SELECT list?
In this case, in my opinion, even no indentation at all would have been preferable to the mess above. In the following example I’ve removed all indentation, while adding a few line breaks to put each element on its own line–something I do when writing my own code.
SELECT c.AddressLine1, c.FirstName, c.LastName, o.Subtotal, o.ShippingCost, CASE WHEN o.TotalQty > 10 THEN 'big' WHEN o.TotalQty BETWEEN 5 AND 9 THEN 'medium' ELSE 'small' END AS BoxSize FROM dbo.Orders AS o INNER JOIN dbo.Customers AS c ON o.CustomerId = c.CustomerId WHERE o.OrderDate > CONVERT(DATE, GETDATE()-1) AND c.State = 'MA'
The important thing is that you pick an style and stick with it. Don’t make a mess, and don’t be like a developer I recently worked with who asked me to help him debug an extremely complex query. I walked over to his desk and found that it was indented much like the code above, so the first thing I did was to reformat it so that we could tell what was actually going on. I asked the developer how he usually formats his code and his response was less than ideal: “I just write it on one line until I feel like pressing Enter. It depends on my mood that day.” True story, I’m sorry to say.
My personal rules are simple: Each item in the SELECT list gets its own line. Or more than one line, in the case of CASE expressions and other complex expressions. Parenthesis that surround expressions sit on their own lines, and the contents are indented, similar to code between braces in C-style languages. And each predicate in a JOIN condition or a WHERE clause gets its own line. A few things I don’t do: I do not indent JOINs beyond the level of the FROM clause, as I feel that they are siblings in the hierarchy. I do not put the ON on the line as the first predicate, as many people do, because I feel that it applies to the entire set of predicates in the JOIN. And I don’t use the commas at the start of the line method because I think it just looks bad.
This style was developed in order to minimize horizontal scrolling, at the expense of some additional vertical scrolling. Some people don’t like it, but it works extremely well for me and people I’ve worked with have become converts. But I’m not telling you (in this post at least) to use my style. I’m just telling you to pick a style, some style, any style, and stick with it religiously.
Below is the same code as above reformatted once more. I think you’ll agree that even if you don’t like exactly how I do things, it’s a lot easier to read and would be easier to maintain.
SELECT c.AddressLine1, c.FirstName, c.LastName, o.Subtotal, o.ShippingCost, CASE WHEN o.TotalQty > 10 THEN 'big' WHEN o.TotalQty BETWEEN 5 AND 9 THEN 'medium' ELSE 'small' END AS BoxSize FROM dbo.Orders AS o INNER JOIN dbo.Customers AS c ON o.CustomerId = c.CustomerId WHERE o.OrderDate > CONVERT(DATE, GETDATE()-1) AND c.State = 'MA'
Enjoy!
My indentation technique is quote similar to yours. The key differences I spot:
(a) I prefer BoxSize = CASE over CASE … AS BoxSize. I still prefer to find the column name easy and not have to figure out where CASE ends to figure out the alias.
(b) I prefer placing the object name on its own line, so instead of:
FROM dbo.Orders AS o
I would have:
FROM
dbo.Orders AS o
Again, I like to have the main parts of the query very visually obvious.
(c) I prefer the ON on the same line as the first predicate. I understand your reasoning for the way you do it, but to me it just reads better my way.
(d) I always terminate my statements with semi-colons! 🙂
Sorry, the display of my comment is subject to weird Community Server rules. There is an extra carriage return in (b); I don’t leave blank lines in queries for the most part, unless I have a huge comment to include and I want to make sure it stands out (for me, or for the next person that comes along).
I totally agree with this. Badly thought out indentation is a pain and that’s why I posted this on Connect the other day:
"Format Document" in SSMS
(https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=496617)
To anyone reading, please click through and vote it up!!
-Jamie
"every query you right"
That should be "every query you write".
Here is a style I have grown to like for readability/maintainability:
SELECT
c.AddressLine1
,c.FirstName
,c.LastName
,o.Subtotal
,o.ShippingCost
,CASE
WHEN o.TotalQty > 10 THEN ‘big’
WHEN o.TotalQty BETWEEN 5 AND 9 THEN ‘medium’
ELSE ‘small’
END BoxSize
FROM dbo.Orders o
INNER JOIN
dbo.Customers c
ON o.CustomerId = c.CustomerId
WHERE o.OrderDate > CONVERT(DATE, GETDATE()-1)
AND c.State = ‘MA’
Well, some of the formatting got lost in my above example but it conveys the idea a bit.
I would like to counter your decision to place commas at the end of the line. I place them in front like so:
SELECT
c.AddressLine1
,c.FirstName
,c.LastName
,o.Subtotal
,o.ShippingCost
,CASE
WHEN o.TotalQty > 10 THEN ‘big’
WHEN o.TotalQty BETWEEN 5 AND 9 THEN ‘medium’
ELSE ‘small’
END AS BoxSize
(I like the added space to replace the comma for the first row, and may or may not add it for things like the END keyword…looks less sloppy).
This provides a few benefits:
1) Easier to read; you can see the start of every line very clearly just by scanning (similar to your logic from another post of why one should use the AS keyword). This becomes especially nice when columns turn into multi-line statements (case statements etc).
2) Easier to re-order columns when refactoring. I find when re-ordering columns that I move the last column out of it’s place (or append another column after it) 10 times more than I move the first column out of it’s place. With commas in front this becomes a simple copy/paste, eliminating two common syntax errors: the one you get for forgetting to add a comma after the previous "last row", and the one you get for leaving a comma after the new "last row".
Guys, I’m not used to make advertising, but in this case I have to write about my experience.
The first tool I install after SSMS on a new laptop is always SQL Refactor (from RedGate). It formats the SQL code the way you want, and also the last SQLPrompt tool has a SQL-reformatting feature (it’s another tool from RedGate).
Every time I have a query written by someone else (or by me before the SQL Refactor era), I reformat it and then I start reading.
Marco, my only issue with workstation-based tools like RedGate’s, as well as it works, is that I have to install (and therefore register) it on every workstation. I use easily a dozen VMs across two desktops and two laptops, and may be working on code on any one of them at any time. It is not always convenient to move to the physical machine where the tool is installed and registered (or RDP to it) in order to re-format the code. Ideally, at some point, I will have enough hooks into my co-workers that all their code comes out looking like mine. 🙂
I completely agree, Adam… poorly formatted code drives me crazy. I used to spend a lot of time updating it manually, but then someone on Twitter pointed me to a free tool, http://www.dpriver.com/pp/sqlformat.htm. Maybe it’ll save you some time and headache, too. 🙂
Any idea when the built in tools from MS will format code consistently? When they can’t do this simple task it makes for more work for us. I get especially frustrated when I manually format some SQL, then choose it and use Design Query in Editor and it blow away my formatting and returns a mess
Adam, I don’t agree that the JOIN is the sibling of FROM. I indent all the main clauses (FROM, GROUP BY, HAVING, WHERE) at one level of indent, and then further indent the portions of that clause. But I strongly agree that consistent formatting is key.
And I agree with Marco – I like SQL Refactor and SQL Prompt.
I love it! I am so on board with this topic.
One thing that hit a personal chord with this post is the need to re-format the T-SQL code. It always amazes me that there are hidden subqueries, derived tables, cases, etc buried in the unformatted code.
One of the worst offenders I’ve seen of unformatted code is in SSIS EXECUTE SQL transforms. It seems like MS didn’t really expect anyone to format the code. It does a miserable time of maintaining formatting that looks good in Query Analyzer.
Some of my personal hot buttons… use of CasE and lack of white space.
Everyone is not going to have the same style… but when I have the opportunity to promote a style I do. The message is to always develop a style.
When I do update a procedure if I have to update a query in a sproc I always put it in my style (since I am the owner of the legacy db). That way I can tell down the road, in addition to header and in-line comments, that I touched it.
Great topic.
I know this article is almost a year old, but I was curious as to what you do for GROUP BY: Do you put every item on a separate line here too?
Hi Charles,
Yes, that’s exactly what I do. My goal is to be as close as possible to 100% consistent — doing the same exact thing in every place. This means that every piece of code I write should always follow the same rules, making it very easy to read once you’ve read one piece of code, and making it so that I don’t have to think about formatting anymore.
So far, so good.
Comments are closed.