Many people have read, reviewed and even implemented the SQL style guide that I wrote. This is great, but there have also been a number of commonly held misconceptions or incorrect readings of the points made in the guide. I have decided to address some of the more common ones via a blog post that will, hopefully, clarify the situation.
Basics
A lot of people seem to have a very weird understanding of the basic terms used in the guide so here is what I mean when I say ‘Avoid’ and ‘Try’:
In this context I mean that where it does not make sense (for performance, readability, etc reasons) then don’t follow the guide. Follow it where possible and be mindful that when you deviate you’re adding to tech debt.
I wrote the guide for people to employ with their brains engaged and not just to blindly follow it. It is just a guide at the end of the day and not the law! You may have an exception that the guide says to avoid - try to, but if you can’t then you can’t. Document it and highlight why the problem has been introduced to the application and move on.
Comma positioning
So many people took issue with this - they desperately wanted commas before terms. I like to stick to reading conventions as much as possible as it makes code far more legible to me. In the English language, a comma separated list always places the comma immediately after a term.
SELECT manufacturer, model, engine_size
FROM motorbikes;
Note that commenting out the engine_size
column here would cause a SQL parser error because there
would be a dangling comma left after the model
column.
Many people, when coding SQL, like to place the comma at the beginning of a term as they seem to think it makes it easier to comment out parts of a query.
SELECT manufacturer
, model
, engine_size
FROM motorbikes;
There’s no way of being polite here - this looks hideous, weird and totally unconventional in a bad way.
A proponent of this style would contend that they can now easily comment out the engine_size
or
model
column if they needed to.
SELECT manufacturer
, model
-- , engine_size
FROM motorbikes;
The query still works, yay! This MUST be better. Well until you try to comment out manufacturer
and
realise all you’ve succeeded in doing is moving the problem to the other end of the list!
SELECT -- manufacturer
, model
, engine_size
FROM motorbikes;
Boom! A big SQL parser error because of the stray comma left in front of the model
column. You’d
be asking the parser to run the following broken SQL code:
SELECT , model
, engine_size
FROM motorbikes;
See what I mean now?
So just stick to the English language convention for legible queries and your query will be just as easy to comment out anyway.
Object oriented (OOP)
This is one of my favourite rules but has apparently left quite a few people scratching their heads. Given its importance, it makes sense to cover it briefly here.
Object oriented design principles do not effectively translate to relational database designs—avoid this pitfall
Fortunately, this is pretty simple and shouldn’t require too much explaining, but it is a very an important aspect of the guide and should not be ignored.
In its most simplified form, I am trying to say that you should not design your database with the schema (table structure) dictated by the application level code objects that access it. Do you use an ORM? So what, who cares? Your database certainly shouldn’t!
The database’s primary concern should be with the most efficient storage of data in a relational structure that emphasises normalisation. You can choose to ignore this, but you need to know that you’ve just introduced a pungent smell into your application.
This desire to allow OOP thinking infect the database layer is known by the snappy title “Object-relational impedance mismatch” and there is a nice Wikipedia article of the same name that goes into far more detail on this topic.
id
columns and surrogate keys
Another fun topic that has been bashed to death when discussing the SQL style guide is the banning of surrogate keys. You do not need them in most cases as better keys often exist in the data already.
Of course, like I mentioned before, you can use them at your own peril - we’re discussing a guide here - not the law.
So if you choose to use an ORM that requires them then use them! I have made recommendations for best practice, but you’re free to disregard them. Just because I think that’s foolish doesn’t mean that it is wrong for you or your project. Make your own choices - you’ve been warned, however.
Avoiding vendor/proprietary functions
Many readers have struggled with the recommendation to only use standard SQL in the erroneously believing that the sole reason for this is to promote SQL portability between engines. What about ease of reading for other developers, reducing complexity and portability of you - the developer?
Why you would willingly choose a proprietary solution when a standard SQL method already exists is beyond me. I really do fail to see the problem that readers have here with this recommendation. You’re introducing complexity where it is unnecessary.
One reader was so incensed by this simple rule that they felt unable to continue reading the guide. This, of course, made me giggle at the absurdity of such a stance.
UPDATE 19/11/2024: 8 years on and this is still something that I see developers taking issue with on Hacker News. So, I figured I would expound upon the ideas here in a new article: A note on code portability.
Joins over the river
Another point of contention is the style of putting table joins on the other side of the river (east). There have been a few readers that find this interrupts their reading flow or they don’t like the way it makes the whole query look.
SELECT r.last_name
FROM riders AS r
INNER JOIN bikes AS b
ON r.bike_vin_num = b.vin_num
AND b.engines > 2
INNER JOIN crew AS c
ON r.crew_chief_last_name = c.last_name
AND c.chief = 'Y';
Here are a few reasons for this style of join layout:
- The join is an addition to the
FROM
clause so it makes logical sense for it to exist grouped under it - It provides a nice clean way of specifying join conditions (eg.
AND b.engines > 2
) - They’re an alteration or operation of the
FROM
clause - Join syntax doesn’t play nicely with the river either
Simple.
I really dislike X and Y
That’s perfectly understandable and precisely why I made it so easy to fork the guide. The core of the guide is even a separate Markdown file you can drop into a project repository and start editing it.
The whole idea of the guide is that it gives you a nice set of defaults to work from and a nicely formatted output. You can enjoy these even with a completely different set of guidelines.
Where is the justification?
It beggars belief how many readers claim to have read the guide and yet I still get messages asking for justifications. At the very top, the introductory paragraph clearly states that Celko’s book is the place to find in-depth discussion of each point. Discussions of justification have no place in a style guide anyway.
So that is where you’ll find all the justification you could need. Buy it!
Who do you think you are telling me what to do?
Interestingly, some readers have laboured under the misapprehension that this guide is somehow the law. Far from it, as the name guide suggests these are suggestions or guidelines. Right at the top of the guide, there is a clear link to where it can be forked to suit your tastes. Use it as a handy base and hack away at it!
Conclusion
So the guide has been controversial, but that is just fine. You don’t have to like it or agree with it. All I’d hoped for was that it might get you thinking about standards in your projects and talking about well-formed SQL. If it was adopted by some then all the better of course.
An impetus if you will.