Premise
I recently ran into a bug in a select
statement in my code. It was fairly trivial to fix after I realized what was going on, but I'm interested in finding a way to make sure a similar bug doesn't happen again.
Here's an example of an offending query:
select
the,
quick,
brown
fox,
jumped,
over,
the,
lazy,
dog
from table_name;
What I had intended was:
select
the,
quick,
brown,
fox,
jumped,
over,
the,
lazy,
dog
from table_name;
For those who don't see it, a comma is missing after brown
in the former. This causes the column to be aliased, because the as
keyword is not required. So, what you get in the result is:
the,
quick,
fox,
jumped,
over,
the,
lazy,
dog
...with all the values of brown
in a column named fox
. This can be noticed pretty easily for a short query like the above (especially when each column has very different values), but where it came up was in a fairly complicated query with mostly integer columns like this:
select
foo,
bar,
baz,
another_table.quux,
a1,
a2,
a3,
a4,
a5,
a6,
a7,
a8,
a9,
a10,
a11,
a12,
a13,
a14,
a15,
a16,
b1,
b2,
b3,
b7,
b8,
b9,
b10,
b11,
b12,
b13,
b14,
b18,
b19,
b20,
b21,
c1,
c2,
c3,
c4,
c5,
c6,
c7,
c8
from table_name
join another_table on table_name.foo_id = another_table.id
where
blah = 'blargh'
-- many other things here
;
Even with better column names, the values are all very similar. If I were to miss a comma after b11
(for example) and then all of the b11
values get called b12
, it's pretty unfortunate when we run the data through our processing pipeline (which depends on these column names in the result). Normally, I'd do select * from table_name
, but what we needed required us to be a little more selective than that.
Question
What I'm looking for is a strategy to stop this from happening again.
Is there a way to require as
when aliasing columns? Or a trick of writing things to make it give an error? (For example, in C-like languages, I started writing 1 == foo
instead of foo == 1
to cause a compile error when I accidentally left out an equal sign, making it the invalid 1 = foo
instead of foo = 1
.)
I use vim
normally, so I can use hlsearch
to highlight commas just so I can eyeball it. However, I have to write queries in other environments quite often, including a proprietary interface in which I can't do something like this easily.
Thanks for your help!
You could wrap your SQL calls in a function that would either:
or
write a comma before the name
vs
also makes it really easy to see the list of sql select arguments
works in any IDE :)
One thing that I've done before is to move the commas to the beginning of the line. This allows some benefits. First, you can instantly see if there are any commas missing. Second, you can add a new column at the end without having to modify the previously last line.
Missing:
Not missing:
I have the same problem that you do. I have used make and the perl script to do a "lint" like check on my code for a long time. It has helped prevent a number of mistakes like this. In the makefile I have:
The perl file is:
I surround my select statements with comments //start-sql and //end-sql. I hope this helps. I have changed the regular expression to reflect how you formatted your SQL as I have been using a different format (with the commas in the front).
As a part of my build/test process I run a set of checks over the code. This is a less than perfect solution but it has helped me.
(I am having a little difficulty with the stackoverflow rich text editor changing my code. Hopefully I will learn how to properly use it.)
If you have columns with similar names, distinguished only by suffix numbers, you've already lost. You have a bad database design.
And most modern developers use SQL generators or ORMs these days, instead of writing this "assembly language" SQL.