Formatting Clear and readable SQL queries

2019-02-05 15:14发布

问题:

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.

We're not using views so that's out of the question.

After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.

What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?

回答1:

With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.

Example:

WITH 
  cubed_data AS (
     SELECT 
        dimension1_id,
        dimension2_id,
        dimension3_id,
        measure_id,
        SUM(value) value
     FROM
        source_data
     GROUP BY
        CUBE(dimension1, dimension2, dimension3),
        measure
  ), 
  dimension1_label AS(
     SELECT 
        dimension1_id,
        dimension1_label
     FROM 
        labels 
     WHERE 
        object = 'dimension1'
  ), ...
SELECT 
  *
FROM  
  cubed_data
  JOIN dimension1_label USING (dimension1_id)
  JOIN dimension2_label USING (dimension2_id)
  JOIN dimension3_label USING (dimension3_id)
  JOIN measure_label USING (measure_id)

The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.

WITH works at least on current versions of Postgres, Oracle and SQL Server



回答2:

Boy is this a loaded question. :) There are as many ways to do it right as there are smart people on this site. That said, here is how I keep myself sane when building complex sql statements:

select
    c.customer_id
   ,c.customer_name
   ,o.order_id
   ,o.order_date
   ,o.amount_taxable
   ,od.order_detail_id
   ,p.product_name
   ,pt.product_type_name
from
    customer c
inner join
    order o
    on c.customer_id = o.customer_id
inner join
    order_detail od
    on o.order_id = od.order_id
inner join
    product p
    on od.product_id = p.product_id
inner join
    product_type pt
    on p.product_type_id = pt.product_type_id
where
    o.order_date between '1/1/2011' and '1/5/2011'
and
    (
        pt.product_type_name = 'toys'
     or
        pt.product_type_name like '%kids%'
    )
order by
    o.order_date
   ,pt.product_type_name
   ,p.product_name

If you're interested, I can post/send layouts for inserts, updates and deletes as well as correlated subqueries and complex join predicates.

Does this answer your question?



回答3:

Table aliases and simple consistency will get you a long, long way

What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).

Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.

Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.

Indenting logically the same level of statement (subqueries, opening brackets, etc)

Capitalize all keywords and standard functions.

Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.

EDIT example:

SELECT a.name, SUM(b.tax)
FROM   db_prefix_registered_users a 
       INNER JOIN db_prefix_transactions b 
           ON a.id = b.user_id
       LEFT JOIN db_countries
           ON b.paid_from_country_id = c.id
WHERE  a.type IN (1, 2, 7) AND
       b.date < (SELECT MAX(date) 
                 FROM audit) AND
       c.country = 'CH'

So, at the end to sum it up - consistency matters the most.



回答4:

Generally, people break lines on reserved words, and indent any sub-queries:

SELECT *
FROM tablename
WHERE value in
   (SELECT *
   FROM tablename2 
   WHERE condition)
ORDER BY column


回答5:

In general, I follow a simple hierarchical set of formatting rules. Basically, keywords such as SELECT, FROM, ORDER BY all go on their own line. Each field goes on its own line (in a recursive fashion)

SELECT 
    F.FIELD1,
    F.FIELD2,
    F.FIELD3
FROM
    FOO F 
WHERE 
    F.FIELD4 IN 
    (
        SELECT 
            B.BAR
        FROM 
            BAR B
        WHERE
            B.TYPE = 4
            AND B.OTHER = 7
    )


回答6:

I like to use something like:

SELECT    col1,
          col2,
          ...
FROM
    MyTable as T1
INNER JOIN
    MyOtherTable as T2
        ON t1.col1 = t2.col1
        AND t1.col2 = t2.col2
LEFT JOIN
    (   
        SELECT 1,2,3
        FROM Someothertable
        WHERE somestuff = someotherstuff
    ) as T3
    ON t1.field = t3.field


回答7:

The only true and right way to format SQL is:

SELECT t.mycolumn        AS column1
      ,t.othercolumn     AS column2
      ,SUM(t.tweedledum) AS column3
FROM   table1 t
      ,(SELECT u.anothercol
              ,u.memaw                  /*this is a comment*/
        FROM   table2       u
              ,anothertable x
        WHERE  u.bla       = :b1        /*the bla value*/
        AND    x.uniquecol = :b2        /*the widget id*/
       ) v
WHERE  t.tweedledee = v.anothercol
AND    t.hohum      = v.memaw
GROUP BY t.mycolumn
        ,t.othercolumn
HAVING COUNT(*) > 1
;

;)

Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.



回答8:

Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.



回答9:

Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.

Common Table Expressions can really help in some cases to break up a query into meaningful sections.



回答10:

An age-old question with a thousand opinions and no one right answer, and one of my favorites. Here's my two cents.

With regards to subqueries, lately I've found it easier to follow what's going on with "extreme" indenting and adding comments like so:

SELECT mt.Col1, mt.Col2, subQ.Dollars
 from MyTable1 mt
  inner join (--  Get the dollar total for each SubCol
              select SubCol, sum(Dollars) Dollars
               from MyTable2
               group by SubCol) subQ
   on subQ.SubCol = mt.Col1
 order by mt.Col2

As for the other cent, I only use upper case on the first word. With pages of run-on queries, it makes it a bit easier to pick out when a new one starts.

Your mileage will, of course, vary.



回答11:

Wow, alot of responses here, but one thing I haven't seen in many is COMMENTS! I tend to add a lot of comments throughout, especially with large SQL statements. Formatting is important, but well placed and meaningful comments are extremely important, not just for you but the poor soul who needs to maintain the code ;)