Formatting Clear and readable SQL queries

2019-02-05 15:14发布

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.

We're not using views so that's out of the question.

After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.

What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?

11条回答
霸刀☆藐视天下
2楼-- · 2019-02-05 15:25

Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.

Common Table Expressions can really help in some cases to break up a query into meaningful sections.

查看更多
Luminary・发光体
3楼-- · 2019-02-05 15:26

I like to use something like:

SELECT    col1,
          col2,
          ...
FROM
    MyTable as T1
INNER JOIN
    MyOtherTable as T2
        ON t1.col1 = t2.col1
        AND t1.col2 = t2.col2
LEFT JOIN
    (   
        SELECT 1,2,3
        FROM Someothertable
        WHERE somestuff = someotherstuff
    ) as T3
    ON t1.field = t3.field
查看更多
The star\"
4楼-- · 2019-02-05 15:28

With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.

Example:

WITH 
  cubed_data AS (
     SELECT 
        dimension1_id,
        dimension2_id,
        dimension3_id,
        measure_id,
        SUM(value) value
     FROM
        source_data
     GROUP BY
        CUBE(dimension1, dimension2, dimension3),
        measure
  ), 
  dimension1_label AS(
     SELECT 
        dimension1_id,
        dimension1_label
     FROM 
        labels 
     WHERE 
        object = 'dimension1'
  ), ...
SELECT 
  *
FROM  
  cubed_data
  JOIN dimension1_label USING (dimension1_id)
  JOIN dimension2_label USING (dimension2_id)
  JOIN dimension3_label USING (dimension3_id)
  JOIN measure_label USING (measure_id)

The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.

WITH works at least on current versions of Postgres, Oracle and SQL Server

查看更多
聊天终结者
5楼-- · 2019-02-05 15:28

Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.

查看更多
走好不送
6楼-- · 2019-02-05 15:29

Table aliases and simple consistency will get you a long, long way

What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).

Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.

Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.

Indenting logically the same level of statement (subqueries, opening brackets, etc)

Capitalize all keywords and standard functions.

Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.

EDIT example:

SELECT a.name, SUM(b.tax)
FROM   db_prefix_registered_users a 
       INNER JOIN db_prefix_transactions b 
           ON a.id = b.user_id
       LEFT JOIN db_countries
           ON b.paid_from_country_id = c.id
WHERE  a.type IN (1, 2, 7) AND
       b.date < (SELECT MAX(date) 
                 FROM audit) AND
       c.country = 'CH'

So, at the end to sum it up - consistency matters the most.

查看更多
ら.Afraid
7楼-- · 2019-02-05 15:29

The only true and right way to format SQL is:

SELECT t.mycolumn        AS column1
      ,t.othercolumn     AS column2
      ,SUM(t.tweedledum) AS column3
FROM   table1 t
      ,(SELECT u.anothercol
              ,u.memaw                  /*this is a comment*/
        FROM   table2       u
              ,anothertable x
        WHERE  u.bla       = :b1        /*the bla value*/
        AND    x.uniquecol = :b2        /*the widget id*/
       ) v
WHERE  t.tweedledee = v.anothercol
AND    t.hohum      = v.memaw
GROUP BY t.mycolumn
        ,t.othercolumn
HAVING COUNT(*) > 1
;

;)

Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.

查看更多
登录 后发表回答