Why (and how) to split column using master..spt_va

2019-01-04 02:31发布

Subquestioning the answer to question "Split one column into multiple rows" which I re-wrote here as [ 1 ].

What is the (meaning of) Type = 'P' and why to use undocumented master..spt_values for splitting a column? What is the benefit of it?


[ 1 ]

CREATE TABLE dbo.Table1 
(
    Col1        CHAR(1),
    Col2        CHAR(1),
    Col3        CHAR(1),
    Col4        VARCHAR(50)
)
GO

INSERT INTO dbo.Table1 VALUES ('A','B','C','1,2,3')
GO
INSERT INTO dbo.Table1 VALUES ('D','E','F','6,7,8,9')
GO


SELECT
    T.col1, RIGHT(LEFT(T.col4,Number-1),
    CHARINDEX(',',REVERSE(LEFT(','+T.col4,Number-1))))
FROM
    master..spt_values,
    table1 T
WHERE
    Type = 'P' AND Number BETWEEN 1 AND LEN(T.col4)+1 AND
    (SUBSTRING(T.col4,Number,1) = ','
    -- OR SUBSTRING(T.col4,Number,1)  = '') --this does not work correctly anyway

Related question:

5条回答
我命由我不由天
2楼-- · 2019-01-04 02:48

Purpose

Why use undocumented master..spt-values

Sybase, and therefore its bastard son MS SQL, provide various features and functions for the product, that is implemented in system procedures (as opposed to the binaries like sqlserver, which are started as a service). These system procedures procedures are written in SQL code and named sp_%. Except for some secret internals, they have the same limitations and needs as any other SQL code. They are part of the Sybase ASE or SQL Server product. As such, they are not required to document it; and the internal bits cannot be reasonably labelled as "undocumented".

master..spt_values contains all the various bits and pieces that the said system procedures need, in an SQL table, to produce the various reports. The sp means system procedure; spt means tables for system procedures; and of course values is the content.

Lookup Tables

What is the (meaning of) Type = 'P'

People often describe spt_values as "de-normalised", but that is the incorrect term. The correct term is folded, or packed. It is 26 or so logical Lookup tables, each beautifully Normalised, folded into one physical table, with a Type column to differentiate the logical tables.

Now in a normal database, that would be a gross error (just look at the answers for "one lookup table or many"). But in a server catalogue, it is desirable, it replaces 26 physical tables.

  • "L" stands for for LockType Lookup; "V" stands for DeviceType Lookup (V is short for Device throughout the server); etc. Type "P2" contains bitwise ordinals, for the expansion of bits that are packed into an INT.

  • A set of consecutive numbers within known bounds, that is available in the form of an SQL table is required, in order to perform a Projection, which many of the system procedures have to do. Type "P" is a list of consecutive numbers between 0 and 2047.

  • The term Projection is used here as the technically precise meaning, the natural logical sense, not the relational algebra meaning, which is unnatural.

There is therefore just one purpose for spt_values, to contain 26 folded, otherwise separate, Reference tables, and one Projection table.

Expansion

The ordinary use of spt_values then, is as an ordinary Lookup or Reference or ENUM table. First, the Lookup values:

    SELECT *                    -- list Genders
        FROM Gender 

It is used in the same way that Person has a GenderCode that needs to be expanded (very expanded, these freaky days):

    SELECT  P.*,                -- list Person
            G.Name              -- expand GenderCode to Name
        FROM Person P
        JOIN Gender G
            ON P.GenderCode = G.GenderCode

Eg. sp_lock produces a report of active locks, displaying lock types as string names. But master..syslocks contains lock types as numbers, it does not contain those names; and if it did, it would be a badly denormalised table ! If you execute the query (Sybase ASE code, you will have to convert):

    SELECT *                    -- list LockTypes
        FROM master..spt_values 
        WHERE type = "L"

you will notice 66 LockType numbers and names in the Lookup table. That allows sp_lock to execute simple code like Person::Gender above:

    SELECT  spid,               -- list Active Locks
            DB_NAME(dbid),
            OBJECT_NAME(id, dbid),
            v.name,             -- expand lock name
            page,
            row
    FROM master..syslocks   L,
         master..spt_values LT
    WHERE L.type = LT.number    -- 
    AND   type = "L"            -- LockType Lookup table
    ORDER by 1, 2, 3, 4, 5, 6   -- such that perusal is easy

Projection

What is the (meaning of) Type = 'P' ?

What is Projection and how is it used ?

Say, for example, instead of the active locks produced by the query above, you wanted a list of all 66 LockTypes, showing the number of active locks (or Null). You don't need a cursor, or a WHILE loop. We could Project the LockType Lookup table, through the count of active locks:

    SELECT  LT.name,            -- list LockTypes
            [Count] = (         -- with count
        SELECT COUNT(*)
            FROM master..syslocks
            WHERE type = LT.number
                )
        FROM master..spt_values LT
        WHERE type = "L"

There are several methods, that is just one. Another method is to use a Derived Table instead of the Subquery. But you still need the Projection.

That is typically what spt_values is used for, either Expansion or Projection. Now that you know it is there, you can use it too. It is safe (in the master database) and used by virtually all the system procedures, which means the system procedures cannot run without it.

for splitting a column?

Ah, you do not understand the "Split one CSV column into multiple rows" code.

  • Forget about spt_values for a moment, and examine that code again. It just needs a list of consecutive numbers, so that in can step through the list of values in the CSV column, byte by byte. The code is activated only for each byte that is a comma, or end-of-string.

  • Where to get a set of consecutive numbers in the form of an SQL table, rather than CREATing one from scratch and INSERTing into it? Why, master..spt_values of course. If you know it is there.

  • (You can learn a bit about the internals of ASE or SQL Server, just by reading the code of the system stored procedures.)

  • Note that any CSV field in one column is a gross Normalisation error, it breaks 2NF (contains repeating values) and 1NF (not atomic). Note, that is not packed or folded, it is a repeating group, it is un-normalised. One of the many negative consequences of such a gross error is, instead of using simple SQL to navigate the repeating group as rows, one has to use complex code to determine and extract the content of the un-normalised CSV field. Here spt_values P provides a vector for that complex code, making it easier.

What is the benefit of it?

I think I have answered that. If you did not have it, every system procedure that requires a list of Numbers would have to CREATE a temp table; and INSERT the rows into it; before running its code. Of course, not having to perform those steps, makes the system procedures much faster.

Now, when you need to perform a Projection, eg. calendar dates in the future, or whatever, you can use spt_values, instead of having to create your own temp table each time (or create your own private permanent table and maintain it).

查看更多
姐就是有狂的资本
3楼-- · 2019-01-04 02:54

I know this is an older post but thought I'd add an update. Tally Table and cteTally table based splitters all have a major problem. They use concatenated delimiters and that kills their speed when the elements get wider and the strings get longer.

I've fixed that problem and wrote an article about it which may be found at he following URL. http://www.sqlservercentral.com/articles/Tally+Table/72993/

The new method blows the doors off of all While Loop, Recursive CTE, and XML methods for VARCHAR(8000).

I'll also tell you that a fellow by the name of "Peter" made an improvement even to that code (in the discussion for the article). The article is still interesting and I'll be updating the attachments with Peter's enhancements in the next day or two. Between my major enhancement and the tweek Peter made, I don't believe you'll find a faster T-SQL-Only solution for splitting VARCHAR(8000). I've also solved the problem for this breed of splitters for VARCHAR(MAX) and am in the process of writing an article for that, as well.

查看更多
萌系小妹纸
4楼-- · 2019-01-04 02:58

Many common solutions for splitting strings in TSQL require a list of numbers; in this case, someone is using the spt_values table to provide them. By inspection, this query returns a list of 2048 consecutive integers:

select number from master..spt_values where type = 'P'

I assume that the original query writer used spt_values for the integers because a) it is 'guaranteed' to be available and therefore the query will always work, and b) it avoids a lengthy explanation about alternative ways to get the integers.

The major disadvantage is that the table is undocumented and therefore using it is potentially confusing and also somewhat risky (at least in principle, an upgrade or service pack might change the table data or structure, or even remove it completely).

There are many alternative ways to get a list of numbers without using undocumented tables (I use a table-valued function):

SQL, Auxiliary table of numbers

查看更多
再贱就再见
5楼-- · 2019-01-04 03:02

Now it is working fine

SELECT T.col1, RIGHT(LEFT(T.col4,Number-1),CHARINDEX(',',REVERSE(LEFT(','+T.col4,Number-1))))
FROM
    master..spt_values,
    table123 T
WHERE
    Type = 'P' AND Number BETWEEN 1 AND LEN(T.col4)+1 AND
    (SUBSTRING(T.col4,Number,1) = ','
  OR SUBSTRING(T.col4,Number,1)  = '')
查看更多
成全新的幸福
6楼-- · 2019-01-04 03:12

In SQL Server 2016 now we have new functionality String_Split which we can use to split columns.

For example, here is the script:

DECLARE @String NVARCHAR(1000) = 'abc,def,ghi,jkl,mno,pqr,stu,vw,xyz';
SELECT * FROM STRING_SPLIT(@String,',');

Upon execution of above script, it will return us following result.

resultof splitstring

Well, that's it. Here is resource by Kathi, where she has compared performance.

查看更多
登录 后发表回答