How to get column-level dependencies in a view

2020-06-07 05:35发布

问题:

I've made some research on the matter but don't have solution yet. What I want to get is column-level dependencies in a view. So, let's say we have a table like this

create table TEST(
    first_name varchar(10),
    last_name varchar(10),
    street varchar(10),
    number int
)

and a view like this:

create view vTEST
as
    select
        first_name + ' ' + last_name as [name],
        street + ' ' + cast(number as varchar(max)) as [address]
    from dbo.TEST

What I'd like is to get result like this:

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST

I've tried sys.dm_sql_referenced_entities function, but referencing_minor_id is always 0 there for views.

select
    referencing_minor_id,
    referenced_schema_name + '.' + referenced_entity_name as depends_on_table_name,
    referenced_minor_name as depends_on_column_name
from sys.dm_sql_referenced_entities('dbo.vTEST', 'OBJECT')

referencing_minor_id depends_on_table_name depends_on_column_name
-------------------- --------------------- ----------------------
0                    dbo.TEST              NULL
0                    dbo.TEST              first_name
0                    dbo.TEST              last_name
0                    dbo.TEST              street
0                    dbo.TEST              number

The same is true for sys.sql_expression_dependencies and for obsolete sys.sql_dependencies.

So do I miss something or is it impossible to do?

There're some related questions (Find the real column name of an alias used in a view?), but as I said - I haven't found a working solution yet.

EDIT 1: I've tried to use DAC to query if this information is stored somewhere in System Base Tables but haven't find it

回答1:

This solution could answer your question only partially. It won't work for columns that are expressions.

You could use sys.dm_exec_describe_first_result_set to get column information:

@include_browse_information

If set to 1, each query is analyzed as if it has a FOR BROWSE option on the query. Additional key columns and source table information are returned.

CREATE TABLE txu(id INT, first_name VARCHAR(10), last_name VARCHAR(10));
CREATE TABLE txd(id INT, id_fk INT, address VARCHAR(100));

CREATE VIEW v_txu
AS
SELECT t.id AS PK_id,
       t.first_name  AS name,
       d.address,
       t.first_name + t.last_name AS name_full
FROM txu t
JOIN txd d
  ON t.id = d.id_fk

Main query:

SELECT name, source_database, source_schema,
      source_table, source_column 
FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM v_txu', null, 1) ;  

Output:

+-----------+--------------------+---------------+--------------+---------------+
|   name    |   source_database  | source_schema | source_table | source_column |
+-----------+--------------------+---------------+--------------+---------------+
| PK_id     | fiddle_0f9d47226c4 | dbo           | txu          | id            |
| name      | fiddle_0f9d47226c4 | dbo           | txu          | first_name    |
| address   | fiddle_0f9d47226c4 | dbo           | txd          | address       |
| name_full | null               | null          | null         | null          |
+-----------+--------------------+---------------+--------------+---------------+

DBFiddleDemo



回答2:

It is a solution based on query plan. It has some adventages

  • almost any select queries can be processed
  • no SchemaBinding

and disadventages

  • has not been tested properly
  • can become broken suddenly if Microsoft change XML query plan.

The core idea is that every column expression inside XML query plan is defined in "DefinedValue" node. First subnode of "DefinedValue" is a reference to output column and second one is a expression. The expression computes from input columns and constant values. As mentioned above It's based only on empirical observation and needs to be tested properly.

It's a invocation example:

exec dbo.GetColumnDependencies 'select * from dbo.vTEST'

target_column_name | source_column_name        | const_value
---------------------------------------------------
address            | Expr1007                  | NULL
name               | Expr1006                  | NULL
Expr1006           | NULL                      | ' '
Expr1006           | [testdb].[dbo].first_name | NULL
Expr1006           | [testdb].[dbo].last_name  | NULL
Expr1007           | NULL                      | ' '
Expr1007           | [testdb].[dbo].number     | NULL
Expr1007           | [testdb].[dbo].street     | NULL

It's code. First of all get XML query plan.

declare @select_query as varchar(4000) = 'select * from dbo.vTEST' -- IT'S YOUR QUERY HERE.
declare @select_into_query    as varchar(4000) = 'select top (1) * into #foo from (' + @select_query + ') as src'
      , @xml_plan             as xml           = null
      , @xml_generation_tries as tinyint       = 10
;
while (@xml_plan is null and @xml_generation_tries > 0) -- There is no guaranty that plan will be cached.
begin 
  execute (@select_into_query);
  select @xml_plan = pln.query_plan
    from sys.dm_exec_query_stats as qry
      cross apply sys.dm_exec_sql_text(qry.sql_handle) as txt
      cross apply sys.dm_exec_query_plan(qry.plan_handle) as pln
    where txt.text = @select_into_query
  ;
end
if (@xml_plan is null
) begin
    raiserror(N'Can''t extract XML query plan from cache.' ,15 ,0);
    return;
  end
;

Next is a main query. It's biggest part is recursive common table expression for column extraction.

with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
                  ,'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as shp -- Used in .query() for predictive namespace using. 
)
    , cte_column_dependencies as
    (

The seed of recursion is a query that extracts columns for #foo table that store 1 row of interested select query.

select
    (select foo_col.info.query('./ColumnReference') for xml raw('shp:root') ,type) -- Becouse .value() can't extract attribute from root node.
      as target_column_info
  , (select foo_col.info.query('./ScalarOperator/Identifier/ColumnReference') for xml raw('shp:root') ,type)
      as source_column_info
  , cast(null as xml) as const_info
  , 1 as iteration_no
from @xml_plan.nodes('//Update/SetPredicate/ScalarOperator/ScalarExpressionList/ScalarOperator/MultipleAssign/Assign')
        as foo_col(info)
where foo_col.info.exist('./ColumnReference[@Table="[#foo]"]') = 1

The recursive part searches for "DefinedValue" node with depended column and extract all "ColumnReference" and "Const" subnodes that used in column expression. It's over complicated by XML to SQL conversions.

union all    
select
    (select internal_col.info.query('.') for xml raw('shp:root') ,type)
  , source_info.column_info
  , source_info.const_info
  , prev_dependencies.iteration_no + 1
from @xml_plan.nodes('//DefinedValue/ColumnReference') as internal_col(info)
  inner join cte_column_dependencies as prev_dependencies -- Filters by depended columns.
        on prev_dependencies.source_column_info.value('(//ColumnReference/@Column)[1]' ,'nvarchar(4000)') = internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and exists (select prev_dependencies.source_column_info.value('(.//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)'))
  cross apply ( -- Becouse only column or only constant can be places in result row.
            select (select source_col.info.query('.') for xml raw('shp:root') ,type) as column_info
                 , null                                                              as const_info
              from internal_col.info.nodes('..//ColumnReference') as source_col(info)
            union all
            select null                                                         as column_info
                 , (select const.info.query('.') for xml raw('shp:root') ,type) as const_info
              from internal_col.info.nodes('..//Const') as const(info)
        ) as source_info
where source_info.column_info is null
    or (
        -- Except same node selected by '..//ColumnReference' from its sources. Sorry, I'm not so well to check it with XQuery simple.
            source_info.column_info.value('(//@Column)[1]' ,'nvarchar(4000)') <> internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and (select source_info.column_info.value('(//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)')) is null
      )
)

Finally, It's select statement that convert XML to appropriate human text.

select
  --  col_dep.target_column_info
  --, col_dep.source_column_info
  --, col_dep.const_info
    coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.target_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as target_column_name
  , coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.source_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as source_column_name
  , col_dep.const_info.value('(/shp:root/shp:Const/@ConstValue)[1]' ,'nvarchar(4000)')
    as const_value
from cte_column_dependencies as col_dep
order by col_dep.iteration_no ,target_column_name ,source_column_name
option (maxrecursion 512) -- It's an assurance from infinite loop.


回答3:

All what you need is mentioned into definition of view.

so we can extract this information via following the next steps:-

  1. Assign the view definition into a string variable.

  2. Split it with (,) comma.

  3. Split the alias with (+) plus operator via using CROSS APPLY with XML.

  4. use the system tables for getting the accurate information like original table.

Demo:-

Create PROC psp_GetLevelDependsView (@sViewName varchar(200))
AS
BEGIN

    Declare @stringToSplit nvarchar(1000),
            @name NVARCHAR(255),
            @dependsTableName NVARCHAR(50),
            @pos INT

    Declare @returnList TABLE ([Name] [nvarchar] (500))

    SELECT TOP 1 @dependsTableName= table_schema + '.'+  TABLE_NAME
    FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE

    select @stringToSplit = definition
    from sys.objects     o
    join sys.sql_modules m on m.object_id = o.object_id
    where o.object_id = object_id( @sViewName)
     and o.type = 'V'

     WHILE CHARINDEX(',', @stringToSplit) > 0
     BEGIN
        SELECT @pos  = CHARINDEX(',', @stringToSplit)  
        SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1)

        INSERT INTO @returnList 
        SELECT @name

        SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
     END

     INSERT INTO @returnList
     SELECT @stringToSplit

    select COLUMN_NAME  ,  b.Name as Expression
    Into #Temp
    FROM INFORMATION_SCHEMA.COLUMNS a , @returnList b
    WHERE TABLE_NAME= @sViewName
    And (b.Name) like '%' + ( COLUMN_NAME) + '%'

    SELECT A.COLUMN_NAME as column_name,  
         Split.a.value('.', 'VARCHAR(100)') AS depends_on_column_name ,   @dependsTableName as depends_on_table_name
         Into #temp2
     FROM  
     (
         SELECT COLUMN_NAME,  
             CAST ('<M>' + REPLACE(Expression, '+', '</M><M>') + '</M>' AS XML) AS Data  
         FROM  #Temp
     ) AS A CROSS APPLY Data.nodes ('/M') AS Split(a); 

    SELECT b.column_name , a.COLUMN_NAME as depends_on_column_name , b.depends_on_table_name
    FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE a , #temp2 b
    WHERE VIEW_NAME= @sViewName
    and b.depends_on_column_name  like '%' + a.COLUMN_NAME + '%'

     drop table #Temp
     drop table #Temp2

 END

Test:-

exec psp_GetLevelDependsView 'vTest'

Result:-

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST


回答4:

I was playing around with this but didn't have time to go any further. Maybe this will help:

-- Returns all table columns called in the view and the objects they pull from

SELECT
     v.[name] AS ViewName
    ,d.[referencing_id] AS ViewObjectID 
    ,c.[name] AS ColumnNames
    ,OBJECT_NAME(d.referenced_id) AS ReferencedTableName
    ,d.referenced_id AS TableObjectIDsReferenced
FROM 
sys.views v 
INNER JOIN sys.sql_expression_dependencies d ON d.referencing_id = v.[object_id]
INNER JOIN sys.objects o ON d.referencing_id = o.[object_id]
INNER JOIN sys.columns c ON d.referenced_id = c.[object_id]
WHERE v.[name] = 'vTEST'

-- Returns all output columns in the view

SELECT 
     OBJECT_NAME([object_id]) AS ViewName
    ,[object_id] AS ViewObjectID
    ,[name] AS OutputColumnName
FROM sys.columns
WHERE OBJECT_ID('vTEST') = [object_id]

-- Get the view definition

SELECT 
    VIEW_DEFINITION
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'vTEST'


回答5:

Unfortunately, SQL Server does not explicitly store mapping between source table columns and view columns. I suspect the main reason is simply due to the potential complexity of views (expression columns, functions called on those columns, nested queries etc.).

The only way that I can think of to determine the mapping between view columns and source columns would be to either parse the query associated to the view or parse the execution plan of the view.

The approach I have outlined here focuses on the second option and relies on the fact that SQL Server will avoid generating output lists for columns not required by a query.

The first step is to get the list of dependent tables and their associated columns required for the view. This can be achieved via the standard system tables in SQL Server.

Next, we enumerate all of the view’s columns via a cursor.

For each view column, we create a temporary wrapper stored procedure that only selects the single column in question from view. Because only a single column is requested SQL Server will only retrieve the information needed to output that single view column.

The newly created procedure will run the query in format only mode and will therefore not cause any actual I/O operations on the database, but it will generate an estimated execution plan when executed. After the query plan is generate, we query the output lists from the execution plan. Since we know which view column was selected we can now associate the output list to view column in question. We can further refine the association by only associating columns that form part of our original dependency list, this will eliminate expression outputs from the result set.

Note that with this method if the view needs to join different tables together to generate the output then all columns required to generate the output will be returned even if it is not directly used in the column expression since it is still in directly required.

The following stored procedure demonstrates the above implementation method:

CREATE PROCEDURE ViewGetColumnDependencies
(
    @viewName   NVARCHAR(50)
)
AS
BEGIN

    CREATE TABLE #_suppress_output
    (
        result NVARCHAR(500) NULL
    );


    DECLARE @viewTableColumnMapping TABLE
    (
        [ViewName]                  NVARCHAR(50),
        [SourceObject]              NVARCHAR(50),
        [SourceObjectColumnName]    NVARCHAR(50),
        [ViewAliasColumn]           NVARCHAR(50)
    )


    -- Get list of dependent tables and their associated columns required for the view.
    INSERT INTO @viewTableColumnMapping
    (
        [ViewName]                  
        ,[SourceObject]             
        ,[SourceObjectColumnName]               
    )
    SELECT          v.[name] AS [ViewName]
                    ,'[' + OBJECT_NAME(d.referenced_major_id) + ']' AS [SourceObject]
                    ,c.[name] AS [SourceObjectColumnName]
    FROM            sys.views v
    LEFT OUTER JOIN sys.sql_dependencies d ON d.object_id = v.object_id
    LEFT OUTER JOIN sys.columns c ON c.object_id = d.referenced_major_id AND c.column_id = d.referenced_minor_id
    WHERE           v.[name] = @viewName;


    DECLARE @aliasColumn NVARCHAR(50);

    -- Next, we enumerate all of the views columns via a cursor. 
    DECLARE ViewColumnNameCursor CURSOR FOR
    SELECT              aliases.name AS [AliasName]
    FROM                sys.views v
    LEFT OUTER JOIN     sys.columns AS aliases  on v.object_id = aliases.object_id -- c.column_id=aliases.column_id AND aliases.object_id = object_id('vTEST')
    WHERE   v.name = @viewName;

    OPEN ViewColumnNameCursor  

    FETCH NEXT FROM ViewColumnNameCursor   
    INTO @aliasColumn  

    DECLARE @tql_create_proc NVARCHAR(MAX);
    DECLARE @queryPlan XML;

    WHILE @@FETCH_STATUS = 0  
    BEGIN 

        /*
        For each view column, we create a temporary wrapper stored procedure that 
        only selects the single column in question from view. The stored procedure 
        will run the query in format only mode and will therefore not cause any 
        actual I/O operations on the database, but it will generate an estimated 
        execution plan when executed.
        */
         SET @tql_create_proc = 'CREATE PROCEDURE ___WrapView
                                AS
                                    SET FMTONLY ON;
                                    SELECT CONVERT(NVARCHAR(MAX), [' + @aliasColumn + ']) FROM [' + @viewName + '];
                                    SET FMTONLY OFF;';

        EXEC (@tql_create_proc);

        -- Execute the procedure to generate a query plan. The insert into the temp table is only done to
        -- suppress the empty result set from being displayed as part of the output.
        INSERT INTO #_suppress_output
        EXEC ___WrapView;

        -- Get the query plan for the wrapper procedure that was just executed.
        SELECT  @queryPlan =   [qp].[query_plan]  
        FROM    [sys].[dm_exec_procedure_stats] AS [ps]
                JOIN [sys].[dm_exec_query_stats] AS [qs] ON [ps].[plan_handle] = [qs].[plan_handle]
                CROSS APPLY [sys].[dm_exec_query_plan]([qs].[plan_handle]) AS [qp]
        WHERE   [ps].[database_id] = DB_ID() AND  OBJECT_NAME([ps].[object_id], [ps].[database_id])  = '___WrapView'

        -- Drop the wrapper view
        DROP PROCEDURE ___WrapView

        /*
        After the query plan is generate, we query the output lists from the execution plan. 
        Since we know which view column was selected we can now associate the output list to 
        view column in question. We can further refine the association by only associating 
        columns that form part of our original dependency list, this will eliminate expression 
        outputs from the result set. 
        */
        ;WITH QueryPlanOutputList AS
        (
          SELECT    T.X.value('local-name(.)', 'NVARCHAR(max)') as Structure,
                    T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                    T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                    T.X.query('*') as SubNodes

          FROM @queryPlan.nodes('*') as T(X)
          UNION ALL 
          SELECT QueryPlanOutputList.structure + N'/' + T.X.value('local-name(.)', 'nvarchar(max)'),
                 T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                 T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                 T.X.query('*')
          FROM QueryPlanOutputList
          CROSS APPLY QueryPlanOutputList.SubNodes.nodes('*') as T(X)
        )
        UPDATE @viewTableColumnMapping
        SET     ViewAliasColumn = @aliasColumn
        FROM    @viewTableColumnMapping CM
        INNER JOIN  
                (
                    SELECT DISTINCT  QueryPlanOutputList.Structure
                                    ,QueryPlanOutputList.[SourceTable]
                                    ,QueryPlanOutputList.[SourceColumnName]
                    FROM    QueryPlanOutputList
                    WHERE   QueryPlanOutputList.Structure like '%/OutputList/ColumnReference'
                ) SourceColumns ON CM.[SourceObject] = SourceColumns.[SourceTable] AND CM.SourceObjectColumnName = SourceColumns.SourceColumnName

        FETCH NEXT FROM ViewColumnNameCursor   
        INTO @aliasColumn 
    END

    CLOSE ViewColumnNameCursor;
    DEALLOCATE ViewColumnNameCursor; 

    DROP TABLE #_suppress_output

    SELECT *
    FROM    @viewTableColumnMapping
    ORDER BY [ViewAliasColumn]

END

The stored procedure can now be executed as follow:

EXEC dbo.ViewGetColumnDependencies @viewName = 'vTEST'