Script to determine if a stored procedure is reado

2019-08-29 01:10发布

问题:

I have a requirement to audit all of our stored procedures, thousands of them, and determine which are readonly, or read-write. I was wondering if anyone knows of a good way to do it accurately.

I have written my own script so far, but I only get ~85% accuracy. I trip up on the stored procedures that are really readonly but they create a few temp tables. For my purposes these are readonly. I cannot just ignore these either, because there are a lot of read-write procedures working with temp tables too.

[EDIT] I got to roughly ~85% accuracy by looking at 20 procedures I know that are pretty complex and comparing them to the results I got from the query.

Here is the query I am currently using:

CREATE TABLE tempdb.dbo.[_tempProcs]
(objectname varchar(150), dbname varchar(150), ROUTINE_DEFINITION varchar(4000))
GO
EXEC sp_MSforeachdb 
'USE [?]
DECLARE @dbname VARCHAR(200)
SET @dbname = DB_NAME()
IF 1 = 1 AND ( @dbname NOT IN (''master'',''model'',''msdb'',''tempdb'',''distribution'') 
BEGIN
EXEC(''
INSERT INTO tempdb.dbo.[_tempProcs](objectname, dbname, ROUTINE_DEFINITION)
SELECT ROUTINE_NAME AS ObjectName, ''''?'''' AS dbname, ROUTINE_DEFINITION
FROM [?].INFORMATION_SCHEMA.ROUTINES WITH(NOLOCK) 
WHERE ROUTINE_DEFINITION LIKE ''''%INSERT [^]%'''' 
    OR ROUTINE_DEFINITION LIKE ''''%UPDATE [^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%INTO [^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%DELETE [^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%CREATE TABLE[^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%DROP [^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%ALTER [^]%''''
    OR ROUTINE_DEFINITION LIKE ''''%TRUNCATE [^]%''''
    AND ROUTINE_TYPE=''''PROCEDURE''''
'')
END
'
GO
SELECT * FROM tempdb.dbo.[_tempProcs] WITH(NOLOCK) 

I have not refined it yet, at the moment I just want to focus on the writeable queries and see if I can get it accurate. Also one other issue is that the ROUTINE_DEFINITION only gives the first 4000 characters, so I might miss any that are writing after the 4000 char length. I might actually end up with a combination of suggestions. Get a list of procs that this query returns, and then further try Arrons suggestion and see if I can weed out even more. I would be happy with 95% accuracy.

I will give this another day or so to see if I can get any further suggestions, but thank you very much so far.

[FINAL EDIT] Ok, here is what I ended up doing, and it looks like I am getting at least 95% accuracy, could be higher. I have attempted to cater for any scenario that I could come up with.

I scripted out the stored procedures to files, and wrote a c# winform application to parse over the files and find the ones that have legitimate 'writes' to the real database.

I am happy to post this code for the state engine I used up here, but comes with no gurantees. I am under pressure to deliver, and really did not have time to beautify the code, and refactor with nice variable names etc and put nice comments in it either, I had 3 hours to do it, and I just squeezed it in, so for those who cares, and might help in the future, here it is:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace SQLParser
{
    public class StateEngine
    {
        public static class CurrentState
        {
            public static bool IsInComment;
            public static bool IsInCommentBlock;
            public static bool IsInInsert;
            public static bool IsInUpdate;
            public static bool IsInDelete;
            public static bool IsInCreate;
            public static bool IsInDrop;
            public static bool IsInAlter;
            public static bool IsInTruncate;
            public static bool IsInInto;
        }

        public class ReturnState
        {
            public int LineNumber { get; set; }
            public bool Value { get; set; }
            public string Line { get; set; }
        }

        private static int _tripLine = 0;
        private static string[] _lines;

        public ReturnState ParseFile(string fileName)
        {
            var retVal = false;
            _tripLine = 0;
            ResetCurrentState();

            _lines = File.ReadAllLines(fileName);

            for (int i = 0; i < _lines.Length; i++)
            {
                retVal = ParseLine(_lines[i], i);

                //return true the moment we have a valid case
                if (retVal)
                {
                    ResetCurrentState();
                    return new ReturnState() { LineNumber = _tripLine, Value = retVal, Line = _lines[_tripLine] };
                }
            }

            if (CurrentState.IsInInsert ||
                CurrentState.IsInDelete ||
                CurrentState.IsInUpdate ||
                CurrentState.IsInDrop ||
                CurrentState.IsInAlter ||
                CurrentState.IsInTruncate)
            {
                retVal = true;
                ResetCurrentState();
                return new ReturnState() { LineNumber = _tripLine, Value = retVal, Line = _lines[_tripLine] };
            }

            return new ReturnState() { LineNumber = -1, Value = retVal };
        }

        private static void ResetCurrentState()
        {
            CurrentState.IsInAlter = false;
            CurrentState.IsInCreate = false;
            CurrentState.IsInDelete = false;
            CurrentState.IsInDrop = false;
            CurrentState.IsInInsert = false;
            CurrentState.IsInTruncate = false;
            CurrentState.IsInUpdate = false;
            CurrentState.IsInInto = false;
            CurrentState.IsInComment = false;
            CurrentState.IsInCommentBlock = false;
        }

        private static bool ParseLine(string sqlLine, int lineNo)
        {
            var retVal = false;
            var _currentWord = 0;
            var _tripWord = 0;
            var _offsetTollerance = 4;

            sqlLine = sqlLine.Replace("\t", " ");

            //This would have been set in previous line, so reset it
            if (CurrentState.IsInComment)
                CurrentState.IsInComment = false;
            var words = sqlLine.Split(char.Parse(" ")).Where(x => x.Length > 0).ToArray();
            for (int i = 0; i < words.Length; i++)
            {
                if (string.IsNullOrWhiteSpace(words[i]))
                    continue;

                _currentWord += 1;

                if (CurrentState.IsInCommentBlock && words[i].EndsWith("*/") || words[i] == "*/") { CurrentState.IsInCommentBlock = false; }
                if (words[i].StartsWith("/*")) { CurrentState.IsInCommentBlock = true; }
                if (words[i].StartsWith("--") && !CurrentState.IsInCommentBlock) { CurrentState.IsInComment = true; }

                if (words[i].Length == 1 && CurrentState.IsInUpdate)
                {
                    //find the alias table name, find 'FROM' and then next word
                    var tempAlias = words[i];
                    var tempLine = lineNo;

                    for (int l = lineNo; l < _lines.Length; l++)
                    {
                        var nextWord = "";
                        var found = false;

                        var tempWords = _lines[l].Replace("\t", " ").Split(char.Parse(" ")).Where(x => x.Length > 0).ToArray();

                        for (int m = 0; m < tempWords.Length; m++)
                        {
                            if (found) { break; }

                            if (tempWords[m].ToLower() == tempAlias && tempWords[m - m == 0 ? m : 1].ToLower() != "update")
                            {
                                nextWord = m == tempWords.Length - 1 ? "" : tempWords[m + 1].ToString();
                                var prevWord = m == 0 ? "" : tempWords[m - 1].ToString();
                                var testWord = "";

                                if (nextWord.ToLower() == "on" || nextWord == "")
                                {
                                    testWord = prevWord;
                                }
                                if (prevWord.ToLower() == "from")
                                {
                                    testWord = nextWord;
                                }

                                found = true;

                                if (testWord.StartsWith("#") || testWord.StartsWith("@"))
                                {
                                    ResetCurrentState();
                                }

                                break;
                            }
                        }
                        if (found) { break; }
                    }
                }

                if (!CurrentState.IsInComment && !CurrentState.IsInCommentBlock)
                {
                    #region SWITCH

                    if (words[i].EndsWith(";"))
                    {
                        retVal = SetStateReturnValue(retVal);
                        ResetCurrentState();
                        return retVal;
                    }


                    if ((CurrentState.IsInCreate || CurrentState.IsInDrop && (words[i].ToLower() == "procedure" || words[i].ToLower() == "proc")) && (lineNo > _tripLine ? 1000 : _currentWord - _tripWord) < _offsetTollerance)
                        ResetCurrentState();

                    switch (words[i].ToLower())
                    {
                        case "insert":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInInsert = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "update":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInUpdate = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "delete":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInDelete = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "into":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            //retVal = SetStateReturnValue(retVal, lineNo);
                            //if (retVal)
                            //    return retVal;

                            CurrentState.IsInInto = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "create":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInCreate = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "drop":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInDrop = true;
                            _tripLine = lineNo;
                            continue;

                        case "alter":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInAlter = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            continue;

                        case "truncate":
                            //assume that we have parsed all lines/words and got to next keyword, so return previous state
                            retVal = SetStateReturnValue(retVal);
                            if (retVal)
                                return retVal;

                            CurrentState.IsInTruncate = true;
                            _tripLine = lineNo;
                            _tripWord = _currentWord;
                            break;

                        default:
                            break;

                    }

                    #endregion

                    if (CurrentState.IsInInsert || CurrentState.IsInDelete || CurrentState.IsInUpdate || CurrentState.IsInDrop || CurrentState.IsInAlter || CurrentState.IsInTruncate || CurrentState.IsInInto)
                    {
                        if ((words[i].StartsWith("#") || words[i].StartsWith("@") || words[i].StartsWith("dbo.#") || words[i].StartsWith("dbo.@")) && (lineNo > _tripLine ? 1000 : _currentWord - _tripWord) < _offsetTollerance)
                        {
                            ResetCurrentState();
                            continue;
                        }

                    }

                    if ((CurrentState.IsInInsert || CurrentState.IsInInto || CurrentState.IsInUpdate) && (((_currentWord != _tripWord) && (lineNo > _tripLine ? 1000 : _currentWord - _tripWord) < _offsetTollerance) || (lineNo > _tripLine)))
                    {
                        retVal = SetStateReturnValue(retVal);
                        if (retVal)
                            return retVal;
                    }

                }
            }

            return retVal;
        }

        private static bool SetStateReturnValue(bool retVal)
        {
            if (CurrentState.IsInInsert ||
                CurrentState.IsInDelete ||
                CurrentState.IsInUpdate ||
                CurrentState.IsInDrop ||
                CurrentState.IsInAlter ||
                CurrentState.IsInTruncate)
            {
                retVal = (CurrentState.IsInInsert ||
                CurrentState.IsInDelete ||
                CurrentState.IsInUpdate ||
                CurrentState.IsInDrop ||
                CurrentState.IsInAlter ||
                CurrentState.IsInTruncate);
            }
            return retVal;
        }

    }
}

USAGE

var fileResult = new StateEngine().ParseFile(*path and filename*);

回答1:

You could try combining sys.sql_modules with a word-parsing table-valued function. EDIT: renamed UDF to fnParseSQLWords, which identifies comments EDIT: added a condition to the RIGHT line and changed all varchar to nvarchar EDIT: Added and w.id > 1; to the main select statement to avoid hits on the leading CREATE PROC when filtering on CREATE.

create function [dbo].[fnParseSQLWords](@str nvarchar(max), @delimiter nvarchar(30)='%[^a-zA-Z0-9\_]%')
returns @result table(id int identity(1,1), bIsComment bit, word nvarchar(max))
begin
    if left(@delimiter,1)<>'%' set @delimiter='%'+@delimiter;
    if right(@delimiter,1)<>'%' set @delimiter+='%';
    set @str=rtrim(@str);
    declare @pi int=PATINDEX(@delimiter,@str);
    declare @s2 nvarchar(2)=substring(@str,@pi,2);
    declare @bLineComment bit=case when @s2='--' then 1 else 0 end;
    declare @bBlockComment bit=case when @s2='/*' then 1 else 0 end;

    while @pi>0 begin       
        insert into @result select case when (@bLineComment=1 or @bBlockComment=1) then 1 else 0 end
            , LEFT(@str,@pi-1) where @pi>1;
        set @s2=substring(@str,@pi,2);
        set @str=RIGHT(@str,len(@str)-@pi);
        set @pi=PATINDEX(@delimiter,@str);
        set @bLineComment=case when @s2='--' then 1 else @bLineComment end;
        set @bBlockComment=case when @s2='/*' then 1 else @bBlockComment end;
        set @bLineComment=case when left(@s2,1) in (char(10),char(13)) then 0 else @bLineComment end;
        set @bBlockComment=case when @s2='*/' then 0 else @bBlockComment end;
    end

    insert into @result select case when (@bLineComment=1 or @bBlockComment=1) then 1 else 0 end
        , @str where LEN(@str)>0;
    return;
end
GO

-- List all update procedures
select distinct ProcName=p.name --, w.id, w.bIsComment, w.word
from sys.sql_modules m
inner join sys.procedures p on p.object_id=m.object_id
cross apply dbo.fnParseSQLWords(m.[definition], default) w
where w.word in ('INSERT','UPDATE','DELETE','INTO','CREATE','DROP','ALTER','TRUNCATE')
and w.bIsComment=0
and w.id > 1;
GO


回答2:

SQL Server doesn't store any property, attribute or other metadata that dictates whether a stored procedure performs any write operations. I'd say you can weed out any stored procedures that don't contain strings like:

INTO
CREATE%TABLE
DELETE
INSERT
UPDATE
TRUNCATE
OUTPUT

This is not an exhaustive list, just a few off the cuff. But of course this will have several false positives because some of the procedures left might have those words naturally (e.g. a stored procedure called "GetIntolerables"). You'll have to perform some manual analysis on the ones that remain to determine if, in fact, those keywords are used as intended or if they're just a side effect. You'll also not be able to tell, really, whether a procedure that creates a #temp table does so only for reading purposes (and even though you've explained this a bit in your question, I'm not clear whether that is a "hit" or not).

In SQL Server 2012 you can get a little closer, or at least identify stored procedures that don't return a result set (the implication being that they must do something else). You can write a dynamic query like this:

SELECT QUOTENAME(OBJECT_SCHEMA_NAME(p.[object_id])) + '.' + QUOTENAME(p.name)
FROM sys.procedures AS p OUTER APPLY
sys.dm_exec_describe_first_result_set_for_object(p.[object_id], 1) AS d
WHERE d.name IS NULL;

One issue with this is that if your procedure has any branching in it that depend on input parameters, time of day, system state, data in a table, etc. then it may not accurately reflect what it does. But it may help whittle the list down a little. It may also return false positives for stored procedures that insert into a table and return the identity value using SELECT, that kind of thing.

In earlier versions you can do something similar with SET FMTONLY ON, but in this case you will have to execute all of the procedures, and this will be cumbersome to do because you'll also need to know about any required parameters (both in and out) and set those up accordingly. The evaluation process is much more manual, and it is still prone to the parameter issue described above.

What method are you using now to get to 85%? What are you going to do with the information once you have your two (or three?) lists?

I can't really see any shortcuts to this. In an ideal world, your naming convention would dictate that stored procedures should be named accurately for what they do, and you should be able to tell them apart immediately (with some being borderline). As it stands, it seems you are looking at a traffic camera and trying to determine which cars might have guns under the driver's seat.



回答3:

There are some keywords you can check for in sys.sql_modules:

  • UPDATE
  • INSERT
  • INTO
  • DELETE
  • CREATE
  • DROP
  • ALTER
  • TRUNCATE

If it doesn't contain ANY of these, I can't think of a way it's writing to a database unless its through another sub-proc or function (which would contain one of those words).

You'll need to check individually after that to make sure it's not a #temp table though. You will also need to do a second pass to keep finding objects that contain those in other objects.



回答4:

a radical solution would be to parse all procedures and insert a call to a function that creates a snapshot of the database on the first line. the last line would create another one and compare it with the first. if they are different, you made a call to a write procedure. off course, you cannot do that in production and you will either have to call all your test-cases on them or you replay an sql-server log.

i wouldn't think too long about this, tho...