Searching for a specific ID in a large database?

2019-02-20 11:25发布

问题:

I need to look up an ID in a very large database. The ID is:

0167a901-e343-4745-963c-404809b74dd9

The database has hundreds of tables, and millions of rows in the big tables.

I can narrow the date to within the last 2 or 3 months, but that's about it. I'm looking for any clues as to how to narrow down searches like this.

One thing I'm curious about is whether using LIKE searches helps.

i.e does it help to do something like

select top 10 * 
from BIG_TABLE
where DESIRED_ID like '016%'

Any tips/suggestions are greatly appreciated . The database is being accessed remotely so that's part of the challenge

回答1:

I have this script that I built several years ago for a similar purpose, albeit with text fields. It finds eligible columns, and then searches through those columns for the value. As you have a non-deterministic scope, you may not be able to do better than something like this.

You may want to tweak it a bit to include uniqueidentifier columns - if that is actually the datatype - or use an equal instead of a like search.

If this is something you are going to reuse periodically, you could feed it a list of common tables or columns to find these things in, so it doesnt take as long to find things.

/*This script will find any text value in the database*/
/*Output will be directed to the Messages window. Don't forget to look there!!!*/

SET NOCOUNT ON
DECLARE @valuetosearchfor varchar(128), @objectOwner varchar(64)
SET @valuetosearchfor = '%putYourGuidHere%' --should be formatted as a like search 
SET @objectOwner = 'dbo'

DECLARE @potentialcolumns TABLE (id int IDENTITY, sql varchar(4000))

INSERT INTO @potentialcolumns (sql)
SELECT 
    ('if exists (select 1 from [' +
    [tabs].[table_schema] + '].[' +
    [tabs].[table_name] + 
    '] (NOLOCK) where [' + 
    [cols].[column_name] + 
    '] like ''' + @valuetosearchfor + ''' ) print ''SELECT * FROM [' +
    [tabs].[table_schema] + '].[' +
    [tabs].[table_name] + 
    '] (NOLOCK) WHERE [' + 
    [cols].[column_name] + 
    '] LIKE ''''' + @valuetosearchfor + '''''' +
    '''') as 'sql'
FROM information_schema.columns cols
    INNER JOIN information_schema.tables tabs
        ON cols.TABLE_CATALOG = tabs.TABLE_CATALOG
            AND cols.TABLE_SCHEMA = tabs.TABLE_SCHEMA
            AND cols.TABLE_NAME = tabs.TABLE_NAME
WHERE cols.data_type IN ('char', 'varchar', 'nvchar', 'nvarchar','text','ntext')
    AND tabs.table_schema = @objectOwner
    AND tabs.TABLE_TYPE = 'BASE TABLE'
    AND (cols.CHARACTER_MAXIMUM_LENGTH >= (LEN(@valueToSearchFor) - 2) OR cols.CHARACTER_MAXIMUM_LENGTH = -1)
ORDER BY tabs.table_catalog, tabs.table_name, cols.ordinal_position

DECLARE @count int
SET @count = (SELECT MAX(id) FROM @potentialcolumns)
PRINT 'Found ' + CAST(@count as varchar) + ' potential columns.'
PRINT 'Beginning scan...'
PRINT ''
PRINT 'These columns contain the values being searched for...'
PRINT ''
DECLARE @iterator int, @sql varchar(4000)
SET @iterator = 1
WHILE @iterator <= (SELECT Max(id) FROM @potentialcolumns)
BEGIN
    SET @sql = (SELECT [sql] FROM @potentialcolumns where [id] = @iterator)
    IF (@sql IS NOT NULL) and (RTRIM(LTRIM(@sql)) <> '')
    BEGIN
        --SELECT @sql --use when checking sql output
        EXEC (@sql)
    END
    SET @iterator = @iterator + 1
END

PRINT ''
PRINT 'Scan completed'

If that looks wonky, the script is executing a statement like this

if exists (select 1 from [schema].[table_name] (NOLOCK) 
                    where [column_name] LIKE '%yourValue%')
begin
   print select * from [schema].[table_name] (NOLOCK) 
                    where [column_name] LIKE '%yourValue%'
end

...and just replacing the [schema], [table_name], [column_name] and %yourValue% in a loop.

Its filtering on...

  • tables in a specific schema (filter can be removed)
  • only tables, not views (can be adjusted)
  • only columns that will hold the search value
  • the (n)char/(n)varchar/(n)text data types (add or change, be cognizant of data type conversion)

Lastly, output does not go to the results grid. Check the Messages window (where you see "N rows affected")



回答2:

First of all what is the requirement why do you need specific value form whoel database.It looks like a one time job to find the value and based on that you will take some action.But it can be time and resource consuming.

Anyway,it looks like a guid column. There is no way to speed it up unless all the guid columns have indexes.

Anyway here is a small query which will generate select statement on all tables which has any guid column(if this is some varchar column then it is very difficult as you have to write the query on each column iof each table and you can write that but i do not see that would be efficient).

However, most important thing is that output is ordered based on following..If there is index and leading key on the guid column then those tables are listed first. Then tables are listed based on the datapages so that the query uses minimum resources. Thus if your guid value is in the first few tables it will be very fast.If it will be in last table it will take time based on the size of the tables and thus could take lots of time.

Also, declare a cursor on this query and then execute the statement one by one and as soon as you find the value come out of the cursor loop as guid is unique value.This will be much efficient.

    select * from (
select 'select '  + ac.name +' from ' + OBJECT_SCHEMA_NAME(ac.object_id) + '.' + OBJECT_NAME(ac.object_id) + ' where ' + ac.name + '=''29490167a901-e343-4745-963c-404809b74dd9''' as querytext
--,*
,isnull(cnt,0) as numberofrows,
ROW_NUMBER() over(order by case when ic.key_ordinal = 0 then 0 else 1 end asc, isnull(si.dpages,si_1.dpages) asc) as rn,isnull(si.dpages,si_1.dpages) datapages
from sys.all_columns ac
inner join sys.all_objects ao on ac.object_id = ao.object_id
left join sys.index_columns ic on ac.object_id=ic.object_id
and ac.column_id =ic.column_id 
left join sys.sysindexes si on ac.object_id = si.id and ic.index_id=si.indid
outer apply (select SUM(rows) from sys.partitions p where ac.object_id = p.object_id and index_id in (0,1) ) a(cnt)
left join sys.sysindexes si_1 on si_1.id =ac.object_id and si_1.indid in (0,1)
where system_type_id =36
and ao.type ='U'
) dta order by rn asc
go


回答3:

Set DESIRED_ID part of the Index.

If there is no Index on this table, database engine performs Table scan and reads every row to check if DESIRED_ID is like '016' and make sure that Proper indexing always results in considerable increase in Performance

CREATE INDEX NameIndex ON TableName(ColumnName ASC) 
INCLUDE (ColumnName2) 

Implementing Index will search the record starting from 016 till like 017 or like 02 or like 1. Whatever it finds first and stops the search afterwards.

While preparing a Dynamic query to find the data in all table for sample GUID value. You can use below query to find the data in column name under Particular table.

select * from sys.columns where name = 'ColumnName' OBJECT_ID = 
(Select OBJECT_ID From sys.tables Where name = 'Object Name')