I need to look up an ID in a very large database. The ID is:
0167a901-e343-4745-963c-404809b74dd9
The database has hundreds of tables, and millions of rows in the big tables.
I can narrow the date to within the last 2 or 3 months, but that's about it. I'm looking for any clues as to how to narrow down searches like this.
One thing I'm curious about is whether using LIKE
searches helps.
i.e does it help to do something like
select top 10 *
from BIG_TABLE
where DESIRED_ID like '016%'
Any tips/suggestions are greatly appreciated . The database is being accessed remotely so that's part of the challenge
I have this script that I built several years ago for a similar purpose, albeit with text fields. It finds eligible columns, and then searches through those columns for the value. As you have a non-deterministic scope, you may not be able to do better than something like this.
You may want to tweak it a bit to include uniqueidentifier columns - if that is actually the datatype - or use an equal instead of a like search.
If this is something you are going to reuse periodically, you could feed it a list of common tables or columns to find these things in, so it doesnt take as long to find things.
/*This script will find any text value in the database*/
/*Output will be directed to the Messages window. Don't forget to look there!!!*/
SET NOCOUNT ON
DECLARE @valuetosearchfor varchar(128), @objectOwner varchar(64)
SET @valuetosearchfor = '%putYourGuidHere%' --should be formatted as a like search
SET @objectOwner = 'dbo'
DECLARE @potentialcolumns TABLE (id int IDENTITY, sql varchar(4000))
INSERT INTO @potentialcolumns (sql)
SELECT
('if exists (select 1 from [' +
[tabs].[table_schema] + '].[' +
[tabs].[table_name] +
'] (NOLOCK) where [' +
[cols].[column_name] +
'] like ''' + @valuetosearchfor + ''' ) print ''SELECT * FROM [' +
[tabs].[table_schema] + '].[' +
[tabs].[table_name] +
'] (NOLOCK) WHERE [' +
[cols].[column_name] +
'] LIKE ''''' + @valuetosearchfor + '''''' +
'''') as 'sql'
FROM information_schema.columns cols
INNER JOIN information_schema.tables tabs
ON cols.TABLE_CATALOG = tabs.TABLE_CATALOG
AND cols.TABLE_SCHEMA = tabs.TABLE_SCHEMA
AND cols.TABLE_NAME = tabs.TABLE_NAME
WHERE cols.data_type IN ('char', 'varchar', 'nvchar', 'nvarchar','text','ntext')
AND tabs.table_schema = @objectOwner
AND tabs.TABLE_TYPE = 'BASE TABLE'
AND (cols.CHARACTER_MAXIMUM_LENGTH >= (LEN(@valueToSearchFor) - 2) OR cols.CHARACTER_MAXIMUM_LENGTH = -1)
ORDER BY tabs.table_catalog, tabs.table_name, cols.ordinal_position
DECLARE @count int
SET @count = (SELECT MAX(id) FROM @potentialcolumns)
PRINT 'Found ' + CAST(@count as varchar) + ' potential columns.'
PRINT 'Beginning scan...'
PRINT ''
PRINT 'These columns contain the values being searched for...'
PRINT ''
DECLARE @iterator int, @sql varchar(4000)
SET @iterator = 1
WHILE @iterator <= (SELECT Max(id) FROM @potentialcolumns)
BEGIN
SET @sql = (SELECT [sql] FROM @potentialcolumns where [id] = @iterator)
IF (@sql IS NOT NULL) and (RTRIM(LTRIM(@sql)) <> '')
BEGIN
--SELECT @sql --use when checking sql output
EXEC (@sql)
END
SET @iterator = @iterator + 1
END
PRINT ''
PRINT 'Scan completed'
If that looks wonky, the script is executing a statement like this
if exists (select 1 from [schema].[table_name] (NOLOCK)
where [column_name] LIKE '%yourValue%')
begin
print select * from [schema].[table_name] (NOLOCK)
where [column_name] LIKE '%yourValue%'
end
...and just replacing the [schema]
, [table_name]
, [column_name]
and %yourValue%
in a loop.
Its filtering on...
- tables in a specific schema (filter can be removed)
- only tables, not views (can be adjusted)
- only columns that will hold the search value
- the
(n)char
/(n)varchar
/(n)text
data types (add or change, be cognizant of data type conversion)
Lastly, output does not go to the results grid. Check the Messages window (where you see "N rows affected")
First of all what is the requirement why do you need specific value form whoel database.It looks like a one time job to find the value and based on that you will take some action.But it can be time and resource consuming.
Anyway,it looks like a guid column. There is no way to speed it up unless all the guid columns have indexes.
Anyway here is a small query which will generate select statement on all tables which has any guid column(if this is some varchar column then it is very difficult as you have to write the query on each column iof each table and you can write that but i do not see that would be efficient).
However, most important thing is that output is ordered based on following..If there is index and leading key on the guid column then those tables are listed first.
Then tables are listed based on the datapages so that the query uses minimum resources. Thus if your guid value is in the first few tables it will be very fast.If it will be in last table it will take time based on the size of the tables and thus could take lots of time.
Also, declare a cursor on this query and then execute the statement one by one and as soon as you find the value come out of the cursor loop as guid is unique value.This will be much efficient.
select * from (
select 'select ' + ac.name +' from ' + OBJECT_SCHEMA_NAME(ac.object_id) + '.' + OBJECT_NAME(ac.object_id) + ' where ' + ac.name + '=''29490167a901-e343-4745-963c-404809b74dd9''' as querytext
--,*
,isnull(cnt,0) as numberofrows,
ROW_NUMBER() over(order by case when ic.key_ordinal = 0 then 0 else 1 end asc, isnull(si.dpages,si_1.dpages) asc) as rn,isnull(si.dpages,si_1.dpages) datapages
from sys.all_columns ac
inner join sys.all_objects ao on ac.object_id = ao.object_id
left join sys.index_columns ic on ac.object_id=ic.object_id
and ac.column_id =ic.column_id
left join sys.sysindexes si on ac.object_id = si.id and ic.index_id=si.indid
outer apply (select SUM(rows) from sys.partitions p where ac.object_id = p.object_id and index_id in (0,1) ) a(cnt)
left join sys.sysindexes si_1 on si_1.id =ac.object_id and si_1.indid in (0,1)
where system_type_id =36
and ao.type ='U'
) dta order by rn asc
go
Set DESIRED_ID
part of the Index
.
If there is no Index
on this table, database engine performs Table scan
and reads every row
to check if DESIRED_ID
is like '016' and make sure that Proper indexing
always results in considerable increase in Performance
CREATE INDEX NameIndex ON TableName(ColumnName ASC)
INCLUDE (ColumnName2)
Implementing Index
will search the record starting from 016 till like 017 or like 02 or like 1. Whatever it finds first and stops the search afterwards.
While preparing a Dynamic query
to find the data in all table for sample GUID value. You can use below query to find the data in column name under Particular table.
select * from sys.columns where name = 'ColumnName' OBJECT_ID =
(Select OBJECT_ID From sys.tables Where name = 'Object Name')