I'm having some troubles to deal with a very big table in my database. Before to talk about the problem, let's talk about what i want to achieve.
I have two source tables :
- Source 1:
SALES_MAN (ID_SMAN, SM_LATITUDE, SM_LONGITUDE)
Source 2:
CLIENT (ID_CLIENT, CLATITUDE, CLONGITUDE)
Target:
DISTANCE (ID_SMAN, ID_CLIENT, SM_LATITUDE, SM_LONGITUDE, CLATITUDE, CLONGITUDE, DISTANCE)
The idea is to find the top N nearest SALES_MAN
for every client using a ROW_NUMBER
in the target table.
What I'm doing currently is calculating the distance between every client and every sales man :
INSERT INTO DISTANCE ([ID_SMAN], [ID_CLIENT], [DISTANCE],
[SM_LATITUDE], [SM_LONGITUDE], [CLATITUDE], [CLONGITUDE])
SELECT
[ID_SMAN], [ID_CLIENT],
geography::STGeomFromText('POINT('+IND_LATITUDE+' '+IND_LONGITUDE+')',4326).STDistance(geography::STGeomFromText('POINT('+DLR.[DLR_N_GPS_LATTITUDE]+' '+DLR.[DLR_N_GPS_LONGITUDE]+')',4326))/1000 as distance,
[SM_LATITUDE], [SM_LONGITUDE], [CLATITUDE], [CLONGITUDE]
FROM
[dbo].[SALES_MAN], [dbo].[CLIENT]
The DISTANCE
table contains approximately 1 milliards rows.
The second step to get my 5 nearest sales man per client is to run this query :
SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY ID_CLIENT ORDER BY DISTANCE) rang
FROM DISTANCE) TAB
WHERE rang < 6
The last query is really a consuming one. So to avoid the SORT operator I tried to create an sorted non clustered index in DISTANCE and ID_CLIENT but it did not work. I also tried to include all the needed columns in the both indexes.
But when I created a clustered index on DISTANCE
and keep the nonclustered sorted index in the ID_CLIENT
the things went better.
So what a nonclustered sorting index is not working in this case?
But when I use the clustered index, I have other problem in loading data and I'm kind of forced to delete it before starting the loading process.
So what do you think? And how we can deal with this kind of tables to be able to select, insert or update data without having performance issues ?
Many thanks