I have a SQL table with the two following columns:
FORMAT Sample
GT:AD:DP:GQ:PL 0/0:233,0:233:99:0,120,1800
GT:AD:DP:GQ:PL 0/1:101,61:220:99:835,0,1859
GT:AD:DP:GQ:PL 0/0:172,0:172:99:0,120,1800
GT:AD:DP:GQ:PL 0/0:216,0:216:99:0,120,1800
GT:AD:DP:GQ:PL 0/0:216,0:216:99:0,120,1800
GT:AD:DP:GQ:PGT:PID:PL 0/1:185,232:417:99:0|1:8029494_T_G:8670,0,6429
GT:AD:DP:GQ:PL 0/0:367,0:367:99:0,120,1800
GT:AD:DP:GQ:PGT:PID:PL 0/1:150,198:348:99:0|1:8029494_T_G:7930,0,5677
GT:AD:DP:GQ:PGT:PID:PL 0/1:148,196:344:99:0|1:8029494_T_G:7876,0,5652
GT:AD:DP:GQ:PGT:PID:PL 0/0:148,0:344:99:0|1:8029494_T_G:7876,8334,14591
GT:AD:DP:GQ:PGT:PID:PL 0/0:148,0:344:99:0|1:8029494_T_G:7876,8334,14591
The FORMAT column specifies the IDs for the fields that are given in the following column splitted by ":".
I would like to extract specific fields from the second column based on the ID/position from the FORMAT column, i.e. AD (2nd), DP (3rd) or GQ (4th).
I was able to extract the AD field with the following code:
SELECT SUBSTRING(Sample, CHARINDEX(':',Sample)+1, CHARINDEX(':',Sample,5)-5) FROM Table 1;
The problem is that I am not able to extract the fields DP or GQ, since the length of the different fields is not always the same one and I cannot specify which should be the starting position to search for the following ":" location.
I also tried to use the Split function from this website:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648
The problem is that I do not know how to declare a column as a variable so that I can extract the required field for every single row of the table.
The desired output for the [Sample] column should look like this:
GT AD DP GQ
0/0 233,0 233 99
0/1 101,61 220 99
0/0 172,0 172 99
0/0 216,0 216 99
0/0 216,0 216 99
0/1 185,232 417 99
0/0 367,0 367 99
0/1 150,198 348 99
0/1 148,196 344 99
0/0 148,0 344 99
0/0 148,0 344 99
Any help would be appreciated,
Thanks,
Perhaps a little XML as the parser
Example
Returns
Or a Simple version
Or if Open to a UDF
Take a peek at TSQL/SQL Server - table function to parse/split delimited string to multiple/separate columns