This is my sample data set...
CREATE TABLE blockhashtable (
id SERIAL PRIMARY KEY
,pos int
,filehash varchar(35)
,blockhash varchar(130)
);
insert into blockhashtable
(pos,filehash,blockhash) values
(1, "randommd51", "randstr1"),
(2, "randommd51", "randstr2"),
(3, "randommd51", "randstr3"),
(1, "randommd52", "randstr2"),
(2, "randommd52", "randstr2"),
(3, "randommd52", "randstr1"),
(4, "randommd52", "randstr7"),
(1, "randommd53", "randstr2"),
(2, "randommd53", "randstr1"),
(3, "randommd53", "randstr2"),
(4, "randommd53", "randstr3"),
(1, "randommd54", "randstr4"),
(2, "randommd54", "randstr55");
...and fiddle of same http://sqlfiddle.com/#!9/e5b201/14
This is my current SQL query and output:
select pos,filehash,avg( (blockhash in ('randstr1', 'randstr2', 'randstr3') )) as matching_ratio from blockhashtable group by filehash;
pos filehash matching_ratio
1 randommd51 1
1 randommd52 0.75
1 randommd53 1
1 randommd54 0
My expected output is something like this this:
pos filehash matching_ratio
1,2 randommd51 1
1,3 randommd52 0.5
1,2,4 randommd53 0.75
0 randommd54 0
The pos
in last row
can be 1
also, I can remove it using a custom condition in python later.
Basically, in my python list, randstr2
only repeat one time, so I want only maximum one match found in the SQL query. That's why matching_ratio
is different in my expected output.
I don't see how your result set corresponds to your data set, but you seem to be after something like this...