How to filter a set of (int, str) tuples, to retur

Suppose I have a set of tuples representing URLS with "scores":

{(0.75, 'http://www.foo.com'), (0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com'), (0.66, 'http://www.bar.com')}.

What is a concise way for me to filter out duplicate URLS, returning only the URL with the lowest score? That is, from the example set above, I want to get the following set, where each URL appears only once, with the lowest corresponding score from the original set:

{(0.5, 'http://www.foo.com'),(0.33, 'http://www.bar.com')}

I came up with the following solution:

from collections import defaultdict

seen = defaultdict(lambda:1)
for score, url in s:
    if score < seen[url]:
        seen[url] = score

filtered = {(v,k) for k,v in seen.items()}

... but I feel like there is probably some simpler and more efficient way to do this without using the intermediary dict to keep track of the max element, and then regenerate the set from that. What is the best way to filter a set of tuples by the min/max of the first element?

标签： python filter set tuples min

4条回答

姐就是有狂的资本

2楼-- · 2020-03-26 05:15

A very simple approach:

L=sorted(s,key=lambda t: (t[1],t[0]))
[L[0]] + [L[i] for i in range(1,len(L)) if L[i][1]!=L[i-1][1]]

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2020-03-26 05:20

Another solution:

seen = {}
for score, url in s:
    if seen.setdefault(url, score) > score:
        seen[url] = score
filtered = {(v,k) for k,v in seen.items()}
print(filtered)

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2020-03-26 05:21

You've already implemented the simplest approach I can think of. The only change I'd make would be to the loop—a slightly more concise version is using min.

seen = defaultdict(lambda: 1)  # `lambda: float('inf')` if scores can be > 1
for score, url in s:
    seen[url] = min(seen[url], score)

{(v,k) for k,v in seen.items()}
# {(0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com')}

If you really want a shorter solution, like I said, it isn't the simplest approach, but it is a one liner. Most of the challenge is interchanging the URL and the score so you can use the URL as a key when dropping duplicates. It goes without saying that sorting is a pre-condition here (that's why I don't like this solution as much as the one above).

{(v, k) for k, v in dict(sorted(((v, k) for k, v in s), reverse=True)).items()}
# {(0.33, 'http://www.bar.com'), (0.5, 'http://www.foo.com')}

This solution becomes so much shorter if s looks like this:

s2 = {(v,k) for k, v in s}
s2 
# {('http://www.bar.com', 0.33), ('http://www.bar.com', 0.66), ...}

You'd only then need to do

list(dict(sorted(s2, reverse=True)).items())
# [('http://www.foo.com', 0.5), ('http://www.bar.com', 0.33)]

0人赞添加讨论(0) 举报

仙女界的扛把子

5楼-- · 2020-03-26 05:24

Without any tricks or additional code for reuse you are pretty close. I came up with something similar that's a little cleaner in my opinion:

seen = set()
filtered = []
for score, url in sorted(urls):
    if url in seen:
        continue
    filtered.append((score, url))
    seen.add(url)

You can also utilize other libraries, such as boltons. You can use the unique method like so:

import operator
from boltons.iterutils import unique
filtered = unique(sorted(urls), key=operator.itemgetter(1))

Update: if the tuples have all relevant scoring as the first elements, this solution would work for arbitrary length of tuples (assuming you change the key function)

0人赞添加讨论(0) 举报

How to filter a set of (int, str) tuples, to retur

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间