random generation of unique combination from two c

I have two columns in a large file, say

pro1 lig1
pro2 lig2
pro3 lig3
pro4 lig1
.....

Second is column redundant. I want new random combinations of double size which should not match given combination, for example

pro1 lig2
pro1 lig4
pro2 lig1
pro2 lig3
pro3 lig4
pro3 lig2
pro4 lig2
pro4 lig3
.....

Thanks.

标签： python linux excel shell random-sample

4条回答

ら.Afraid

2楼-- · 2019-08-09 01:36

If you want exactly two results for each value in column one, I'd brute force the non-matching part, with something like this:

import random

def gen_random_data(inputfile):
    with open(inputfile, "r") as f:
        column_a, column_b = zip(*(line.strip().split() for line in f))

    for a, b in zip(column_a, column_b):
        r = random.sample(column_b, 2)
        while b in r: # resample if we hit a duplicate of the original pair
            r = random.sample(column_b, 2)

        yield a, r[0]
        yield a, r[1]

0人赞添加讨论(0) 举报

小情绪 Triste *

3楼-- · 2019-08-09 01:39

c = """pro1 lig1
pro2 lig2
pro3 lig3
pro4 lig4"""
lines = c.split("\n")
set_a = set()
set_b = set()
for line in lines:
    left, right = line.split(" ")
    set_a |= set([left])
    set_b |= set([right])

import random
for left in sorted(list(set_a)):
    rights = random.sample(set_b, 2)
    for right in rights:
        print left, right

OUTPUT

pro1 lig2
pro1 lig4
pro2 lig4
pro2 lig3
pro3 lig1
pro3 lig4
pro4 lig2
pro4 lig1

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2019-08-09 01:47

Using some sorting, filtering, chaining and list comprehensions, you can try:

from itertools import chain
import random
random.seed(12345) # Only for fixing output, remove in productive code

words = [x.split() for x in """pro1 lig1
pro2 lig2
pro3 lig3
pro4 lig4""".split("\n")]

col1 = [w1 for w1,w2 in words]
col2 = [w2 for w1,w2 in words]

col1tocol2 = dict(words)        

combinations = chain(*[
                    [(w1, w2) for w2 in 
                        sorted(
                            filter(
                                lambda x: x != col1tocol2[w1], 
                                col2),
                            key=lambda x: random.random())
                            [:2]]
                    for w1 in col1])

for w1,w2 in combinations:
    print w1, w2

This gives:

pro1 lig3
pro1 lig2
pro2 lig4
pro2 lig1
pro3 lig4
pro3 lig2
pro4 lig3
pro4 lig1

The main trick is to use a random function as key for sorted.

0人赞添加讨论(0) 举报

姐就是有狂的资本

5楼-- · 2019-08-09 01:47

Say you have two columns:

col1 = ['pro1', 'pro2', ...]
col2 = ['lig1', 'lig2', ...]

Then the most straightforward way to do this would be to use itertools.product and random.sample as below:

from itertools import product
from random import sample

N = 100 #How many pairs to generate

randomPairs = sample(list(product(col1, col2)), N)

If col1 and col2 contain duplicate items, you can extract the unique items by doing set(col1) and set(col2).

Note that list(product(...)) will generate N * M element list, where N and M are the number of unique items in the columns. This may cause problems if N * M ends up being a very large number.

0人赞添加讨论(0) 举报

random generation of unique combination from two c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间