Python, comparison sublists and making a list

2019-02-25 22:45发布

问题:

I have a list that contains a lot of sublists. i.e.

mylst = [[1, 343, 407, 433, 27], 
         [1, 344, 413, 744, 302], 
         [1, 344, 500, 600, 100], 
         [1, 344, 752, 1114, 363], 
         [1, 345, 755, 922, 168], 
         [2, 345, 188, 1093, 906], 
         [2, 346, 4, 950, 947], 
         [2, 346, 953, 995, 43], 
         [3, 346, 967, 1084, 118], 
         [3, 347, 4, 951, 948], 
         [3, 347, 1053, 1086, 34], 
         [3, 349, 1049, 1125, 77], 
         [3, 349, 1004, 1124, 120], 
         [3, 350, 185, 986, 802], 
         [3, 352, 1018, 1055, 38]]

I want to start categorizing this list firstly and making another list by using three steps. First of all, I want to compare sublists when the first item in each sublist is the same, i.e mylist[a][0]==1. Secondly, comparing second item in sublists, and if difference between the second item in the sublist and another second item in the following sulbists under 2, then calculate the difference between third items or fourth items. If either of the difference for third and fourth item is under 10, then I want to append index of the sublist.

The result that I want should be... like this : [0, 1, 3, 4, 6, 7, 10, 11, 12]

Following is my naive attempts to do this.

Following is my naive attempts to do this.

def seg(mylist) :
    Segments = []
    for a in range(len(mylist)-1) :
        for index, value in enumerate (mylist) :
            if mylist[a][0] == 1 :
                if abs(mylist[a][1] - mylist[a+1][1]) <= 2 :
                    if (abs(mylist[a][2] - mylist[a+1][2]) <= 10 or 
                        abs(mylist[a][3] - mylist[a+1][3]) <= 10) :
                        Segments.append(index)
return Segments

or

def seg(mylist) :
    Segments= []
    for index, value in enumerate(mylist) :
        for a in range(len(mylist)-1) :
            if mylist[a][0] == 1 :
                try :
                    if abs(mylist[a][1]-mylist[a+1][1]) <= 2 :
                        if (abs(mylist[a][2]-mylist[a+1][2]) <= 10 or
                            abs(mylist[a][3] - mylist[a+1][3]) <= 10) :
                            Segments.append(index)
                except IndexError :
                    if abs(mylist[a][1]-mylist[a+1][1]) <= 2 :
                        if (abs(mylist[a][2]-mylist[a+1][2]) <= 10 or
                            abs(mylist[a][3] - mylist[a+1][3]) <= 10):
                            Segments.append(index)
return Segments

These codes don't look nice at all, and result are not showing as that I intended to. In the bottom one, I wrote try and except to handle index error(list out of range), initially I used 'while' iteration instead of 'for' iteration.

What should I do to get result that I wanted to? How can I correct those codes to look like more 'pythonic' way? Any idea would be great for me, and many thanks in advance.

回答1:

You will have to catch the duplicate indexes but this should be a lot more efficient:

gr = []
it = iter(mylst)
prev = next(it)

for ind, ele in enumerate(it):
    if ele[0] == prev[0] and abs(ele[1] - prev[1]) <= 2:
        if any(abs(ele[i] - prev[i]) < 10 for i in (2, 3)):
            gr.extend((ind, ind+1))
    prev = ele

Based on your logic 6 and 7 should not appear as they don't meet the criteria:

     [2, 346, 953, 995, 43], 
     [3, 346, 967, 1084, 118], 

Also for 10 to appear it should be <= 2 not < 2 as per your description.

You could use an OrderedDict to remove the dupes and keep the order:

from collections import OrderedDict

print(OrderedDict.fromkeys(gr).keys())
[0, 1, 3, 4, 10, 11, 12]


回答2:

This seems to have worked for me. I'm not sure if its more Pythonic in any way though and you'll be looping through the list multiple times so there's some things you can definitely do to optimize it more.

def seg(mylist):
    # converted list to set in case there are any duplicates
    segments = set()

    for entry_index in range(len(mylist)):
        for c in range(len(mylist)):
            first = mylist[entry_index]
            comparison = mylist[c]

            # ignore comparing the same items
            if entry_index == c:
               continue

            # ignore cases where the first item does not match
            if first[0] != comparison[0]:
                continue

            # ignore cases where the second item differs by more than 2
            if abs(first[1] - comparison[1]) > 2:
                continue

            # add cases where the third and fourth items differ by less than 10
            if abs(first[2] - comparison[2]) < 10 or abs(first[3] - comparison[3]) < 10:
                segments.add(entry_index)

            elif abs(first[2] - comparison[3]) < 10 or abs(first[3] - comparison[2]) < 10:
                segments.add(entry_index)

    return segments