Understanding this line: list_of_tuples = [(x,y) f

2019-04-24 10:15发布

问题:

As you've already understood I'm a beginner and am trying to understand what the "Pythonic way" of writing this function is built on. I know that other threads might include a partial answer to this, but I don't know what to look for since I don't understand what is happening here.

This line is a code that my friend sent me, to improve my code which is:

import numpy as np

#load_data:
def load_data():
    data_one = np.load ('/Users/usr/... file_name.npy') 
    list_of_tuples = []
    for x, y, label in data_one:
        list_of_tuples.append( (x,y) )
    return list_of_tuples

print load_data()

The "improved" version:

import numpy as np

#load_data:
def load_data():
    data_one = np.load ('/Users/usr.... file_name.npy') 
    list_of_tuples = [(x,y) for x, y, label in data_one]
    return list_of_tuples

print load_data()

I wonder:

  1. What is happening here?
  2. Is it a better or worse way? since it is "Pythonic" I assume it wouldn't work with other languages and so perhaps it's better to get used to the more general way?

回答1:

list_of_tuples = [(x,y) for x, y, label in data_one]

(x, y) is a tuple <-- linked tutorial.

This is a list comprehension

    [(x,y) for x, y, label in data_one]
#   ^                                 ^
#   |       ^comprehension syntax^    |
# begin list                       end list   

data_one is an iterable and is necessary for a list comprehension. Under the covers they are loops and must iterate over something.

x, y, label in data_one tells me that I can "unpack" these three items from every element that is delivered by the data_one iterable. This is just like a local variable of a for loop, it changes upon each iteration.

In total, this says:

Make a list of tuples that look like (x, y) where I get x, y, and label from each item delivered by the iterable data_one. Put each x and y into a tuple inside a list called list_of_tuples. Yes I know I "unpacked" label and never used it, I don't care.



回答2:

Both ways are correct and work. You could probably relate the first way with the way things are done in C and other languages. This is, you basically run a for loop to go through all of the values and then append it to your list of tuples.

The second way is more pythonic but does the same. If you take a look at [(x,y) for x, y, label in data_one] (this is a list comprehension) you will see that you are also running a for loop on the same data but your result will be (x, y) and all of those results will form a list. So it achieves the same thing.

The third way (added as a response of the comments) uses a slice method.

I've prepared a small example similar to yours:

data = [(1, 2, 3), (2, 3, 4), (4, 5, 6)]

def load_data():
    list_of_tuples = []
    for x, y, label in data:
        list_of_tuples.append((x,y))
    return list_of_tuples

def load_data_2():
    return [(x,y) for x, y, label in data]

def load_data_3():
    return [t[:2] for t in data]

They all do the same thing and return [(1, 2), (2, 3), (4, 5)] but their runtime is different. This is why a list comprehension is a better way to do this.

When i run the first method load_data() i get:

%%timeit
load_data()
1000000 loops, best of 3: 1.36 µs per loop

When I run the second method load_data_2() I get:

%%timeit
load_data_2()
1000000 loops, best of 3: 969 ns per loop

When I run the third method load_data_3() I get:

%%timeit 
load_data_3()
1000000 loops, best of 3: 981 ns per loop

The second way, list comprehension, is faster!



回答3:

The "improved" version uses a list comprehension. This makes the code declarative (describing what you want) rather than imperative (describing how to get what you want).

The advantages of declarative programming are that the implementation details are mostly left out, and the underlying classes and data-structures can perform the operations in an optimal way. For example, one optimisation that the python interpreter could make in your example above, would be to pre-allocate the correct size of the array list_of_tuples rather than having to continually resize the array during the append() operation.

To get you started with list comprehensions, I'll explain the way I normally start to write them. For a list L write something like this:

output = [x for x in L]

For each element in L, a variable is extracted (the centre x) and can be used to form the output list (the x on the left). The above expression effectively does nothing, and output the same as L. Imperatively, it is akin to:

output = []
for x in L:
    output.append(x)

From here though, you could realise that each x is actually a tuple that could be unpacked using tuple assignment:

output = [x for x, y, label in L]

This will create a new list, containing only the x element from each tuple in the list.

If you wanted to pack a different tuple in the output list, you just pack it on the left-hand side:

output = [(x,y) for x, y, label in L]

This is basically what you end up with in your optimised version.

You can do other useful things with list comprehensions, such as only inserting values that conform to a specific condition:

output = [(x,y) for x, y, label in L if x > 10]

Here is a useful tutorial about list comprehensions that you might find interesting: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/



回答4:

The action is essentially the same. In newer Python interpreters the scope of the variables in the list comprehension is narrower (x can't be seen outside the comprehension).

list_of_tuples = []
for x, y, label in data_one:
    list_of_tuples.append( (x,y) )

list_of_tuples = [(x,y) for x, y, label in data_one]

This kind of action occurs often enough that Python developers thought it worth while to use special syntax. There's a map(fn, iterable) function that does something similar, but I think the list comprehension is clearer.

Python developers like this syntax enough to extend it to generators and dictionaries and sets. And they allow nesting and conditional clauses.

Both forms use tuple unpacking x,y,label in data_one.

What are both of these clips doing? data_one apparently is a list of tuples (or sublists) with 3 elements. This code is creating a new list with 2 element tuples - 2 out of the 3 elements. I think it's easier to see that in the list comprehension.

It's wise to be familiar with both. Sometimes the action is too complicated to cast in the comprehension form.

Another feature of the comprehension - it doesn't allow side effects (or at least it is trickier to incorporate them). That may be a defect in some cases, but generally it makes the code clearer.



回答5:

This is called a list comprehension. It's similar to a loop and can often accomplish the same task, but will generate a list with the results. The general format is [operation for variable in iterable]. For example,

[x**2 for x in range(4)] would result in [0, 1, 4, 9].

They can also be made more complicated (like yours above is) by using multiple functions, variables, and iterables in one list comprehension. For example,

[(x,y) for x in range(5) for y in range(10)].

You can find more reading on this here.