As you've already understood I'm a beginner and am trying to understand what the "Pythonic way" of writing this function is built on. I know that other threads might include a partial answer to this, but I don't know what to look for since I don't understand what is happening here.
This line is a code that my friend sent me, to improve my code which is:
import numpy as np
#load_data:
def load_data():
data_one = np.load ('/Users/usr/... file_name.npy')
list_of_tuples = []
for x, y, label in data_one:
list_of_tuples.append( (x,y) )
return list_of_tuples
print load_data()
The "improved" version:
import numpy as np
#load_data:
def load_data():
data_one = np.load ('/Users/usr.... file_name.npy')
list_of_tuples = [(x,y) for x, y, label in data_one]
return list_of_tuples
print load_data()
I wonder:
- What is happening here?
- Is it a better or worse way? since it is "Pythonic" I assume it wouldn't work with other languages and so perhaps it's better to get used to the more general way?
Both ways are correct and work. You could probably relate the first way with the way things are done in C and other languages. This is, you basically run a for loop to go through all of the values and then append it to your list of tuples.
The second way is more pythonic but does the same. If you take a look at
[(x,y) for x, y, label in data_one]
(this is a list comprehension) you will see that you are also running a for loop on the same data but your result will be(x, y)
and all of those results will form a list. So it achieves the same thing.The third way (added as a response of the comments) uses a slice method.
I've prepared a small example similar to yours:
They all do the same thing and return
[(1, 2), (2, 3), (4, 5)]
but their runtime is different. This is why a list comprehension is a better way to do this.When i run the first method
load_data()
i get:When I run the second method
load_data_2()
I get:When I run the third method
load_data_3()
I get:The second way, list comprehension, is faster!
(x, y)
is atuple
<-- linked tutorial.This is a list comprehension
data_one
is aniterable
and is necessary for a list comprehension. Under the covers they are loops and must iterate over something.x, y, label in data_one
tells me that I can "unpack" these three items from every element that is delivered by thedata_one
iterable. This is just like a local variable of a for loop, it changes upon each iteration.In total, this says:
Make a list of tuples that look like
(x, y)
where I getx, y, and label
from each item delivered by the iterabledata_one
. Put eachx
andy
into a tuple inside a list calledlist_of_tuples
. Yes I know I "unpacked"label
and never used it, I don't care.This is called a list comprehension. It's similar to a loop and can often accomplish the same task, but will generate a list with the results. The general format is
[operation for variable in iterable]
. For example,[x**2 for x in range(4)]
would result in[0, 1, 4, 9]
.They can also be made more complicated (like yours above is) by using multiple functions, variables, and iterables in one list comprehension. For example,
[(x,y) for x in range(5) for y in range(10)]
.You can find more reading on this here.
The action is essentially the same. In newer Python interpreters the scope of the variables in the list comprehension is narrower (
x
can't be seen outside the comprehension).This kind of action occurs often enough that Python developers thought it worth while to use special syntax. There's a
map(fn, iterable)
function that does something similar, but I think the list comprehension is clearer.Python developers like this syntax enough to extend it to generators and dictionaries and sets. And they allow nesting and conditional clauses.
Both forms use tuple unpacking
x,y,label in data_one
.What are both of these clips doing?
data_one
apparently is a list of tuples (or sublists) with 3 elements. This code is creating a new list with 2 element tuples - 2 out of the 3 elements. I think it's easier to see that in the list comprehension.It's wise to be familiar with both. Sometimes the action is too complicated to cast in the comprehension form.
Another feature of the comprehension - it doesn't allow side effects (or at least it is trickier to incorporate them). That may be a defect in some cases, but generally it makes the code clearer.
The "improved" version uses a list comprehension. This makes the code declarative (describing what you want) rather than imperative (describing how to get what you want).
The advantages of declarative programming are that the implementation details are mostly left out, and the underlying classes and data-structures can perform the operations in an optimal way. For example, one optimisation that the python interpreter could make in your example above, would be to pre-allocate the correct size of the array
list_of_tuples
rather than having to continually resize the array during theappend()
operation.To get you started with list comprehensions, I'll explain the way I normally start to write them. For a list
L
write something like this:For each element in
L
, a variable is extracted (the centrex
) and can be used to form the output list (thex
on the left). The above expression effectively does nothing, andoutput
the same asL
. Imperatively, it is akin to:From here though, you could realise that each
x
is actually a tuple that could be unpacked using tuple assignment:This will create a new list, containing only the
x
element from each tuple in the list.If you wanted to pack a different tuple in the output list, you just pack it on the left-hand side:
This is basically what you end up with in your optimised version.
You can do other useful things with list comprehensions, such as only inserting values that conform to a specific condition:
Here is a useful tutorial about list comprehensions that you might find interesting: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/