I am currently in a personal learning project where I read in an XML database. I find myself writing functions that gather data and I'm not sure what would be a fast way to return them.
Which is generally faster:
yield
s, or
- several
append()
s within the function then return
the ensuing list
?
I would be happy to know in what situations where yield
s would be faster than append()
s or vice-versa.
yield
has the huge advantage of being lazy and speed is usually not the best reason to use it. But if it works in your context, then there is no reason not to use it:
# yield_vs_append.py
data = range(1000)
def yielding():
def yielder():
for d in data:
yield d
return list(yielder())
def appending():
lst = []
for d in data:
lst.append(d)
return lst
This is the result:
python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "yielding()"
10000 loops, best of 3: 80.1 usec per loop
python2.7 -m timeit -s "from yield_vs_append import yielding,appending" "appending()"
10000 loops, best of 3: 130 usec per loop
At least in this very simple test, yield
is faster than append.
I recently asked myself a similar question exploring ways of generating all permutations of a list (or tuple) either via appending to a list or via a generator, and found (for permutations of length 9, which take about a second or so to generate):
- The naive approach (permutations are lists, append to list, return list of lists) takes about three times the time of
itertools.permutations
- Using a generator (ie
yield
) reduces this by approx. 20 %
- Using a generator and generating tuples is the fastest, about twice the time of
itertools.permutations
.
Take with a grain of salt! Timing and profiling was very useful:
if __name__ == '__main__':
import cProfile
cProfile.run("main()")
There is a even faster alternative to TH4Ck's yielding(). It is list comprehension.
In [245]: def list_comp():
.....: return [d for d in data]
.....:
In [246]: timeit yielding()
10000 loops, best of 3: 89 us per loop
In [247]: timeit list_comp()
10000 loops, best of 3: 63.4 us per loop
Of course it is rather silly to micro-benchmark these operations without knowing the structure of your code. Each of them are useful in difference situation. For example list comprehension is useful if you want to apply a simple operation that can be express as an single expression. Yield has a significant advantage for you to isolate the traversal code into a generator method. Which one is appropriate depends a lot on the usage.
Primally u must decide, if u need generator, this also got improved method. Like list generator "[elem for elem in somethink]". And generators be recommended if u just use value in list for some operations. But if u need list for many changes, and work with many elements at the same time, this must be list.
(Like 70% times if standard programmer use list, better will be generator. use less memory, just many people just doesn't see other way of list. Unfortunately at our epoch, many people pee at good optymalization, and do just to work.)
If u use generator for list to improve return, let's do that same with yield guys. Anyway, we got multiple more optimized methods for all actions in Python programming language.
Yield is faster then return, and I'll prove this.
Just check this guys:
data = range(1000)
def yielder():
yield from data
def appending():
L = []
app = list.append
for i in data:
app(L, i)
return L
def list_gen():
return [i for i in data]
Of course appending will be slower then other ideas, becouse we create and extend list any loop time. Just loop "for" is very unoptymalized, if u can avoid this, do that. Becouse at any step this function load next element and write our variable, to got this object value in memory. So we jump at any element, create reference, extend list in loop (declared method is huge speed optymalizer), when we generate just return, summary got 2000 elements at two lists.
list_gen is less memorywise, we just return elements, but like up, we generate secound list. Now we got two lists, orginal data, and her copy. Summary 2000 elements. There just we avoid step with create reference to variable. Becouse our gen in lists avoid this step. Just write elements.
yielder use least of all memory, becouse we got just yielded value from data. We avoid one reference. For example:
data = range(1000)
def yielder():
yield from data
def list_gen():
return [i for i in data]
#Now we generate next reference after line [i for i in data]
for i in list_gen():
#some instruction
#This is our first reference, becouse was yield from data.
for i in yielder():
#some instruction
Use only one element to some instruction, not all from list, next one value yielder will return at next loop, not magazine all 1000 elements to write in reference.
Srry for little dig out topic, just when i accidentally came a cross from google search, other beginner python programmers can see this nonsense.