Python efficiency: lists vs. tuples

2020-07-09 02:33发布

问题:

I have a medium-amount of base objects.

These base objects will be put in collections, and these collections will be munged around: sorted, truncated, etc.

Unfortunately, the n is large enough that memory consumption is slightly worrisome, and speed is getting concerning.

My understanding is that tuples are slightly more memory-efficient, since they are deduplicated.

Anyway, I would like to know what the cpu/memory tradeoffs of lists vs. tuples are in Python 2.6/2.7.

回答1:

If you have a tuple and a list with the same elements, the tuple takes less space. Since tuples are immutable, you can't sort them, add to them, etc. I recommend watching this talk by Alex Gaynor for a quick intro on when to choose what datastructure in Python.

UPDATE: Thinking about it some more, you may want to look into optimizing the space usage of your objects, e.g., via __slots__ or using namedtuple instances as proxies instead of the actual objects. This would likely lead to much bigger savings, since you have N of them and (presumbaly) only a few collections in which they appear. namedtuple in particular is super awesome; check out Raymond Hettinger's talk.



回答2:

As others mentioned tuples are immutable. Sorting a tuple (e.g. sorted(mytuple)) returns a list, which you would then have to cast back to a tuple.

To sort a tuple (and keep it a tuple) you'd have to do this:

mytuple = (3,2,1)
mysortedtuple = tuple(sorted(mytuple))

To sort a list you'd have to do this:

mylist = [3,2,1]
mylist.sort()

Because you're not casting and re-casting, the latter, in this instance, is more efficient.

Don't get hung up on using tuples over lists unless you have a good justification. If you need sorted data, tuples are not the way to go unless they are created that way in the first place. Tuples excel when the data they contain DOES NOT CHANGE, such as with configuration settings that are loaded at run-time, or data that has already been processed.

Considering that you mentioned you are processing a large dataset, you might want to look at using a functional programming style by way of generators and iterators over lists and tuples. This way you're not shuttling around and creating new containers, but just chaining iteration operations to get to the end result.

Further reading:

  • Python's itertools
  • Python functional programming HOWTO


回答3:

What is the (average, min, max) number of base objects in a collection?

Tuples are "deduplicated" and lists are not? What do you think that "deduplicated" means in this context?

Lists do take up more memory than tuples, because extra memory is allocated on the presumption that a list is going to grow and you definitely don't want to realloc() memory each time you do large_list.append(). However on a 32-bit machine, the amortised cost of an extra list element is 4 bytes for a pointer, N bytes for the element itself, and no more than another 4 bytes for the extra memory. N is 16 bytes for a float. That means a list of floats takes up to 24 bytes per extra float, compared with 20 bytes for a tuple. A "base object" with N==100 gives a comparison of 108 versus 104. If a based object is referred to in two collections, then 58 versus 54. How big is your N?

Advice: Leave your collections as lists. Concentrate on:

  • ensuring your base objects are memory-efficient

  • use generators and itertools goodies instead of temporary lists where possible

  • if you can't avoid having temporary lists, ensure that they are thrown away immmediately they are not needed any more i.e. don't wait till the creating method returns; use explicit del as soon as possible.



回答4:

In addition to all these suggestions, you may find that numpy will fill your needs. If your objects are something that numpy handles by default (ints, native C types, etc) then that would be ideal. You can use a numpy array with custom objects as well, but that might be more work than it's worth.



回答5:

You can't use them the same way. Tuples are immutable and don't support appending, sorting, etc (calling sorted on a tuple yields a list, and so on). Tuples are totally different from lists, so any performance comparison is meaningless.



回答6:

You cannot sort an immutable object - i.e. when sorting a tuple you'll always create a new one.



回答7:

There are at least two existing questions that are similar enough to yours that the answers (or links within them) may be useful to you. To summarize: let the features of the type (mutable vs. immutable, heterogeneous vs. homogeneous) rather than performance guide your decision, because the performance/efficiency differences are minimal.

What's the difference between list and tuples in Python?
What are differences between List, Dictionary and Tuple in Python?