QList vs QVector revisited

2020-02-17 07:33发布

站内文章 / C++

52 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

My question is basically when to choose QVector and when to choose QList as your Qt container. What I already know:

Qt docs: QList class

For most purposes, QList is the right class to use. Its index-based API is more convenient than QLinkedList's iterator-based API, and it is usually faster than QVector because of the way it stores its items in memory. It also expands to less code in your executable.

The same is written is this very popular Q&A: QVector vs QList. It also favors QList.
But: on recent Qt World Summit 2015 KDAB presented "Why QList is harmful", this is basically here:

QList considered harmful

Don't use QList, use Q_DECLARE_TYPEINFO

As far as I understand the idea is that QList for almost all types is inefficient when allocating new elements in heap. Each time you are adding new element, it calls new (once per element) and this is inefficient compared to QVector.

This is why now I am trying to understand: is it QVector which we should choose as default container?

回答1:

Qt advertises QList as the "jack of all trades", but the other half of that saying is "master of none". I'd say QList is a good candidate if you plan on appending to both ends of the list, and those are no larger than than a pointer, as QList reserves space before and after. That's about it, I mean as far as good reasons to use QList are concerned.

QList will automatically store "large" objects as pointer and allocate the objects on the heap, which may be considered a good thing if you are a baby, which doesn't know how to declare a QVector<T*> and use dynamic allocation. This is not necessarily a good thing, and in some cases it will only bloat the memory usage and add extra indirection. IMO it is always a good idea to be explicit about what you want, whether it is pointers or instances. Even if you do want heap allocation, it is always better to allocate it yourself and simply add the pointer to the list than construct the object once, then have have copy construct that on the heap.

Qt will return you a QList in a lot of places where it comes with overhead, for example when getting a QObject's children or you search for children. In this case it doesn't make sense to use a container that allocates space before the first element, as it is a list of objects which are already there, not something you are likely to prepend to. I also don't much like the absence of a resize() method.

Imagine a situation where you have an object with size of 9 bytes and byte alignment on a 64 bit system. It is "far too much" for QList so instead it will use 8 byte pointer + CPU overhead for the slow heap allocation + memory overhead for the heap allocation. It will use twice the memory and with an extra indirection for access it will hardly offer performance advantages as advertised.

As of why QVector cannot suddenly become the "default" container - you don't change horses mid-race - it is a legacy thing, Qt being such an old framework, and even though a lot of stuff has been deprecated, making changes to widely used defaults is not always possible, not without breaking a lot of code, or producing undesired behavior. Good or bad, QList will likely continue being the default all the way throughout Qt 5, and likely in the next major release as well. The same reason Qt will continue using "dumb" pointers, for years after smart pointers have become a must and everybody is crying about how bad plain pointers are and how they should not be used ever.

That being said, nobody is forcing you to use QList in your design. There is no reason why QVector should not be your default container. I myself don't use QList anywhere, and in the Qt functions which return a QList I merely use as a temporary to move stuff into a QVector.

Furthermore, and this is only my personal opinion, but I do find a lot of design decisions in Qt that don't necessary make sense, be that performance or memory use efficiency or ease of use wise, and overall there are a a lot of frameworks and languages which like promoting their ways of doing things, not because it is the best way to do it, but because it is their way to do it.

Last but not least:

For most purposes, QList is the right class to use.

It really boils down to how you understand this. IMO in this context, "the right" does not stand for "the best" or "the optimal", but for "good enough" as in "it will do, even if not the best". Especially if you know nothing about different container classes and how they work.

For most purposes, QList will do.

To sum things up:

QList PROs

you intend to prepend objects no larger than the size of a pointer, since it reserves some space in the front
you intend to insert in the middle of the list objects (substantially) larger than a pointer (and I am being generous here, since you can easily use QVector with explicit pointers to achieve the same and cheaper - no extra copy), since when resizing the list, no objects will be moved, only pointers

QList CONs

doesn't have a resize() method, reserve() is a subtle trap, since it will not increase the valid list size, even if index access works it falls in the UB category, also you will not be able to iterate that list
does an extra copy and heap allocating when object is larger than a pointer, which might also be an issue if object identity matters
uses extra indirection to access objects larger than a pointer
has CPU time and memory usage overheads due to the last two, also less cache friendly
comes with additional overhead when used as a "search" return value, since you are not likely to prepend or even append to that
only makes sense if index access is a must, for optimal prepend and insert performance a linked list might be a better option.

The CON's marginally outweigh the PROs, meaning that while in "casual" use QList might be acceptable, you definitely don't want to use it in situations where CPU time and/or memory usage are a critical factor. All in all, QList is best suited for lazy and careless use, when you don't want to make the consideration of optimal storage container for the use case, which would typically be a QVector<T>, a QVector<T*> or a QLinkedList (and I exclude "STL" containers, since we are talking Qt here, Qt containers are just as portable, sometimes faster, and most certainly easier and cleaner to use, whereas std containers are needlessly verbose).

回答2:

In Qt 5.7, the documentation was changed concerning the topic discussed here. In QVector it is now stated:

QVector should be your default first choice. QVector<T> will usually give better performance than QList<T>, because QVector<T> always stores its items sequentially in memory, where QList<T> will allocate its items on the heap unless sizeof(T) <= sizeof(void*) and T has been declared to be either a Q_MOVABLE_TYPE or a Q_PRIMITIVE_TYPE using Q_DECLARE_TYPEINFO.

They refer to this article by Marc Mutz.

So the official point of view has changed.

回答3:

QList is an array of void*.

In its normal operation, it news the elements on the heap and stores a pointer to them in the void* array. Like a linked list, that means that references (but, unlike linked lists, not iterators!) to elements contained in the list remain valid under all container modifications until the element is removed from the container again. Thus the name "list". This datastructure is called an array-list and is used in a lot of programming languages where every object is of reference type (say, Java). It is a very cache-unfriendly data structure, like all node-based containers.

But the resizing of the array-list can be factored into a type-independent helper class (QListData), which is supposed to save some executable code size. In my experiments, it's next to impossible to predict which of QList, QVector or std::vector produces the least executable code.

This would have been a good data type for the many Qt reference-like types such as QString, QByteArray, etc., which consist of nothing more than a pimpl pointer. For these types, QList gained an important optimisation: when the type is not larger than a pointer (and please note that this definition depends on the platform's pointer size - 32 or 64bits), instead of heap-allocating objects, the objects are stored in the void* slots directly.

This is only possible, though, if the type is trivially relocatable. That means it can be relocated in memory using memcpy. Relocation here means I take an object, memcpy it to another address and - crucially - not run the destructor of the old object.

And this is where things started to go wrong. Because unlike in Java, in C++ a reference to an object is its address. And while in the original QList, references were stable until the object was removed from the collection again, by putting them into the void* array this property no longer holds. This is no longer a "list" for all intents and purposes.

Things continued to go wrong, though, because they allowed types that are strictly smaller than a void* to be placed in a QList, too. But the memory management code expects elements of pointer size, so QList adds padding(!). That means that a QList<bool> on 64bit platforms looks like this:

[ | | | | | | | [ | | | | | | | [ ...
[b|   padding   [b|   padding   [b...

Instead of fitting 64 bools into a cache line, like QVector does, QList only manages 8.

Things went wrong out of any proportion when the docs started calling QList a good default container. It's not. The original STL states:

Vector is the simplest of the STL container classes, and in many cases the most efficient.

Scott Meyer's Effective STL has several items that start with "Prefer std::vector over...".

What is true in general C++ is not suddenly wrong just because you're using Qt.

Qt 6 will fix that particular design mistake. In the meantime, use QVector or std::vector.

回答4:

If the size of the QList's element type is greater than the pointer's size QList performs better than QVector because it doesn't store the objects sequentially but stores sequentially pointers to heap copies.

I'd tend to say the opposite. It'll be much worse off, when going through the items. If it stores it as pointers on the heap won't QList be much worse off than QVector? The reason that sequential storage(QVector all the time) is so good is, that is is cache friendly, once you store pointers,you lose the data locality, start getting cache misses and it's horrible for performance.

The "default" container IMHO should be a QVector (or std::vector), if you're worried about lots of reallocation, then preallocate a reasonable amount, pay the once off cost and you'll benefit in the long run.

Use the *Vector by default, if you get performance problems, profile and change as necessary.

回答5:

QList is the best possible container to use generally as the documentation states. If the size of the elements' type is <= of the pointer's size = machine & OS bitness = 4 or 8 bytes then the objects are stored the same way as QVector does - sequentially in memory. If the size of the QList's element type is greater than the pointer's size QList performs better than QVector because it doesn't store the objects sequentially but stores sequentially pointers to heap copies. In the 32-bit case the picture is as follows:

sizeof( T ) <= sizeof( void* )
=====
QList< T > = [1][1][1][1][1]
                   or
             [2][2][2][2][2]
                   or
             [3][3][3][3][3]
                   or
             [4][4][4][4][4] = new T[];

sizeof( T ) > sizeof( void* )
=====
QList< T > = [4][4][4][4][4] = new T*[]; // 4 = pointer's size
              |   |  ...  |
           new T new T   new T

In case you want your objects to be laid out sequentially in memory no matter the size of their elements, as it is usually the case with OpenGL programming, then you should use QVector.

Here is a detailed description of the QList's internals.

回答6:

Imagine, that we have DataType class.

QVector - array of objects, such as:

// QVector<DataType> internal structure
DataType* pArray = new DataType[100];

QList - array of pointers to objects, such as:

// QList<DataType> internal structure
DataType** pPointersArray = new DataType*[100];

Therefore, direct access by index will be faster for QVector:

{
// ...
cout << pArray[index]; //fast
cout << *pPointersArray[index]; //slow, need additional operation for dereferencing
// ...
}

But swaping will be faster for QList, if sizeof(DataType) > sizeof(DataType*):

{
// QVector swaping
DataType copy = pArray[index];
pArray[index] = pArray[index + 1];
pArray[index + 1] = copy; // copy object

// QList swaping
DataType* pCopy = pPointersArray [index];
pPointersArray[index] = pPointersArray [index + 1];
pPointersArray[index + 1] = pCopy; // copy pointer
// ...
}

So, if you need direct access without swaping operations between elements (such as sorting, for example), or sizeof(DataType) <= sizeof(DataType*), your better way is use QVector. In other case use QList.

回答7:

QList behaves differently depending on what's inside (see source code struct MemoryLayout):

if sizeof T == sizeof void* and T is defined Q_MOVABLE_TYPE, then QList<T> behaves exactly like QVector, that is, the data is stored contiguously in memory.
if sizeof T < sizeof void* and T is defined Q_MOVABLE_TYPE, then QList<T> pads each entry to sizeof void*, and loses layout-compatibility with QVector.
in all other cases, QList<T> is a linked list and therefore slow to some degree.

This behavior is what makes QList<T> pretty much always a bad choice, because depending on nifty details, QList<T> is either really a list, or a vector. That's bad API design and prone to errors. (For instance, you will run into bugs if you have a library with a public interface that uses a QList<MyType> internally and in its public interface. sizeof MyType is < sizeof void*, but say you forgot to declare MyType as Q_MOVABLE_TYPE. Later, you want to add Q_MOVABLE_TYPE. This is binary incompatible, meaning that you now have to recompile all code that uses your library, as the memory layout of QList<MyType> changed in the public API. If you are not careful, you will miss this and introduce a bug. This illustrates quite nicely why QList is a bad choice here.)

That said, QList is still not entirely bad: It will probably do what you want most of the cases, but maybe it will do the job behind the scenes differently to what you might expect.

Rule of thumb is:

Instead of QList, use QVector<T> or QVector<T*>, since it explicitly says what you want. You can combine that with std::unique_ptr.
In C++11 and onwards, it is even considered best to just use std::vector, since it will behave correctly in a range-based for loop. (QVector and QList may detach and therefore perform a deep-copy).

You can find all these details and more in a presentation from Marc Mutz and in the video by Olivier Goffart.