I am developing a graphics application using Qt 4.5 and am putting images in the QPixmapCache, I wanted to optimise this so that if a user inserts an image which is already in the cache it will use that.
Right now each image has a unique id which helps optimises itself on paint events. However I realise that if I could calculate a hash of the image I could lookup the cache to see if it already exists and use that (it would help more for duplicate objects of course).
My problem is that if its a large QPixmap will a hash calculation of it slow things down or is there a quicker way?
A couple of comments on this:
If you're going to be generating a hash/cache key of a pixmap, then you may want to skip the QPixmapCache and use QCache directly. This would eliminate some overhead of using QStrings as keys (unless you also want to use the file path to locate the items)
As of Qt4.4, QPixmap has a "hash" value associated with it (see QPixmap::cacheKey() ). The documentation claims "Distinct QPixmap objects can only have the same cache key if they refer to the same contents." However, since Qt uses shared-data copying, this may only apply to copied pixmaps and not to two distinct pixmaps loaded from the same image. A bit of testing would tell you if it works, and if it does, it would let you easily get a hash value.
If you really want to do a good, fairly quick cache with removing duplications, you might want to look at your own data structure that sorts according to sizes, color depths, image types, and things such as that. Then you would only need to hash the actual image data after you find the same type of image with the same dimensions, bit-depths, etc. Of course, if your users generally open a lot of images with those things the same, it wouldn't help at all.
Performance: Don't forget about the benchmarking stuff Qt added in 4.5, which would let you compare your various hashing ideas and see which one runs the fastest. I haven't checked it out yet, but it looks pretty neat.
Just in case anyone comes across this problem (and isn't too terribly experienced with hashing things, particularly something like an image), here's a VERY simple solution I used for hashing QPixmaps and entering them into a lookup table for later comparison:
Here is the hashing function itself (you can have it hash into a qint64 if you desire less collisions). As you can see I convert the pixmap into a QImage, and simply walk through its dimensions and perform a very simple one-at-a-time hash on each pixel and return the final result. There are many ways to improve this implementation (see the other answers to this question), but this is the basic gist of what needs to be done.
The OP mentioned how he would use this hashing function to then construct a lookup table for later comparing images. This would require a very simple lookup initialization function -- something like this:
I'm using a QMap here called imageTable which would need to be declared in the class as such:
Then, finally, when you want to compare an image to the images in your lookup table (ie: "what image, out of the images I know it can be, is this particular image?"), you just call the hashing function on the image (which I'm assuming will also be a QPixmap) and the return QString value will allow you to figure that out. Something like this would work:
That's it. I hope this helps someone -- it would have saved me some time, although I was happy to have the experience of figuring it out.
I'm assuming you're talking about actually calculating a hash over the data of the image rather than getting the unique id generated by QT.
Depending on your images, you probably don't need to go over the whole image to calculate a hash. Maybe only read the first 10 pixels? first scan line?
Maybe a pseudo random selection of pixels from the entire image? (with a known seed so that you could repeat the sequence) Don't forget to add the size of the image to the hash as well.
Hash calculations should be pretty quick (somewhere above 100 MB/s if no disk I/O involved) depending on which algorithm you use. Before hashing, you could also do some quick tests to sort out potential candidates - f.e. images must have same width and height, else it's useless to compare their hash values.
Of course, you should also keep the hash values for inserted images so you only have to calculate a hash for new images and won't have to calculate it again for the cached images.
If the images are different enough, it would perhaps be enough to not hash the whole image but a smaller thumbnail or a part of the image (f.e. first and last 10 lines), this will be faster, but will lead to more collisions.