I'm programming a computer graphics application in Scala which uses a RGB class to return the color at a point in the image. As you can imagine, the function which return the color RGB object is called many times.
class RGB(val red: Int, val green: Int, val blue: Int) { }
There is a function getPixelRGB which is often used as follows
val color:RGB = getPixelRGB(image, x, y)
The problem is that I may call this function a million times which will then, I believe, generate a million unique RGB object instances, thats a very unattractive situation. There are some thoughts I have about this:
getPixelRGB could potentially create an infinite number of objects if it was called an infinite number of times, but it need not be an infinite number of objects as there are only a maximum of 255 * 255 * 255 possible combinations which can be produced for RGB. So the number of objects created "should" be finite. This function could be adjusted to use a object pool where if it is to return the same color as some time before it could return the same pooled object instance for that color.
I could encode this RGB as a Int. An Int would have less memory overhead than a normal Scala/Java object, Java objects have extra memory overhead. Since a Scala Int type is 4 bytes wide, the first 3 bytes could store the RGB value. Only returning an Int rather than a RGB from the getPixelRGB method would be less memory overhead I assume. However how to do this while still having the convince of the RGB class?
Supposedly, and they are, short lived objects and I have read that the garbage collector should re-claim them quickly. However I'm still worried about it. How does the GC know that I'm throwing it away quickly? So confusing.
So in general, my question is how to make this getPixelRGB more memory friendly? also should I even be worried about it?
In terms of memory friendliness, the most efficient solution is to store the complete color information just in one Int. As you have mentioned correctly, the color information requires just three bytes, so the four bytes of Int are enough. You could encode and decode the RGB information from one Int by using bit operations:
It doesn't know it. It assumes it. This is called the generational hypothesis on which all generational garbage collectors are built:
Objects which satisfy this hypothesis are very cheap (even cheaper, in fact, than
malloc
andfree
in languages like C), only objects which violate one or both assumptions are expensive.You could have an interface that returns a simple
Int
. Then you could use implicit conversions to treat anInt
as anRGB
object where needed.Then you could treat any
Int
as anRGBInt
where you think it makes sense:You can encode RGB with single long or int. Moreover, in scala 2.10 you can define value class for primitive values, say
With value class you can still have type information and enjoy class-less memory layout in JVM.
Your question #3 wasn't addressed yet so I will give it a shot.
The working of modern GCs is based on the observation that objects of different lifetime behave very differently. So it manages them in so called generations. Objects just created are stored in the eden space. When this fills up, all the objects in it which are still being referenced by (i.e. they are alive) get copied over to the so called young generation space. Thus all dead objects are left behind and the space occupied by them reclaimed with practically zero effort. This is what makes short lived objects so cheap for the JVM. And most of the objects created by an average program are temporary or local variables which fall out of scope very quickly.
After this first round of GC, the young generation space(s) are managed in a similar fashion, except that there may be more of them. The GC can be configured to have the objects spend one or more rounds in the young generation space(s). Then eventually, the final survivors are migrated into the survivor (aka old generation) space, where they are to stay for the rest of their lifetime. This space is managed by periodically applying some variant of the classical mark and sweep technique: walk through the graph of all live references and mark live objects, then sweep out all unmarked (dead) objects by compacting the survivors into one contiguous memory block, thus defragmenting free memory. This is an expensive operation which blocks the execution of the program, and it is very difficult to implement it correctly, especially in a modern multithreaded VM. This is why generational GC was invented, to ensure that only a tiny fraction of all objects created get to this stage.