How to reduce the number of objects created in Sca

2019-03-18 13:56发布

问题:

I'm programming a computer graphics application in Scala which uses a RGB class to return the color at a point in the image. As you can imagine, the function which return the color RGB object is called many times.

class RGB(val red: Int, val green: Int, val blue: Int) { }

There is a function getPixelRGB which is often used as follows

val color:RGB = getPixelRGB(image, x, y)

The problem is that I may call this function a million times which will then, I believe, generate a million unique RGB object instances, thats a very unattractive situation. There are some thoughts I have about this:

  1. getPixelRGB could potentially create an infinite number of objects if it was called an infinite number of times, but it need not be an infinite number of objects as there are only a maximum of 255 * 255 * 255 possible combinations which can be produced for RGB. So the number of objects created "should" be finite. This function could be adjusted to use a object pool where if it is to return the same color as some time before it could return the same pooled object instance for that color.

  2. I could encode this RGB as a Int. An Int would have less memory overhead than a normal Scala/Java object, Java objects have extra memory overhead. Since a Scala Int type is 4 bytes wide, the first 3 bytes could store the RGB value. Only returning an Int rather than a RGB from the getPixelRGB method would be less memory overhead I assume. However how to do this while still having the convince of the RGB class?

  3. Supposedly, and they are, short lived objects and I have read that the garbage collector should re-claim them quickly. However I'm still worried about it. How does the GC know that I'm throwing it away quickly? So confusing.

So in general, my question is how to make this getPixelRGB more memory friendly? also should I even be worried about it?

回答1:

You can encode RGB with single long or int. Moreover, in scala 2.10 you can define value class for primitive values, say

class RGB private(val underlying: Long) extends AnyVal {
  def toTriple = /*decoding to (red, green, blue)*/
} 
object RGB {
  def apply(red: Int, green: Int, blue: Int) = /* encode and create class with new RGB(longvalue)*/
}

With value class you can still have type information and enjoy class-less memory layout in JVM.



回答2:

Your question #3 wasn't addressed yet so I will give it a shot.

How does the GC know that I'm throwing [short lived objects] away quickly?

The working of modern GCs is based on the observation that objects of different lifetime behave very differently. So it manages them in so called generations. Objects just created are stored in the eden space. When this fills up, all the objects in it which are still being referenced by (i.e. they are alive) get copied over to the so called young generation space. Thus all dead objects are left behind and the space occupied by them reclaimed with practically zero effort. This is what makes short lived objects so cheap for the JVM. And most of the objects created by an average program are temporary or local variables which fall out of scope very quickly.

After this first round of GC, the young generation space(s) are managed in a similar fashion, except that there may be more of them. The GC can be configured to have the objects spend one or more rounds in the young generation space(s). Then eventually, the final survivors are migrated into the survivor (aka old generation) space, where they are to stay for the rest of their lifetime. This space is managed by periodically applying some variant of the classical mark and sweep technique: walk through the graph of all live references and mark live objects, then sweep out all unmarked (dead) objects by compacting the survivors into one contiguous memory block, thus defragmenting free memory. This is an expensive operation which blocks the execution of the program, and it is very difficult to implement it correctly, especially in a modern multithreaded VM. This is why generational GC was invented, to ensure that only a tiny fraction of all objects created get to this stage.



回答3:

In terms of memory friendliness, the most efficient solution is to store the complete color information just in one Int. As you have mentioned correctly, the color information requires just three bytes, so the four bytes of Int are enough. You could encode and decode the RGB information from one Int by using bit operations:

def toColorCode(r: Int, g: Int, b: Int) = r << 16 | g << 8 | b

def toRGB(code: Int): (Int, Int, Int) = (
  (code & 0xFF0000) >> 16, 
  (code & 0x00FF00) >> 8, 
  (code & 0x0000FF)
)


回答4:

Supposedly, and they are, short lived objects and I have read that the garbage collector should re-claim them quickly. However I'm still worried about it. How does the GC know that I'm throwing it away quickly? So confusing.

It doesn't know it. It assumes it. This is called the generational hypothesis on which all generational garbage collectors are built:

  • almost all objects die young
  • almost no old objects contain references to new objects

Objects which satisfy this hypothesis are very cheap (even cheaper, in fact, than malloc and free in languages like C), only objects which violate one or both assumptions are expensive.



回答5:

You could have an interface that returns a simple Int. Then you could use implicit conversions to treat an Int as an RGB object where needed.

case class RBGInt(red: Int, green: Int, blue: Int) {
   // ...
}

object Conversions { 

  implicit def toRGBInt(p: Int) = {
    val (r, g, b) = /* some bitmanipulation to turn p into 3 ints */
    RGBInt(r, g, b)
  }

}

Then you could treat any Int as an RGBInt where you think it makes sense:

type RGB = Int // useful in documenting interfaces that consume
               // or returns Ints which represent RGBs

def getPixelRGB(img: Image, x: Int, y: Int): RGB = {
  // returns an Int
}

def someMethod(..) = {
  import Conversions._
  val px: RGB = getPixelRGB(...) // px is actually an Int
  px.red // px, an Int is lifted to an RGBInt
}