I have ~10 millions domain objects which has to stay for all application lifetime in memory but can be added or removed any time one by one. Main storage is HashMap<Long, MyDO>
My processing can be done in basic foreach loops but i can optimize some of operations by creating indexes via mapping some object fields like HashMap<String, ArrayList<MyDO>>
which will reduce iteration count for 30-100x but processing more like 2-5x in total
So question is how much slower will be GC for ~10 millions long living objects if i have not one map i store them but 5 maps and thus creating like 5x times references to same objects?
UPD In short: Is it feasable to use generic java collections with boxed keys for indexes in case there are ~10M object with ~1K objects added/removed per second?
There'll probably hardly any difference. Long living objects get promoted to the tenured area which gets collected only rarely. It takes a couple of generation till the promotion and until them they have to be copied from the Eden into the survival area. Here the number of links doesn't matter.
So question is how much slower will be GC for ~10 millions long living objects if i have not one map i store them but 5 maps and thus creating like 5x times references to same objects?
I'd say that the number of references as such doesn't count at all. But all the map entries are actually themselves objects. However, 10 millions doesn't sound like a big number.
UPD In short: Is it feasable to use generic java collections with boxed keys for indexes in case there are ~10M object with ~1K objects added/removed per second?
No idea, but you could avoid it using some primitive collection. Can't you simply try it out? There are three useful optimizations principles:
- Don't do it!
- If you do, don't do it now!
- If you do, don't do it without measurement!
It may happen that the GC overhead will be negligible and you find yourself wasting time.
References get used to mark an object as "in use", but once an object gets marked, additional references do nothing. Of course they have to be inspected, but this overhead is to be counted for the referrer rather then referee. So if you create one million references to a single object, it's the million objects what costs you time and not the single object.
I'm not sure if this is a case here but if you are really worried about the gc in this case and you'd like to better control behavior of your derived maps and thus their influence on the performance of the gc in my opinion you should take a look at the usage of different kind of references (strong, weak, soft, phantom) in java.
Also remember that pre-optimization is a root of all evil especially in programming.