I am monitoring the performance and CPU of a large java application , using VisualVM. When I look at its memory profile I see maximum heap (about 50%) is being used up by char arrays.
Following is a screenshot of the memory profile:
In the memory profile at any given time i see roughly about 9000 char[] objects.
The application accepts a large file as input. The file roughly has about 80 lines each line consisting of 15-20 delimited config options. The application parses the file and stores these lines in a ArrayList of Strings. It then parses these string to get the individual config options for each server.
The application also frequently logs each event to the console.
Java implementation of Strings uses char[] internally along with a reference to array and 3 integer.
From different posts on the internet it seems like StringBuffer , StringBuilder , String.intern() are more memory efficient data types.
How do they compare to java.lang.String ? Has anybody benchmarked them ? If the application uses multithreading (which it does)are they a safe alternative ?
What I do is is have one or more String pools. I do this to a) not create new Strings if I have one in the pool and b) reduce the retained memory size, sometimes by a factor of 3-5. You can write a simple string interner yourself but I suggest you consider how the data is read in first to determine the optimal solution. This matters as you can easily make matters worse if you don't have an efficient solution.
As EJP points out processing a line at a time is more efficient, as is parsing each line as you read it. i.e. an int
or double
takes up far less space than the same String (unless you have a very high rate of duplication)
Here is an example of a StringInterner which takes a StringBuilder to avoid creating objects needlessly. You first populate a recycled StringBuilder with the text and if a String matching that text is in the interner, that String is returned (or a toString() of the StringBuilder is.) The benefit is that you only create objects (and no more than needed) when you see a new String (or at least one not in the array) This can get a 80% to 99% hit rate and reduce memory consumption (and garbage) dramatically when loading many strings of data.
public class StringInterner {
@NotNull
private final String[] interner;
private final int mask;
public StringInterner(int capacity) {
int n = nextPower2(capacity, 128);
interner = new String[n];
mask = n - 1;
}
@Override
@NotNull
public String intern(@NotNull CharSequence cs) {
long hash = 0;
for (int i = 0; i < cs.length(); i++)
hash = 57 * hash + cs.charAt(i);
int h = hash(hash) & mask;
String s = interner[h];
if (isEqual(s, cs))
return s;
String s2 = cs.toString();
return interner[h] = s2;
}
static boolean isEqual(@Nullable CharSequence s, @NotNull CharSequence cs) {
if (s == null) return false;
if (s.length() != cs.length()) return false;
for (int i = 0; i < cs.length(); i++)
if (s.charAt(i) != cs.charAt(i))
return false;
return true;
}
static int nextPower2(int n, int min) {
if (n < min) return min;
if ((n & (n - 1)) == 0) return n;
int i = min;
while (i < n) {
i *= 2;
if (i <= 0) return 1 << 30;
}
return i;
}
static int hash(long n) {
n ^= (n >> 43) ^ (n >> 21);
n ^= (n >> 15) ^ (n >> 7);
return (int) n;
}
}
This class is interesting in that it is not thread safe in the tradition sense, but will work correctly when used concurrently, in fact might work more efficiently when multiple threads have different views of the contents of the array.