Guava Charmatcher static initialization slow

2020-06-03 06:00发布

UPDATE: It looks like my 13.0.1 is calling code from this diffed CharMatcher.

http://code.google.com/p/guava-libraries/source/diff?spec=svn69ad96b719d7cd3d872a948d7454f17b816a21c2&r=464b0cfab7c3b6713c35e6f3ae7426542668c77b&format=side&path=/guava/src/com/google/common/base/CharMatcher.java

I seem to be having an issue with Guava's Charmatcher. I am currently writing some JUnit tests and I am noticing the first test takes a bit of time (15+ seconds), but the future tests are all approximately the same (.3 seconds), with a majority of the tests running through the same code.

My analysis of the code has pinpointed that the Charmatcher in Guava's library seems to be the culprit. It looks like there is a bit of static initialization code in CharMatcher that might be the true reason it is taking so long. Is there any way I can disable or optimize this behavior? Is there something I am missing from the wiki pages or help files?

Guava version 13.0.1, System: Linux 64, Eclipse 3.6.1

EDIT: I built a test app with the following code:

import com.google.common.base.CharMatcher;

public class Main {
    public static void main(String[] args) {
        // Using system instead of stopwatch to isolate library.
        long startTime = System.currentTimeMillis();
        CharMatcher.is(' ');
        long endTime = System.currentTimeMillis();
        System.out.println("took " + String.valueOf(endTime-startTime) + " ms");

        startTime = System.currentTimeMillis();
        CharMatcher.is('d');
        endTime = System.currentTimeMillis();
        System.out.println("2nd took " + String.valueOf(endTime-startTime) + " ms");
    }
}

This resulted in the following output:

took 15945 ms
2nd took 0 ms

I ran this in eclipse minus the JUnit framework and just google's guava library. Also I packaged an executable char and received similar results. I'll come back with a second edit after I run a profiler through it.

Thanks for any help.

Edit 2: Results from profiling:

Main.main(String[])                                                                  22,556 ms
com.google.common.base.CharMatcher.<clinit>()                                     22.556 ms                                    
    com.google.common.base.CharMatcher.precomputed()                                 22,550 ms                      
        com.google.common.base.Platform.precomputeCharMatcher(CharMatcher)           22,550 ms
            com.google.common.base.CharMatcher.precomputedInternal()                  22,550 ms
                com.google.common.base.CharMatcher.slowGetChars()                    13,638 ms
                com.google.common.base.CharMatcher.setBits(CharMatcher$LookupTable)  8,911 ms

标签: java guava
4条回答
▲ chillily
2楼-- · 2020-06-03 06:34

Update: This is fixed in Guava 14.

You are running into some kind of JIT bug present in u21, and likely earlier versions, but not in at least some later JDK versions. Here's the -XX:+PrintCompilation output for your test program in u21:

  1       java.util.Arrays::binarySearch0 (72 bytes)
  2       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  3       com.google.common.base.CharMatcher$12::matches (22 bytes)
  2   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  4       com.google.common.base.CharMatcher$11::matches (17 bytes)
  5       java.util.Arrays::binarySearch (9 bytes)
  6       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  1%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
  6   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  7       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  7   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  1%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
  2%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
  8       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  8   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  2%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
  3%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
  9       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  9   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  3%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
  4%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 10       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 10   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  4%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 11       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  5%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 11   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 12       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  5%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 12   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  6%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 13       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 13   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  6%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
  7%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 14       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 15       com.google.common.base.CharMatcher::slowGetChars (52 bytes)
 14   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  7%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 16       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 16   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 17       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 17   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  8%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 18       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 18   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 19       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 19   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 20       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  8%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 20   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 21       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 21   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 22       com.google.common.base.CharMatcher$Or::matches (28 bytes)
  9%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 22   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 23       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 23   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 24       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 24   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
  9%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 25       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 25   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 26       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 26   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 10%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 27       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 27   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 28       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 28   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 10%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 29       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 29   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 30       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 30   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 11%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 31       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 31   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 32       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 32   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 11%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 33       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 33   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 34       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 34   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 35       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 12%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 35   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 36       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 36   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 37       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 37   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 12%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 38       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 38   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 39       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 39   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 13%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 40       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 40   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 41       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 41   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 13%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 42       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 42   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 43       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 43   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 44       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 14%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 44   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 45       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 45   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 46       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 46   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 14%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 47       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 47   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 48       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 48   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 15%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 49       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 49   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 50       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 50   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 15%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 51       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 51   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 52       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 52   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 16%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 53       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 53   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 54       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 54   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 16%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 55       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 55   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 56       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 56   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 57       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 17%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 57   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 58       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 58   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 59       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 59   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 17%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 60       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 60   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 61       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 61   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 18%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 62       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 62   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 63       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 63   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 18%  made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 64       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 64   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 65       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 65   made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 19%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 66       com.google.common.base.CharMatcher$Or::matches (28 bytes)

<I omitted ~500 very similar lines here>

143%  made not entrant  com.google.common.base.CharMatcher::setBits @ -2 (30 bytes)
144%      com.google.common.base.CharMatcher::setBits @ 2 (30 bytes)
144%  made not entrant  com.google.common.base.CharMatcher::setBits @ -2 (30 bytes)
145%      com.google.common.base.CharMatcher::setBits @ 2 (30 bytes)
145%  made not entrant  com.google.common.base.CharMatcher::setBits @ -2 (30 bytes)
took 7599 ms
2nd took 0 ms

Here's the same output for u34:

 64   1       java.util.Arrays::binarySearch0 (72 bytes)
 68   2       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 68   3       com.google.common.base.CharMatcher$12::matches (22 bytes)
 70   4       com.google.common.base.CharMatcher$11::matches (17 bytes)
 71   5       java.util.Arrays::binarySearch (9 bytes)
 71   1%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 76   2      made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 76   6       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 88   1%     made not entrant  com.google.common.base.CharMatcher::slowGetChars @ -2 (52 bytes)
 88   6      made not entrant  com.google.common.base.CharMatcher$Or::matches (28 bytes)
 88   7       com.google.common.base.CharMatcher::slowGetChars (52 bytes)
 89   8       com.google.common.base.CharMatcher$Or::matches (28 bytes)
 91   9       com.google.common.base.CharMatcher$8::matches (14 bytes)
 91   2%      com.google.common.base.CharMatcher::slowGetChars @ 9 (52 bytes)
 98  10       java.lang.String::indexOf (166 bytes)
 98  11       java.lang.String::indexOf (151 bytes)
102   3%      com.google.common.base.CharMatcher::setBits @ 2 (30 bytes)
113  12       com.google.common.base.CharMatcher::setBits (30 bytes)
113  13       com.google.common.base.CharMatcher$LookupTable::set (15 bytes)

As you can see, it is a hell of a lot more sane in u34. It looks like what is happening in u21 is that some combination of OSR (affecting slowGetChars method as it has a large loop), and recursive megamorphic calls (affecting CharMatcher$Or and other subclasses of CharMatcher - which are called at the end of the recursive CharMatcher$Or chain) produce a perfect storm of JIT de-optimization and subsequent recompilation, which is resolved in u34.

Still, I think the Guava guys are being a bit abusive by creating a 65k element LUT during static init, even if you never use the matchers in question, which takes still about ~50ms when JIT is working properly. I filed a bug against guava, let's see if the maintainers agree :). Update: the bug was fixed within a couple of days of me filing this, the fix will be in Guava 14. That's what I call quick turnaround!

Original post follows:

I have the exact same behavior and ran into this post looking for a solution.

Looking at the code, it seems that they are making a 65k element (each element takes one bit) lookup table and checking if every character matches, which in this requires many nested function calls due to the fluent style being used to define the matchers:

  public static final CharMatcher INVISIBLE = inRange('\u0000', '\u0020')
      .or(inRange('\u007f', '\u00a0'))
      .or(is('\u00ad'))
      .or(inRange('\u0600', '\u0604'))
      .or(anyOf("\u06dd\u070f\u1680\u180e"))
      .or(inRange('\u2000', '\u200f'))
      .or(inRange('\u2028', '\u202f'))
      .or(inRange('\u205f', '\u2064'))
      .or(inRange('\u206a', '\u206f'))
      .or(is('\u3000'))
      .or(inRange('\ud800', '\uf8ff'))
      .or(anyOf("\ufeff\ufff9\ufffa\ufffb"))
      .withToString("CharMatcher.INVISIBLE")
      .precomputed();

That's the code that is triggering the long static init, which on my fast box takes ~7 seconds. Here's the output of your app on my box:

took 6814 ms
2nd took 0 ms

I'm filing a guava bug.

查看更多
孤傲高冷的网名
3楼-- · 2020-06-03 06:34

I just ran the exact benchmark you provided, and got

took 74 ms

2nd took 0 ms

...Perhaps the problem is on Eclipse's end? Are you running in debug mode?

查看更多
家丑人穷心不美
4楼-- · 2020-06-03 06:39

Just tested with :

@Test
public void testCharMatcherSlow() throws Exception {
    System.out.println(CharMatcher.JAVA_DIGIT.matches('1'));
}

and it took 0.06sec to execute. Also looked at the CharMatcher code, and nothing could explain the slow behavior you encounter. You should replace the CharMatcher by a mock to see that it is not the issue and maybe use a profiler to find out the cause.

查看更多
对你真心纯属浪费
5楼-- · 2020-06-03 06:49

As a quick help, try to switch to the newest version from the source repo. There are changes there, e.g. the method slowGetChars is gone. It used to be a bit slow, but nowhere near what's happening to you.

For everybody else it runs some hundred times faster and without being able to reproduce it it's hard to help.

查看更多
登录 后发表回答