Performance of StringTokenizer class vs. String.sp

In my software I need to split string into words. I currently have more than 19,000,000 documents with more than 30 words each.

Which of the following two ways is the best way to do this (in terms of performance)?

StringTokenizer sTokenize = new StringTokenizer(s," ");
while (sTokenize.hasMoreTokens()) {

String[] splitS = s.split(" ");
for(int i =0; i < splitS.length; i++)

标签： java performance stringtokenizer

10条回答

地球回转人心会变

2楼-- · 2020-01-24 02:38

The Java API specification recommends using split. See the documentation of StringTokenizer.

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2020-01-24 02:41

Split in Java 7 just calls indexOf for this input, see the source. Split should be very fast, close to repeated calls of indexOf.

0人赞添加讨论(0) 举报

闹够了就滚

4楼-- · 2020-01-24 02:49

What the 19,000,000 documents have to do there ? Do you have to split words in all the documents on a regular basis ? Or is it a one shoot problem?

If you display/request one document at a time, with only 30 word, this is a so tiny problem that any method would work.

If you have to process all documents at a time, with only 30 words, this is a so tiny problem that you are more likely to be IO bound anyway.

0人赞添加讨论(0) 举报

我只想做你的唯一

5楼-- · 2020-01-24 02:49

Performance wise StringTokeniser is way better than split. Check the code below,

But according to Java docs its use is discouraged. Check Here

0人赞添加讨论(0) 举报

聊天终结者

6楼-- · 2020-01-24 02:52

Another important thing, undocumented as far as I noticed, is that asking for the StringTokenizer to return the delimiters along with the tokenized string (by using the constructor StringTokenizer(String str, String delim, boolean returnDelims)) also reduces processing time. So, if you're looking for performance, I would recommend using something like:

private static final String DELIM = "#";

public void splitIt(String input) {
    StringTokenizer st = new StringTokenizer(input, DELIM, true);
    while (st.hasMoreTokens()) {
        String next = getNext(st);
        System.out.println(next);
    }
}

private String getNext(StringTokenizer st){  
    String value = st.nextToken();
    if (DELIM.equals(value))  
        value = null;  
    else if (st.hasMoreTokens())  
        st.nextToken();  
    return value;  
}

Despite the overhead introduced by the getNext() method, that discards the delimiters for you, it's still 50% faster according to my benchmarks.

0人赞添加讨论(0) 举报

女痞

7楼-- · 2020-01-24 02:52

This could be a reasonable benchmarking using 1.6.0

http://www.javamex.com/tutorials/regular_expressions/splitting_tokenisation_performance.shtml#.V6-CZvnhCM8

0人赞添加讨论(0) 举报

1 2 下一页

Performance of StringTokenizer class vs. String.sp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间