I need to get the space-separated tokens in a string, but I also need to know the character position within the original string at which each token starts. Is there any way to do this with StringTokenizer
. Also, as I understand it, this is a legacy class; is there a better alternative to using StringTokenizer
.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You should always use String#split()
to split your string rather than StringTokenizer
.
However, since you also want the position of the tokens in your string, then it would be better to use Pattern
and Matcher
class. You have got Matcher#start()
method which gives the position of the string matching the pattern.
Here's an example: -
String str = "abc asf basdfasf asf";
Matcher matcher = Pattern.compile("\\S+").matcher(str);
while (matcher.find()) {
System.out.println(matcher.start() + ":" + matcher.group());
}
The pattern \\S+
matches the non-space characters from that string. Using Matcher#find()
methods returns all the matched substring.
回答2:
You can easily do this yourself using String.split()
String text = "hello world example";
int tokenStartIndex = 0;
for (String token : text.split(" ")) {
System.out.println("token: " + token + ", tokenStartIndex: " + tokenStartIndex);
tokenStartIndex += token.length() + 1; // +1 because of whitespace
}
this prints:
token: hello, tokenStartIndex: 0
token: world, tokenStartIndex: 6
token: example, tokenStartIndex: 12
回答3:
I improved micha's answer, so that it can handle neighboring spaces:
String text = "hello world example";
int start = 0;
for (String token : text.split("[\u00A0 \n]")) {
if (token.length() > 0) {
start = text.indexOf(token, start);
System.out.println("token: " + token + ", start at: " + start);
}
}
Output is:
token: hello, start at: 0
token: world, start at: 7
token: example, start at: 17