How to find a whole word in a String in java

2019-01-03 06:08发布

I have a String that I have to parse for different keywords. For example, I have the String:

"I will come and meet you at the 123woods"

And my keywords are

'123woods' 'woods'

I should report whenever I have a match and where. Multiple occurrences should also be accounted for. However, for this one, I should get a match only on 123woods, not on woods. This eliminates using String.contains() method. Also, I should be able to have a list/set of keywords and check at the same time for their occurrence. In this example, if I have '123woods' and 'come', I should get two occurrences. Method execution should be somewhat fast on large texts.

My idea is to use StringTokenizer but I am unsure if it will perform well. Any suggestions?

13条回答
干净又极端
2楼-- · 2019-01-03 06:38

Looking back at the original question, we need to find some given keywords in a given sentence, count the number of occurrences and know something about where. I don't quite understand what "where" means (is it an index in the sentence?), so I'll pass that one... I'm still learning java, one step at a time, so I'll see to that one in due time :-)

It must be noticed that common sentences (as the one in the original question) can have repeated keywords, therefore the search cannot just ask if a given keyword "exists or not" and count it as 1 if it does exist. There can be more then one of the same. For example:

// Base sentence (added punctuation, to make it more interesting):
String sentence = "Say that 123 of us will come by and meet you, "
                + "say, at the woods of 123woods.";

// Split it (punctuation taken in consideration, as well):
java.util.List<String> strings = 
                       java.util.Arrays.asList(sentence.split(" |,|\\."));

// My keywords:
java.util.ArrayList<String> keywords = new java.util.ArrayList<>();
keywords.add("123woods");
keywords.add("come");
keywords.add("you");
keywords.add("say");

By looking at it, the expected result would be 5 for "Say" + "come" + "you" + "say" + "123woods", counting "say" twice if we go lowercase. If we don't, then the count should be 4, "Say" being excluded and "say" included. Fine. My suggestion is:

// Set... ready...?
int counter = 0;

// Go!
for(String s : strings)
{
    // Asking if the sentence exists in the keywords, not the other
    // around, to find repeated keywords in the sentence.
    Boolean found = keywords.contains(s.toLowerCase());
    if(found)
    {
        counter ++;
        System.out.println("Found: " + s);
    }
}

// Statistics:
if (counter > 0)
{
    System.out.println("In sentence: " + sentence + "\n"
                     + "Count: " + counter);
}

And the results are:

Found: Say
Found: come
Found: you
Found: say
Found: 123woods
In sentence: Say that 123 of us will come by and meet you, say, at the woods of 123woods.
Count: 5

查看更多
萌系小妹纸
3楼-- · 2019-01-03 06:40

To Match "123woods" instead of "woods" , use atomic grouping in regular expresssion. One thing to be noted is that, in a string to match "123woods" alone , it will match the first "123woods" and exits instead of searching the same string further.

\b(?>123woods|woods)\b

it searches 123woods as primary search, once it got matched it exits the search.

查看更多
We Are One
4楼-- · 2019-01-03 06:45

You can also use regex matching with the \b flag (whole word boundary).

查看更多
爷的心禁止访问
5楼-- · 2019-01-03 06:46

How about something like Arrays.asList(String.split(" ")).contains("xx")?

See String.split() and How can I test if an array contains a certain value.

查看更多
男人必须洒脱
6楼-- · 2019-01-03 06:47
public class FindTextInLine {
    String match = "123woods";
    String text = "I will come and meet you at the 123woods";

    public void findText () {
        if (text.contains(match)) {
            System.out.println("Keyword matched the string" );
        }
    }
}
查看更多
淡お忘
7楼-- · 2019-01-03 06:49

A much simpler way to do this is to use split():

String match = "123woods";
String text = "I will come and meet you at the 123woods";

String[] sentence = text.split();
for(String word: sentence)
{
    if(word.equals(match))
        return true;
}
return false;

This is a simpler, less elegant way to do the same thing without using tokens, etc.

查看更多
登录 后发表回答