How to get all words of a string in c#?

I have a paragraph in a single string and I'd like to get all the words in that paragraph.

My problem is that I don't want the suffixes words that end with punctuation marks such as (',','.',''','"',';',':','!','?') and /n /t etc.

I also don't want words with 's and 'm such as world's where it should only return world.

In the example he said. "My dog's bone, toy, are missing!"

the list should be: he said my dog bone toy are missing

标签： c# string

5条回答

够拽才男人

2楼-- · 2020-08-09 10:57

Hope this is helpful for you:

        string[] separators = new string[] {",", ".", "!", "\'", " ", "\'s"};
        string text = "My dog's bone, toy, are missing!";

        foreach (string word in text.Split(separators, StringSplitOptions.RemoveEmptyEntries))
            Console.WriteLine(word);

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2020-08-09 10:57

split on whitespace, trim anything that isn't a letter on the resulting strings.

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2020-08-09 10:58

See Regex word boundary expressions, What is the most efficient way to count all of the words in a richtextbox?. Moral of the story is that there are many ways to approach the problem, but regular expressions are probably the way to go for simplicity.

0人赞添加讨论(0) 举报

劫难

5楼-- · 2020-08-09 11:02

Here's a looping replace method... not fast, but a way to solve it...

string result = "string to cut ' stuff. ! out of";

".',!@".ToCharArray().ToList().ForEach(a => result = result.Replace(a.ToString(),""));

This assumes you want to place it back in the original string, not a new string or a list.

0人赞添加讨论(0) 举报

在下西门庆

6楼-- · 2020-08-09 11:14

Expanding on Shan's answer, I would consider something like this as a starting point:

MatchCollection matches = Regex.Match(input, @"\b[\w']*\b");

Why include the ' character? Because this will prevent words like "we're" from being split into two words. After capturing it, you can manually strip out the suffix yourself (whereas otherwise, you couldn't recognize that re is not a word and ignore it).

So:

static string[] GetWords(string input)
{
    MatchCollection matches = Regex.Matches(input, @"\b[\w']*\b");

    var words = from m in matches.Cast<Match>()
                where !string.IsNullOrEmpty(m.Value)
                select TrimSuffix(m.Value);

    return words.ToArray();
}

static string TrimSuffix(string word)
{
    int apostropheLocation = word.IndexOf('\'');
    if (apostropheLocation != -1)
    {
        word = word.Substring(0, apostropheLocation);
    }

    return word;
}

Example input:

he said. "My dog's bone, toy, are missing!" What're you doing tonight, by the way?

Example output:

[he, said, My, dog, bone, toy, are, missing, What, you, doing, tonight, by, the, way]

One limitation of this approach is that it will not handle acronyms well; e.g., "Y.M.C.A." would be treated as four words. I think that could also be handled by including . as a character to match in a word and then stripping it out if it's a full stop afterwards (i.e., by checking that it's the only period in the word as well as the last character).

0人赞添加讨论(0) 举报

How to get all words of a string in c#?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间