Ruby: Extracting Words From String

2019-01-23 01:30发布

I'm trying to parse words out of a string and put them into an array. I've tried the following thing:

@string1 = "oriented design, decomposition, encapsulation, and testing. Uses "
puts @string1.scan(/\s([^\,\.\s]*)/)

It seems to do the trick, but it's a bit shaky (I should include more special characters for example). Is there a better way to do so in ruby?

Optional: I have a cs course description. I intend to extract all the words out of it and place them in a string array, remove the most common word in the English language from the array produced, and then use the rest of the words as tags that users can use to search for cs courses.

4条回答
beautiful°
2楼-- · 2019-01-23 02:14

For me the best to spliting sentences is:

line.split(/[^[[:word:]]]+/)

Even with multilingual words and punctuation marks work perfectly:

line = 'English words, Polski Żurek!!! crème fraîche...'
line.split(/[^[[:word:]]]+/)
=> ["English", "words", "Polski", "Żurek", "crème", "fraîche"] 
查看更多
\"骚年 ilove
3楼-- · 2019-01-23 02:25

Well, you could split the string on spaces if that's your delimiter of interest

@string1.split(' ')

Or split on word boundaries

\W  # Any non-word character

\b  # Any word boundary character

Or on non-words

\s  # Any whitespace character

Hint: try testing each of these on http://rubular.com

And note that ruby 1.9 has some differences from 1.8

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-01-23 02:31

The split command.

   words = @string1.split(/\W+/)

will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.

查看更多
唯我独甜
5楼-- · 2019-01-23 02:35

For Rails you can use something like this:

@string1.split(/\s/).delete_if(&:blank?)
查看更多
登录 后发表回答