Ruby string split into words ignoring all special

2020-03-30 06:01发布

问题:

I need a query to be split into words everywhere a non word character is used. For example:

query = "I am a great, boy's and I like! to have: a lot-of-fun and @do$$nice&acti*vities+enjoy good ?times."

Should output:

["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"] 

This does the trick but is there a simpler way?

query.split(/[ ,'!:\\@\\$\\&\\*+?.-]/)

回答1:

query.split(/\W+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]

query.scan(/\w+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]

This is different from the expected output in that it does not include empty strings.



回答2:

I am adding this answer as @sawa's did not exactly reproduce the desired output:

#Split using any single non-word character:
query.split(/\W/) #=> ["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]

Now if you do not want the empty strings in the result just use sawa's answer.

The result above will create many empty strings in the result if the string contains multiple spaces, as each extra spaces will be matched again and create a new splitting point. To avoid that we can add an or condition:

# Split using any number of spaces or a single non-word character:
query.split(/\s+|\W/)