Ruby string split into words ignoring all special

2020-03-30 05:09发布

I need a query to be split into words everywhere a non word character is used. For example:

query = "I am a great, boy's and I like! to have: a lot-of-fun and @do$$nice&acti*vities+enjoy good ?times."

Should output:

["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"] 

This does the trick but is there a simpler way?

query.split(/[ ,'!:\\@\\$\\&\\*+?.-]/)

2条回答
迷人小祖宗
2楼-- · 2020-03-30 05:46
query.split(/\W+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]

query.scan(/\w+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]

This is different from the expected output in that it does not include empty strings.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2020-03-30 06:05

I am adding this answer as @sawa's did not exactly reproduce the desired output:

#Split using any single non-word character:
query.split(/\W/) #=> ["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]

Now if you do not want the empty strings in the result just use sawa's answer.

The result above will create many empty strings in the result if the string contains multiple spaces, as each extra spaces will be matched again and create a new splitting point. To avoid that we can add an or condition:

# Split using any number of spaces or a single non-word character:
query.split(/\s+|\W/)
查看更多
登录 后发表回答