How to replace multiple substrings with same strin

2019-08-11 10:02发布

问题:

I want to change different charaters/substrings to a single character or nil. I want to change "How to chop an onion?" to "how-chop-onion".

string
.gsub(/'s/,'')
.gsub(/[?&]/,'')
.gsub('to|an|a|the','')
.split(' ')
.map { |s| s.downcase}
.join '-'

Using pipe character | does not work. How can I do this with gsub?

回答1:

to|an|a|the is pattern, you are using it as String. Here:

str.gsub('to|an|a|the', '')   # passing string argument
#=> "How to chop an onion?"

str.gsub(/to|an|a|the/, '')   # passing pattern argument
#=> "How  chop  onion?"


回答2:

▶ "How to chop an onion?".gsub(/'s|[?&]+|to|an|a|the/,'')
                         .downcase.split(/\s+/).join '-'
#⇒ "how-chop-onion"


回答3:

Start by making a list of what you want to do:

  • Remove certain words
  • Remove certain punctuation
  • Remove extra spaces after words are removed
  • Convert to lower case1

Now think about the order in which these operations should be performed. The conversion to lower case can be done anytime, but it's convenient to do it first, in which case the regex need not be case-indifferent. Punctuation should be removed before certain words, to more easily identify words as opposed to substrings. Removing the extra spaces obviously must be done after words are removed. We therefore want the order to be:

  • Convert to lower case
  • Remove certain punctuation
  • Remove certain words
  • Remove extra spaces after words are removed

After down-casing, this could be done with three chained gsubs:

str = "Please, don't any of you know how to chop an avacado?"

r1 = /[,?]/      # match a comma or question mark

r2 = /
     \b          # match a word break
     (?:         # start a non-capture group
     to|an|a|the # match one of these words (checking left to right)
     )           # end non-capture group
     \b          # match a word break
     /x          # extended/free-spacing regex definition mode

r3 = /\s\s/      # match two whitespace characters

str.downcase.gsub(r1,'').gsub(r2,'').gsub(r3,' ')
  #=> "please don't any of you know how chop avacado"

Note that without the word breaks (\b) in r2 we would get:

"plese don't y of you know how chop vcdo"

Also, the first gsub could be replaced by:

tr(',?','')

or:

delete(',?')

These gsubs can be combined into one (how I'd write it), as follows:

r = /
    [,?]                # as in r1
    |                   # or
    \b(?:to|an|a|the)\b # as in r2
    |                   # or
    \s                  # match a whitespace char
    (?=\s)              # match a whitespace char in a postive lookahead
    /x

str.downcase.gsub(r,'')
  #=> "please don't any of you know how chop avacado"

"Lookarounds" (here a positive lookahead) are often referred to as "zero-width", meaning that, while the match is required, they do not form part of the match that is returned.

1 Have you ever wondered where the terms "lower case" and "upper case" came from? In the early days of printing, typesetters kept the metal movable type in two cases, one located above the other. Those for the taller letters, used to begin sentences and proper nouns, were in the upper case; the remaining ones were in the lower case.