How can I cut(1) camelcase words?

2019-04-06 17:13发布

问题:

Is there an easy way in Bash to split a camelcased word into its constituent words?

For example, I want to split aCertainCamelCasedWord into 'a Certain Camel Cased Word' and be able to select those fields that interest me. This is trivially done with cut(1) when the word separator is the underscore, but how can I do this when the word is camelcased?

回答1:

sed 's/\([A-Z]\)/ \1/g'

Captures each capital letter and substitutes a leading space with the capture for the whole stream.

$ echo "aCertainCamelCasedWord" | sed 's/\([A-Z]\)/ \1/g'
a Certain Camel Cased Word


回答2:

This solution works if you need to not split up words that are all caps. For example, using the top answer you'll get:

$ echo 'FAQPage' | sed 's/\([A-Z]\)/ \1/g' 
F A Q Page

But instead with my solution, you'll get:

$ echo 'FAQPage' | sed 's/\([A-Z][^A-Z]\)/ \1/g'
FAQ Page

Note: This does not work correctly when there is a second instance of multiple uppercase words, for example:

$ echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two


回答3:

This answer does not work correctly when there is a second instance of multiple uppercase

echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two

So and additional expression is required for that

 echo 'FAQPageOneReplacedByFAQPageTwo' | sed -e 's|\([A-Z][^A-Z]\)| \1|g' -e 's|\([a-z]\)\([A-Z]\)|\1 \2|g'
 FAQ Page One Replaced By FAQ Page Two


回答4:

Pure Bash:

name="aCertainCamelCasedWord"

declare -a word                                 # the word array

counter1=0                                      # count characters
counter2=0                                      # count words

while [ $counter1 -lt ${#name} ] ; do
  nextchar=${name:${counter1}:1}
  if [[ $nextchar =~ [[:upper:]] ]] ; then
    ((counter2++))
    word[${counter2}]=$nextchar
  else
    word[${counter2}]=${word[${counter2}]}$nextchar
  fi
  ((counter1++))
done

echo -e "'${word[@]}'"