I have two strings which I want to compare for equal chars, the strings must contain the exact chars but mychars can have extra chars.
mychars="abcdefg"
testone="abcdefgh" # false h is not in mychars
testtwo="abcddabc" # true all char in testtwo are in mychars
function test() {
if each char in $1 is in $2 # PSEUDO CODE
then
return 1
else
return 0
fi
}
if test $testone $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo "Not all in the string" because the h is not in the string mychars
if test $testtwo $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo 'All in the string'
What is the best way to do this? My guess is to loop over all the chars in the first parameter.
You can use tr
to replace any char from mychars
with a symbol, then you can test if the resulting string is any different from the symbol, p.e.,:
tr -s "[$mychars]" "." <<< "ggaaabbbcdefg"
Outputs:
.
But:
tr -s "[$mychars]" "." <<< "xxxggaaabbbcdefgxxx"
Prints:
xxx.xxx
So, your function could be like the following:
function test() {
local dictionary="$1"
local res=$(tr -s "[$dictionary]" "." <<< "$2")
if [ "$res" == "." ]; then
return 1
else
return 0
fi
}
Update: As suggested by @mklement0, the whole function could be shortened (and the logic fixed) by the following:
function test() {
local dictionary="$1"
[[ '.' == $(tr -s "[$dictionary]" "." <<< "$2") ]]
}
The accepted answer's solution is short, clever, and efficient.
Here's a less efficient alternative, which may be of interest if you want to know which characters are unique to the 1st string, returned as a sorted, distinct list:
charTest() {
local charsUniqueToStr1
# Determine which chars. in $1 aren't in $2.
# This returns a sorted, distinct list of chars., each on its own line.
charsUniqueToStr1=$(comm -23 \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$1" | sort -u) \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$2" | sort -u))
# The test succeeds if there are no chars. in $1 that aren't also in $2.
[[ -z $charsUniqueToStr1 ]]
}
mychars="abcdefg" # define reference string
charTest "abcdefgh" "$mychars"
echo $? # print exit code: 1 - 'h' is not in reference string
charTest "abcddabc" "$mychars"
echo $? # print exit code: 0 - all chars. are in reference string
Note that I've renamed test()
to charTest()
to avoid a name collision with the test
builtin/utility.
sed 's/\(.\)/\1\'$'\n''/g'
splits the input into individual characters by placing each on a separate line.
- Note that the command creates an extra empty line at the end, but that doesn't matter in this case; to eliminate it, append
; ${s/\n$//;}
to the sed
script.
- The command is written in a POSIX-compliant manner, which complicates it, due to having to splice in an
\
-escaped actual newline (via an ANSI C-quoted string, $\n'
); if you have GNU sed
, you can simplify to sed -r 's/(.)/\1\n/g
sort -u
then sorts the resulting list of characters and weeds out duplicates (-u
).
comm -23
compares the distinct set of sorted characters in both strings and prints those unique to the 1st string (comm
uses a 3-column layout, with the 1st column containing lines unique to the 1st file, the 2nd column containing lines unique to the 2nd column, and the 3rd column printing lines the two input files have in common; -23
suppresses the 2nd and 3rd columns, effectively only printing the lines that are unique to the 1st input).
[[ -z $charsUniqueToStr1 ]]
then tests if $charsUniqueToStr1
is empty (-z
);
in other words: success (exit code 0
) is indicated, if the 1st string contains no chars. that aren't also contained in the 2nd string; otherwise, failure (exit code 1
); by virtue of the conditional ([[ .. ]]
) being the last statement in the function, its exit code also becomes the function's exit code.