What is the correct/best way of handling spaces and quotes in bash completion?
Here’s a simple example. I have a command called words
(e.g., a dictionary lookup program) that takes various words as arguments. The supported ‘words’ may actually contain spaces, and are defined in a file called words.dat
:
foo
bar one
bar two
Here’s my first suggested solution:
_find_words()
{
search="$cur"
grep -- "^$search" words.dat
}
_words_complete()
{
local IFS=$'\n'
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
COMPREPLY=( $( compgen -W "$(_find_words)" -- "$cur" ) )
}
complete -F _words_complete words
Typing ‘words f<tab>’
correctly completes the command to ‘words foo ’
(with a trailing space), which is nice, but for ‘words b<tab>’
it suggests ‘words bar ’
. The correct completion would be ‘words bar\ ’
. And for ‘words "b<tab>’
and ‘words 'b<tab>’
it offers no suggestions.
This last part I have been able to solve. It’s possible to use eval
to properly parse the (escaped) characters. However, eval
is not fond of missing quotes, so to get everything to work, I had to change the search="$cur"
to
search=$(eval echo "$cur" 2>/dev/null ||
eval echo "$cur'" 2>/dev/null ||
eval echo "$cur\"" 2>/dev/null || "")
This actually works. Both ‘words "b<tab>’
and ‘words 'b<tab>’
correctly autocompletes, and if I add a ‘o’
and press <tab>
again, it actually completes the word and adds the correct closing quote. However, if I try to complete ‘words b<tab>’
or even ‘words bar\ <tab>’
, it is autocompleted to ‘words bar ’
instead of ‘words bar\ ’
, and adding for instance ‘one’
would fail when the words
program is run.
Now, obviously it is possible to handle this correctly. For instance, the ls
command can do it for files namned ‘foo’
‘bar one’
and ‘bar two’
(though it does have problems with some ways of expressing the filenames when one uses a (valid) combination of both "
, '
and various escapes). However, I couldn’t figure out how ls
does it by reading the bash completion code.
So, does anybody know of how properly handle this? The actual input quotes need not be preserved; I would be happy with a solution that changes ‘words "b<tab>’
, ‘words 'b<tab>’
and ‘words b<tab>’
to ‘words bar\ ’
, for instance, (though I would prefer stripping of quotes, like in this example, instead of adding them).
This not too elegant postprocessing solution seems to work for me (GNU bash, version 3.1.17(6)-release (i686-pc-cygwin)). (Unless I didn't test some border case as usual :))
Don't need to eval things, there are only 2 kinds of quotes.
Since compgen doesn't want to escape spaces for us, we will escape them ourselves (only if word didn't start with a quote). This has a side effect of full list (on double tab) having escaped values as well. Not sure if that's good or not, since ls doesn't do it...
EDIT: Fixed to handle single and double qoutes inside the words. Essentially we have to pass 3 unescapings :). First for grep, second for compgen, and last for words command itself when autocompletion is done.
_find_words()
{
search=$(eval echo "$cur" 2>/dev/null || eval echo "$cur'" 2>/dev/null || eval echo "$cur\"" 2>/dev/null || "")
grep -- "^$search" words.dat | sed -e "{" -e 's#\\#\\\\#g' -e "s#'#\\\'#g" -e 's#"#\\\"#g' -e "}"
}
_words_complete()
{
local IFS=$'\n'
COMPREPLY=()
local cur="${COMP_WORDS[COMP_CWORD]}"
COMPREPLY=( $( compgen -W "$(_find_words)" -- "$cur" ) )
local escaped_single_qoute="'\''"
local i=0
for entry in ${COMPREPLY[*]}
do
if [[ "${cur:0:1}" == "'" ]]
then
# started with single quote, escaping only other single quotes
# [']bla'bla"bla\bla bla --> [']bla'\''bla"bla\bla bla
COMPREPLY[$i]="${entry//\'/${escaped_single_qoute}}"
elif [[ "${cur:0:1}" == "\"" ]]
then
# started with double quote, escaping all double quotes and all backslashes
# ["]bla'bla"bla\bla bla --> ["]bla'bla\"bla\\bla bla
entry="${entry//\\/\\\\}"
COMPREPLY[$i]="${entry//\"/\\\"}"
else
# no quotes in front, escaping _everything_
# [ ]bla'bla"bla\bla bla --> [ ]bla\'bla\"bla\\bla\ bla
entry="${entry//\\/\\\\}"
entry="${entry//\'/\'}"
entry="${entry//\"/\\\"}"
COMPREPLY[$i]="${entry// /\\ }"
fi
(( i++ ))
done
}
The question is rather loaded and this answer attempts to explain each aspect:
- How to handle spaces with
COMPREPLY
.
- How does
ls
do it.
There're also people reaching this question wanting to know how to implement
the completion function in general. So:
- How how do I implement the completion function and correctly set
COMPREPLY
?
How does ls
do it
Moreover, why does it behave differently to when I set COMPREPLY
?
Back in '12 (before I updated this answer), I was in a similar situation and searched high and low for the answer to this discrepancy myself. Here's the answer I came up with.
ls
, or rather, the default completion routine does it using the -o filenames
functionality, which performs: filename-specific processing (like adding a slash to directory names or suppressing trailing spaces.
To demonstrate:
$ foo () { COMPREPLY=("bar one" "bar two"); }
$ complete -o filenames -F foo words
$ words ░
Tab
$ words bar\ ░ # Ex.1: notice the space is completed escaped
TabTab
bar one bar two # Ex.2: notice the spaces are displayed unescaped
$ words bar\ ░
Now, there are two points I should make clear right away to avoid any confusion:
First, your completion function cannot be implemented simply by setting COMPREPLY
to an array of your word list! The example above is hard-coded to return candidates starting with b-a-r, just to show what happens when TabTab is pressed. (Don't worry, we'll get to a more general implementation shortly!)
Second, that format for COMPREPLY
only works because -o filenames
is specified. For an explanation of how to set COMPREPLY
when not using -o filenames
, look no further than the next sub-heading.
Also note: The downside of using -o filenames
is if there's a directory lying about with the same name as the matching word, the completed word will automatically get an arbitrary slash attached to the end. (e.g. bar\ one/
)
How to handle spaces with COMPREPLY
(without using -o filenames
)
Long story short, it needs to be escaped, and this is what @Eugene's accepted answer is doing.
To contrast the different between the above -o filenames
demo:
$ foo () { COMPREPLY=("bar\ one" "bar\ two"); } # Note the blackslashes I've added
$ complete -F foo words # Note the lack of -o filenames
$ words ░
Tab
$ words bar\ ░ # Same with -o filenames, space is completed escaped
TabTab
bar\ one bar\ two # Unlike -o filenames, notice the spaces are displayed escaped
$ words bar\ ░
How do I actually implement a completion function?
Implementing a completion functions involves:
- Representing your word list.
- Filtering your word list to just candidates for the current word.
- Setting
COMPREPLY
correctly.
I'm not going to assume to know all the complex requirements there can be for 1 and 2 and the following is only a very basic implementation. I'm providing an explanation for each part so one can mix-and-match to fit their own requirements.
foo() {
# Get the currently completing word
local CWORD=${COMP_WORDS[COMP_CWORD]}
# This is our word list (in a bash array for convenience)
local WORD_LIST=(foo 'bar one' 'bar two')
# Commands below depend on this IFS
local IFS=$'\n'
# Filter our candidates
CANDIDATES=($(compgen -W "${WORD_LIST[*]}" -- "$CWORD"))
# Correctly set our candidates to COMPREPLY
if [ ${#CANDIDATES[*]} -eq 0 ]; then
COMPREPLY=()
else
COMPREPLY=($(printf '%q\n' "${CANDIDATES[@]}"))
fi
}
complete -F foo words
In this example, we use compgen
to filter our words. (It's provided by bash for this exact purpose.) One could use any solution they like, but I'd advise against using grep
-like programs simply because of the complexities of escaping regex.
compgen
takes the word list with the -W
argument and returns the filtered result with one word per line. Since our words can contain spaces, we set IFS=$'\n'
beforehand so only newlines are counted as element delimiters when putting the result into our array with the CANDIDATES=(...)
syntax.
Another point of note is what we're passing for the -W
argument. This argument takes an IFS
delimited word list. Since our words contain spaces, this too requires IFS=$'\n'
set so our words aren't broken up.
Incidentally, "${WORD_LIST[*]}"
expands to a string with elements delimited with what we've set for IFS
and is exactly what we need.
In the example above I chose to define WORD_LIST
literally in code.
One could also initialize the array from an external source such as a file. Just make sure to move IFS=$'\n'
beforehand if words are going to be line-delimited such as in the original question:
local IFS=$'\n'
local WORD_LIST=($(cat /path/to/words.dat))`
Finally, we set COMPREPLY
making sure to escape the likes of spaces. Escaping is quite complicated but thankfully printf
's %q
format performs all the necessary escaping we need and that's what we use to expand CANDIDATES
. (Note we're telling printf
to put \n
after each element because that's what we've set IFS
to.)
Those observant may spot this form of setting COMPREPLY
only applies if -o filenames
is not used. No escaping is necessary if it is and COMPREPLY
may be set to the same contents as CANDIDATES
with COMPREPLY=("$CANDIDATES[@]")
.
Extra care should be taken when expansions may be performed on empty arrays as this can lead to unexpected results. The example above handles this by branching when the length of CANDIDATES
is zero.
_foo ()
{
words="bar one"$'\n'"bar two"
COMPREPLY=()
cur=${COMP_WORDS[COMP_CWORD]}
prev=${COMP_WORDS[COMP_CWORD-1]}
cur=${cur//\./\\\.}
local IFS=$'\n'
COMPREPLY=( $( grep -i "^$cur" <( echo "$words" ) | sed -e 's/ /\\ /g' ) )
return 0
}
complete -o bashdefault -o default -o nospace -F _foo words
Pipe _find_words
through sed
and have it enclose each line in quotation marks. And when typing a command line, make sure to put either "
or '
before a word to be tab-completed, otherwise this method will not work.
_find_words() { cat words.dat; }
_words_complete()
{
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
local IFS=$'\n'
COMPREPLY=( $( compgen -W "$( _find_words | sed 's/^/\x27/; s/$/\x27/' )" \
-- "$cur" ) )
}
complete -F _words_complete words
Command line:
$ words "ba░
tab
$ words "bar ░
tabtab
bar one bar two
$ words "bar o░
tab
$ words "bar one" ░
I solved this by creating my own function compgen2 which handles the extra processing when the current word doesn't begin with a quote character. otherwise it works similar to compgen -W.
compgen2() {
local IFS=$'\n'
local a=($(compgen -W "$1" -- "$2"))
local i=""
if [ "${2:0:1}" = "\"" -o "${2:0:1}" = "'" ]; then
for i in "${a[@]}"; do
echo "$i"
done
else
for i in "${a[@]}"; do
printf "%q\n" "$i"
done
fi
}
_foo() {
local cur=${COMP_WORDS[COMP_CWORD]}
local prev=${COMP_WORDS[COMP_CWORD-1]}
local words=$(cat words.dat)
local IFS=$'\n'
COMPREPLY=($(compgen2 "$words" "$cur"))
}
echo -en "foo\nbar one\nbar two\n" > words.dat
complete -F _foo foo