TL;DR:
Why isn't invoking ./myscript foo*
when myscript
has var=$1
the same as invoking ./myscript
with var=foo*
hardcoded?
Longer form
I've come across a weird issue in a bash script I'm writing. I am sure there is a simple explanation, but I can't figure it out.
I am trying to pass a command line argument to be assigned as a variable in the script.
I want the script to allow 2 command line arguments as follows:
$ bash my_bash_script.bash args1 args2
In my script, I assigned variables like this:
ARGS1=$1
ARGS2=$2
Args 1 is a string descriptor to add to the output file.
Args 2 is a group of directories: "dir1, dir2, dir3", which I am passing as dir*
When I assign dir*
to ARGS2 in the script it works fine, but when I pass dir*
as the second command line argument, it only includes dir1
in the wildcard expansion of dir*
.
I assume this has something to do with how the shell handles wildcards (even when passed as args), but I don't really understand it.
Any help would be appreciated.
Environment / Usage
I have a group of directories:
dir_1_y_map, dir_1_x_map, dir_2_y_map, dir_2_x_map,
... dir_10_y_map, dir_10_x_map...
Inside these directories I am trying to access a file with extension ".status"
via *.status
, and ".report.txt"
via *report.txt
.
I want to pass dir_*_map
as the second argument to the script and store it in the variable ARGS2, then use it to search within each of the directories for the ".status"
and ".report"
files.
The issue is that passing dir_*_map
from the command line doesn't give the list of directories, but rather just the first item in the list. If I assign the variable ARGS2=dir_*_map
within the script, it works as I intend.
Workaround: Quoting
It turns out that passing the second argument in quotes allowed the wildcard expansion to work appropriately for "dir_*_map"
#!/usr/bin/env bash
ARGS1=$1
ARGS2=$2
touch $ARGS1".extension"
for i in /$ARGS2/*.status
do
grep -e "string" $i >> $ARGS1".extension"
done
Here is an example invocation of the script:
sh ~/path/to/script descriptor "dir_*_map"
I don't fully understand when/why some arguments must be passed in quotes, but I assume it has to do with the wildcard expansion in the for loop.
Addressing the "why"
Assignments, as in
var=foo*
, don't expand globs -- that is, when you runvar=foo*
, the literal stringfoo*
is put into the variablefoo
, not the list of files matchingfoo*
.By contrast, unquoted use of
foo*
on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.Thus, running
./yourscript foo*
doesn't passfoo*
as$1
unless no files matching that glob expression exist; instead, it becomes something like./yourscript foo01 foo02 foo03
, with each argument in a different spot on the command line.The reason running
./yourscript "foo*"
functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found inIFS
, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named[1]
and a file named1
, passing[1]
would always be replaced with1
).Idiomatic Usage
The idiomatic way to build this would be to
shift
away the first argument, and then iterate over subsequent ones, like so:If you have many
.status
files in a single directory, all this can be made more efficient by usingfind
to invokegrep
with as many arguments as possible, rather than callinggrep
individually on a per-file basis:Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:
This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.
Some other points of note:
"$dir"/*.status
, then end the quotes before the glob expression starts.for dir; do
is precisely equivalent tofor dir in "$@"; do
, which iterates over arguments. Don't make the mistake of usingfor dir in $*; do
orfor dir in $@; do
instead! These latter invocations combine each element of the list with the first character ofIFS
(which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on anyIFS
characters found within, then expands each component of the resulting list as a glob./dev/null
as an argument togrep
is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example,grep
defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't havegrep
hang trying to read from stdin if it's passed no additional filenames at all (whichfind
won't do here, butxargs
can).