I have written a simple script as an advanced tool for my awk
/sed
requirements. In the script I compare two files on basis of values from one column of the query file and then extract whole entries from the master file. The script allows you to enter the values for columns and delimiters for each file.
The problem is that the 'delimiter' options are not recognized by script when given from command line.
Here is my code (partial):
##- - - - - - - -- - - - - - Arguments - - - - - - - - - - - - - -##
parser = argparse.ArgumentParser()
## Command line options
parser.add_argument("-m", "--master", dest="master", help="master file")
parser.add_argument("-q", "--query", dest="query", help="queries to be extracted")
parser.add_argument("-d", "--delimiter", dest="delimiter", default='\t', help="delimiter in master")
parser.add_argument("-p", "--position", dest="position", default='1', help="position/column of value in master")
parser.add_argument("-d2", "--delimiter2", dest="delimiter2", default='\t', help="delimiter in query")
parser.add_argument("-p2", "--position2", dest="position2", default='1', help="position/column of value in query")
args = parser.parse_args()
def Extractor(master, query):
out_file = ('%s_matched_%s' % (query,master))
fh_out = open(out_file, 'w')
query_set = () ## To unique query set
for i in query:
key = i.split('args.delimiter2')[int(args.position2)] ## Key is the value on which matching will be done
query_set.add(key)
So as you see, I take options for the 'query file' delimiter from the command line and use them in the script via argparse
, but that does not work. It only works if I explicitly mention the delimiter in the script like:
key = i.split('\t')[args.position2] ## Key is the value on which matching will be done
The command line option I give is:
$ py3 ExtractHeaders_v01.py -m ABC.csv -q XYZ.list -d2 \t -d , -p 1 -p2 0
where
ABC.csv
is the master file from which to extract entries.- The second column will be used for matching (
-p 1
) - Its delimiter is comma (
-d ,
)
- The second column will be used for matching (
XYZ.list
is the query file.- The first column will be used for matching (
-p2 0
) - Its delimiter is tab (
-d2 \t
)
- The first column will be used for matching (
Please help me understand why the delimiters are not used by script when given from the command line.
You can also pass the
Tab
character in a *nix shell (bash for example) by pressingCtrl+V
followed byTab
enclosed in quotes (single or double), i.e. type"
Ctrl+V
Tab
"
.Your shell is interpreting the
\t
in your command line and what's getting passed to Python is, most likely, a singlet
. Try\\t
or'\t'
to get the literal two-character escape sequence into theargv
. Then you'll need to unescape this string in Python: