I am trying to use subprocess.popen
to run commands on my machine.
This is what I have so far
cmdvec = ['/usr/bin/hdfs', 'dfs', '-text', '/data/ds_abc/clickstream/{d_20151221-2300}/*', '|', 'wc', '-l']
subproc = subprocess.Popen(cmdvec, stdout=subprocess.PIPE, stdin=None, stderr=subprocess.STDOUT)
If I run the command in my terminal I get an output of
15/12/21 16:09:31 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
15/12/21 16:09:31 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 9cd4009fb896ac12418449e4678e16eaaa3d5e0a]
15/12/21 16:09:31 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
15305
The number 15305
is the desired value I want.
When I run the command by splitting it and converting it into a list, I do this to try to get the lines:
for i in subproc.stdout:
print(i)
However this gives me the data as if this command was ran because all the data from the file is being displayed.
/usr/bin/hdfs dfs -text /data/ds_abc/clickstream/{d_20151221-2300}/*
It doesn't seem like the pipe |
has been used to count the number of lines are in all the files