I'm creating unix shell script to execute the impala query. I need to get the output log of impala query. For example I tried the below.
output_log = echo $(impala-shell -i $node -q "select name from impaladb.impalatbl" -o output_file)
Output:
+--------+
| name |
+--------+
| tom |
| mike |
+--------+
Fetched 2 row(s) in 0.83s
Here I'm getting the two name output in both output_file and output_log. But I need the "Fetched 2 row(s) in 0.83s" log in output_log variable. How can I get it?
I'm not familiar with impala, so I'm not confident that what you are doing is the most efficient way to query impala. However, you're trying to focus in on a specific line of output; that I can answer.
There are many ways to do this. Maybe the most straightforward is grep:
output_log = echo `impala-shell -i $node -q "select name from impaladb.impalatbl" -o output_file | grep Fetch`
Try this:
Solution 1:
output_log=$(nohup impala-shell -k --ssl -i $node --verbose --delimited --query="select count(*) as cnt from impaladb.impalatbl" 2>/dev/null)
echo $output_log
Solution 2:
output_log=$(echo `impala-shell -k --ssl -i $node --verbose --delimited --query="select count(*) as cnt from impaladb.impalatbl" -o output_file | head output_file`)
echo $output_log
I solved the problem.
The way it works is that impala sends the query output in a different stream and the other information w.r.t the query in a different stream.
hence all you have to do is
impala-shell -i $node -q "select name from impaladb.impalatbl" 2>output_file
the 2> will send the output containing the "Fetched 1 row in 10seconds" in the output file. Now you can grep it or do whatever you want .
Suppose you want to store the same output in output_log variable , then use
output_log=$(impala-shell -i $node -q "select name from impaladb.impalatbl" 2>&1)
here 2>&1 will send the output to stdout which will be used for assinging the value to variable.
For more information on this , just put 2>&1 in google search and learn more about it.
Hope it helps you!!
Some additional observations
2>&1 redirect output from stderr to stdout but stdout also gets the query output, so when you store it in a variable , it will get query output as well as the extra information like"fetched 1 row in 3seconds"
but
when we use
2>a.txt then only the stderr output is getting redirected. so a.txt will contain only information like "starting impala.....fetched 1 row in 2 seconds" . and then you can grep from file and put that in a variable.
just wanted to highlight this slight difference I observed between storing in a file and storing in a variable.