Expect: extract specific string from output

2019-02-15 17:27发布

问题:

I am navigating a Java-based CLI menu on a remote machine with expect inside a bash script and I am trying to extract something from the output without leaving the expect session.

Expect command in my script is:

expect -c "
spawn ssh user@host
expect \"#\"
send \"java cli menu command here\r\"
expect \"java cli prompt\"
send \"java menu command\"
"
###I want to extract a specific string from the above output###

Expect output is:

Id       Name
-------------------
abcd 12  John Smith

I want to extract abcd 12 from the above output into another expect variable for further use within the expect script. So that's the 3rd line, first field by using a double-space delimiter. The awk equivalent would be: awk -F ' ' 'NR==3 {$1}'

The big issue is that the environment through which I am navigating with Expect is, as I stated above, a Java CLI based menu so I can't just use awk or anything else that would be available from a bash shell.

Getting out from the Java menu, processing the output and then getting in again is not an option as the login process lasts for 15 seconds so I need to remain inside and extract what I need from the output using expect internal commands only.

回答1:

You can use regexp in expect itself directly with the use of -re flag. Thanks to Donal on pointing out the single quote and double quote issues. I have given solution using both ways.

I have created a file with the content as follows,

Id       Name
-------------------
abcd 12  John Smith

This is nothing but your java program's console output. I have tested this in my system with this. i.e. I just simulated your program's output with cat. You just replace the cat code with your program commands. Simple. :)

Double Quotes :

#!/bin/bash
expect -c "
spawn ssh user@domain
expect \"password\"
send \"mypassword\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input_file\r\"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
"

Single Quotes :

#!/bin/bash
expect -c '
spawn ssh user@domain
expect "password"
send "mypasswordhere\r"
expect "\\\$" { puts matched_literal_dollar_sign}
send "cat input_file\r"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output $expect_out(1,string)
#puts $expect_out(1,string)
puts "Result : $output"
'

As you can see, I have used {-\r\n(.*?)\s\s}. Here the braces prevent any variable substitutions. In your output, we have a 2nd line with full of hyphens. Then a newline. Then your 3rd line content. Let's decode the regex used.

-\r\n is to match one literal hyphen and a new line together. This will match the last hyphen in the 2nd line and the newline which in turn make it to 3rd line now. So, .*? will match the required output (i.e. abcd 12) till it encounters double space which is matched by \s\s.

You might be wondering why I need parenthesis which is used to get the sub-match patterns.

In general, expect will save the expect's whole match string in expect_out(0,string) and buffer all the matched/unmatched input to expect_out(buffer). Each sub match will be saved in subsequent numbering of string such as expect_out(1,string), expect_out(2,string) and so on.

As Donal pointed out, it is better to use single quote's approach since it looks less messy. :)

It is not required to escape the \r with the backslash in case of double quotes.

Update :

I have changed the regexp from -\r\n(\w+\s+\w+)\s\s to -\r\n(.*?)\s\s.

With this way - your requirement - such as match any number of letters and single spaces until you encounter first occurrence of double spaces in the output

Now, let's come to your question. You have mentioned that you have tried -\r\n(\w+)\s\s. But, there is a problem here with \w+. Remember \w+ will not match space character. Your output has some spaces in it till double spaces.

The use of regexp will matter based on your requirements on the input string which is going to get matched. You can customize the regular expressions based on your needs.

Update version 2 :

What is the significance of .*?. If you ask separately, I am going to repeat what you commented. In regular expressions, * is a greedy operator and ? is our life saver. Let us consider the string as

Stackoverflow is already overflowing with number of users.

Now, see the effect of the regular expression .*flow as below.

* matches any number of characters. More precisely, it matches the longest string possible while still allowing the pattern itself to match. So, due to this, .* in the pattern matched the characters Stackoverflow is already over and flow in pattern matched the text flow in the string.

Now, in order to prevent the .* to match only up to the first occurrence of the string flow, we are adding the ? to it. It will help the pattern to behave as non-greedy manner.

Now, again coming back to your question. If we have used .*\s\s, then it will match the whole line since it is trying to match as much as possible. This is common behavior of regular expressions.

Update version 3:

Have your code in the following way.

x=$(expect -c "
spawn ssh user@host
expect \"password\"
send \"password\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input\r\"
expect -re {-\r\n(.*?)\s\s}
if {![info exists expect_out(1,string)]} {
        puts \"Match did not happen :(\"
        exit 1
}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
")
y=$?

# $x now contains the output from the 'expect' command, and $y contains the
# exit status
echo $x
echo $y;

If the flow happened properly, then exit code will have value as 0. Else, it will have 1. With this way, you can check the return value in bash script.

Have a look at here to know about the info exists command.



标签: tcl expect