I am trying to output a string that contains everything between two words of a string:
input:
\"Here is a String\"
output:
\"is a\"
Using:
sed -n \'/Here/,/String/p\'
includes the endpoints, but I don\'t want to include them.
I am trying to output a string that contains everything between two words of a string:
input:
\"Here is a String\"
output:
\"is a\"
Using:
sed -n \'/Here/,/String/p\'
includes the endpoints, but I don\'t want to include them.
sed -e \'s/Here\\(.*\\)String/\\1/\'
Simple grep can also support positive & negative look-ahead & look-back: For your case, the command would be:
echo \"Here is a string\" | grep -o -P \'(?<=Here).*(?=string)\'
You can strip strings in Bash alone:
$ foo=\"Here is a String\"
$ foo=${foo##*Here }
$ echo \"$foo\"
is a String
$ foo=${foo%% String*}
$ echo \"$foo\"
is a
$
And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:
$ echo \"Here is a String\" | grep -Po \'(?<=(Here )).*(?= String)\'
is a
The accepted answer does not remove text that could be before Here
or after String
. This will:
sed -e \'s/.*Here\\(.*\\)String.*/\\1/\'
The main difference is the addition of .*
immediately before Here
and after String
.
Through GNU awk,
$ echo \"Here is a string\" | awk -v FS=\"(Here|string)\" \'{print $2}\'
is a
grep with -P
(perl-regexp) parameter supports \\K
, which helps in discarding the previously matched characters. In our case , the previously matched string was Here
so it got discarded from the final output.
$ echo \"Here is a string\" | grep -oP \'Here\\K.*(?=string)\'
is a
$ echo \"Here is a string\" | grep -oP \'Here\\K(?:(?!string).)*\'
is a
If you want the output to be is a
then you could try the below,
$ echo \"Here is a string\" | grep -oP \'Here\\s*\\K.*(?=\\s+string)\'
is a
$ echo \"Here is a string\" | grep -oP \'Here\\s*\\K(?:(?!\\s+string).)*\'
is a
If you have a long file with many multi-line ocurrences, it is useful to first print number lines:
cat -n file | sed -n \'/Here/,/String/p\'
This might work for you (GNU sed):
sed \'/Here/!d;s//&\\n/;s/.*\\n//;:a;/String/bb;$!{n;ba};:b;s//\\n&/;P;D\' file
This presents each representation of text between two markers (in this instance Here
and String
) on a newline and preserves newlines within the text.
All the above solutions have deficiencies where the last search string is repeated elsewhere in the string. I found it best to write a bash function.
function str_str {
local str
str=\"${1#*${2}}\"
str=\"${str%%$3*}\"
echo -n \"$str\"
}
# test it ...
mystr=\"this is a string\"
str_str \"$mystr\" \"this \" \" string\"
You can use \\1
(refer to http://www.grymoire.com/Unix/Sed.html#uh-4):
echo \"Hello is a String\" | sed \'s/Hello\\(.*\\)String/\\1/g\'
The contents that is inside the brackets will be stored as \\1
.
Problem. My stored Claws Mail messages are wrapped as follows, and I am trying to extract the Subject lines:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
Per A2 in this thread, How to use sed/grep to extract text between two words? the first expression, below, \"works\" as long as the matched text does not contain a newline:
grep -o -P \'(?<=Subject: ).*(?=molecular)\' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
However, despite trying numerous variants (.+?; /s; ...
), I could not get these to work:
grep -o -P \'(?<=Subject: ).*(?=link)\' corpus/01
grep -o -P \'(?<=Subject: ).*(?=therapeutic)\' corpus/01
etc.
Solution 1.
Per Extract text between two strings on different lines
sed -n \'/Subject: /{:a;N;/Message-ID:/!ba; s/\\n/ /g; s/\\s\\s*/ /g; s/.*Subject: \\|Message-ID:.*//g;p}\' corpus/01
which gives
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Solution 2.*
Per How can I replace a newline (\\n) using sed?
sed \':a;N;$!ba;s/\\n/ /g\' corpus/01
will replace newlines with a space.
Chaining that with A2 in How to use sed/grep to extract text between two words?, we get:
sed \':a;N;$!ba;s/\\n/ /g\' corpus/01 | grep -o -P \'(?<=Subject: ).*(?=Message-ID:)\'
which gives
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
This variant removes double spaces:
sed \':a;N;$!ba;s/\\n/ /g; s/\\s\\s*/ /g\' corpus/01 | grep -o -P \'(?<=Subject: ).*(?=Message-ID:)\'
giving
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
To understand sed
command, we have to build it step by step.
Here is your original text
user@linux:~$ echo \"Here is a String\"
Here is a String
user@linux:~$
Let\'s try to remove Here
with s
ubstition option in sed
user@linux:~$ echo \"Here is a String\" | sed \'s/Here //\'
is a String
user@linux:~$
At this point, I believe you would be able to remove String
as well
user@linux:~$ echo \"Here is a String\" | sed \'s/String//\'
Here is a
user@linux:~$
But this is not your desired output.
To combine two sed commands, use -e
option
user@linux:~$ echo \"Here is a String\" | sed -e \'s/Here //\' -e \'s/String//\'
is a
user@linux:~$
Hope this helps