For example, given:
USCAGoleta9311734.5021-120.1287855805
I want to extract just:
US
For example, given:
USCAGoleta9311734.5021-120.1287855805
I want to extract just:
US
Probably the most efficient method, if you're using the bash
shell (and you appear to be, based on your comments), is to use the sub-string variant of parameter expansion:
pax> long="USCAGol.blah.blah.blah"
pax> short="${long:0:2}" ; echo "${short}"
US
This will set short
to be the first two characters of long
. If long
is shorter than two characters, short
will be identical to it.
This in-shell method is usually better if you're going to be doing it a lot (like 50,000 times per report as you mention) since there's no process creation overhead. All solutions which use external programs will suffer from that overhead.
If you also wanted to ensure a minimum length, you could pad it out before hand with something like:
pax> long="A"
pax> tmpstr="${long}.."
pax> short="${tmpstr:0:2}" ; echo "${short}"
A.
This would ensure that anything less than two characters in length was padded on the right with periods (or something else, just by changing the character used when creating tmpstr
). It's not clear that you need this but I thought I'd put it in for completeness.
Having said that, there are any number of ways to do this with external programs (such as if you don't have bash
available to you), some of which are:
short=$(echo "${long}" | cut -c1-2)
short=$(echo "${long}" | head -c2)
short=$(echo "${long}" | awk '{print substr ($0, 0, 2)}'
short=$(echo "${long}" | sed 's/^\(..\).*/\1/')
The first two (cut
and head
) are identical for a single-line string - they basically both just give you back the first two characters. They differ in that cut
will give you the first two characters of each line and head
will give you the first two characters of the entire input
The third one uses the awk
sub-string function to extract the first two characters and the fourth uses sed
capture groups (using ()
and \1
) to capture the first two characters and replace the entire line with them. They're both similar to cut
- they deliver the first two characters of each line in the input.
None of that matters if you are sure your input is a single line, they all have an identical effect.
easiest way is
${string:position:length}
Where this extracts $length
substring from $string
at $position
.
This is a bash builtin so awk or sed is not required.
You've gotten several good answers and I'd go with the Bash builtin myself, but since you asked about sed
and awk
and (almost) no one else offered solutions based on them, I offer you these:
echo "USCAGoleta9311734.5021-120.1287855805" | awk '{print substr($0,0,2)}'
and
echo "USCAGoleta9311734.5021-120.1287855805" | sed 's/\(^..\).*/\1/'
The awk
one ought to be fairly obvious, but here's an explanation of the sed
one:
Just grep:
echo 'abcdef' | grep -Po "^.." # ab
If you're in bash
, you can say:
bash-3.2$ var=abcd
bash-3.2$ echo ${var:0:2}
ab
This may be just what you need…
Quite late indeed but here it is
sed 's/.//3g'
Or
awk NF=1 FPAT=..
Or
perl -pe '$_=unpack a2'
colrm — remove columns from a file
To leave first two chars, just remove columns starting from 3
cat file | colrm 3
If your system is using a different shell (not bash
), but your system has bash
, then you can still use the inherent string manipulation of bash
by invoking bash
with a variable:
strEcho='echo ${str:0:2}' # '${str:2}' if you want to skip the first two characters and keep the rest
bash -c "str=\"$strFull\";$strEcho;"
If you want to use shell scripting and not rely on non-posix extensions (such as so-called bashisms), you can use techniques that do not require forking external tools such as grep, sed, cut, awk, etc., which then make your script less efficient. Maybe efficiency and posix portability is not important in your use case. But in case it is (or just as a good habit), you can use the following parameter expansion option method to extract the first two characters of a shell variable:
$ sh -c 'var=abcde; echo "${var%${var#??}}"'
ab
This uses "smallest prefix" parameter expansion to remove the first two characters (this is the ${var#??}
part), then "smallest suffix" parameter expansion (the ${var%
part) to remove that all-but-the-first-two-characters string from the original value.
This method was previously described in this answer to the "Shell = Check if variable begins with #" question. That answer also describes a couple similar parameter expansion methods that can be used in a slightly different context that the one that applies to the original question here.
perl -ple 's/^(..).*/$1/'
if mystring = USCAGoleta9311734.5021-120.1287855805
print substr(mystring,0,2)
would print US
where 0 is the start position and 2 is how meny chars to read
Is this what your after?
my $string = 'USCAGoleta9311734.5021-120.1287855805';
my $first_two_chars = substr $string, 0, 2;
ref: substr