Is there a command like cat
in linux which can return a specified quantity of characters from a file?
e.g., I have a text file like:
Hello world
this is the second line
this is the third line
And I want something that would return the first 5 characters, which would be "hello".
thanks
I know the answer is in reply to a question asked 6 years ago ...
But I was looking for something similar for a few hours and then found out that: cut -c does exactly that, with an added bonus that you could also specify an offset.
cut -c 1-5 will return Hello and cut -c 7-11 will return world. No need for any other command
Here's a simple script that wraps up using the
dd
approach mentioned here:extract_chars.sh
this script gives the exact number of characters from the specific line and location, e.g.:
gives the chars in line 5 and chars 5 to 8 of line 5,
Note:
tail -1
is used to select the last line displayed by the head.head or tail can do it as well:
Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.
This (and cut) are portable.
Even though this was answered/accepted years ago, the presently accepted answer is only correct for one-byte-per-character encodings like iso-8859-1, or for the single-byte subsets of variable-byte character sets (like Latin characters within UTF-8). Even using multiple-byte splices instead would still only work for fixed-multibyte encodings like UTF-16. Given that now UTF-8 is well on its way to being a universal standard, and when looking at this list of languages by number of native speakers and this list of top 30 languages by native/secondary usage, it is important to point out a simple variable-byte character-friendly (not byte-based) technique, using
cut -c
andtr
/sed
with character-classes.Compare the following which doubly fails due to two common Latin-centric mistakes/presumptions regarding the bytes vs. characters issue (one is
head
vs.cut
, the other is[a-z][A-Z]
vs.[:upper:][:lower:]
):to this (note: this worked fine on FreeBSD, but both
cut
&tr
on GNU/Linux still mangled Greek in UTF-8 for me though):If your
cut
doesn't handle-c
with variable-byte encodings correctly, for "the firstX
characters" (replaceX
with your number) you could try:sed -E -e '1 s/^(.{X}).*$/\1/' -e q
- which is limited to the first line thoughhead -n 1 | grep -E -o '^.{X}'
- which is limited to the first line and chains two commands thoughdd
- which has already been suggested in other answers, but is really cumbersomesed
script with sliding window buffer to handle characters spread over multiple lines, but that is probably more cumbersome/fragile than just using something likedd
If your
tr
doesn't handle character-classes with variable-byte encodings correctly you could try:sed -E -e 's/[[:upper:]]/\L&/g
(GNU-specific)head
works too:..will extract the first 100 bytes and return them.
What's nice about using
head
for this is that the syntax fortail
matches: