I have a folder that contains multiple text files. I'm trying to split all text files at 10000 line per file while keeping the base file name i.e. if filename1.txt contains 20000 lines the output will be filename1-1.txt (10000 lines) and filename1-2.txt (10000 lines).
I tried to use split -10000 filename1.txt
but this is not keeping the base filename and i have to repeat the command for each text file in the folder. I also tried doing for f in *.txt; do split -10000 $f.txt; done
. This didn't work too.
Any idea how can i do this? Thanks.
for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done
Or, written over multiple lines:
for f in filename*.txt
do
split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"
done
How it works:
-d
tells split
to use numeric suffixes
-a1
tells split
to start with only single digits for the suffix.
-l10000
tells split
to split every 10,000 lines.
--additional-suffix=.txt
tells split
to add .txt
to the end of the names of the new files.
"$f"
tells split
the name of the file to split.
"${f%.txt}-"
tells split
the prefix name to use for the split files.
Example
Suppose that we start with these files:
$ ls
filename1.txt filename2.txt
Then we run our command:
$ for f in filename*.txt; do split -d -a1 -l10000 --additional-suffix=.txt "$f" "${f%.txt}-"; done
When this is done, we now have the original files and the new split files:
$ ls
filename1-0.txt filename1-1.txt filename1.txt filename2-0.txt filename2-1.txt filename2.txt
Using older, less featureful forms of split
If your split does not offer --additional-suffix
, then consider:
for f in filename*.txt
do
split -d -a1 -l10000 "$f" "${f%.txt}-"
for g in "${f%.txt}-"*
do
mv "$g" "$g.txt"
done
done
No need for shell loops, just one simple awk command does it for all files:
awk 'FNR%1000==1{if(FNR==1)c=0; close(out); out=FILENAME; sub(/.txt/,"-"++c".txt)} {print > out}' *