可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a input file as follows:
MB1 00134141
MB1 12415085
MB1 13253590
MB1 10598105
MB1 01141484
...
...
MB1 10598105
I want to combine 5 lines and merge it into one line.
I want my bash script to process the bash script to produce output as follows -
MB1 00134141 MB1 12415085 MB1 13253590 MB1 10598105 MB1 01141484
...
...
...
I have written following script and it works but it is slow for file of size 23051 lines.
Can I write a better code to make it faster?
#!/bin/bash
file=timing.csv
x=0
while [ $x -lt $(cat $file | wc -l) ]
do
line=`head -n $x $file | tail -n 1`
echo -n $line " "
let "remainder = $x % 5"
if [ "$remainder" -eq 0 ]
then
echo ""
fi
let x=x+1
done
exit 0
I tried to execute the following command but it messes up some numbers.
cat timing_deleted.csv | pr -at5
回答1:
In pure bash, with no external processes (for speed):
while true; do
out=()
for (( i=0; i<5; i++ )); do
read && out+=( "$REPLY" )
done
if (( ${#out[@]} > 0 )); then
printf '%s ' "${out[@]}"
echo
fi
if (( ${#out[@]} < 5 )); then break; fi
done <input-file >output-file
This correctly handles files where the number of lines is not a multiple of 5.
回答2:
Using tr:
cat input_file | tr "\n" " "
回答3:
Use the paste command:
paste -d ' ' - - - - - < tmp.txt
paste
is far better, but I couldn't bring myself to
delete my previous mapfile
-based solution.
[UPDATE: mapfile
reads too many lines prior to version 4.2.35 when used with -n
]
#!/bin/bash
file=timing.csv
while true; do
mapfile -t -n 5 arr
(( ${#arr} > 0 )) || break
echo "${arr[*]}"
done < "$file"
exit 0
We can't do while mapfile ...; do
because mapfile
exists with status 0 even when it doesn't read any input.
回答4:
You can use xargs
, if your input always contains a consistent number of spaces per line:
cat timing_deleted.csv | xargs -n 10
This will take the input from cat timing_deleted.csv
and combine the input on 10 (-n 10
) whitespace characters. The spaces in each column, such as MB1 00134141
, count as a whitespace character - as well as the newline at the end of each line. So, for 5 lines, you'll need to use 10.
EDIT
As commented by Charles, you can skip the usage of cat
and directly push the data into xargs
with:
xargs -n 10 < timing_deleted.csv
I didn't notice any performance gains using a really large file, but it doesn't require multiple commands.
回答5:
Using sed, but this one will not process last few lines that do not add to a factor of 5:
sed 'N;N;N;N;s/\n/ /g;' input_file
The N
command reads the next line and appends it to the current line, preserving the newline. This script reads four additional lines for each line it reads, accumulating chunks of 5 lines in the buffer. For each such chunk, it replaces all of the newlines with a space.
回答6:
A awk script would do that. A sed replace too, I guess. I don't know sed well, so here you go.
NF{
if(i>=5){
line = line "\n";
i=0;
}else{
line = line " " $0;
i++;
}
}
END{
print line;
}
Call that, say, merge.awk. Here is how you invoque it :
awk -f merge.awk filetomerge.txt
or
cat filetomerge.txt | awk -f merge.awk
Should be rather fast too.