use grep and awk to transfer data from .srt to .cs

2019-07-09 05:03发布

I got an interesting project to do! I'm thinking about converting an srt file into a csv/xls file.

a srt file would look like this:

1
00:00:00,104 --> 00:00:02,669
Hi, I'm shell-scripting.

2
00:00:02,982 --> 00:00:04,965
I'm not sure if it would work,
but I'll try it!

3
00:00:05,085 --> 00:00:07,321
There must be a way to do it!

while I want to output it into a csv file like this:

"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."   
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"

So as you can see, each subtitle takes up two rows. My thinking would be using grep to put the srt data into the xls, and then use awk to format the xls file.

What do you guys think? How am I suppose to write it? I tried

$grep filename.srt > filename.xls

It seems that all the data including the time codes and the subtitle words ended up all in column A of the xls file...but I want the words to be in column B...How would awk be able to help with the formatting?

Thank you in advance! :)

4条回答
Bombasti
2楼-- · 2019-07-09 05:20

My other answer was half awk and half Perl, but, given that awk can't write Excel spreadsheets whereas Perl can, it seems daft to require you to master both awk and Perl when Perl is perfectly capable of doing it all on its own... so here goes in Perl:

#!/usr/bin/perl
use strict;
use warnings;

use Excel::Writer::XLSX;
my $workbook  = Excel::Writer::XLSX->new('result.xlsx');
my $worksheet = $workbook->add_worksheet();
my $ExcelRow=0; 
local $/ = "";   # set paragraph mode, so we read till next blank line as one record

while(my $para=<>){
   $ExcelRow++;                               # move down a line in Excel worksheet
   chomp $para;                               # strip CR
   my @lines=split /\n/, $para;               # split paragraph into lines on linefeed character
   my $scene = $lines[0];                     # pick up scene number from first line of para
   my ($start,$end)=split / --> /,$lines[1];  # pick up start and end time from second line
   my $cell=sprintf("A%d",$ExcelRow);         # work out cell
   $worksheet->write($cell,$scene);           # write scene to spreadsheet column A
   $cell=sprintf("B%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$start);           # write start time to spreadsheet column B
   $cell=sprintf("C%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$end);             # write end time to spreadsheet column C
   $cell=sprintf("D%d",$ExcelRow);            # work out cell
   $worksheet->write($cell,$lines[2]);        # write description to spreadsheet column D
   for(my $i=3;$i<scalar @lines;$i++){        # output additional lines of description
      $ExcelRow++;
      $cell=sprintf("D%d",$ExcelRow);         # work out cell
      $worksheet->write($cell,$lines[$i]);
   }
}

$workbook->close;

Save the above on a file called srt2xls and then make it executable with the command:

chmod +x srt2xls

Then you can run it with

./srt2xls < SomeFileile.srt

and it will give you this spreadsheet called result.xlsx

enter image description here

查看更多
别忘想泡老子
3楼-- · 2019-07-09 05:25

Since you want to convert the srt into csv. below is awk command

 awk '{gsub(" --> ","\x22,\x22");if(NF!=0){if(j<3)k=k"\x22"$0"\x22,";else{k="\x22"$0"\x22 ";l=1}j=j+1}else j=0;if(j==3){print k;k=""}if(l==1){print ",,,"k ;l=0;k=""}}' inputfile > output.csv

detail veiw of awk

awk '{
       gsub(" --> ","\x22,\x22"); 
       if(NF!=0)
         {
           if(j<3)
              k=k"\x22"$0"\x22,";
           else
            {
              k="\x22"$0"\x22 ";
              l=1
            }
          j=j+1
         }
        else
          j=0;
        if(j==3)
          { 
            print k;
            k=""
          }
        if(l==1)
          {
            print ",,,"k;
            l=0;
            k=""
          }
    }' inputfile > output.csv

take the output.csv on windows platform and then open with microsoft excel and save it as .xls extension.

查看更多
beautiful°
4楼-- · 2019-07-09 05:35

I think something like this should do it quite nicely:

awk -v RS= -F'\n' '
   { 
      sub(" --> ","\x7c",$2)                 # change "-->" to "|"
      printf "%s|%s|%s\n",$1,$2,$3           # print scene, time start, time stop, description
      for(i=4;i<=NF;i++)printf "|||%s\n",$i  # print remaining lines of description
   }' file.srt

The -v RS= sets the Record Separator to blank lines. The -F'\n' sets the Field Separator to new lines.

The sub() replaces the "-->" with a pipe symbol (|).

The first three fields are then printed separated by pipes, and then there is a little loop to print the remaining lines of description, inset by three pipe symbols to make them line up.

Output

1|00:00:00,104|00:00:02,669|Hi, I'm shell-scripting.
2|00:00:02,982|00:00:04,965|I'm not sure if it would work,
|||but I'll try it!
3|00:00:05,085|00:00:07,321|There must be a way to do it!

As I am feeling like having some more fun with Perl and Excel, I took the above output and parsed it in Perl and wrote a real Excel XLSX file. Of course, there is no real need to use awk and Perl so ideally one would re-cast the awk and integrate it into the Perl since the latter can write Excel files while the former cannot. Anyway here is the Perl.

#!/usr/bin/perl
use strict;
use warnings;

use Excel::Writer::XLSX;
my $DEBUG=0; 
my $workbook  = Excel::Writer::XLSX->new('result.xlsx');
my $worksheet = $workbook->add_worksheet();
my $row=0; 

while(my $line=<>){
   $row++;                                   # move down a line in Excel worksheet
   chomp $line;                              # strip CR
   my @f=split /\|/, $line;                  # split fields of line into array @f[], on pipe symbols (|)
   for(my $j=0;$j<scalar @f;$j++){           # loop through all fields
     my $cell= chr(65+$j) . $row;            # calcuate Excell cell, starting at A1 (65="A")
     $worksheet->write($cell,$f[$j]);        # write to spreadsheet
     printf "%s:%s ",$cell,$f[$j] if $DEBUG;
   }
   printf "\n" if $DEBUG;
}

$workbook->close;

Output

enter image description here

查看更多
Lonely孤独者°
5楼-- · 2019-07-09 05:43
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; q="\""; s=q OFS q }
{
    split($2,a,/ .* /)
    print q $1 s a[1] s a[2] s $3 q
    for (i=4;i<=NF;i++) {
        print "", "", "", q $i q
    }
}

$ awk -f tst.awk file
"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work,"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"
查看更多
登录 后发表回答