Excel saves tab delimited files without newline (U

2019-04-04 03:16发布

问题:

This is a common issue I have and my solution is a bit brash. So I'm looking for a quick fix and explanation of the problem.

The problem is that when I decide to save a spreadsheet in excel (mac 2011) as a tab delimited file it seems to do it perfectly fine. Until I try to parse the file line by line using Perl. For some reason it slurps the whole document in one line.

My brutish solution is to open the file in a web browser and copy and paste the information into the tab delimited file in TextEdit (I never use rich text format). I tried introducing a newline in the end of the file before doing this fix and it does not resolve the issue.

What's going on here? An explanation would be appreciated.

~Thanks!~

回答1:

The problem is the actual character codes that define new lines on different systems. Windows systems commonly use a CarriageReturn+LineFeed (CRLF) and *NIX systems use only a LineFeed (LF).

These characters can be represented in RegEx as \r\n or \n (respectively).

Sometimes, to hash through a text file, you need to parse New Line characters. Try this for DOS-to-UNIX in perl:

perl -pi -e 's/\r\n/\n/g' input.file

or, for UNIX-to-DOS using sed:

$ sed 's/$'"/`echo \\\r`/" input.txt > output.txt

or, for DOS-to-UNIX using sed:

$ sed 's/^M$//' input.txt > output.txt


回答2:

Found a pretty simple solution to this. Copy data from Excel to clipboard, paste it into a google spreadsheet. Download google spreadsheet file as a 'tab-separated values .tsv'. This gets around the problem and you have tab delimiters with an end of line for each line.



回答3:

Yet another solution ...

  • for a tab-delimited file, save the document as a Windows Formatted Text (.txt) file type
  • for a comma-separated file, save the document as a `Windows Comma Separated (.csv)' file type


回答4:

Perl has a useful regex pattern \R which will match any common line ending. It actually matches any vertical whitespace -- the same as \v -- or the CR LF combination, so it's the same as \r\n|\v

This is useful here because you can slurp your entire file into a single scalar and then split /\R/, which will give you a list of file records, already chomped (if you want to keep the line terminators you can split /\R\K/ instead

Another option is the PerlIO::eol module. It provides a new Perl IO layer that will normalize line endings no matter what the contents of the file are

Once you have loaded the module with use PerlIO::eol you can use it in an open statement

open my $fh, '<:eol(LF)', 'myfile.tsv' or die $!;

or you can use the open pragma to set it as the default layer for all input file handles

use open IN  => ':raw:eol(LF)';

which will work fine with an input file from any platform