Why should text files end with a newline?

2018-12-31 03:49发布

I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?

18条回答
浮光初槿花落
2楼-- · 2018-12-31 04:26

Some tools expect this. For example, wc expects this:

$ echo -n "Line not ending in a new line" | wc -l
0
$ echo "Line ending with a new line" | wc -l
1
查看更多
不再属于我。
3楼-- · 2018-12-31 04:30

Because that’s how the POSIX standard defines a line:

3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

Therefore, lines not ending in a newline character aren't considered actual lines. That's why some programs have problems processing the last line of a file if it isn't newline terminated.

There's at least one hard advantage to this guideline when working on a terminal emulator: All Unix tools expect this convention and work with it. For instance, when concatenating files with cat, a file terminated by newline will have a different effect than one without:

$ more a.txt
foo$ more b.txt
bar
$ more c.txt
baz
$ cat *.txt
foobar
baz

And, as the previous example also demonstrates, when displaying the file on the command line (e.g. via more), a newline-terminated file results in a correct display. An improperly terminated file might be garbled (second line).

For consistency, it’s very helpful to follow this rule – doing otherwise will incur extra work when dealing with the default Unix tools.

Now, on non POSIX compliant systems (nowadays that’s mostly Windows), the point is moot: files don’t generally end with a newline, and the (informal) definition of a line might for instance be “text that is separated by newlines” (note the emphasis). This is entirely valid. However, for structured data (e.g. programming code) it makes parsing minimally more complicated: it generally means that parsers have to be rewritten. If a parser was originally written with the POSIX definition in mind, then it might be easier to modify the token stream rather than the parser — in other words, add an “artificial newline” token to the end of the input.

查看更多
千与千寻千般痛.
4楼-- · 2018-12-31 04:31

Presumably simply that some parsing code expected it to be there.

I'm not sure I would consider it a "rule", and it certainly isn't something I adhere to religiously. Most sensible code will know how to parse text (including encodings) line-by-line (any choice of line endings), with-or-without a newline on the last line.

Indeed - if you end with a new line: is there (in theory) an empty final line between the EOL and the EOF? One to ponder...

查看更多
闭嘴吧你
5楼-- · 2018-12-31 04:31

IMHO, it's a matter of personal style and opinion.

In olden days, I didn't put that newline. A character saved means more speed through that 14.4K modem.

Later, I put that newline so that it's easier to select the final line using shift+downarrow.

查看更多
唯独是你
6楼-- · 2018-12-31 04:32

This originates from the very early days when simple terminals were used. The newline char was used to trigger a 'flush' of the transferred data.

Today, the newline char isn't required anymore. Sure, many apps still have problems if the newline isn't there, but I'd consider that a bug in those apps.

If however you have a text file format where you require the newline, you get simple data verification very cheap: if the file ends with a line that has no newline at the end, you know the file is broken. With only one extra byte for each line, you can detect broken files with high accuracy and almost no CPU time.

查看更多
倾城一夜雪
7楼-- · 2018-12-31 04:33

Why should (text) files end with a newline?

As well expressed by many, because:

  1. Many programs do not behave well, or fail without it.

  2. Even programs that well handle a file lack an ending '\n', the tool's functionality may not meet the user's expectations - which can be unclear in this corner case.

  3. Programs rarely disallow final '\n' (I do not know of any).


Yet this begs the next question:

What should code do about text files without a newline?

  1. Most important - Do not write code that assumes a text file ends with a newline. Assuming a file conforms to a format leads to data corruption, hacker attacks and crashes. Example:

    // Bad code
    while (fgets(buf, sizeof buf, instream)) {
      // What happens if there is no \n, buf[] is truncated leading to who knows what
      buf[strlen(buf) - 1] = '\0';  // attempt to rid trailing \n
      ...
    }
    
  2. If the final trailing '\n' is needed, alert the user to its absence and the action taken. IOWs, validate the file's format. Note: This may include a limit to the maximum line length, character encoding, etc.

  3. Define clearly, document, the code's handling of a missing final '\n'.

  4. Do not, as possible, generate a file the lacks the ending '\n'.

查看更多
登录 后发表回答