I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?
相关问题
- What is the best way to do a search in a large fil
- Spring Integration - Inbound file endpoint. How to
- Why should we check WIFEXITED after wait in order
- php--glob for searching directories and .jpg only
- UNIX Bash - Removing double quotes from specific s
相关文章
- What is the correct way to declare and use a FILE
- Making new files automatically executable?
- Reverse four length of letters with sed in unix
- Extracting columns from text file using Perl one-l
- How to serialize data into indented json [duplicat
- Problem with piping commands in C
- Creating a custom file like object python suggesti
- Sorting a data stream before writing to file in no
There's also a practical programming issue with files lacking newlines at the end: The
read
Bash built-in (I don't know about otherread
implementations) doesn't work as expected:This prints only
foo
! The reason is that whenread
encounters the last line, it writes the contents to$line
but returns exit code 1 because it reached EOF. This breaks thewhile
loop, so we never reach theecho $line
part. If you want to handle this situation, you have to do the following:That is, do the
echo
if theread
failed because of a non-empty line at end of file. Naturally, in this case there will be one extra newline in the output which was not in the input.A separate use case: when your text file is version controlled (in this case specifically under git although it applies to others too). If content is added to the end of the file, then the line that was previously the last line will have been edited to include a newline character. This means that
blame
ing the file to find out when that line was last edited will show the text addition, not the commit before that you actually wanted to see.I was always under the impression the rule came from the days when parsing a file without an ending newline was difficult. That is, you would end up writing code where an end of line was defined by the EOL character or EOF. It was just simpler to assume a line ended with EOL.
However I believe the rule is derived from C compilers requiring the newline. And as pointed out on “No newline at end of file” compiler warning, #include will not add a newline.
It's very late here but I just faced one bug in file processing and that came because the files were not ending with empty newline. We were processing text files with
sed
andsed
was omitting the last line from output which was causing invalid json structure and sending rest of the process to fail state.All we were doing was:
There is one sample file say:
foo.txt
with somejson
content inside it.The file was created in widows machine and window scripts were processing that file using powershall commands. All good.
When we processed same file using
sed
commandsed 's|value|newValue|g' foo.txt > foo.txt.tmp
The newly generated file wasand boom, it failed the rest of the processes because of the invalid JSON.
So it's always a good practice to end your file with empty new line.
This answer is an attempt at a technical answer rather than opinion.
If we want to be POSIX purists, we define a line as:
Source: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
An incomplete line as:
Source: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195
A text file as:
Source: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397
A string as:
Source: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_396
From this then, we can derive that the only time we will potentially encounter any type of issues are if we deal with the concept of a line of a file or a file as a text file (being that a text file is an organization of zero or more lines, and a line we know must terminate with a <newline>).
Case in point:
wc -l filename
.From the
wc
's manual we read:What are the implications to JavaScript, HTML, and CSS files then being that they are text files?
In browsers, modern IDEs, and other front-end applications there are no issues with skipping EOL at EOF. The applications will parse the files properly. It has to since not all Operating Systems conform to the POSIX standard, so it would be impractical for non-OS tools (e.g. browsers) to handle files according to the POSIX standard (or any OS-level standard).
As a result, we can be relatively confident that EOL at EOF will have virtually no negative impact at the application level - regardless if it is running on a UNIX OS.
At this point we can confidently say that skipping EOL at EOF is safe when dealing with JS, HTML, CSS on the client-side. Actually, we can state that minifying any one of these files, containing no <newline> is safe.
We can take this one step further and say that as far as NodeJS is concerned it too cannot adhere to the POSIX standard being that it can run in non-POSIX compliant environments.
What are we left with then? System level tooling.
This means the only issues that may arise are with tools that make an effort to adhere their functionality to the semantics of POSIX (e.g. definition of a line as shown in
wc
).Even so, not all shells will automatically adhere to POSIX. Bash for example does not default to POSIX behavior. There is a switch to enable it:
POSIXLY_CORRECT
.Food for thought on the value of EOL being <newline>: http://www.rfc-editor.org/EOLstory.txt
Staying on the tooling track, for all practical intents and purposes, let's consider this:
Let's work with a file that has no EOL. As of this writing the file in this example is a minified JavaScript with no EOL.
Notice the
cat
file size is exactly the sum of its individual parts. If the concatenation of JavaScript files is a concern for JS files, the more appropriate concern would be to start each JavaScript file with a semi-colon.As someone else mentioned in this thread: what if you want to
cat
two files whose output becomes just one line instead of two? In other words,cat
does what it's supposed to do.The
man
ofcat
only mentions reading input up to EOF, not <newline>. Note that the-n
switch ofcat
will also print out a non- <newline> terminated line (or incomplete line) as a line - being that the count starts at 1 (according to theman
.)Now that we understand how POSIX defines a line , this behavior becomes ambiguous, or really, non-compliant.
Understanding a given tool's purpose and compliance will help in determining how critical it is to end files with an EOL. In C, C++, Java (JARs), etc... some standards will dictate a newline for validity - no such standard exists for JS, HTML, CSS.
For example, instead of using
wc -l filename
one could doawk '{x++}END{ print x}' filename
, and rest assured that the task's success is not jeopardized by a file we may want to process that we did not write (e.g. a third party library such as the minified JS wecurl
d) - unless our intent was truly to count lines in the POSIX compliant sense.Conclusion
There will be very few real life use cases where skipping EOL at EOF for certain text files such as JS, HTML, and CSS will have a negative impact - if at all. If we rely on <newline> being present, we are restricting the reliability of our tooling only to the files that we author and open ourselves up to potential errors introduced by third party files.
Moral of the story: Engineer tooling that does not have the weakness of relying on EOL at EOF.
Feel free to post use cases as they apply to JS, HTML and CSS where we can examine how skipping EOL has an adverse effect.
Imagine that the file is being processed while the file is still being generated by another process.
It might have to do with that? A flag that indicates that the file is ready to be processed.