Reference: This is a self-answered question. It was meant to share the knowledge, Q&A style.
How do I detect the type of end of line character in PHP?
PS: I've been writing this code from scratch for too long now, so I decided to share it on SO, plus, I'm sure someone will find ways for improvement.
Wouldn't it be easier to just replace everything except new lines using regex?
With that in mind, we do some magic:
Not sure if we can trust regex to do all this, but I don't have anything to test with.
Notes:
Needs to somehow know that we may be on an exotic system like ZX8x (since ASCII x76 is a regular letter)@radu raised a good point, in my case, it's not worth the effort to handle ZX8x systems nicely.mb_detect_eol()
(multibyte) anddetect_eol()
The here already given answers provide the user of enough information. The following code (based on the already given anwers) might help even more:
then use the following code in a static class Utility to detect
and then for a file:
Change the Your-Class-Name into your name for the implementation Class (all static members).
Based on ohaal's answer.
This can return one or two caracters for EOL like LF, CR+LF..
My answer, because I could make neither ohaal's one or transilvlad's one work, is:
Explanation:
The general idea in both proposed solutions is good, but implementation details hinder the usefulness of those answers.
Indeed, the point of this function is to return the kind of newline used in a file, and that newline can either be one or two character long.
This alone renders the use of
str_split()
incorrect. The only way to cut the tokens correctly is to use a function that cuts a string with variable lengths, based on character detection instead. That is whenexplode()
comes into play.But to give useful markers to explode, it is necessary to replace the right characters, in the right amount, by the right match. And most of the magic happens in the regular expression.
3 points have to be considered:
.*
as suggested by ohaal will not work. While it is true that.
will not match newline characters, on a system where\r
is not a newline character, or part of a newline character,.
will match it incorrectly (reminder: we are detecting newlines because they could be different from the ones on our system. Otherwise there is no point)./[^\r\n]*/
with anything will "work" to make the text vanish, but will be an issue as soon as we want to have a separator (since we remove all characters but the newlines, any character that isn't a newline will be a valid separator). Hence the idea to create a match with the newline, and use a backreference to that match in the replacement.