I am trying to parse the Linux /etc/passwd
file in Java. I'm currently reading each line through the java.util.Scanner
class and then using java.lang.String.split(String)
to delimit each line.
The problem is that the line:
list:x:38:38:Mailing List Manager:/var/list:/bin/sh"
is treated by the scanner as 3 different lines:
list:x:38:38:Mailing
List
Manager...
When I type this out into a new file that I didn't get from Linux, Scanner
parses it properly.
Is there something I'm not understanding about new lines in Linux?
Obviously a work around is to parse it without using scanner, but it wouldn't be elegant. Does anyone know of an elegant way to do it?
Is there a way to convert the file into one that would work with Scanner
?
Not even two days ago: Historical reason behind different line ending at different platforms
EDIT
Note from the original author:
"I figured out I have a different error that is causing the problem. Disregard question"
Have you tried to remove all hidden characters but '\n'. What is the regex your using to split the lines?
Why not use
LineNumberReader
?If you can't do that, what does the code look like?
The only difference I can think of is that you are splitting on a bad regex and that when you edit the file yourself, you get dos newlines that somehow pass your regex.
Still, for reading things one line at a time, it seems like overkill to use
Scanner
.Of course, why you are parsing
/etc/passwd
is a hole other discussion :)Now I remember why I use BufferedReader on these occasions... :-)
This works for me on Ubuntu
From Wikipedia:
I translate this into these line endings in general:
'\r\n'
'\r'
'\n'
'\n'
You need to make your scanner/parser handle the unix version, too.
You can get the standard line ending for your current OS from: