Do line endings differ between Windows and Linux?

2019-01-07 21:33发布

I am trying to parse the Linux /etc/passwd file in Java. I'm currently reading each line through the java.util.Scanner class and then using java.lang.String.split(String) to delimit each line.

The problem is that the line:

list:x:38:38:Mailing List Manager:/var/list:/bin/sh" 

is treated by the scanner as 3 different lines:

  1. list:x:38:38:Mailing
  2. List
  3. Manager...

When I type this out into a new file that I didn't get from Linux, Scanner parses it properly.

Is there something I'm not understanding about new lines in Linux?

Obviously a work around is to parse it without using scanner, but it wouldn't be elegant. Does anyone know of an elegant way to do it?

Is there a way to convert the file into one that would work with Scanner?


Not even two days ago: Historical reason behind different line ending at different platforms

EDIT

Note from the original author:

"I figured out I have a different error that is causing the problem. Disregard question"

7条回答
家丑人穷心不美
2楼-- · 2019-01-07 21:39

Have you tried to remove all hidden characters but '\n'. What is the regex your using to split the lines?

查看更多
太酷不给撩
3楼-- · 2019-01-07 21:40

Why not use LineNumberReader?

If you can't do that, what does the code look like?

The only difference I can think of is that you are splitting on a bad regex and that when you edit the file yourself, you get dos newlines that somehow pass your regex.

Still, for reading things one line at a time, it seems like overkill to use Scanner.

Of course, why you are parsing /etc/passwd is a hole other discussion :)

查看更多
等我变得足够好
4楼-- · 2019-01-07 21:47

Now I remember why I use BufferedReader on these occasions... :-)

查看更多
时光不老,我们不散
5楼-- · 2019-01-07 21:49

This works for me on Ubuntu

import java.util.Scanner;
import java.io.File;

public class test {
  public static void main(String[] args) {
    try {
      Scanner sc = new Scanner(new File("/etc/passwd"));
      String l;
      while( ( l = sc.nextLine() ) != null ) {
        String[] p = l.split(":");
        for(String pi: p) System.out.print( pi + "\t:\t" );
        System.out.println();
      }
    } catch(Exception e) { e.printStackTrace(); }
  }
}
查看更多
Summer. ? 凉城
6楼-- · 2019-01-07 21:55

From Wikipedia:

  • LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
  • CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
  • CR: Commodore machines, Apple II family, Mac OS up to version 9 and OS-9

I translate this into these line endings in general:

  • Windows: '\r\n'
  • Mac (OS 9-): '\r'
  • Mac (OS 10+): '\n'
  • Unix/Linux: '\n'

You need to make your scanner/parser handle the unix version, too.

查看更多
神经病院院长
7楼-- · 2019-01-07 22:03

You can get the standard line ending for your current OS from:

System.getProperty("line.separator")
查看更多
登录 后发表回答