Tips for writing a file parser in Java? [closed]

2019-03-29 08:36发布

EDIT: I'm mostly parsing "comma-seperated values", fuzzy brought that term to my attention.

Interpreting the blocks of CSV are the main question here.

I know how to read the file into something like a String[] and some of the basic features of String, but I don't think using methods like contains() and analyzing everything character by character will work.

What are some ways I can do this in a smarter way?

Example of a line:

-barfoob: boobs, foob, "foo bar"

标签: java parsing
12条回答
何必那么认真
2楼-- · 2019-03-29 09:05

You may be able to use the Neko HTML parser to some degree. It depends on how it handles the non-standard HTML.

查看更多
不美不萌又怎样
3楼-- · 2019-03-29 09:07

I think the java.util.Scanner will help you. Have a look at http://java.sun.com/javase/6/docs/api/java/util/Scanner.html

查看更多
我想做一个坏孩纸
4楼-- · 2019-03-29 09:08

If the document is valid XML, then any of the other answers will work. If it's not, you'll have to lex.

查看更多
Ridiculous、
5楼-- · 2019-03-29 09:10

you should look at ANTLR even if you want to write the parser yourself, ANTLR is a great alternative. Or at least look at YAML

查看更多
在下西门庆
6楼-- · 2019-03-29 09:10

After looking at your sample input, I fail to see any resemblance to HTML or XML:

-barfoob: boobs, foob, "foo bar"

If this is what you want to parse, I have an alternative suggestion, to use the Java properties parser (comes with standard Java), and then parse the remainder of each line using your own custom code. You will need to refactor your format somewhat in order for this to work, so it's up to you.

barfoob=boobs, foob, "foo bar"

Java properties will be be able to return you barfoob as the property name, and boobs, foob, "foo bar" as the property value. That's where you can use your custom code to split the property value into boobs, foob and foo bar.

查看更多
beautiful°
7楼-- · 2019-03-29 09:10

I'd strongly advice to not reinvent the wheel and use an existing solution like Flatworm, Fixedformat4j or jFFP that can all parse positional or comma-separated values files (personally, I recommend Flatworm).

查看更多
登录 后发表回答