Tips for writing a file parser in Java? [closed]

2019-03-29 08:36发布

EDIT: I'm mostly parsing "comma-seperated values", fuzzy brought that term to my attention.

Interpreting the blocks of CSV are the main question here.

I know how to read the file into something like a String[] and some of the basic features of String, but I don't think using methods like contains() and analyzing everything character by character will work.

What are some ways I can do this in a smarter way?

Example of a line:

-barfoob: boobs, foob, "foo bar"

标签: java parsing
12条回答
太酷不给撩
2楼-- · 2019-03-29 08:45

Depending on how complicated your "schema" is, a regular expression might be what you want. If there is a lot of nesting then it might be easiest to convert to XML or JSON and use a prebuilt parser.

查看更多
该账号已被封号
3楼-- · 2019-03-29 08:49

People are right about standard formats being best practice, but let's set that aside.

Assuming that the example you give is representative, the task is pretty trivial.

You show a line with an initial token, demarked with a colon-space, then a list of comma-separated values. Separate at that first colon-space, and then use split() on the part to the right. Handling of the quotes is trivial, too.

查看更多
【Aperson】
4楼-- · 2019-03-29 08:51

Since the input is "formatted similarly to HTML", then it is likely that your data is best represented using a tree-like structure, and also, it is likely that it is XML or similar to XML.

If this is the case, I propose the smartest way to parse your file is to use an XML parser.

Here are some resources you may find helpful:

HTH

查看更多
地球回转人心会变
5楼-- · 2019-03-29 08:52

If the XML is valid, I personally prefer using http://www.xom.nu simply because it features a nice DOM model. As pointed out, though, there are parsers in J2SE.

查看更多
Rolldiameter
6楼-- · 2019-03-29 08:55

This and digging through wikipedia for related articles will probably suffice.

查看更多
Explosion°爆炸
7楼-- · 2019-03-29 09:04

There's a reason that everyone assumes you're talking about XML: inventing a proprietary text-based file format requires very strong justification in the face of the maturity and easy availability of XML parsers.

And your question indicates that you have very little prior knowledge about parsers (otherwise you'd be writing an ANTLR or JavaCC grammar instead of asking this question) - which is another strong argument against rolling your own, except as a learning experience.

查看更多
登录 后发表回答