Alternatives for enhanced reading and parsing text

I need to read from a variety of different text files (I've some delimited files and some fixed width files). I've considered parsing the files line by line (slow using the File.ReadLine type methods) and reading the file using the ODBC text driver (faster) but does anyone have any other (better) suggestions? I'm using .NET/C#.

标签： .net file-io text-files

9条回答

甜甜的少女心

2楼-- · 2019-04-12 00:27

Your question is a little vague. I assume that the text files contain structured data, not just random lines of text.

If you are parsing the files yourself then .NET has a library function to read all the lines from a text file into an array of strings (File.ReadAllLines). If you know your files are small enough to hold in memory, then you can use this method and iterate over the array using a regular expression to validate & extract the fields.

Excel files are a different ball game. .XLS files are binary, not text, so you would need to use a 3rd party library to access them. .XLSX files from Excel 2007 contain compressed XML data, so once again you would need to decompress the XML then use an XML parser to get at the data. I would not recommend writing your own XML parser, unless you feel the need for the intellectual exercise.

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2019-04-12 00:29

I agree with John,

For example:-

using System.IO;

...

public class Program {
  public static void Main() {
    foreach(string s in File.ReadAllLines(@"c:\foo\bar\something.txt") {
      // Do something with each line...
    }
  }
}

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-04-12 00:30

Regarding reading XLS Files:

If you have Microsoft Office XP and above, you have access to the already included .NET SDK Office Libraries, where you can "natively" read XLS files, Word, PPT, etc. Please note that under Office XP you have to manually check that during install (unless you had .NET previously installed).

I don't know if these libraries are available as a separate package if you don't have Microsoft Office.

For some obscure reason, all these libraries (including the latest versions from Office 2007 -a.k.a.: Office 12), are COM components that are a pain to use, cause ugly dependencies and are not backwards compatible. I.E.: if you have some methods that work with Office XP (Office11), and you install that onto a customer with Office 12, it doesn't work, because some interfaces where changed. So you need to maintain two set of "libraries" and methods to deal with that. The same holds true if use Office 12 libraries to program, and you customer has Office 11. Your libraries don't work. :S

I don't know why Microsoft never created a Microsoft.Office.XXXX managed library (wrapper) around those ugly things.

Anyways, your question is quite strange, try to follow some advice here. Good luck!

0人赞添加讨论(0) 举报

做自己的国王

5楼-- · 2019-04-12 00:31

The ODBC text driver is now rather out of date - it has no Unicode support.

Amazingly MS Excel still uses it, so if you open a Unicode CSV in Excel 2007 (rather than import it) you lose all non-ASCII chars.

You best bet is to use .Net's file reading methods, as others have suggested.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

6楼-- · 2019-04-12 00:33

Answering my own question:

I ended up using the Microsoft.VisualBasic.FileIO.TextFieldParser object, see:

http://msdn.microsoft.com/en-us/library/f68t4563.aspx

~~(example of implementation here)~~

This allows me to handle csv files without worrying about how to cope with whether fields are enclosed in quotes, contain commas, escaped quotes etc.

0人赞添加讨论(0) 举报

倾城　Initia

7楼-- · 2019-04-12 00:36

If the files are relatively small you can use the File class. It has these methods which may help you:

ReadAllBytes
ReadAllLines
ReadAllText

0人赞添加讨论(0) 举报

1 2 下一页

Alternatives for enhanced reading and parsing text

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间