Quickly replace first line of large file

2019-04-04 11:43发布

问题:

I have many large csv files (1-10 gb each) which I'm importing into databases. For each file, I need to replace the 1st line so I can format the headers to be the column names. My current solution is:

using (var reader = new StreamReader(file))
{
    using (var writer = new StreamWriter(fixed))
    {
        var line = reader.ReadLine();
        var fixedLine = parseHeaders(line);
        writer.WriteLine(fixedLine);

        while ((line = reader.ReadLine()) != null)
            writer.WriteLine(line);
    }
}

What is a quicker way to only replace line 1 without iterating through every other line of these huge files?

回答1:

If you can guarantee that fixedLine is the same length (or less) as line, you can update the files in-place instead of copying them.

If not, you can possibly get a little performance improvement by accessing the .BaseStream of your StreamReader and StreamWriter and doing big block copies (using, say, a 32K byte buffer) to do the copying, which will at least eliminate the time spent checking every character to see if it's an end-of-line character as happens now with reader.ReadLine().



回答2:

The only thing that can significantly speed it up is if you can really replace first line. If new first line is no longer than old one - replace (with space padding if needed) the first line carefully.

Otherwise - you have to create new file and copy the rest after first line. You may be able to optimize copying a bit by adjusting buffer sizes/explicit copy as binary/per-allocating size, but it will not change the fact that you need to copy whole file.

One more cheat if you planning to drop CSV data into DB anyway: if order does not matter you can read some lines from the beginning, replace them with new header and add the removed lines to the end of the file.

Side note: if this is one-time operation I'd simply copy files and be done with it... Debugging code that inserts data into middle of text file with potentially different encoding may not worth an effort.



标签: c# replace