I feel embarrassed to ask this question as I feel like I should already know. However, given I don't....I want to know how to read large files from disk to a database without getting an OutOfMemory exception. Specifically, I need to load CSV (or really tab delimited files).
I am experimenting with CSVReader
and specifically this code sample but I'm sure I'm doing it wrong. Some of their other coding samples show how you can read streaming files of any size, which is pretty much what I want (only I need to read from disk), but I don't know what type of IDataReader
I could create to allow this.
I am reading directly from disk and my attempt to ensure I don't ever run out of memory by reading too much data at once is below. I can't help thinking that I should be able to use a BufferedFileReader
or something similar where I can point to the location of the file and specify a buffer size and then CsvDataReader
expects an IDataReader
as it's first parameter, it could just use that. Please show me the error of my ways, let me be rid of my GetData
method with it's arbitrary file chunking mechanism and help me out with this basic problem.
private void button3_Click(object sender, EventArgs e)
{
totalNumberOfLinesInFile = GetNumberOfRecordsInFile();
totalNumberOfLinesProcessed = 0;
while (totalNumberOfLinesProcessed < totalNumberOfLinesInFile)
{
TextReader tr = GetData();
using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
{
csvData.Settings.HasHeaders = false;
csvData.Settings.SkipEmptyRecords = true;
csvData.Settings.TrimWhitespace = true;
for (int i = 0; i < 30; i++) // known number of columns for testing purposes
{
csvData.Columns.Add("varchar");
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(@"Data Source=XPDEVVM\XPDEV;Initial Catalog=MyTest;Integrated Security=SSPI;"))
{
bulkCopy.DestinationTableName = "work.test";
for (int i = 0; i < 30; i++)
{
bulkCopy.ColumnMappings.Add(i, i); // map First to first_name
}
bulkCopy.WriteToServer(csvData);
}
}
}
}
private TextReader GetData()
{
StringBuilder result = new StringBuilder();
int totalDataLines = 0;
using (FileStream fs = new FileStream(pathToFile, FileMode.Open, System.IO.FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader sr = new StreamReader(fs))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("D\t"))
{
totalDataLines++;
if (totalDataLines < 100000) // Arbitrary method of restricting how much data is read at once.
{
result.AppendLine(line);
}
}
}
}
}
totalNumberOfLinesProcessed += totalDataLines;
return new StringReader(result.ToString());
}