Read a txt file fscanf vs. fread vs. textscan [dup

2019-07-21 06:24发布

问题:

This question already has an answer here:

  • Fastest Matlab file reading? 4 answers

I have a .txt file that has been generated from SQL-2005 (in ANSI format). I have tried textscan and fscanf. The entire txt file has only numeric data.

Online resources suggest that fscanf is FASTER than textscan but I found it otherwise.

  • Textscan was much faster than fscanf

I want to try this with fread as well but I do not know how to import data using fread. Can you please suggest/comment? Thanks.

fName     = 'Test.txt'    % From SQL in ANSI format, 5million rows, 5 Cols
Numofrows = 1000000 ; %1million
Numcols   = 5 ;

fid = fopen(fName, 'r');
C   = textscan(fid, '%f %f %f %f %f', Numofrows ) ;
C   = cell2mat(C);

fclose(fid); fid = fopen(fName, 'r');
[C, Count] = fscanf(fid, '%f %f %f %f %f', Numofrows * Numcols ) ;
C = reshape(C, Count./Numofrows , Numofrows ) ; C=C';

回答1:

Ideally you would be able to get your data into a binary format and then use fread to directly read double precision number in. I would expect fread to be a lot faster in that case. (String-to-number conversions are expensive, and a raw binary format will result in a much smaller file).

Otherwise you can read characters using fread and then run a string-to-number conversion on the incoming data (sscanf seems to be the best). The only trick is that you need to get your read batches to end on a line break, otherwise your text-to-string operation is likely to give unpredictable results. You can do that be first reading a large batch of characters, then either backing up until you reach a line break, or reading in additional characters until you find the end of the line. I have found this is slightly faster than either textscan of fscanf ... but our numbers do not match for other reasons; I'm not sure what to believe.

Example code of the second method is included in a previous answer (including a lot of overlap with this question), as well as some timing results. https://stackoverflow.com/a/9441839/931379.



回答2:

There is another option that you did not list: load

   L = load(fName);

It is very simple, and will figure out the format automatically for you. It does have some limitations - The format should have same amount of numbers in each line.