This question already has an answer here:
-
Fastest Matlab file reading?
4 answers
I have a .txt file that has been generated from SQL-2005 (in ANSI format). I have tried textscan
and fscanf
. The entire txt file has only numeric
data.
Online resources suggest that fscanf is FASTER than textscan but I found it otherwise.
- Textscan was much faster than fscanf
I want to try this with fread
as well but I do not know how to import data using fread. Can you please suggest/comment? Thanks.
fName = 'Test.txt' % From SQL in ANSI format, 5million rows, 5 Cols
Numofrows = 1000000 ; %1million
Numcols = 5 ;
fid = fopen(fName, 'r');
C = textscan(fid, '%f %f %f %f %f', Numofrows ) ;
C = cell2mat(C);
fclose(fid); fid = fopen(fName, 'r');
[C, Count] = fscanf(fid, '%f %f %f %f %f', Numofrows * Numcols ) ;
C = reshape(C, Count./Numofrows , Numofrows ) ; C=C';
Ideally you would be able to get your data into a binary format and then use fread
to directly read double precision number in. I would expect fread
to be a lot faster in that case. (String-to-number conversions are expensive, and a raw binary format will result in a much smaller file).
Otherwise you can read characters using fread
and then run a string-to-number conversion on the incoming data (sscanf seems to be the best). The only trick is that you need to get your read batches to end on a line break, otherwise your text-to-string operation is likely to give unpredictable results. You can do that be first reading a large batch of characters, then either backing up until you reach a line break, or reading in additional characters until you find the end of the line. I have found this is slightly faster than either textscan of fscanf ... but our numbers do not match for other reasons; I'm not sure what to believe.
Example code of the second method is included in a previous answer (including a lot of overlap with this question), as well as some timing results. https://stackoverflow.com/a/9441839/931379.
There is another option that you did not list: load
L = load(fName);
It is very simple, and will figure out the format automatically for you. It does have some limitations - The format should have same amount of numbers in each line.