Obviously one could loop through a file using fgetl or similar function and increment a counter, but is there a way to determine the number of lines in a file without doing such a loop?
问题:
回答1:
I like to use the following code for exactly this task
fid = fopen('someTextFile.txt', 'rb');
%# Get file size.
fseek(fid, 0, 'eof');
fileSize = ftell(fid);
frewind(fid);
%# Read the whole file.
data = fread(fid, fileSize, 'uint8');
%# Count number of line-feeds and increase by one.
numLines = sum(data == 10) + 1;
fclose(fid);
It is pretty fast if you have enough memory to read the whole file at once. It should work for both Windows- and Linux-style line endings.
Edit: I measured the performance of the answers provided so far. Here is the result for determining the number of lines of a text file containing 1 million double values (one value per line). Average of 10 tries.
Author Mean time +- standard deviation (s)
------------------------------------------------------
Rody Oldenhuis 0.3189 +- 0.0314
Edric (2) 0.3282 +- 0.0248
Mehrwolf 0.4075 +- 0.0178
Jonas 1.0813 +- 0.0665
Edric (1) 26.8825 +- 0.6790
So fastest are the approaches using Perl and reading all the file as binary data. I would not be surprised, if Perl internally also read large blocks of the file at once instead of looping through it line by line (just a guess, do not know anything about Perl).
Using a simple fgetl()
-loop is by a factor of 25-75 slower than the other approaches.
Edit 2: Included Edric's 2nd approach, which is much faster and on-par with the Perl solution, I'd say.
回答2:
I think a loop is in fact the best - all other options so far suggested either rely on external programs (need to error-check; need str2num; harder to debug / run cross-platform etc.) or read the whole file in one go. Loops aren't so bad. Here's my variant
function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(@() fclose(fh));
count = 0;
while ischar(fgetl(fh))
count = count + 1;
end
end
EDIT: Jonas rightly points out that the above loop is really slow. Here's a faster version.
function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(@() fclose(fh));
count = 0;
while ~feof(fh)
count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
end
end
It's still not as fast as wc -l
, but it's not a disaster either.
回答3:
I found a nice trick here:
if (isunix) %# Linux, mac
[status, result] = system( ['wc -l ', 'your_file'] );
numlines = str2num(result);
elseif (ispc) %# Windows
numlines = str2num( perl('countlines.pl', 'your_file') );
else
error('...');
end
where 'countlines.pl'
is a perl script, containing
while (<>) {};
print $.,"\n";
回答4:
You can read the entire file at once, and then count how many lines you've read.
fid = fopen('yourFile.ext');
allText = textscan(fid,'%s','delimiter','\n');
numberOfLines = length(allText{1});
fclose(fid)
回答5:
I would recommend using an external tool for this. For example an app called cloc
, which you can download here for free.
On linux you then simply type cloc <repository path>
and get
YourPC$ cloc <directory_path>
87 text files.
81 unique files.
23 files ignored.
http://cloc.sourceforge.net v 1.60 T=0.19 s (311.7 files/s, 51946.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
MATLAB 59 1009 1074 4993
HTML 1 0 0 23
-------------------------------------------------------------------------------
SUM: 60 1009 1074 5016
-------------------------------------------------------------------------------
They also claim it should work on windows.