In a project, I have to read a file, and i have to work with the number of characters in a file, and is there a way to get number of characters without reading it character by character (otherwise i will have to read the file twice, once just to find the number of characters in it).
Is it even possible?
You can try this:
FILE *fp = ... /*open as usual*/;
fseek(fp, 0L, SEEK_END);
size_t fileSize = ftell(fp);
However, this returns the number of bytes in the file, not the number of characters. It is not the same unless the encoding is known to be one byte per character (e.g. ASCII).
You'd need to "rewind" the file back to the beginning after you've learned the size:
fseek(fp, 0L, SEEK_SET);
Yes.
Seek to the end get the position of the end that is the size.
FILE* file = fopen("Plop");
fseek(file, 0, SEEK_END);
size_t size = ftell(file); // This is the size of the file.
// But note it is in bytes.
// Also note if you are reading it into memory this is
// is the value you want unless you plan to dynamically
// convert the character encoding as you read.
fseek(file, 0, SEEK_SET); // Move the position back to the start.
In C++ the stream have the same functionality:
std::ifstream file("Plop");
file.seekg(0, std::ios_base::end);
size_t size = file.tellg();
file.seekg(0, std::ios_base::beg);
The simple answer is no. More precisely, it's system dependent: under
Unix, it's possible (e.g. using stat
); under Windows, it's not
possible for a text file, but if you're reading the file in binary,
there's a function GetFileSize
which can be used.
Although not guaranteed, under all of the implementations I know (for
these two platforms), seeking to the end of the file, then doing an
ftell
, will return something which, when converted to a sufficiently
large integral type, will give the same results as the above (with the
same restrictions).
Finally: why do you need this information? If it's just to allocate an
appropriately sized buffer, even with a text file, GetFileSize
(and
tell
after seeking to the end) will return a value slightly larger
than the number of bytes you can read. You're buffer will be slightly
oversized, but this is generally not a problem.
I think you are likely looking for a dynamic memory solution. What you actually asked is "is there a way to get the number of characters in a file without reading it?". The answer (assuming one byte per character) is yes, you can use the stat
call to get the file size, and the file size in bytes is the number of characters. With UTF-8 the answer is no, but let's put that aside for the moment since just-learning computer scientists usually don't worry about internationalization.
I think the reason you want to know how many characters there are is so that you can have storage big enough to hold them all. You don't need to know how big the file is to store the whole thing.
If you have an std::vector<char>
, it can start out able to hold ten characters, then grow to hold twenty, then ten thousand... And when you're done reading the file, it will hold them all, even though you never knew how many there would be.
Off the top of my head is so have a look at the file size and divide that by how many bytes a single character is?
Problems arise when dealing with white space and end lines etc.