I have a program in C++ that needs to return a line that a specific word appears in. For instance, if my file looks like this:
the cow jumped over
the moon with the
green cheese in his mouth
and I need to print the line that has "with". All the program gets is the offset from the beginning of the file (in this case 24, since "with" is 24 characters from the beginning of the file).
How do I print the whole line "the moon with the", with just the offset?
Thanks a lot!
A good solution is reading the file from the beginning until the desired position (answer by @Chet Simpson). If you want optimization (e.g. very large file, position somewhere in the middle, typical lines rather short), you can read the file backwards. However, this only works with files opened in binary mode (any file on unix-like platforms; open the file with ios_base::binary
parameter on Windows).
The algorithm goes as follows:
- Go back a few bytes in file
- Read the few bytes
- If there is an end-of-line there, the rest is easy
- Otherwise, repeat
Code (tested on Windows):
std::string GetSurroundingLine(std::istream& f, std::istream::pos_type start_pos)
{
std::istream::pos_type prev_pos = start_pos;
std::istream::pos_type pos;
char buffer[40]; // typical line length, so typical iteration count is 1
std::istream::pos_type size = sizeof(buffer);
// Look for the beginning of the line that includes the given position
while (true)
{
// Move back 40 bytes from prev_pos
if (prev_pos < size)
pos = 0;
else
pos = prev_pos - size;
f.seekg(pos);
// Read 40 bytes
f.read(buffer, prev_pos - pos);
if (!f)
throw;
// Look for a newline byte, which terminates previous line
int eol_pos;
for (eol_pos = sizeof(buffer) - 1; eol_pos >= 0; --eol_pos)
if (buffer[eol_pos] == '\n')
break;
// If found newline or got to beginning of file - done looking
if (eol_pos >= 0 || pos == (std::istream::pos_type)0)
{
pos += eol_pos + 1;
break;
}
}
// Position the read pointer
f.seekg(pos);
// Read the line
std::string s;
std::getline(f, s, '\n');
return s;
}
Edit: On Windows-like platforms, where end-of-line is marked by \r\n
, since you have to use binary mode, the output string will contain the extra character \r
(unless there is no end-of-line at end-of-file), which you can throw away.
You can do this by reading each line individually and recording the file position before and after the read. Then it's just a simple check to see if the offset of the word falls within the bounds of that line.
#include <iostream>
#include <fstream>
#include <string>
std::string LineFromOffset(
const std::string &filename,
std::istream::pos_type targetIndex)
{
std::ifstream input(filename);
// Save the start position of the first line. Should be zero of course.
std::istream::pos_type lineStartIndex = input.tellg();
while(false == input.eof())
{
std::string line;
std::getline(input, line);
// Get the end position of the line
std::istream::pos_type lineEndIndex = input.tellg();
// If the index of the word we're looking for in the bounds of the
// line, return it
if(targetIndex >= lineStartIndex && targetIndex < lineEndIndex)
{
return line;
}
// The end of this line is the start of the next one. Set it
lineStartIndex = lineEndIndex;
}
// Need a better way to indicate failure
return "";
}
void PrintLineTest()
{
std::string str = LineFromOffset("test.txt", 24);
std::cout << str;
}
There are functions of each of the operation
fopen - open the file
fseek - seek the file to the desired offset
fread - read the amount of bytes you want
fclose - close the file