Storing values from a CSV file to a dynamic char a

2020-04-24 16:58发布

问题:

I am trying to store certain values from a CSV file which contains the following details in a dynamic char array. I am able to read the CSV file and have provided the code for that in the description below. I would appreciate if anyone could please let me know what approach shall I use to store certain data from a row in a CSV in a dynamic char array. Thanks! I have also written a substring function which returns a particular string by putting the parameters of starting index, end index and source.

CSV File :- place,name,Lat,Long,1/22/20,1/23/20,1/24/20 I want to store the dates after Long (without commas) in a dynamic char array(I cannot use vectors) Thanks!

char* substring(char* source, int startIndex, int endIndex)
{
    int size = endIndex - startIndex + 1;
    char* s = new char[size+1];
    strncpy(s, source + startIndex, size); 
    s[size]  = '\0'; //make it null-terminated
    return s;
} 

char** readCSV(const char* csvFileName, int& csvLineCount)
{
    ifstream fin(csvFileName);
    if (!fin)
    {
        return nullptr;
    }
    csvLineCount = 0;
    char line[1024];
    while(fin.getline(line, 1024))
    {
        csvLineCount++;
    };
    char **lines = new char*[csvLineCount];
    fin.clear();
    fin.seekg(0, ios::beg);
    for (int i=0; i<csvLineCount; i++)
    {
        fin.getline(line, 1024);   
        lines[i] = new char[strlen(line)+1];
        strcpy(lines[i], line);
    };
    fin.close();
    return lines;
}

回答1:

Per your comment that you are still having problems dynamically allocating for the lines read from the csv file (and with the caveat, programs today should avoid the old pointer-to-pointer to char in favor of vector of strings) -- one reason is you are approaching the allocation of pointers in an inefficient manner. Instead of a single pass through your input file, you make 2-passes through your file, one to read the number of lines (to allocate pointers), and then again to read and allocate storage for each line. While that is one approach -- it is a very inefficient approach since file I/O is one of the least efficient tasks you can perform (and you do it twice)

Instead, simply allocate some initial number of pointers (1 or 2 or 8 is a good starting point). You then keep track of the number of allocated pointers available (say with size_t avail = 2; and the number of pointers used (say size_t used = 0;). Then as you are reading lines you check if (used == available) to know when it is time to reallocate more pointers. You can simply reallocate 2X the current number of pointers using a temporary char** pointer. You then copy the existing pointers to tmp, delete[] lines; and then assign the new block of memory containing the pointers back to lines.

Another change to make is to open your std::ifstream file stream in main(), validate it is open for reading, and then pass a reference to the open stream as a parameter to your function instead of passing the filename (if you can't successfully open the stream in the caller -- there is no need to make the function call to count lines)

To read the lines from your stream handling the allocations as you go, you could do something like the following:

#include <iostream>
#include <fstream>
#include <cstring>

#define MAXC 1024

char **readcsv (std::ifstream& fin, size_t& csvLineCount)
{
    size_t avail = 2, used = 0;                     /* allocated/used counters */
    char line[MAXC], **lines = new char*[avail];    /* line and lines */

    while (fin.getline (line, MAXC)) {              /* loop reading each line */
        size_t len;                                 /* for line length */
        if (used == avail) {                        /* all pointers used? */
            char **tmp = new char *[2 * avail];     /* allocate twice as many */
            memcpy (tmp, lines, used * sizeof *lines);  /* copy lines to new tmp */
            delete[] lines;                         /* free existing pionters */
            lines = tmp;                            /* set lines to new block */
            avail *= 2;                             /* update ptrs available */
        }
        lines[used] = new char[(len = strlen(line)) + 1];   /* alloc for lines[used] */
        memcpy (lines[used++], line, len + 1);      /* copy line to lines[used] */
    }
    csvLineCount = used;                            /* update csvLineCount to used */

    return lines;       /* return lines */
}

Adding a short main() that takes the filename to read as the first argument to the program and opens the stream in main() before passing a reference to the open stream to your read function would be:

int main (int argc, char **argv) {

    if (argc < 2) { /* validate 1 argument given for filename */
        std::cerr << "error: insufficient input.\n"
                     "usage: " << argv[0] << " filename.\n";
        return 1;
    }

    char **lines = nullptr;         /* pointer-to-pointer to char */
    size_t nlines = 0;              /* line counter */
    std::ifstream f (argv[1]);      /* file stream */

    if (!f.is_open()) {     /* validate file open for reading */
        std::cerr << "error: file open failed '" << argv[1] << "'.\n";
        return 1;
    }

    if (!(lines = readcsv (f, nlines))) {   /* call line read function/validate */
        std::cerr << "error: readcsv() failed.\n";
        return 1;
    }

    for (size_t i = 0; i < nlines; i++) {   /* loop outputting lines, freeing memory */
        std::cout << lines[i] << '\n';
        delete[] lines[i];                  /* free lines */
    }
    delete[] lines;                         /* free pointers */
}

Example Input File

$ cat dat/latlon.csv
place1,name1,Lat1,Long1,1/22/20,1/23/20,1/24/20
place2,name2,Lat2,Long2,1/22/20,1/23/20,1/24/20
place3,name3,Lat3,Long3,1/22/20,1/23/20,1/24/20
place4,name4,Lat4,Long4,1/22/20,1/23/20,1/24/20

Example Use/Output

All lines successfully stored in allocated memory:

$ ./bin/read_alloc_csv_lines dat/latlon.csv
place1,name1,Lat1,Long1,1/22/20,1/23/20,1/24/20
place2,name2,Lat2,Long2,1/22/20,1/23/20,1/24/20
place3,name3,Lat3,Long3,1/22/20,1/23/20,1/24/20
place4,name4,Lat4,Long4,1/22/20,1/23/20,1/24/20

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/read_alloc_csv_lines dat/latlon.csv
==8108== Memcheck, a memory error detector
==8108== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8108== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8108== Command: ./bin/read_alloc_csv_lines dat/latlon.csv
==8108==
place1,name1,Lat1,Long1,1/22/20,1/23/20,1/24/20
place2,name2,Lat2,Long2,1/22/20,1/23/20,1/24/20
place3,name3,Lat3,Long3,1/22/20,1/23/20,1/24/20
place4,name4,Lat4,Long4,1/22/20,1/23/20,1/24/20
==8108==
==8108== HEAP SUMMARY:
==8108==     in use at exit: 0 bytes in 0 blocks
==8108==   total heap usage: 10 allocs, 10 frees, 82,712 bytes allocated
==8108==
==8108== All heap blocks were freed -- no leaks are possible
==8108==
==8108== For counts of detected and suppressed errors, rerun with: -v
==8108== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

I have left handling the substring to you since your comment pertained to problems reading the lines into allocated memory. If you have problems with that later, just let me know. Also let me know if you have any further questions about how things were done above.



回答2:

I would slightly change your code to iterate the fields of the header line instead of iterating the lines of a file:

char* substring(const char *start, const char *end)
{
    int size = end - start + 1;
    char* s = new char[size + 1];
    strncpy(s, start, size);
    s[size] = '\0'; //make it null-terminated
    return s;
}

char** readHeaderDates(const char* csvFileName, int& csvDateCount)
{
    ifstream fin(csvFileName);
    if (!fin)
    {
        return nullptr;
    }
    csvDateCount = 0;
    char line[1024];
    if (! fin.getline(line, 1024))   // read header line
    {
        return nullptr;
    };
    fin.close();
    // count commas in line:
    for (const char *ix = line;; ix = strchr(ix, ',')) {
        if (NULL == ix) break;
        csvDateCount += 1;
        ix += 1;
    }
    csvDateCount -= 3;
    if (csvDateCount <= 0) {
        return nullptr;
    }
    char **dates = new char*[csvDateCount];
    const char *ix = line;
    for (int i = 0; i < 4; i++) {
        ix = strchr(ix, ',') + 1;
    }
    for (int i = 0; i<csvDateCount; i++)
    {
        const char *start = ix;
        const char *end = strchr(ix, ',');
        if (nullptr == end) end = start + strlen(start);
        dates[i] = substring(start, end);
    }
    return dates;
}

BEWARE: untested code...



标签: c++ arrays