I need to read a matrix from a file which we dont

2019-07-16 12:53发布

问题:

I have a struct like that

struct Data {
    int ID;
double test_sample[2065][1];
int XX_row;
int XX_col
double **XX;                        //size=[2065][changing]
double **alpha_new;                 //size=[changing][1]
int alpha_new row;
int alpha_new_col;
double t3;
double kernel_par;

}person[20];

I've written this struct for every person (for 20 person) into 20 file using fwrite:

fwrite(&person, sizeof( struct Data ), 1,Ptr );

Now I have 20 files in binary. Each file includes these variables for one person. All is Ok for now.

Problem: I can't read a file and assingn it to a struck because in every file, dimension of XX and alpha_new Matrix is different. (in a file [2065][8],some of them[2065][12])

I need to read these variables using fread (or different) and input to the face recognition program... Is there a way to read variables individually in the file or should I change the writing method as well?

I don't know how to write all variables matrixes in one file without using struct!

I hope I can explain my problem here, sorry for my poor english, I waiting for your help to finish my Final project in c; I am using visual studio 2012

回答1:

For such a complex structure, it's a modestly major undertaking. Here's a not-so-short SSCCE (Short, Self-Contained, Complete Example). There are really 3 files slammed into one:

  • stderr.h — declarations of error reporting functions (top 10 lines)
  • serialize.c — the serialization code (just under 300 lines in between)
  • stderr.c — the error reporting functions (bottom 40 lines)

I'm not planning to explain the error reporting functions. They work more or less like printf() as far as formatting arguments goes, but they write to standard error, not standard output, and they include the program name as a prefix, and the error derived from errno. The emalloc() function checks memory allocations, reporting an error and exiting if the allocation fails. This error handling is adequate for simple programs; it is not adequate for complex programs that need to recover if there's a memory problem, saving the work or whatever.

Within the real serialization code, there are 4 groups of functions, plus main() to orchestrate.

  1. Allocation and initialization functions to create and initialize the structures.
  2. Print functions to dump the structures.
  3. Export functions to serialize the data for export.
  4. Import functions to deserialize the data for import.

The print functions allow a human to see the data, and you could save the output to file and compare the export data with the import data to ensure that they're the same.

The code would be simpler if you used a structure to describe all your 2D arrays, such as:

typedef struct Array_2D
{
    double **data;
    size_t   nrows;
    size_t   ncols;
} Array_2D;

You'd then simply embed 3 of these into your struct Data:

struct Data
{
    int       ID;
    double    t3;
    double    kernel_par;
    Array_2D  test_sample;
    Array_2D  XX;
    Array_2D  alpha_new;
};

I'm really not clear what the benefit of double test_sample[2065][1]; is compared with double test_sample[2065];. I will observe it makes the code more complex than it would be otherwise. I end up treating it as a normal 1D array of double by using &data->test_sample[0][0] as the starting point.

There's more than one way to do the serialization. I've opted for a 2D array of doubles to be represented by N 1D arrays, and each 1D array is prefixed by a size_t describing the size of the 1D array. That gives some redundancy in the files, which means that there's slightly better error detection. It would be feasible to simply output the two dimensions of a 2D array, followed by rows x cols values. Indeed, at one point, I had the import code assuming that while the export code was using the other technique — this did not make for a happy runtime when the numbers were misunderstood and I was getting debug output and errors like:

test_sample: 2.470328e-323, 1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00
2D array size 4617315517961601024 x 5 = 4639833516098453504
serialize(46983) malloc: *** mmap(size=45035996273704960) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
./serialize: Out of memory (12: Cannot allocate memory)

That's a lot of memory...the 2.470328e-323 was a symptom of trouble, too. (So no, I didn't get it all right on the first time I ran the code.)

I did most of the testing with SAMPLE_SIZE at 5 and NUM_PERSON at 3.

serialize.c

/* stderr.h */
#ifndef STDERR_H_INCLUDED
#define STDERR_H_INCLUDED

static void err_setarg0(char const *argv0);
static void err_sysexit(char const *fmt, ...);
static void err_syswarn(char const *fmt, ...);

#endif /* STDERR_H_INCLUDED */

#include <stdio.h>
#include <stdlib.h>

enum { SAMPLE_SIZE = 20 }; /* 2065 in original */
enum { NUM_PERSON  = 10 }; /*   20 in original */

struct Data
{
    int ID;
    double test_sample[SAMPLE_SIZE][1]; //Why?
    size_t XX_row;
    size_t XX_col;
    double **XX;                        //size=[SAMPLE_SIZE][changing]
    double **alpha_new;                 //size=[changing][1]
    size_t alpha_new_row;
    size_t alpha_new_col;
    double t3;
    double kernel_par;
} person[NUM_PERSON];

typedef struct Data Data;

static void *emalloc(size_t nbytes)
{
    void *space = malloc(nbytes);
    if (space == 0)
        err_sysexit("Out of memory");
    return space;
}

static void free_data(Data *data)
{
    for (size_t i = 0; i < data->XX_row; i++)
        free(data->XX[i]);
    free(data->XX);

    for (size_t i = 0; i < data->alpha_new_row; i++)
        free(data->alpha_new[i]);
    free(data->alpha_new);

    data->ID = 0;
    data->t3 = 0.0;
    data->kernel_par = 0.0;
    data->XX = 0;
    data->XX_row = 0;
    data->XX_col = 0;
    data->alpha_new = 0;
    data->alpha_new_row = 0;
    data->alpha_new_col = 0;
}

static void free_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
        free_data(&data[i]);
}

static double **alloc_2D_double(size_t rows, size_t cols)
{
    double **data = emalloc(rows * sizeof(*data));
    for (size_t i = 0; i < rows; i++)
    {
        data[i] = emalloc(cols * sizeof(*data[i]));
    }
    return data;
}

static void populate_data(Data *data, size_t entry_num)
{
    /* entry_num serves as 'changing' size */
    data->ID = entry_num;
    data->t3 = entry_num * SAMPLE_SIZE;
    data->kernel_par = (1.0 * SAMPLE_SIZE) / entry_num;

    for (size_t i = 0; i < SAMPLE_SIZE; i++)
        data->test_sample[i][0] = i + entry_num;

    data->XX_row = SAMPLE_SIZE;
    data->XX_col = entry_num;
    data->XX = alloc_2D_double(data->XX_row, data->XX_col);

    for (size_t i = 0; i < data->XX_row; i++)
    {
        for (size_t j = 0; j < data->XX_col; j++)
            data->XX[i][j] = i * data->XX_col + j;
    }

    data->alpha_new_row = entry_num;
    data->alpha_new_col = 1;
    data->alpha_new = alloc_2D_double(data->alpha_new_row, data->alpha_new_col);

    for (size_t i = 0; i < data->alpha_new_row; i++)
    {
        for (size_t j = 0; j < data->alpha_new_col; j++)
            data->alpha_new[i][j] = i * data->alpha_new_col + j;
    }
}

static void populate_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
        populate_data(&data[i], i+1);
}

static void print_1D_double(FILE *fp, char const *tag, double const *values, size_t nvalues)
{
    char const *pad = "";
    fprintf(fp, "%s: ", tag);
    for (size_t i = 0; i < nvalues; i++)
    {
        fprintf(fp, "%s%e", pad, values[i]);
        pad = ", ";
    }
    putc('\n', fp);
}

static void print_2D_double(FILE *fp, char const *tag, double **values, size_t nrows, size_t ncols)
{
    fprintf(fp, "2D array %s[%zd][%zd]\n", tag, nrows, ncols);
    for (size_t i = 0; i < nrows; i++)
    {
        char buffer[32];
        snprintf(buffer, sizeof(buffer), "%s[%zd]", tag, i);
        print_1D_double(fp, buffer, values[i], ncols);
    }
}

static void print_data(FILE *fp, char const *tag, const Data *data)
{
    fprintf(fp, "Data: %s\n", tag);
    fprintf(fp, "ID = %d; t3 = %e; kernel_par = %e\n", data->ID, data->t3, data->kernel_par);
    print_1D_double(fp, "test_sample", &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0][0]));
    print_2D_double(fp, "XX", data->XX, data->XX_row, data->XX_col);
    print_2D_double(fp, "Alpha New", data->alpha_new, data->alpha_new_row, data->alpha_new_col);
}

static void print_array(FILE *fp, char const *tag, const Data *data, size_t nentries)
{
    fprintf(fp, "Array: %s\n", tag);
    fprintf(fp, "Size: %zd\n", nentries);
    for (size_t i = 0; i < nentries; i++)
    {
        char buffer[32];
        snprintf(buffer, sizeof(buffer), "Row %zd", i);
        print_data(fp, buffer, &data[i]);
    }
    fprintf(fp, "End Array: %s\n\n", tag);
}

static void set_file_name(char *buffer, size_t buflen, size_t i)
{
    snprintf(buffer, buflen, "exp_data.%.3zd.exp", i);
}

static void export_1D_double(FILE *fp, double *data, size_t ncols)
{
    if (fwrite(&ncols, sizeof(ncols), 1, fp) != 1)
        err_sysexit("Failed to write number of columns");
    if (fwrite(data, sizeof(double), ncols, fp) != ncols)
        err_sysexit("Failed to write array of %zd doubles", ncols);
}

static void export_2D_double(FILE *fp, double **data, size_t nrows, size_t ncols)
{
    if (fwrite(&nrows, sizeof(nrows), 1, fp) != 1)
        err_sysexit("Failed to write number of rows");
    if (fwrite(&ncols, sizeof(ncols), 1, fp) != 1)
        err_sysexit("Failed to write number of columns");
    for (size_t i = 0; i < nrows; i++)
        export_1D_double(fp, data[i], ncols);
}

static void export_int(FILE *fp, int value)
{
    if (fwrite(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to write int to file");
}

static void export_double(FILE *fp, double value)
{
    if (fwrite(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to write double to file");
}

static void export_data(FILE *fp, Data *data)
{
    export_int(fp, data->ID);
    export_double(fp, data->t3);
    export_double(fp, data->kernel_par);
    export_1D_double(fp, &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0]));
    export_2D_double(fp, data->XX, data->XX_row, data->XX_col);
    export_2D_double(fp, data->alpha_new, data->alpha_new_row, data->alpha_new_col);
}

static void export_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
    {
        char filename[30];
        set_file_name(filename, sizeof(filename), i);
        FILE *fp = fopen(filename, "w");
        if (fp == 0)
            err_sysexit("Failed to open file %s for writing", filename);
        printf("Export %zd to %s\n", i, filename);
        export_data(fp, &data[i]);
        fclose(fp);
    }
}

static int import_int(FILE *fp)
{
    int value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read int");
    return value;
}

static double import_double(FILE *fp)
{
    double value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read int");
    return value;
}

static size_t import_size_t(FILE *fp)
{
    size_t value;
    if (fread(&value, sizeof(value), 1, fp) != 1)
        err_sysexit("Failed to read size_t");
    return value;
}

static void import_1D_double(FILE *fp, double *data, size_t nvalues)
{
    size_t size = import_size_t(fp);
    if (size != nvalues)
        err_sysexit("Size mismatch (wanted %zd, actual %zd)\n", nvalues, size);
    if (fread(data, sizeof(data[0]), nvalues, fp) != nvalues)
        err_sysexit("Failed to read %zd doubles");
}

static void import_2D_double(FILE *fp, double ***data, size_t *nrows, size_t *ncols)
{
    *nrows = import_size_t(fp);
    *ncols = import_size_t(fp);
    *data  = alloc_2D_double(*nrows, *ncols);
    for (size_t i = 0; i < *nrows; i++)
        import_1D_double(fp, (*data)[i], *ncols);
}

static void import_data(FILE *fp, Data *data)
{
    data->ID = import_int(fp);
    data->t3 = import_double(fp);
    data->kernel_par = import_double(fp);

    import_1D_double(fp, &data->test_sample[0][0], sizeof(data->test_sample)/sizeof(data->test_sample[0][0]));
    import_2D_double(fp, &data->XX, &data->XX_row, &data->XX_col);
    import_2D_double(fp, &data->alpha_new, &data->alpha_new_row, &data->alpha_new_col);
}

static void import_array(Data *data, size_t nentries)
{
    for (size_t i = 0; i < nentries; i++)
    {
        char filename[30];
        set_file_name(filename, sizeof(filename), i);
        FILE *fp = fopen(filename, "r");
        if (fp == 0)
            err_sysexit("Failed to open file %s for reading", filename);
        printf("Import %zd from %s\n", i, filename);
        import_data(fp, &data[i]);
        fclose(fp);
    }
}

int main(int argc, char **argv)
{
    err_setarg0(argv[0]);
    if (argc != 1)
        err_syswarn("Ignoring %d irrelevant arguments", argc-1);
    populate_array(person, NUM_PERSON);
    print_array(stdout, "Freshly populated", person, NUM_PERSON);
    export_array(person, NUM_PERSON);
    printf("\n\nEXPORT COMPLETE\n\n");
    free_array(person, NUM_PERSON);
    import_array(person, NUM_PERSON);
    printf("\n\nIMPORT COMPLETE\n\n");
    print_array(stdout, "Freshly imported", person, NUM_PERSON);
    free_array(person, NUM_PERSON);
    return(0);
}

/* stderr.c */
/*#include "stderr.h"*/
#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>

static char const *arg0 = "<undefined>";

static void err_setarg0(char const *argv0)
{
    arg0 = argv0;
}

static void err_vsyswarn(char const *fmt, va_list args)
{
    int errnum = errno;
    fprintf(stderr, "%s: ", arg0);
    vfprintf(stderr, fmt, args);
    if (errnum != 0)
        fprintf(stderr, " (%d: %s)", errnum, strerror(errnum));
    putc('\n', stderr);
}

static void err_syswarn(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
}

static void err_sysexit(char const *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    err_vsyswarn(fmt, args);
    va_end(args);
    exit(1);
}

When run under valgrind, it was given a clean bill of health with no memory leaked. And it took more than one pass before I could safely say that, too (valgrind showed up a bug that eyeballing the results hadn't spotted, though it was obvious once detected).


Answers to questions in comments

Anyway, here are a couple of problem occurring while executing the code.

First one is 'snprintf': identifier not found

Second one is in the line of "double **data = emalloc(rows * sizeof(*data));" it says cannot convert from 'void *' to 'double **' and it make sense because data is double and emalloc is returning void *; how can I solve these problems before I start embedding this to my original program?

  1. Don't use a C++ compiler to compile C code
  2. Update to a system with a C99 compiler.

Or, since you are probably on Windows and using MSVC:

  1. Use a cast double **data = (double **)emalloc(rows * sizeof(*data));
  2. Look up _snprintf() and snprintf_s() and so on in MSDN. I find it via Google with 'site:microsoft.com snprintf' (for various spellings of 'snprintf') when I need to know what MSVC does.

In case of emergency, use sprintf(); the size of the buffer is big enough that there shouldn't be any risk of overflow, which is what snprintf() et al protect against.


By the way,in my program there is a function called cernel_matrix(double **M1 ,double **M2), a function taking two 2-dimensional matrices. I am passing test sample and xx to this function, sometimes xx and xx, sometimes test_sample and test_sample, it depending so I can't make test_sample 1-dimensional; it's just the way of the function works. Otherwise I'll get this error: cannot convert from 'double*' to 'double **'. I hope I explained why test sample can't be 1-dimensional.

  1. The cernel_matrix() function isn't told how big the matrices are, so I don't know how it can possibly work reliably.
  2. I'm not convinced that passing test_sample to cernel_matrix is safe; a double matrix[][1] value does not convert to double **. So I'm not convinced I understand why test_sample is a matrix like that.

I put together a micro test-case for this:

extern void cernel_matrix(double **M1, double **M2);

extern void m(void);

void m(void)
{
    double **m0;
    double *m1[13];
    double m2[234][1];

    cernel_matrix(m0, m1);
    cernel_matrix(m1, m2);
}

The compiler told me:

x.c: In function ‘m’:
x.c:12:5: warning: passing argument 2 of ‘cernel_matrix’ from incompatible pointer type [enabled by default]
x.c:1:13: note: expected ‘double **’ but argument is of type ‘double (*)[1]’
x.c:11:18: warning: ‘m0’ is used uninitialized in this function [-Wuninitialized]

The 'uninitialize' warning is perfectly valid, but the problem is the other warning and its note. You should be getting something similar from your compiler.


I think I understand the idea of it and the functions, but still there are lots of things that I don't understand in the code. I should be able to express all the line because I have a presentation to my teachers.

When someone else provides you with code because you've not shown anything, you run the risk of not understanding what they do.

Since you need to understand the code to present it to the teachers, you're probably going to need to do some programming exercises. Note that one of the first things I did was cut the problem down to toy size (instead of 2065, I used 5 or 10 or 20). You should do the same. Start with a structure that only contains the fixed size elements — id, t3, kernel_par and test_sample. Make it so that you can initialize and export and import that. You can import into a different variable than the one you export, and then do a comparison of the two variables. You could even omit test_sample in the first version.

When you've got that working, then add one of your arrays and its dimension members. Now get that working (with size 4x5 or similar). Then add the other array (it should be trivial). As you do this, you should see what the various functions in the example I gave do, and why they're there. They're all 'necessary' at some level. As I alluded in my comments, it took me several (too many) attempts to get it right. I was compiling with rigorous warning options, but still valgrind was wittering about uninitialized data (as I was about to post). But I eventually spotted an incompletely edited copy'n'paste piece of code.

Note that if you'd posted code that did a sane job of attempting to export the data, and preferably a sane job of attempting to import the data, then that code could have been fixed. Since you posted no code of any worth whatsoever, it made it hard to produce code that addressed your real problem without producing something testable. The code I provided is testable. The testing could be more comprehensive — yes, undoubtedly. But making code testable, and testing it, is an important part of learning to program.

Incidentally, the key point in the export process for variable length data of any type (such as arrays) is to make sure the size of the data (array) is written before the data (array) itself is written. Then the import process knows how much space to allocate before reading the data (array) back in.