“#include” a text file in a C program as a char[]

2019-01-01 12:46发布

Is there a way to include an entire text file as a string in a C program at compile-time?

something like:

  • file.txt:

    This is
    a little
    text file
    
  • main.c:

    #include <stdio.h>
    int main(void) {
       #blackmagicinclude("file.txt", content)
       /*
       equiv: char[] content = "This is\na little\ntext file";
       */
       printf("%s", content);
    }
    

obtaining a little program that prints on stdout "This is a little text file"

At the moment I used an hackish python script, but it's butt-ugly and limited to only one variable name, can you tell me another way to do it?

15条回答
看淡一切
2楼-- · 2019-01-01 12:56

Hasturkun's answer using the xxd -i option is excellent. If you want to incorporate the conversion process (text -> hex include file) directly into your build the hexdump.c tool/library recently added a capability similar to xxd's -i option (it doesn't give you the full header - you need to provide the char array definition - but that has the advantage of letting you pick the name of the char array):

http://25thandclement.com/~william/projects/hexdump.c.html

It's license is a lot more "standard" than xxd and is very liberal - an example of using it to embed an init file in a program can be seen in the CMakeLists.txt and scheme.c files here:

https://github.com/starseeker/tinyscheme-cmake

There are pros and cons both to including generated files in source trees and bundling utilities - how to handle it will depend on the specific goals and needs of your project. hexdump.c opens up the bundling option for this application.

查看更多
与风俱净
3楼-- · 2019-01-01 12:56

I think it is not possible with the compiler and preprocessor alone. gcc allows this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               define hostname my_dear_hostname
                hostname
            )
            "\n" );

But unfortunately not this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               include "/etc/hostname"
            )
            "\n" );

The error is:

/etc/hostname: In function ‘init_module’:
/etc/hostname:1:0: error: unterminated argument list invoking macro "STRGF"
查看更多
刘海飞了
4楼-- · 2019-01-01 12:57

The question was about C but in case someone tries to do it with C++11 then it can be done with only little changes to the included text file thanks to the new raw string literals:

In C++ do this:

const char *s =
#include "test.txt"
;

In the text file do this:

R"(Line 1
Line 2
Line 3
Line 4
Line 5
Line 6)"

So there must only be a prefix at the top of the file and a suffix at the end of it. Between it you can do what you want, no special escaping is necessary as long as you don't need the character sequence )". But even this can work if you specify your own custom delimiter:

R"=====(Line 1
Line 2
Line 3
Now you can use "( and )" in the text file, too.
Line 5
Line 6)====="
查看更多
皆成旧梦
5楼-- · 2019-01-01 13:00

I had similar issues, and for small files the aforementioned solution of Johannes Schaub worked like a charm for me.

However, for files that are a bit larger, it ran into issues with the character array limit of the compiler. Therefore, I wrote a small encoder application that converts file content into a 2D character array of equally sized chunks (and possibly padding zeros). It produces output textfiles with 2D array data like this:

const char main_js_file_data[8][4]= {
    {'\x69','\x73','\x20','\0'},
    {'\x69','\x73','\x20','\0'},
    {'\x61','\x20','\x74','\0'},
    {'\x65','\x73','\x74','\0'},
    {'\x20','\x66','\x6f','\0'},
    {'\x72','\x20','\x79','\0'},
    {'\x6f','\x75','\xd','\0'},
    {'\xa','\0','\0','\0'}};

where 4 is actually a variable MAX_CHARS_PER_ARRAY in the encoder. The file with the resulting C code, called, for example "main_js_file_data.h" can then easily be inlined into the C++ application, for example like this:

#include "main_js_file_data.h"

Here is the source code of the encoder:

#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>


#define MAX_CHARS_PER_ARRAY 2048


int main(int argc, char * argv[])
{
    // three parameters: input filename, output filename, variable name
    if (argc < 4)
    {
        return 1;
    }

    // buffer data, packaged into chunks
    std::vector<char> bufferedData;

    // open input file, in binary mode
    {    
        std::ifstream fStr(argv[1], std::ios::binary);
        if (!fStr.is_open())
        {
            return 1;
        }

        bufferedData.assign(std::istreambuf_iterator<char>(fStr), 
                            std::istreambuf_iterator<char>()     );
    }

    // write output text file, containing a variable declaration,
    // which will be a fixed-size two-dimensional plain array
    {
        std::ofstream fStr(argv[2]);
        if (!fStr.is_open())
        {
            return 1;
        }
        const std::size_t numChunks = std::size_t(std::ceil(double(bufferedData.size()) / (MAX_CHARS_PER_ARRAY - 1)));
        fStr << "const char " << argv[3] << "[" << numChunks           << "]"    <<
                                            "[" << MAX_CHARS_PER_ARRAY << "]= {" << std::endl;
        std::size_t count = 0;
        fStr << std::hex;
        while (count < bufferedData.size())
        {
            std::size_t n = 0;
            fStr << "{";
            for (; n < MAX_CHARS_PER_ARRAY - 1 && count < bufferedData.size(); ++n)
            {
                fStr << "'\\x" << int(unsigned char(bufferedData[count++])) << "',";
            }
            // fill missing part to reach fixed chunk size with zero entries
            for (std::size_t j = 0; j < (MAX_CHARS_PER_ARRAY - 1) - n; ++j)
            {
                fStr << "'\\0',";
            }
            fStr << "'\\0'}";
            if (count < bufferedData.size())
            {
                fStr << ",\n";
            }
        }
        fStr << "};\n";
    }

    return 0;
}
查看更多
千与千寻千般痛.
6楼-- · 2019-01-01 13:02

I'd suggest using (unix util)xxd for this. you can use it like so

$ echo hello world > a
$ xxd -i a

outputs:

unsigned char a[] = {
  0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a
};
unsigned int a_len = 12;
查看更多
与君花间醉酒
7楼-- · 2019-01-01 13:02

You have two possibilities:

  1. Make use of compiler/linker extensions to convert a file into a binary file, with proper symbols pointing to the begin and end of the binary data. See this answer: Include binary file with GNU ld linker script.
  2. Convert your file into a sequence of character constants that can initialize an array. Note you can't just do "" and span multiple lines. You would need a line continuation character (\), escape " characters and others to make that work. Easier to just write a little program to convert the bytes into a sequence like '\xFF', '\xAB', ...., '\0' (or use the unix tool xxd described by another answer, if you have it available!):

Code:

#include <stdio.h>

int main() {
    int c;
    while((c = fgetc(stdin)) != EOF) {
        printf("'\\x%X',", (unsigned)c);
    }
    printf("'\\0'"); // put terminating zero
}

(not tested). Then do:

char my_file[] = {
#include "data.h"
};

Where data.h is generated by

cat file.bin | ./bin2c > data.h
查看更多
登录 后发表回答