How to only build auto generated code when the gen

2019-02-02 14:02发布

问题:

I am working on a source code repository that generates some C++ code by running a python script outputting headers and implementation. This code is subsequently compiled and linked to my libraries and executables. I know that the generated code will only change if one of two conditions are true:

  1. The generator code itself changes
  2. The input (an XML file) to the generator changes

I want to use cmake to manage the build process. At the moment, I am using execute_process to fire off the generator. However, this runs every time I run cmake and it touches the files, causing my generated code to be recompiled and adding to my total compile time.

I also want to make sure that the generated code is always run before my libraries. In other words, I want the libraries to depend on the generator to have run.

What is the proper way to handle such a situation in cmake? I have seen this previous answer: "Get CMake to execute a target in project before building a library". But this relies on the output of the code generator being known in advance. My code generator will generate a variable number of files.

回答1:

Use ADD_CUSTOM_COMMAND to trigger your generator. It allows you to define input and output dependencies and will only run if the outputs are older than the inputs.

ADD_CUSTOM_COMMAND( OUTPUT generatedfile1 generatedfile2
                    COMMAND python generateSources.py xmlfile1 xmlfile2
                    DEPENDS xmlfile1 xmlfile2 generateSources.py 
                    COMMENT "Generating source code from XML" )

Make sure that the generated files are not used in more than one independent target that may compile in parallel or you may(will) get a conflict during your build. To ensure this, the following should do the trick:

ADD_CUSTOM_TARGET( RunGenerator DEPENDS generatedfile1 generatedfile2 
                   COMMENT "Checking if re-generation is required" )

Then make your other targets depend on this one:

ADD_DEPENDENCY( MyTarget RunGenerator )

NB: The RunGenerator target will always be considered out-of-date and, thus, always run. However, since it does nothing (besides printing the comment and checking the dependencies) in this case, that doesn't matter. The custom command will take care of regeneration IF required.

Update after comments:

If you do not know the name of the files, you can use

ADD_CUSTOM_COMMAND( OUTPUT generated.timestamp
                    COMMAND python generateSources.py xmlfile1 xmlfile2
                    COMMAND ${CMAKE_COMMAND} -E touch generated.timestamp
                    DEPENDS xmlfile1 xmlfile2 generateSources.py 
                    COMMENT "Generating source code from XML" )

However: Using GLOB requires you to explicitly run CMake to update your file lists. Integrating this into the custom command would probably mess up your build process (if several projects are building in parallel and one project restarts CMake configuration). IIRC, it is ok for you to run CMake manually when you know that either the python script or the XML files changed but your problem is that those files are touched when anything else requires a re-run of CMake.

If the python script does not take too long to run, you could let it run with each CMake run (like you do now) but make sure that the unchanged files do not get touched, you can try the following (untested):

# generated sources files into a temporary directory (adjust your current execute_process)
EXECUTE_PROCESS( COMMAND python ../generateSources.py ../xmlfile1 ../xmlfile2 
                 WORKING_DIRECTORY tmp )

# get the filenames
FILE( GLOB GENERATED_TEMP_FILES tmp/* )

# copy to the "expected" directory, but only if content CHANGED
FOREACH( F ${GENERATED_TEMP_FILES} )
    GET_FILENAME_COMPONENT( "${F}" FN NAME)
    CONFIGURE_FILE( "${F}" "./generated/${FN}" COPY_ONLY )
ENDFOREACH()

# use your current globbing command
FILE( GLOB GENERATED_SOURCES ./generated/* )


回答2:

A low level solution generate your files in another directory, make a file compare with the current files and only copy if they are different, no copy no recompile.

This of course only functions if the generator doesn't fill in some random noise like version and dates. In which case you can try to filter those out.



回答3:

I had more or less this problem today. I'm embedding binary resources into the c++ executable (it's an embedded web server).

Solved it like this:

    #get the file timestamp of the 'source' resource file
    FILE(TIMESTAMP "${CMAKE_CURRENT_SOURCE_DIR}/${_resource}" RESOURCE_TIME)

    #get the file time of the 'template' file
    FILE(TIMESTAMP ${TEMPLATE_FILE} TEMPLATE_TIME)

    #get the timestamp (if any) of the generated file
    FILE(TIMESTAMP "${CMAKE_CURRENT_BINARY_DIR}/${_filename}" TARGET_TIME)
....
    #only configure the file if the target is older than either the
    #source or the template
    IF((RESOURCE_TIME > TARGET_TIME) OR (TEMPLATE_TIME > TARGET_TIME) OR (NOT TARGET_TIME))
        MESSAGE(STATUS "configuring ${...}")
        FILE(READ ${_resource} RESOURCE_CONTENT HEX)
        string(REGEX REPLACE "([0-9a-f][0-9a-f])" "0x\\1," RESOURCE_CONTENT ${RESOURCE_CONTENT})
        configure_file("${TEMPLATE_FILE}"
            "${CMAKE_CURRENT_BINARY_DIR}/${_filename}"
            @ONLY)
    ELSE()
        MESSAGE(STATUS "already configured ${...}")
    ENDIF()


回答4:

We had similar problems where I work. Only we don't work with cmake but another build system.

I don't know cmake, but this might give you several ideas:

Our solution involved treating the xmls and the generator as code that needs to compile, treating the generator as a dependency and the xml files as source. If the generator or the xml files changed the build system ran the generator (which generated everything - even if only one file changed, that means it touches all the generated files).

The build system of course did not run the generator directly, but instead we wrote a small python script that decided how to run the generator properly - for example an improvement we can add is generating all the files to /tmp and only moving the changed files (comparing using diff), thus only the changed files will be touched. (we don't need it at the moment as our files don't change that often)

We also ended up running the build system twice with two different graphs, one graph for generators and another graph for the other files. We designed it to allow a several levels of generator-generated dependencies so that one generator can rely on generated products of another.

Another two tricks to consider, if your build system allows you to use regular expressions to build files, you may want to use it. Also, you can generate the configuration files for your build system in the generation process.



标签: c++ build cmake