Single-header generator tool

The src/acme/ directory of the Corrade repository contains a tool written in Python 3.6 for generating single-header libraries from multi-file C and C++ projects. It's currently used to generate single-header Corrade libraries, which are available through the magnum-singles repository.

The desired usage is by creating a driver file, described below, which contains the complete setup and references all input files, and then running the script on it. For example:

./acme.py path/to/CorradeArray.h

The above creates a file in path/to/output/CorradeArray.h, replacing file-relative #include-s by actual file contents, among other things. Example contents of the driver file are below — it will create a hypothetical CorradeArray.h header, combining Containers::Array, Containers::StaticArray, Containers::ArrayView and Containers::StaticArrayView together. It contains placeholders for copyright notices, a global place to put all external #include directives, a setup for additional processing behavior and finally includes for all headers that are meant to be combined.

/* This file is generated from Corrade {{revision}}. Do not edit directly. */

/*
    This file is part of Corrade.

    {{copyright}}

    Permission is hereby granted, free of charge, to any person obtaining a
    copy of this software and associated documentation files (the "Software"),
    to deal in the Software without restriction, including without limitation
    … <snip> …
*/

// {{includes}}

/* Remove all comments from the files to make them smaller */
#pragma ACME comments off

/* Look for #includes starting with Corrade in the corrade/src directory */
#pragma ACME local Corrade
#pragma ACME path corrade/src

/* Look for Corrade/configure.h in the CMake-generated build dir, but don't
   use its contents and provide a simplified version instead. */
#pragma ACME path corrade/build/src
#pragma ACME enable Corrade_configure_h
#ifdef _WIN32
    #define CORRADE_TARGET_WINDOWS
#elif defined(__APPLE__)
    #define CORRADE_TARGET_APPLE
#elif defined(__unix__)
    #define CORRADE_TARGET_UNIX
#endif

/* Remove things included for backwards compatibility or doxygen docs, use
   standard <cassert> */
#pragma ACME enable CORRADE_STANDARD_ASSERT
#pragma ACME disable DOXYGEN_GENERATING_OUTPUT
#pragma ACME disable CORRADE_BUILD_DEPRECATED

#include "Corrade/Containers/Array.h"
#include "Corrade/Containers/StaticArray.h"

The tool has special handling for various {{placeholders}}, preprocessor branches such as #ifdef or #elif and limited handling of #define statements. To avoid the need for an external configuration file or excessive options passed on the command-line, additional options are passed via #pragma ACME directives placed in input files. These options are recognized anywhere so for example if you are generating multiple single-header libraries, you can have the #pragma directives in a common header and #include it from the actual driver files.

System include placement

All system includes (i.e., includes that are not quoted and not matching any local include paths) are sorted and placed on a line containing a // {{includes}} placeholder. If the placeholder is not present, the includes are prepended before everything else.

It's possible to have multiple occurrences of the // {{includes}} placeholder, each of them will expand only to newly encountered includes that aren't already present before. This can be used to separate away implementation details or heavy parts of the library, for example:

// {{includes}}

#include "Corrade/Containers/Pointer.h"

#ifdef CORRADE_POINTER_STL_COMPATIBILITY
// {{includes}}

// This header includes <memory>, so make it opt-in
#include "Corrade/Containers/PointerStl.h"
#endif

If a system include is wrapped in a preprocessor define other than the file-level include guard, it's assumed to be platform-specific and is at its own line, without being moved to the global location. If you want it to be present at the global location but wrapped in the preprocessor defines, do it manually.

Local include matching

By default the script treats only quote-included headers (#include "foo.h") as local, while #include <foo.h> are treated as system (and thus kept in the final file). In addition, it's possible to specify which aditional path prefixes should be treated as local using #pragma ACME local, independently of quoted or bracketed includes. For example, the following snippet will treat both #include <Magnum/Types.h> and #include "Corrade/configure.h" as local:

#pragma ACME local Corrade
#pragma ACME local Magnum

By default the script searches for local includes relative to the location of the file which includes them. To extend the include paths, use #pragma ACME path, again the path is taken relative to the file the #pragma was in. For example, this makes the script look both into the Corrade source directory and to the build directory for generated headers:

// assuming the driver file is placed next to the Corrade project directory
#pragma ACME path corrade/src
#pragma ACME path corrade/build/src

The script aborts if it can't find a local include in any of the paths.

Local include processing

For the top-level (driver) file, file contents from local #include statements are pasted directly on the same line, in specified order, like the C preprocessor does. However, for remaining files, to avoid messy nested include guards, the files are concatenated in the depth-first other they got included.

If a local file is included multiple times, only the first occurrence is taken and other are ignored. Note that, to make the implementation simpler, this works similarly to the #pragma once statement but without taking the actual #pragma once or header guards into account.

In case it's not desired to expand a local include (for example to make one header-only library depend on another), a #pragma ACME noexpand directive can be used. For example, in the following case the {{includes}} placeholder will contain also #include "BaseLib.h" after replacing, separated by a blank line from system includes.

// {{includes}}

#pragma ACME noexpand BaseLib.h



#include <cstddef>

#include "BaseLib.h"

Preprocessor branch processing

In order to further trim down generated file size, the script has a rudimentary preprocessor branch evaluator. You can force a macro to be either defined or undefined, which will then cause things like #if defined(A_MACRO) to become either #if 0 or #if 1, which ultimately causes code inside to be either fully removed or included without the redundant wrapping #if 1. The syntax is as following:

#pragma ACME enable NDEBUG
#pragma ACME disable DOXYGEN

The script is able to handle more complex preprocessor logic as well. With the above setup, the following line:

#if defined(NDEBUG) && (defined(BUILD_DEPRECATED_APIS) || defined(DOXYGEN))

will be simplified to just

#ifdef BUILD_DEPRECATED_APIS

Besides the above, also all #define / #undef statements that touch the enabled/disabled macros will get removed. That makes it possible to remove header guards, for example the following input will become just the function definition after processing:

#pragma ACME disable MyLib_Header_h

#ifndef MyLib_Header_h
#define MyLib_Header_h
float calculate(float a, float b);
#endif

As a special case, #cmakedefine and #cmakedefine01 statements are unconditionally removed. This is useful to for example supply a configure.h.cmake template to the script instead of a generated configure.h that may have unwanted build-specific defines baked in.

Forgetting previous includes or enabled/disabled macros

Sometimes it's desirable to include a certain header multiple times, for example if each case where it's included is guarded by a different define. The script doesn't interpret the high-level define tree to know that if an include may need to be added again, but you can explicitly tell it to forget that a certain header was included in order to have it appear again the next time:

#ifdef MYLIB_STL_COMPATIBILITY
// {{includes}}
#include <string>

...
#endif
#ifdef MYLIB_IMPLEMENTATION
// {{ncludes}}
#pragma ACME forget <string>
#include <string>

...
#endif

The above adds #include <string> to both {{includes}} so it's included also if #define MYLIB_IMPLEMENTATION is present without #define MYLIB_STL_COMPATIBILITY. Note that this only works if the next case goes into a different {{includes}} — putting it into the same doesn't make much practical sense anyway, and causes a warning.

This feature works for local includes and enabled/disabled macros as well. For example an assert header may be included several times, each time with a different predefined macro. In the below example, the contents of MyAssert.h are processed and pasted twice, once as if MYLIB_NO_ASSERT was defined, and once as if MYLIB_NO_ASSERT wasn't defined, effectively enabling assertions only for the implementation:

#pragma ACME enable MYLIB_NO_ASSERT
#include "MyAssert.h"

...

#ifdef MYLIB_IMPLEMENTATION
#pragma ACME forget "MyAssert.h"
#pragma ACME forget MYLIB_NO_ASSERT
#pragma ACME disable MYLIB_NO_ASSERT
#include "MyAssert.h"

...
#endif

Code comments

By default, the script includes all code comments except license blocks (blocks that contain the matched copyright lines). The #pragma ACME comments off makes it discard all comments in code following the pragma, while #pragma ACME comments on enables them again.

{{revision}}

The {{revision}} placeholder can be used to annotate the generated header with a specific revision of the original code for easier version history tracking. The revision will be extracted in a working directory of the file this placeholder is in. For single-header libraries generated from multiple projects or in case it's needed to extract a revision from somewhere else than the directory containing the driver file, {{revision:path}} can be used. The path is a substring of a file path and the revision will be extracted in a directory containing the first file that matches given path.

The revision is by default extracted using git describe, you can override this using #pragma ACME revision. First argument is the matching path, either corresponding to path part of some placeholder or being * to indicate a global setting. The rest of the line is a shell command to get the revision, which then gets executed with a CWD being the dir of first file that matched the path. If no such pragma is found, the default is as if the following line would be present:

#pragma ACME revision * git describe --dirty --always

{{stats}}

The {{stats:id}} placeholder can be used to print various stats of the final file before it gets written to the disk. The actual behavior is controlled by #pragma ACME stats. First argument is the identifier matching the id part of the placeholder, the rest of the line is the shell command used to get the stats, the actual generated file contents being fed to its standard input. If no such pragma is found, no placeholders are replaced. For example, the following, when expanded to {{stats:wc-l}}, will show a number of lines the generated file has:

#pragma ACME stats wc-l wc -l

Compared to {{revision}}, which is run in the directory of the corresponding source file, this command is always run in the directory of the output.

Real-world examples

Tests for the above described functionality can be found in a test/ directory next to the acme.py file. Real-world driver file examples, used for generating the Corrade single-header libs, are in the src/singles/ directory.