添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

The C Preprocessor originally was a stand-alone program that the C compiler called to “preprocess” source files before compiling them — hence the name. The reason C has a preprocessor unlike most other languages is due to the use of preprocessors in general at Bell Labs at the time, such as M4 and the troff suite of preprocessors .

Modern C and C++ compilers have the preprocessor integrated, though there are often options to control it specifically. For gcc and clang at least, the -E option causes a file only to be preprocessed, which can often been illuminating as to what the preprocessor is doing.

Preprocessing includes:

  • Conditional elimination via #if , #ifdef , etc.
  • File inclusion via #include .
  • Comment replacement.
  • Tokenization .
  • Macro expansion.
  • The first four are fairly straightforward; macro expansion, however, is the most complicated — and weird. It’s its own mini-language inside C and C++.

    Unlike either C or C++, the preprocessor’s language is line-based , that is a preprocessor directive begins with # (that must be the first token on a line) and ends with end-of-line (on Unix systems, the newline character, ASCII decimal 10, written as \n ) — unless escaped via \ in which case the directive ends with the first unescaped newline.

    Following the # may be zero or more whitespace characters followed by the directive name ( define , if , include , etc.), hence all the following are equivalent:

    #ifndef NDEBUG
    #   ifndef NDEBUG
        #ifndef NDEBUG
        Enter fullscreen mode
        Exit fullscreen mode
    

    Why would you want an object-like macro defined with zero tokens? Just to indicate that it is defined at all for use with #ifdef, #ifndef, or defined(). The actual definition doesn’t matter.

    Another use is when a macro expands to different tokens depending on the platform via a sequence of #ifdefs. It’s sometimes the case that a particular platform either doesn’t support (or doesn’t need) whatever you’re trying to do; hence, the macro just expands to nothing.

    Object-like macros are used for:

  • Program-wide definitions to control conditional compilation via #if, #ifdef, etc.
  • Program-wide constants (such as the above example).
  • Include guards.
  • These days, constexpr in C23 can largely replace using object-like macros for program-wide constants; the same is true for constexpr in C++11.

    Include guards are still necessary to prevent the declarations within a file from being seen by the compiler proper more than once:

    // c_ast.h
    #ifndef CDECL_C_AST_H
    #define CDECL_C_AST_H
    // ...
    #endif /* CDECL_C_AST_H */
        Enter fullscreen mode
        Exit fullscreen mode
    
  • It is prefixed by the program name (to help ensure a unique name); and:
  • All letters are converted to upper case; and:
  • All non-identifier characters are converted to an underscore — except never to start a name with _ nor contain __ (double underscore) that are reserved identifiers in C and C++.
  • You might sometimes see #pragma once (a non-standard, but widely supported directive) in header files as a replacement, but some compilers are optimized to handle this implicitly, so there’s really no reason to use it.

    Yes, the syntax of 0[ARRAY] is legal. It’s a consequence of the quirky interplay between arrays and pointers in C. Briefly, the a[i] syntax to access the ith element of an array a is just syntactic sugar for *(a+i). Since addition is commutative, *(a+i) can be alternatively written as *(i+a); that in turn can be written as i[a]. In C, this has no practical use.

    So why use it here? In C++, however, using 0[ARRAY] will cause trying to use ARRAY_SIZE on an object of a class for which operator[] has been overloaded to cause a compilation error, which is what you’d want.

    Unlike either a C or C++ function, when defining a function-like macro, the ( that follows the macro’s name must be adjacent, i.e., not have any whitespace between them. (If there is whitespace between them, then it’s an object-like macro where the first character of the replacement list is (.)

    So if ni, then i will be incremented twice! Historically, there’s no way to fix this other than simply never pass expressions having side effects as macro arguments.

    This is a great example of why the convention is that macro names are generally written in all UPPER CASE to draw attention to the fact than a name is a macro and the normal C/C++ rules do not apply.

    The solution really is to use an inline function instead of a macro.

    Above, the > token is a perfectly valid preprocessor argument even though it would be a syntax error as an ordinary C function argument. Remember: the normal C/C++ rules do not apply to the preprocessor.

    Macros employing such “anything goes” arguments should generally be avoided, but they are used for “stringification” (see below).

    Starting with C99, function-like macros can take a variable number of arguments (including zero). Such a macro is known as variadic. To declare such a macro, the last (or only) parameter is ... (an ellipsis).

    For example, given a fatal_error() variadic function that prints an error message formatted in a printf()-like way and exit()s with a status code, you might want to wrap that with an INTERNAL_ERROR() macro that includes the file and line whence the error came:

    _Noreturn void fatal_error( int status,
                                char const *format, ... );
    #define INTERNAL_ERROR(FORMAT,...)    \
      fatal_error( EX_SOFTWARE,           \
        "%s:%d: internal error: " FORMAT, \
        __FILE__, __LINE__, __VA_ARGS__   \
        Enter fullscreen mode
        Exit fullscreen mode
    

    It’s common practice both to break-up long replacement lists into multiple lines and to align the \s for readability.

    Also, when a macro expands into a C or C++ statement, the replacement list must not end with a ; — it will be provided by the caller of the macro.

    The __VA_ARGS__ token expands into everything that was passed for the second (in this case) and subsequent arguments (if any) and the commas that separate them.

    In C and C++, adjacent string literals are concatenated into a single string which is how the “internal error” part gets prepended to FORMAT. (This is an example of a rare instance where a macro parameter is intentionally not enclosed within parentheses.)

    A keen observer might notice a possible problem with the INTERNAL_ERROR macro, specifically this line:

        __FILE__, __LINE__, __VA_ARGS__   \
        Enter fullscreen mode
        Exit fullscreen mode
    

    passing zero additional arguments? Then __VA_ARGS__ would expand into nothing and the , after __LINE__ would cause a syntax error since C functions don’t accept blank arguments. What’s needed is a way to include the , in the expansion only if __VA_ARGS__ is not empty.

    Starting in C23 and C++20, that’s precisely what __VA_OPT__ does. You can rewrite the macro using it:

    #define INTERNAL_ERROR(FORMAT,...)    \
      fatal_error( EX_SOFTWARE,           \
        "%s:%d: internal error: " FORMAT, \
        __FILE__, __LINE__                \
        __VA_OPT__(,) __VA_ARGS__         \
        Enter fullscreen mode
        Exit fullscreen mode
    
    #define INTERNAL_ERROR(FORMAT, ...)    \
      fprintf( stderr,                     \
        "%s:%d: internal error: " FORMAT,  \
        __FILE__, __LINE__                 \
        __VA_OPT__(,) __VA_ARGS__          \
      );                                   \
      exit( EX_SOFTWARE )
        Enter fullscreen mode
        Exit fullscreen mode
    

    Since the user didn’t use {} for the if, only the fprintf() is executed conditionally and the exit() is always executed because it’s a separate statement.

    The common way do fix this is by always enclosing multiple statements in a replacement list within a do ... while loop:

    #define INTERNAL_ERROR(FORMAT, ...)      \
      do {                                   \
        fprintf( stderr,                     \
          "%s:%d: internal error: " FORMAT,  \
          __FILE__, __LINE__                 \
          __VA_OPT__(,) __VA_ARGS__          \
        );                                   \
        exit( EX_SOFTWARE );                 \
      } while (0)
        Enter fullscreen mode
        Exit fullscreen mode
    

    Specifically, # followed by a parameter name stringifies the set of tokens comprising the corresponding argument for that parameter.

    Some of the weird things about the C preprocessor include:

  • More than one token can comprise an argument.
  • An argument’s leading and trailing whitespace is eliminated.
  • Intervening whitespace between an argument’s tokens (if it has more than one) is collapsed to a single space.
  • For example:

    STRINGIFY(a b)        // "a b"
    STRINGIFY( a b )      // "a b"
    STRINGIFY(a   b)      // "a b"
        Enter fullscreen mode
        Exit fullscreen mode
    
    #define assert(EXPR) ((EXPR) ? (void)0 : \
      __assert( __func__, __FILE__, __LINE__, #EXPR ))
    assert( p != NULL );
        Enter fullscreen mode
        Exit fullscreen mode
    
  • When there are two or more parameters, arguments can be omitted resulting in empty arguments.
  • For example:

    PASTE(,)              // (nothing)
    PASTE(foo,)           // foo
    PASTE(,bar)           // bar
        Enter fullscreen mode
        Exit fullscreen mode
    

    What you want is a result like var_42, that is the prefix var_ followed by the current line number. (Such names are typically used to help ensure unique names.) The problem is that, while PASTE expands its parameter B into the argument __LINE__, if the argument is itself a macro that ordinarily would expand (in this case, to the current line number), it won’t be expanded.

    To fix it (as with many other problems in software) requires an extra level of indirection:

    #define PASTE_HELPER(A,B)  A ## B
    #define PASTE(A,B)         PASTE_HELPER(A,B)
    PASTE(var_, __LINE__)      // var_42
        Enter fullscreen mode
        Exit fullscreen mode
    

    This fixes the problem because __LINE__ will be expanded by PASTE (because it’s not an argument of ##) and then the result of that expansion (the current line number, here, 42) will be passed to PASTE_HELPER that will just concatenate it as-is.

    The same indirection fix can be used with # when necessary as well.

    There are a couple of more weird things about the C preprocessor, specifically that a macro will not expand if either:

  • It references itself (either directly or indirectly); or:
  • A use of a function-like macro is not followed by (.
  • An example of a self-referential macro is:

    #define nullptr nullptr
        Enter fullscreen mode
        Exit fullscreen mode
    

    What use is that? It makes a name available to the preprocessor that can be tested via #ifdef.

    What about a use of a function-like macro that is not followed by (? There are no simple examples I’m aware of so a complicated example is a story for another time.

    Because EMPTY is defined to have zero tokens (comments don’t count), when EMPTY is expanded into AVOID1, you’d think that nothing should be there — except the left - and right - would then come together and form -- (a different token of the minus-minus operator) so the preprocessor inserts a space between them to preserve the original, separate - tokens.

    The preprocessor largely doesn’t care whether it’s preprocessing C or C++ code — except when it comes to paste avoidance. For example:

    #define AVOID2(X)  X*
    AVOID2(->)         // in C  : ->*
    AVOID2(->)         // in C++: -> *
        Enter fullscreen mode
        Exit fullscreen mode
    
  • All macros are in the global scope meaning you have to choose names that won’t clash with standard names nor names used in third-party packages.
  • As shown, function-like macro parameters should always be enclosed in parentheses (except when arguments to either # or ##) to ensure the desired precedence.
  • As shown, function-like macro arguments should not have side effects.
  • As shown, multi-line macros should be enclosed in do ... while loops.
  • Since multiline macros using escaped newlines are joined into a single, long line, errors in such lines are hard to read.
  • Complicated macros are hard to debug because you only ever see the end result of the expansion. There’s no way to see how expansion progresses step-by-step.
  • Problems 2-5 go away by using either constexpr expressions or inline or constexpr (in C++) functions instead of function-like macros. However, it’s harder to use inline functions in C if the parameter types can vary since C doesn’t have templates.

    C11 offers _Generic that provides a veneer of function overloading. Ironically, it requires a function-like macro to use it.

    I decided to solve problem 6 myself by adding a feature to cdecl that allows you to #define macros as usual and then expand them where cdecl will print the expansion step-by-step as well as warn about things you might not expect. However, that’s a story for another time.

    7 Scandalous Weird Old Things About The C Preprocessor, Robert Elder, September 20, 2015
  • GNU C Preprocessor Internals, The
  • Programming languages — C, ISO/IEC 9899:2023 working draft — April 1, 2023

    Built on Forem — the open source software that powers DEV and other inclusive communities.

    Made with love and Ruby on Rails. DEV Community © 2016 - 2024.