What's the Point of the C Preprocessor, Actually?

Hirrolot's Blog

Aug 3, 2021

r/programming · r/ProgrammingLanguages · r/C_Programming

“To do stuff that you cannot do with functions” 1 – virtually every C programmer would say.

Fair. But in fact, it is a bit more complicated.

As my experience with C and programming in general grew, I recognised more and more code patterns that I write by hand… that I am forced to write by hand. Most of the time, this occurred due to inadequacies of a programming language being used.

This is especially the case with C.

In C, I quite often find myself writing virtual tables, writing tagged unions… I distribute function pointers to their proper locations, I construct variants explicitly, checking whether data and a tag are consistent. In more high-level languages, these concepts are built-in; you do not need to explicitly state a memory layout, state exact instructions to construct the layout. This all is hidden under a nice interface.

We can do it in C too. Take a look at the preprocessor.

Macros to the rescue!

In C, the only way to do metaprogramming is to use macros. Perhaps the reason we could not reify such abstractions as tagged unions till this moment is that C macros can only work with individual items but cannot operate on sequences thereof; put it simply, macros cannot loop or recurse. Therefore, without loops or recursion at our disposal, we could not generate something from a series of functions comprising some software interface, or from a series of variants comprising some tagged union. But we can enrich the preprocessor in such a way that it becomes possible. Read on.

Metalang99 is the solution I came up with.

Metalang99 is a language to write macros. Sorry, it is a language to write recursive macros!

What was previously impossible soon became possible:

datatype(
    BinaryTree,
    (Leaf, int),
    (Node, BinaryTree *, int, BinaryTree *)
);

int sum(const BinaryTree *tree) {
    match(*tree) {
        of(Leaf, x) return *x;
        of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
    }

    return -1;
}

Adapted from Datatype99, a library for tagged unions.

Oh, sorry again, I am a bit sleepy today. I forgot one crucial detail: to make the following code work, you must #include <datatype99.h>. Let me mend myself this time:

#include <interface99.h>

#define Shape_IFACE                      \
    vfunc( int, perim, const VSelf)      \
    vfunc(void, scale, VSelf, int factor)

interface(Shape);

typedef struct {
    int a, b;
} Rectangle;

int  Rectangle_perim(const VSelf) { /* ... */ }
void Rectangle_scale(VSelf, int factor) { /* ... */ }

impl(Shape, Rectangle);

typedef struct {
    int a, b, c;
} Triangle;

int  Triangle_perim(const VSelf) { /* ... */ }
void Triangle_scale(VSelf, int factor) { /* ... */ }

impl(Shape, Triangle);

Adapted from Interface99, a library for software interfaces.

Everything is correct now.

Everything you need to make it work is a one-liner #include <interface99.h>.

Believe it or not, this little detail is the exact purpose of the preprocessor. Let me explain it to you: preprocessor macros are embedded into the language for a reason. After all, macros are just a means for code generation, so why cannot we generate code using external tools, provided that they are often more advanced and so on? Because aside from being “advanced and so on”, they are also less natural.

What is wrong with external codegen?

The thing is that with native macros, you can interleave invocations thereof with the actual code, or business logic, or files in which you usually write your code. With third-party code generators, you cannot. You can only fscanf some code from file.blah and fprintf the generated code to generated.h. Okay, even if you had a ready C parser to read macro invocations of the form X(...) directly from source.c, where X is defined as #define X(...) /* Consume all arguments! */ not to break the real compilation, where would you generate code? Please, do not tell me that you are going to fprintf right into source.c! Because you know, the placement of functions/types makes much sense in C, and you cannot fprintf the generated code for X(...) to generated.h and include it in source.c. The things might break apparently. And yes, you cannot just swallow the whole source.c and output source-generated.c somewhere because your IDE would then unironically say “goodbye good luck” to you – at least, constructions generated by such macros would no longer be visible when you write code.

That is, with third-party code generators, you are forced to separate the files in which you write ordinary code from the files to be fed to the code generator.

With native macros, you write code as usual.

With native macros, you do not violate the normal order in which linguistic constructions cooperate with each other. When you write struct Vect { ... }, you write it in the same file as Vect_add, Vect_remove, and so on. Why should you apparently write datatype(T, ...) in a separate file when it is also a linguistic construction? Elaborating further, why should we treat software interfaces as an alien spacecraft fallen to Earth?

With Datatype99 and Interface99, you generate the stuff in-place. Tagged unions and software interfaces are those kinds of abstractions to be considered as parts of the host language, i.e., C. Therefore, they should be treated in the same way as we treat struct, as we treat union, functions, and variables.

No, I am not claiming that external codegen is useless. It has applications in a build process and other areas; for example, sometimes it is perfectly fine to separate files (OpenSSL, n.d.). What I am trying to convey is to use the right tool for the job. But wait, the suggested libraries rely on some heavy-duty macros, and it is crystal clear that the vanilla C preprocessor is not meant for such kind of abuse, right?

This is the turning point of our spontaneous discussion.

The side effects of aggressive macros

Instead of thinking philosophically, I encourage you to think pragmatically.

Instead of thinking about what is good and what is bad, I encourage you to think about benefits and possible side effects.

The benefits include more concise, safe, clean code.

The side effects might include scary compilation errors and preposterous compilation times.

Not really.

When I started designing Metalang99, I was aware of how metaprogramming can go insane. Metalang99 is an attempt to make it less insane. With some unhealthy curiosity, you might accidentally call Satan, and he will kindly produce gigabytes of error messages for you, dear. Not kidding, I experienced it on my own:

In the above error, I asked a compiler to show a full backtrace of macro expansions. Most of the time, it is just a senseless bedsheet of macro definitions, so I always turn it down by -ftrack-macro-expansion=0 (GCC) or -fmacro-backtrace-limit=1 (Clang).

But how to produce errors that people understand?

This question is out of the scope bla-bla-bla. I will just show you some real errors you can get from Datatype99 real quick:

Looks nice?

I know how to break this wonderful world. Look:

playground.c

datatype(A, (Foo, int) ~, (Bar, int));

/bin/sh

$ gcc playground.c -Imetalang99/include -Idatatype99 -ftrack-macro-expansion=0
playground.c:3:1: error: static assertion failed: "invalid term `ML99_PRIV_IF_0 ~(ML99_PRIV_listFromTuplesError, ML99_PRIV_listFromTuplesProgressAux) (DATATYPE99_PRIV_parseVariant, 2, (Foo, int) ~, (Bar, int), ~)`"
    3 | datatype(A, (Foo, int) ~, (Bar, int));
      | ^~~~~~~~

Looks less nice?

Bad news: it is impossible to handle all kinds of errors in macros gracefully. But we do not need to handle all of them. It would be sufficient to handle most of them. Now I shall convince you that even Rust, a language that sells itself as a language with comprehensible errors, even Rust sometimes produces complete nonsense:

(Kindly given by Waffle Lapkin.)

Show more hordes of errors…

(I believe some of them were on stable Rust.)

Even so, most of the time, Rust performs well enough.

Even so, most of the time, Datatype99 & Inteface99 perform well enough.

Rust exemplifies perfectly that a system need not be ideal to be practically useful. The same holds for the macros: I rarely see complete nonsense from my macros, but whether you like it or not, it might happen. Surely, it is not a reason to abandon the whole approach; as you can see, your computer is still there, your terminal did not die under tons of error messages, and everything you need to do is just to carefully look at the macro invocation and perhaps run your compiler with -E 2. The funny fact is that even in Rust, I was forced to cargo-expand some macros several times to get a sense of what is wrong, so why no one is saying that Rusty macros are totally unusable?

Regarding compilation times, they are just fine.

Final words

Let me sum up.

The purpose of the preprocessor is to enable seamless integration.

The purpose of the preprocessor is to allow your macros to be conveniently interleaved with the rest of your code.

The purpose of the preprocessor is not to break the normal order in which linguistic abstractions cooperate with each other.

The purpose of the preprocessor is to be natural
and this is what external codegen cannot suggest, no matter how you try.

Links:

Afterword

References

Drew DeVault. n.d. “Rust Is Not a Good C Replacement.” https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html.
Hirrolot. n.d.a. “A Research Programming Language on Top of C Macros.” https://github.com/Hirrolot/poica.
———. n.d.b. “Datatype99 Code Generation Semantics.” https://github.com/Hirrolot/datatype99#semantics.
———. n.d.c. “FAQ: Why Use C Instead of Rust/Zig/Whatever Else?” https://github.com/Hirrolot/datatype99#q-why-use-c-instead-of-rustzigwhatever-else.
———. n.d.d. “Interface99 Code Generation Semantics.” https://github.com/Hirrolot/interface99#semantics.
OpenSSL. n.d. “A Use Case for Generics in OpenSSL.” https://github.com/openssl/openssl/blob/aff636a4893e24bdc686a00a13ae6199dd38d6aa/include/openssl/safestack.h.in.
Simucal. n.d. “Why Would Anybody Use C over C++?” https://stackoverflow.com/questions/497786/why-would-anybody-use-c-over-c.

  1. I believe that the C preprocessor was initially put into the language as a temporary workaround. With the preprocessor, you can do conditional compilation, foreach-macros, generics, etc. Nowadays, most of this stuff is done by “the right tools” but back in the 70’s, it was unclear how to solve such problems.↩︎

  2. -E stands for “preprocess only”. It is supported at least by GCC and Clang but other compilers should have the same option as well (probably under a different name).↩︎