The naive solution
… would be just writing this damn function:
const char *Colour_print(Colour c) {
switch (c) {
case Red: return "Red";
case Green: return "Green";
case Blue: return "Blue";
case Orange: return "Orange";
case White: return "White";
case Black: return "Black";
case Pink: return "Pink";
case Yellow: return "Yellow";
default: return "Unknown";
}
}
Damn nice.
But what if it changes?…
Look: every time you change your enumeration, this function must be
updated accordingly. For example, if you change Yellow
to
Foo
, you must update return "Yellow";
to
return "Foo";
. To make matters worse, you may have several
such functions scattered among many different places across your
codebase, which is totally not fine. Besides the hassle associated with
updating N different points instead of one (O(N)
vs. O(1)
), you are now in the risk of forgetting to update
some line of code, thereby introducing a software bug that is ready to
crash your working application.
X-Macro
Some old folks may respond to this problem with a technique known as X-Macro:
#define COLOURS \
X(Red) \
X(Green) \
X(Blue) \
X(Orange) \
X(White) \
X(Black) \
X(Pink) \
X(Yellow)
#define X(c) c,
typedef enum { COLOURS } Colour;
#undef X
const char *Colour_print(Colour c) {
switch (c) {
#define X(c) case c: return #c;
COLOURS;
#undef X
default: return "Unknown";
}
}
This is a neat technique because it solves the aforementioned problem
with updating N places at once: now you only need to update the
COLOURS
macro and you can expect all the other places to be
updated automatically by the preprocessor. Going further, if we want to
define several such enumerations without unnecessary boilerplate, we can
generalise our X-Macro: instead of using the X
convention
inside COLOURS
, we can parameterise COLOURS
with the f
parameter, thereby expressing a higher-order
macro:
enum-printable.h
#define ENUM_PRINTABLE(name, list) \
typedef enum { list(DEF_ENUM_VARIANT) } name; \
\
const char *name##_print(name val) { \
switch (val) { \
list(CASE_ENUM_VARIANT) \
default: return "Unknown"; \
} \
}
#define DEF_ENUM_VARIANT(c) c,
#define CASE_ENUM_VARIANT(c) case c: return #c;
colour.c
#include "enum-printable.h"
#define COLOURS(f) \
f(Red) \
f(Green) \
f(Blue) \
f(Orange) \
f(White) \
f(Black) \
f(Pink) \
f(Yellow)
ENUM_PRINTABLE(Colour, COLOURS)
apple.c
#include "enum-printable.h"
#define APPLES(f) \
f(GodenDel) \
f(Winesap) \
f(Jonathan) \
f(Cortland)
ENUM_PRINTABLE(Apple, APPLES)
Much better now! However, we can accomplish the same even without one macro per a user enumeration. Let me show you how.
Macro iteration
Excellent, but how this kind of ENUM_PRINTABLE
is
implemented? If you try to implement it by yourself, you will definitely
be in trouble.
However, in this post, I will not borther you with all the details. Here is the definition:
#include <metalang99.h>
#define ENUM_PRINTABLE(name, ...) \
typedef enum { __VA_ARGS__ } name; \
\
const char *name##_print(name val) { \
switch (val) { \
ML99_EVAL(ML99_variadicsForEach(ML99_reify(v(CASE_ENUM_VARIANT)), \
v(__VA_ARGS__))) \
default: return "Unknown"; \
} \
}
#define CASE_ENUM_VARIANT(c) case c: return #c;
Metalang99 is a
macro metaprogramming library for pure C99 (and C++11) that allows you
to perform macro recursion and iteration. In this code snippet, we
iterate on variadic arguments using ML99_variadicsForEach
in order to generate a pretty-printing function for a particular
enumeration. I developed Metalang99 in response to a number
of use-cases that require macro recursion and iteration; deriving
pretty-printers is only one example of such a majestic superpower.
Why this approach is better than the previous one? It is more natural
to C, easier to write, and causes less confusion at a caller site (“What
is this mysterious f
???”). E.g., you can just copy the
comma-separated enumeration variants and paste them somewhere else, but
with X-Macro you would need to remove the f
invocations and
intersperse commas between the variants in a proper manner.
What are the disadvantages? Now you cannot derive some stuff
for an enum
in a different source file. This is what I call
the locality of definition: all necessary stuff must be defined
in the same place as our enumeration. This might or might not be a
dissapointment for you, but generally speaking, locality of definition
reduces flexibility of code. For example, in Rust, I cannot put my own
derive macro onto some type defined in a third-party library, which
sometimes causes frustration and ass pain.
A program is a sort of publication. It’s meant to be read by the programmer, another programmer (perhaps yourself a few days, weeks or years later), and lastly a machine.
A word about third-party codegen
One may want to suggest using code generators like M4 instead of Metalang99, owing to the huge amount of macro machinery involved in this library and/or incomprehensible compile-time errors (a false statement, we will come back to this later). Sure you can, but think about the consequences; you would then need to either:
- Separate codegen files from C files, or
- Embed special syntax to C files and fuck up IDE support, or
- Write code in comments and fuck up IDE support, or
- Do something else and fuck up IDE support.
Several projects adopted some of the aforementioned solutions (especially 3 and 4). The unfortunate consequences may not seem so severe if you invoke codegen only in a small amount of places, but declaring types, obviously, does not fall into this category of things. Therefore, we need a solution that does not require external machinery, and the only thing in pure C that can do the trick is its macro system.
Now about compilation errors: they are just fine, really. For
example, if we accidentally make a mistake somewhere in the middle of
ENUM_PRINTABLE
:
test.c
#define ENUM_PRINTABLE(name, ...) \
typedef enum { __VA_ARGS__ } name; \
\
const char *name##_print(name val) { \
switch (val) { \
ML99_EVAL(ML99_variadicsForEach_BLAH( \
ML99_reify(v(CASE_ENUM_VARIANT)), v(__VA_ARGS__))) \
default: return "Unknown"; \
} \
}
// More code here...
We would see the following compilation error:
/bin/sh
$ gcc test.c -Imetalang99/include -ftrack-macro-expansion=0
test.c: In function ‘Colour_print’:
test.c:20:1: error: static assertion failed: "invalid term `ML99_variadicsForEach_BLAH( (0args, ML99_reify, (0v, CASE_ENUM_VARIANT)), (0v, Red, Green, Blue, Orange, White, Black, Pink, Yellow))`"
20 | ENUM_PRINTABLE(Colour, Red, Green, Blue, Orange, White, Black, Pink, Yellow)
| ^~~~~~~~~~~~~~
From which we easily conclude that
ML99_variadicsForEach_BLAH
is not really a macro and we
need to change it to the proper ML99_variadicsForEach
,
after which the compilation succeeds. In fact, Metalang99 is built with
developer experience in mind: it is equipped with a built-in syntax
checker and panic invocation facilities, which makes macro development a
much less painful process. For a more elaborated discussion on
side-effects of macros and the nature of the C preprocessor, please see
my blog post “What’s
the Point of the C Preprocessor, Actually?”
Final words
So here are all three solutions that I saw throughout my experience as a C programmer. You can choose an approach to your own liking. Maybe you have learnt something new. Note that the above discussion can be applied not only to pretty-printing, but generally to any piece of code that needs to generate some other code depending on a symbolic representation of enumeration’s values. Cheers!
Links:
- Installation instructions for Metalang99.
- Q: Why use C instead of Rust/Zig/whatever else?
- Q: Why not third-party code generators?
For more information on Metalang99 and derived projects, see “Macros on Steroids, Or: How Can Pure C Benefit From Metaprogramming”.
Appendix: Deriving pretty-printers via Datatype99
Datatype99 is a project derived from Metalang99, which allows you to define enumerations with payloads (a.k.a. algebraic data types (ADTs)). It does also provide a functionality similar to Rust’s derive macros. Let us leverage Datatype99 and see how to achieve pretty-printing through compile-time type introspection:
#include <datatype99.h>
#include <stdio.h>
#define DATATYPE99_DERIVE_Print_IMPL(name, variants) \
ML99_call( \
derivePrint, \
v(name), \
ML99_listMapInPlace(ML99_compose(v(caseEnumVariant), v(ML99_untuple)), v(variants)))
#define derivePrint_IMPL(name, ...) \
v(const char *name##_print(name val) { \
match(val) { \
__VA_ARGS__ \
} \
return "Unknown"; \
})
#define caseEnumVariant_IMPL(tag, _sig) ML99_prefixedBlock(DATATYPE99_of(v(tag)), v(return #tag;))
#define caseEnumVariant_ARITY 1
datatype(
derive(Print),
Colour,
(Red),
(Green),
(Blue),
(Orange),
(White),
(Black),
(Pink),
(Yellow)
);
int main(void) {
printf("Got '%s'!\n", Colour_print(Yellow()));
}
The neat thing is that you can list other derivers together with
Print
like this: derive(Print, Foo, Bar)
,
which adds some extensibility to the code. Note that this implementation
does not account variant parameters (payload); examples/derive/print.c
shows how to handle them. For more information on deriver macros, see my
blog post “Compile-Time
Introspection of Sum Types in Pure C99”.