In implementing tags rendering for this static blog generation
engine, I ran into a number of issues around quoting and specifically
using a comma-separated list as a data structure. This surprises no
one, since m4 is famously prickly. I felt up to the challenge,
though, and I thought I'd summarize here a few repeating problems and
solutions.
Enable debugging output
Setting debugmode(V)
turns on all the tracing
features. This input:
debugmode(V)
define(`foo',7)
define(`bar',`foo')
bar
...becomes:
m4trace:stdin:2: -1- id 2: define ...
m4trace:stdin:2: -1- id 2: define(`foo', `7') -> ???
m4trace:stdin:2: -1- id 2: define(...)
m4trace:stdin:3: -1- id 3: define ...
m4trace:stdin:3: -1- id 3: define(`bar', `foo') -> ???
m4trace:stdin:3: -1- id 3: define(...)
m4trace:stdin:4: -1- id 4: bar ...
m4trace:stdin:4: -1- id 4: bar -> ???
m4trace:stdin:4: -1- id 4: bar -> foo
m4trace:stdin:4: -1- id 5: foo ...
m4trace:stdin:4: -1- id 5: foo -> ???
m4trace:stdin:4: -1- id 5: foo -> 7
7
m4debug:stdin:5: input exhausted
Every expansion gets its own id (they can interleave), and distinct
lines show when a macro is encountered, expanded, and concluded.
This can also be enabled globally by passing --debug=V
on the command line.
Also, know that your editor can probably help matching quotes and
braces. Emacs' m4 mode knows that ` and ' match, and enabling
show-paren-mode
gives a warning (special highlighting) on
a mismatch. Good matching:
A mismatch:
Grokking m4's recursion idiom
The m4 info manual documents this fairly well, although I would
have struggled much more with this if it wasn't bread-and-butter for
erlang also.
define(`double',`ifelse($double(7,12,19,31)
...results in:
14,24,38,62
The key element is the ifelse
, whose first arguments handle the special cases of zero or one argument, while the final argument performs the recursion with the non-first arguments (shift($@)
). With this idiom, anything can now be accomplished that you might, in another language, use a for
or foreach
for.
Note that the last arguments to ifelse
above must be quoted, or they'll be evaluated before the switch logic, resulting in infinite recursion (marked by a stack overflow):
define(`double',`ifelse($double(7,12,19,31)
...
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4: stack overflow
Problem: accumulate a list
I needed something like an append()
macro to append a
new item to a comma-separated list, which could later be iterated
repeatedly. It seemed natural at first to use diversions, since they
are pitched as a gradual accumulator of text. This is invalid,
though, because diverted text will never be expanded. Once
text has been diverted to, say, buffer 4, all you can do with it is
undivert it to other buffers, and eventually to output
(undivert(0)
). It's more accurate to say that a
diversion is good for accumulating fully-expanded output text only.
The solution is to append by redefining a macro to have a new
element with each call. Here's a reasonable first attempt:
debugmode(V)
define(`append',`ifdef(`mylist',`define(`mylist',mylist`,$1')',`define(`mylist',$1)')')
append(4)
append(7)
append(2)
But, uh-oh:
m4trace:stdin:2: -1- id 2: define ...
m4trace:stdin:2: -1- id 2: define(`append', `ifdef(`mylist',`define(`mylist',mylist`,$1')',`define(`mylist',$1)')') -> ???
m4trace:stdin:2: -1- id 2: define(...)
m4trace:stdin:3: -1- id 3: append ...
m4trace:stdin:3: -1- id 3: append(4) -> ???
m4trace:stdin:3: -1- id 3: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,4')',`define(`mylist',4)')
m4trace:stdin:3: -1- id 4: ifdef ...
m4trace:stdin:3: -1- id 4: ifdef(`mylist', `define(`mylist',mylist`,4')', `define(`mylist',4)') -> ???
m4trace:stdin:3: -1- id 4: ifdef(...) -> `define(`mylist',4)'
m4trace:stdin:3: -1- id 5: define ...
m4trace:stdin:3: -1- id 5: define(`mylist', `4') -> ???
m4trace:stdin:3: -1- id 5: define(...)
m4trace:stdin:4: -1- id 6: append ...
m4trace:stdin:4: -1- id 6: append(7) -> ???
m4trace:stdin:4: -1- id 6: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,7')',`define(`mylist',7)')
m4trace:stdin:4: -1- id 7: ifdef ...
m4trace:stdin:4: -1- id 7: ifdef(`mylist', `define(`mylist',mylist`,7')', `define(`mylist',7)') -> ???
m4trace:stdin:4: -1- id 7: ifdef(...) -> `define(`mylist',mylist`,7')'
m4trace:stdin:4: -1- id 8: define ...
m4trace:stdin:4: -2- id 9: mylist ...
m4trace:stdin:4: -2- id 9: mylist -> ???
m4trace:stdin:4: -2- id 9: mylist -> 4
m4trace:stdin:4: -1- id 8: define(`mylist', `4,7') -> ???
m4trace:stdin:4: -1- id 8: define(...)
m4trace:stdin:5: -1- id 10: append ...
m4trace:stdin:5: -1- id 10: append(2) -> ???
m4trace:stdin:5: -1- id 10: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,2')',`define(`mylist',2)')
m4trace:stdin:5: -1- id 11: ifdef ...
m4trace:stdin:5: -1- id 11: ifdef(`mylist', `define(`mylist',mylist`,2')', `define(`mylist',2)') -> ???
m4trace:stdin:5: -1- id 11: ifdef(...) -> `define(`mylist',mylist`,2')'
m4trace:stdin:5: -1- id 12: define ...
m4trace:stdin:5: -2- id 13: mylist ...
m4trace:stdin:5: -2- id 13: mylist -> ???
m4trace:stdin:5: -2- id 13: mylist -> 4,7
m4trace:stdin:5: -1- id 12: define(`mylist', `4', `7,2') -> ???
m4:stdin:5: Warning: excess arguments to builtin `define' ignored
m4trace:stdin:5: -1- id 12: define(...)
m4debug:stdin:7: input exhausted
The first and second elements are added nicely, but when the third
element is added, the preceding definition (4,7) has a comma and is interpreted
as separate arguments to define. We're having fun now!
Here's a more concise version, but it has the same problem:
define(`append',`define(`mylist',ifdef(`mylist',mylist`,$1',`$1'))')
After many attempts with different quote placement, I found a solution with changequote.
changequote lets you define different quote characters, and in this case we can
use it to make that comma-separated list look like a single argument to define:
define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')
With output:
...
m4trace:stdin:7: -1- id 13: define(`mylist', `[4,7,2]') -> ???
...
Success, although our list is now wrapped in [ and ], which will require unwrapping later.
Problem: passing a comma-separated list as one argument or n arguments
Let's try passing our new, bracket-wrapped list to our 'double'
macro from earlier. Another temporary changequote is required
to strip the brackets off:
define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')
append(4)
append(7)
append(2)
debugmode(V)
define(`double',`ifelse($double(changequote([,])mylist[]changequote(`,'))
Sadly, double accepts the list as a single argument, which expands badly later:
m4trace:stdin:8: -1- id 19: double(4,7,2) -> ???
m4trace:stdin:8: -1- id 19: double(...) -> ifelse(1,0,,1,1,`eval(4,7,2 * 2)',`eval(4,7,2 * 2),double(shift(`4,7,2'))')
m4trace:stdin:8: -1- id 23: ifelse ...
m4trace:stdin:8: -1- id 23: ifelse(`1', `0', `', `1', `1', `eval(4,7,2 * 2)', `eval(4,7,2 * 2),double(shift(`4,7,2'))') -> ???
m4trace:stdin:8: -1- id 23: ifelse(...) -> `eval(4,7,2 * 2)'
m4trace:stdin:8: -1- id 24: eval ...
m4trace:stdin:8: -1- id 24: eval(`4', `7', `2 * 2') -> ???
m4:stdin:8: non-numeric argument to builtin `eval'
The solution is to use a gateway macro, which accepts the single
argument but passes $1, causing the single string to be interpreted as
a list again by the back-end macro. I've renamed the original double
to r_double, which calls out that it's recursive and looks more
private (we only want to call double directly):
define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')
append(4)
append(7)
append(2)
...which produces the expected output:
8,14,4
Double changequotes
What if we move our final changequote call to be inside of recursive
function, "just to be sure" that the quotes are always ` and '?
define(`r_double',`changequote(`,')ifelse($define(`double',`r_double($1)')
double(changequote([,])mylist[])
m4trace:stdin:9: -1- id 24: changequote ...
m4trace:stdin:9: -1- id 24: changequote([`], [']) -> ???
m4trace:stdin:9: -1- id 24: changequote(...)
m4trace:stdin:9: -1- id 25: ifelse ...
m4trace:stdin:9: -1- id 25: ifelse(`3', `0', `', `3', `1', `eval(4 * 2)', `eval(4 * 2),r_double(shift([4],[7],[2]))') -> ???
m4trace:stdin:9: -1- id 25: ifelse(...) -> `eval(4 * 2),r_double(shift([4],[7],[2]))'
m4trace:stdin:9: -1- id 26: eval ...
m4trace:stdin:9: -1- id 26: eval(`4 * 2') -> ???
m4trace:stdin:9: -1- id 26: eval(...) -> `8'
8,m4trace:stdin:9: -1- id 27: r_double ...
m4trace:stdin:9: -2- id 28: shift ...
m4trace:stdin:9: -2- id 28: shift(`[4]', `[7]', `[2]') -> ???
m4trace:stdin:9: -2- id 28: shift(...) -> ``[7]',`[2]''
m4trace:stdin:9: -1- id 27: r_double([7], [2]) -> ???
m4trace:stdin:9: -1- id 27: r_double(...) -> changequote(`,')ifelse(2,0,,2,1,`eval([7] * 2)',`eval([7] * 2),r_double(shift(`[7]',`[2]'))')
m4trace:stdin:9: -1- id 29: changequote ...
m4trace:stdin:9: -1- id 29: changequote(`,') -> ???
m4trace:stdin:9: -1- id 29: changequote(...)
m4trace:stdin:9: -1- id 30: ifelse ...
m4debug:stdin:10: input exhausted
m4:stdin:9: ERROR: end of file in string
It's not immediately clear what went wrong, but consider this simplest case:
debugmode(V)
changequote([,])
changequote(`,')
changequote(`,')
apple,ball,cat
m4trace:stdin:2: -1- id 2: changequote ...
m4trace:stdin:2: -1- id 2: changequote(`[', `]') -> ???
m4trace:stdin:2: -1- id 2: changequote(...)
m4trace:stdin:3: -1- id 3: changequote ...
m4trace:stdin:3: -1- id 3: changequote([`], [']) -> ???
m4trace:stdin:3: -1- id 3: changequote(...)
m4trace:stdin:4: -1- id 4: changequote ...
m4trace:stdin:4: -1- id 4: changequote(`,') -> ???
m4trace:stdin:4: -1- id 4: changequote(...)
applem4debug:stdin:7: input exhausted
m4:stdin:5: ERROR: end of file in string
See it? When the quotes are already properly set, the subsequent
changequote changes the quote character to a comma. A double-changequote
in m4 is equivalent to a double-free in C, and as dangerous.
Diversion about diversions (and bad naming)
undivert()
is the opposite of divert()
,
right? Really, divert configures where future output goes, while undivert
is an immediate flush of the named buffer, to wherever output is currently directed.
You could think of output as a one-way pipe. Use divert to
slide the right-hand side buffer list below. Input to the pipe is stdin by
default, unless you use undivert to quickly flush a buffer into
the pipe (and then immediately return to reading input from stdin).
Here's the result of 'divert(0) undivert(3)':
stdin
buffer 1
buffer 2 buffer -1 (discard)
buffer 3 ====> buffer 0 (stdout)
buffer ... buffer 1
buffer 2
buffer 3
...
Summary
m4 is powerful but requires precise thinking, a high threshold for
pain and good debugging output. Hopefully this has been helpful to
someone encoutering one of the problems above.