Advanced list debugging in m4

In implementing tags rendering for this static blog generation engine, I ran into a number of issues around quoting and specifically using a comma-separated list as a data structure. This surprises no one, since m4 is famously prickly. I felt up to the challenge, though, and I thought I'd summarize here a few repeating problems and solutions.

Enable debugging output

Setting debugmode(V) turns on all the tracing features. This input:

debugmode(V)
define(`foo',7)
define(`bar',`foo')
bar

...becomes:

m4trace:stdin:2: -1- id 2: define ...
m4trace:stdin:2: -1- id 2: define(`foo', `7') -> ???
m4trace:stdin:2: -1- id 2: define(...)

m4trace:stdin:3: -1- id 3: define ...
m4trace:stdin:3: -1- id 3: define(`bar', `foo') -> ???
m4trace:stdin:3: -1- id 3: define(...)

m4trace:stdin:4: -1- id 4: bar ...
m4trace:stdin:4: -1- id 4: bar -> ???
m4trace:stdin:4: -1- id 4: bar -> foo
m4trace:stdin:4: -1- id 5: foo ...
m4trace:stdin:4: -1- id 5: foo -> ???
m4trace:stdin:4: -1- id 5: foo -> 7
7
m4debug:stdin:5: input exhausted

Every expansion gets its own id (they can interleave), and distinct lines show when a macro is encountered, expanded, and concluded.

This can also be enabled globally by passing --debug=V on the command line.

Also, know that your editor can probably help matching quotes and braces. Emacs' m4 mode knows that ` and ' match, and enabling show-paren-mode gives a warning (special highlighting) on a mismatch. Good matching:

A mismatch:

Grokking m4's recursion idiom

The m4 info manual documents this fairly well, although I would have struggled much more with this if it wasn't bread-and-butter for erlang also.

define(`double',`ifelse($#,0,,$#,1,`eval($1 * 2)',`eval($1 * 2),double(shift($@))')')
double(7,12,19,31)

...results in:


14,24,38,62

The key element is the ifelse, whose first arguments handle the special cases of zero or one argument, while the final argument performs the recursion with the non-first arguments (shift($@)). With this idiom, anything can now be accomplished that you might, in another language, use a for or foreach for.

Note that the last arguments to ifelse above must be quoted, or they'll be evaluated before the switch logic, resulting in infinite recursion (marked by a stack overflow):

define(`double',`ifelse($#,0,,$#,1,eval($1 * 2),eval($1 * 2),double(shift($@)))')
double(7,12,19,31)
...
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4:stdin:2: bad expression in eval: * 2
m4: stack overflow

Problem: accumulate a list

I needed something like an append() macro to append a new item to a comma-separated list, which could later be iterated repeatedly. It seemed natural at first to use diversions, since they are pitched as a gradual accumulator of text. This is invalid, though, because diverted text will never be expanded. Once text has been diverted to, say, buffer 4, all you can do with it is undivert it to other buffers, and eventually to output (undivert(0)). It's more accurate to say that a diversion is good for accumulating fully-expanded output text only.

The solution is to append by redefining a macro to have a new element with each call. Here's a reasonable first attempt:

debugmode(V)
define(`append',`ifdef(`mylist',`define(`mylist',mylist`,$1')',`define(`mylist',$1)')')
append(4)
append(7)
append(2)

But, uh-oh:


m4trace:stdin:2: -1- id 2: define ...
m4trace:stdin:2: -1- id 2: define(`append', `ifdef(`mylist',`define(`mylist',mylist`,$1')',`define(`mylist',$1)')') -> ???
m4trace:stdin:2: -1- id 2: define(...)

m4trace:stdin:3: -1- id 3: append ...
m4trace:stdin:3: -1- id 3: append(4) -> ???
m4trace:stdin:3: -1- id 3: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,4')',`define(`mylist',4)')
m4trace:stdin:3: -1- id 4: ifdef ...
m4trace:stdin:3: -1- id 4: ifdef(`mylist', `define(`mylist',mylist`,4')', `define(`mylist',4)') -> ???
m4trace:stdin:3: -1- id 4: ifdef(...) -> `define(`mylist',4)'
m4trace:stdin:3: -1- id 5: define ...
m4trace:stdin:3: -1- id 5: define(`mylist', `4') -> ???
m4trace:stdin:3: -1- id 5: define(...)

m4trace:stdin:4: -1- id 6: append ...
m4trace:stdin:4: -1- id 6: append(7) -> ???
m4trace:stdin:4: -1- id 6: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,7')',`define(`mylist',7)')
m4trace:stdin:4: -1- id 7: ifdef ...
m4trace:stdin:4: -1- id 7: ifdef(`mylist', `define(`mylist',mylist`,7')', `define(`mylist',7)') -> ???
m4trace:stdin:4: -1- id 7: ifdef(...) -> `define(`mylist',mylist`,7')'
m4trace:stdin:4: -1- id 8: define ...
m4trace:stdin:4: -2- id 9: mylist ...
m4trace:stdin:4: -2- id 9: mylist -> ???
m4trace:stdin:4: -2- id 9: mylist -> 4
m4trace:stdin:4: -1- id 8: define(`mylist', `4,7') -> ???
m4trace:stdin:4: -1- id 8: define(...)

m4trace:stdin:5: -1- id 10: append ...
m4trace:stdin:5: -1- id 10: append(2) -> ???
m4trace:stdin:5: -1- id 10: append(...) -> ifdef(`mylist',`define(`mylist',mylist`,2')',`define(`mylist',2)')
m4trace:stdin:5: -1- id 11: ifdef ...
m4trace:stdin:5: -1- id 11: ifdef(`mylist', `define(`mylist',mylist`,2')', `define(`mylist',2)') -> ???
m4trace:stdin:5: -1- id 11: ifdef(...) -> `define(`mylist',mylist`,2')'
m4trace:stdin:5: -1- id 12: define ...
m4trace:stdin:5: -2- id 13: mylist ...
m4trace:stdin:5: -2- id 13: mylist -> ???
m4trace:stdin:5: -2- id 13: mylist -> 4,7
m4trace:stdin:5: -1- id 12: define(`mylist', `4', `7,2') -> ???
m4:stdin:5: Warning: excess arguments to builtin `define' ignored
m4trace:stdin:5: -1- id 12: define(...)


m4debug:stdin:7: input exhausted

The first and second elements are added nicely, but when the third element is added, the preceding definition (4,7) has a comma and is interpreted as separate arguments to define. We're having fun now!

Here's a more concise version, but it has the same problem:

define(`append',`define(`mylist',ifdef(`mylist',mylist`,$1',`$1'))')

After many attempts with different quote placement, I found a solution with changequote. changequote lets you define different quote characters, and in this case we can use it to make that comma-separated list look like a single argument to define:

define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')

With output:

...
m4trace:stdin:7: -1- id 13: define(`mylist', `[4,7,2]') -> ???
...

Success, although our list is now wrapped in [ and ], which will require unwrapping later.

Problem: passing a comma-separated list as one argument or n arguments

Let's try passing our new, bracket-wrapped list to our 'double' macro from earlier. Another temporary changequote is required to strip the brackets off:

define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')
append(4)
append(7)
append(2)

debugmode(V)
define(`double',`ifelse($#,0,,$#,1,`eval($1 * 2)',`eval($1 * 2),double(shift($@))')')
double(changequote([,])mylist[]changequote(`,'))

Sadly, double accepts the list as a single argument, which expands badly later:

m4trace:stdin:8: -1- id 19: double(4,7,2) -> ???
m4trace:stdin:8: -1- id 19: double(...) -> ifelse(1,0,,1,1,`eval(4,7,2 * 2)',`eval(4,7,2 * 2),double(shift(`4,7,2'))')
m4trace:stdin:8: -1- id 23: ifelse ...
m4trace:stdin:8: -1- id 23: ifelse(`1', `0', `', `1', `1', `eval(4,7,2 * 2)', `eval(4,7,2 * 2),double(shift(`4,7,2'))') -> ???
m4trace:stdin:8: -1- id 23: ifelse(...) -> `eval(4,7,2 * 2)'
m4trace:stdin:8: -1- id 24: eval ...
m4trace:stdin:8: -1- id 24: eval(`4', `7', `2 * 2') -> ???
m4:stdin:8: non-numeric argument to builtin `eval'

The solution is to use a gateway macro, which accepts the single argument but passes $1, causing the single string to be interpreted as a list again by the back-end macro. I've renamed the original double to r_double, which calls out that it's recursive and looks more private (we only want to call double directly):

define(`append',`define(`mylist',ifdef(`mylist',`[changequote([,])mylist[,$1]changequote(`,')]',[$1]))')
append(4)
append(7)
append(2)

define(`r_double',`ifelse($#,0,,$#,1,`eval($1 * 2)',`eval($1 * 2),r_double(shift($@))')')
define(`double',`r_double($1)')
double(changequote([,])mylist[]changequote(`,'))

...which produces the expected output:

8,14,4

Double changequotes

What if we move our final changequote call to be inside of recursive function, "just to be sure" that the quotes are always ` and '?
define(`r_double',`changequote(`,')ifelse($#,0,,$#,1,`eval($1 * 2)',`eval($1 * 2),r_double(shift($@))')')
define(`double',`r_double($1)')
double(changequote([,])mylist[])
m4trace:stdin:9: -1- id 24: changequote ...
m4trace:stdin:9: -1- id 24: changequote([`], [']) -> ???
m4trace:stdin:9: -1- id 24: changequote(...)
m4trace:stdin:9: -1- id 25: ifelse ...
m4trace:stdin:9: -1- id 25: ifelse(`3', `0', `', `3', `1', `eval(4 * 2)', `eval(4 * 2),r_double(shift([4],[7],[2]))') -> ???
m4trace:stdin:9: -1- id 25: ifelse(...) -> `eval(4 * 2),r_double(shift([4],[7],[2]))'
m4trace:stdin:9: -1- id 26: eval ...
m4trace:stdin:9: -1- id 26: eval(`4 * 2') -> ???
m4trace:stdin:9: -1- id 26: eval(...) -> `8'
8,m4trace:stdin:9: -1- id 27: r_double ...
m4trace:stdin:9: -2- id 28: shift ...
m4trace:stdin:9: -2- id 28: shift(`[4]', `[7]', `[2]') -> ???
m4trace:stdin:9: -2- id 28: shift(...) -> ``[7]',`[2]''
m4trace:stdin:9: -1- id 27: r_double([7], [2]) -> ???
m4trace:stdin:9: -1- id 27: r_double(...) -> changequote(`,')ifelse(2,0,,2,1,`eval([7] * 2)',`eval([7] * 2),r_double(shift(`[7]',`[2]'))')
m4trace:stdin:9: -1- id 29: changequote ...
m4trace:stdin:9: -1- id 29: changequote(`,') -> ???
m4trace:stdin:9: -1- id 29: changequote(...)
m4trace:stdin:9: -1- id 30: ifelse ...
m4debug:stdin:10: input exhausted
m4:stdin:9: ERROR: end of file in string

It's not immediately clear what went wrong, but consider this simplest case:

debugmode(V)
changequote([,])
changequote(`,')
changequote(`,')
apple,ball,cat
m4trace:stdin:2: -1- id 2: changequote ...
m4trace:stdin:2: -1- id 2: changequote(`[', `]') -> ???
m4trace:stdin:2: -1- id 2: changequote(...)

m4trace:stdin:3: -1- id 3: changequote ...
m4trace:stdin:3: -1- id 3: changequote([`], [']) -> ???
m4trace:stdin:3: -1- id 3: changequote(...)

m4trace:stdin:4: -1- id 4: changequote ...
m4trace:stdin:4: -1- id 4: changequote(`,') -> ???
m4trace:stdin:4: -1- id 4: changequote(...)

applem4debug:stdin:7: input exhausted
m4:stdin:5: ERROR: end of file in string

See it? When the quotes are already properly set, the subsequent changequote changes the quote character to a comma. A double-changequote in m4 is equivalent to a double-free in C, and as dangerous.

Diversion about diversions (and bad naming)

undivert() is the opposite of divert(), right? Really, divert configures where future output goes, while undivert is an immediate flush of the named buffer, to wherever output is currently directed.

You could think of output as a one-way pipe. Use divert to slide the right-hand side buffer list below. Input to the pipe is stdin by default, unless you use undivert to quickly flush a buffer into the pipe (and then immediately return to reading input from stdin). Here's the result of 'divert(0) undivert(3)':

      stdin      
      buffer 1
      buffer 2   	        buffer -1 (discard)
      buffer 3       ====>	buffer  0 (stdout)
      buffer ... 		buffer  1
		 		buffer  2
		 		buffer  3
		 		...                   

Summary

m4 is powerful but requires precise thinking, a high threshold for pain and good debugging output. Hopefully this has been helpful to someone encoutering one of the problems above.