As programmers, in our daily office/school life, we are expected to write code following best practice, to comment it wisely, so that when need is to re-read it, well someone can do it. To take a break from all those constraints, we can head to the IOCCC the International Obfuscated C Code Contest.
In this post, we are going to focus on the IOCCC 1986 winner in the Worst abuse of the C preprocessor category. The code was written by James Hague.
Starting from the given source, observing its output, we will explain how it works.
The Code
Here it is in all its obfuscated glory:
#define DIT (
#define DAH )
#define __DAH ++
#define DITDAH *
#define DAHDIT for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main DIT DAH{_DAHDIT
DITDAH _DIT,DITDAH DAH_,DITDAH DIT_,
DITDAH _DIT_,DITDAH DIT_DAH DIT
DAH,DITDAH DAH_DIT DIT DAH;DAHDIT
DIT _DIT=DIT_DAH DIT 81 DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT DAH;__DIT
DIT'n'DAH DAH DAHDIT DIT DAH_=_DIT;DITDAH
DAH_;__DIT DIT DITDAH
_DIT_?_DAH DIT DITDAH DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH DAH DAHDIT DIT
DITDAH DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT DITDAH DAH_>='a'? DITDAH
DAH_&223:DITDAH DAH_ DAH DAH; DIT
DITDAH DIT_ DAH __DAH,_DIT_ __DAH DAH
DITDAH DIT_+= DIT DITDAH _DIT_>='a'? DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_ DAH{ __DIT DIT
DIT_>3?_DAH DIT DIT_>>1 DAH:' 'DAH;return
DIT_&1?'-':'.';}__DIT DIT DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT 1,&DIT_,1 DAH;}
Apart from the particular formatting, what jumps to the eye is the number of “unnecessary” macros and the repetitive use of DIT
and DAT
variations.
The output
If we compile the code at this point we see many warnings. Among them, two for the implicit declaration of __DIT
and _DAH
. After that step, we can run the code, and as we provide sequences of ascii characters, it spits out sequences of . and _.
$ ./a.out hello, world
.... . .-.. .-.. --- --..-- .-- --- .-. .-.. -..
It looks like Morse code. And indeed, using an online Morse decoder, it is. It reverses back to HELLO, WORLD
De-Obfuscating
Let’s first try to perform the pre-processor job and replace the macros by their values. After a bit of reformatting, this is what we have:
char _DAH_[]=”ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e’b.s;i,d:”;
main()
{
char *_DIT, *DAH_, *DIT_, *_DIT_, *malloc (), *gets();
for (_DIT = malloc(81), DIT_=_DIT++; _DIT == gets(_DIT); __DIT(‘n’))
for (DAH_=_DIT; *DAH_; __DIT(*_DIT_ ? _DAH(*DIT_ ) : ‘?’),__DIT(‘ ‘),DAH_++)
for (*DIT_ = 2, _DIT_ = _DAH_; *_DIT_ && (*_DIT_ != (*DAH_ >= ‘a’ ? *DAH_&223 : *DAH_ )); (*DIT_ )++,_DIT_++)
*DIT_+= (*_DIT_>=’a’ ? *_DIT_ — ‘a’ : 0);
}
_DAH(DIT_)
{
__DIT(DIT_> 3 ? _DAH(DIT_>>1) : ‘ ’);
return DIT_ & 1 ? ‘-’ : ‘.’;
}
__DIT(DIT_) char DIT_;
{
(void) write (1,&DIT_,1);
}
Slightly better.
We see the three functions we expected: main
, _DAH
, and __DIT
. We also see an external variable __DAH__
, a long string. __DIT
looks like the putchar
function from the standard library, printing a char at a time. And what about _DAH
?
Dive into _DAH
It is recursive. As long as the argument is a number that takes more than 2 bits to write, it calls the function again, stripping the number from its last bit. The output will be part of the argument printed as —
and .
masking for 1
and 0
, i.e. the number in binary format, and it will return the second leftmost digit. As an example, if we call _DAH(5)
, 5 being 101
in binary, it will call _DAH(2)
. That is the base case. it prints