I’ve been messing around with IRC bots in different languages on and off for the last year, including a couple in Ruby (public broadcast messages for USF CTF 2007 and a stevenote countdown timer), and more recently, “tostada,” a multifunction one in Erlang, which now contains a full-blown programming language implementation.

The Erlang one’s functionality started pretty simply, as a bunch of hand-coded triggers and replies:

command("channel",Cmd) ->
  format("This is ~p",[Cmd#toscmd.channel]);
command("version",) ->
  "I don't have a version.";
command("slavepid",) ->
  format("I'm ~p", [self()]);
command("whoami",Cmd) ->
  format("You're ~p", [Cmd#toscmd.sender]);
command("sleep",Cmd) ->
  Sleeptime = list_to_integer(lists:nth(1, Cmd#toscmd.args)),
  timer:sleep(Sleeptime),
  format("I am awake after sleeping ~p", [Sleeptime]).

However, this seemed pretty anemic compared to the other bot in the channel, Shaniqua, which is basically a frontend for a programmable tcl interpreter, allowing it to do stuff as far-reaching as being a slot machine one second, and fetching stuff from Flickr the next. I wanted tostada to not be completely worthless, but I also wanted it to be all-Erlang because I don’t really know Erlang yet.

So I decided to write a stack-based programming language in Erlang, because this kind of language is quick and easy. I called it “obscura” for reasons I’ve now forgotten.

Erlang

Erlang is a functional programming language invented by Ericsson for reliable parallel programming. To do this, it imposes certain constraints and supports certain conventions.

  • The proper name for a function is functionname/numargs. multiply(X, Y) -> X*Y. is named multiply/2.
  • Functions can have multiple bodies. Erlang selects the best one based on arguments as labeled, and guards (basically caveats, such as double(X) when is_integer(X) ) See command/2 above for an example of a function selected on the value of its first argument.
  • Statements in the same function body are separated with commas “,” ; multiple bodies for the same function are separated with semicolons “;” ; the last body of a function is terminated with a period “.” . See command/2 above for examples.
  • Variables are scoped to the function they’re in. Global variables are discouraged, because they promote coupling.
  • Matching is used instead of assignment. X = 42 succeeds if X is undefined or 42; fails otherwise.
  • Variables are assigned once per function. See previous bullet.
  • In a function’s arguments, _ (underscore) is used to indicate an argument that you don’t care about. If you use a variable name, the compiler will whine if it’s never used. This is a good thing.
  • If the last thing you do in a function is call another function, you don’t create a new stack frame. Thus, tail recursion is the preferred way to loop.

Example tail recursion:

sum(List) -> sum(List, 0).
sum([], Tot) -> % base case - empty list
  Tot;
sum([Car|Cdr], Tot) -> % add current entry to running total
  sum(Cdr, Tot + Cdr).

Note the use of a running total argument to sum/2. This is necessary because simply returning sum(Cdr) + Car will leave the current call on the stack, taking up more memory.

Erlang has quite usable documentation available online at http://www.erlang.org/doc/, and is free to download for most modern operating systems, and Windows too.

Parsing

You only need to parse two things to do simple arithmetic operations: integers and operators. Likewise, the only state you need to hold boils down to the remaining program and the current state of the stack. Let’s parse first, so we have something to test operators with.

We start off with basically a blank slate:

-module (obscura).
-author ("bkerley@brycekerley.net").
-include ("command.hrl").
-export ([eval/1]).
-import (io_lib, [format/2]).

eval(Line) ->
  format("~p",[Line]).

At this point, we can call obscura:eval("3 4 +") and get outÉ what we entered. For simplicity’s sake, we can write a parser in two steps:

  1. Split the line with string:tokens/2 into an array of words.
  2. Convert each word to an integer. Failing that, leave it as a string.

And coded:

eval(Line) ->
  Nodes = integerize(string:tokens(Line, " "), []),
  format("~p",[Nodes]).

integerize([],Accum) ->
  lists:reverse(Accum);
integerize([First|Rest], Accum) ->
  integerize(Rest, [try_integerize(First)|Accum]).

try_integerize(Candidate) ->
  try
    list_to_integer(Candidate)
  catch
    error:_ ->
      Candidate
  end.

integerize/2 is the recursive conversion function. Since Erlang optimizes tail recursion, and it uses a singly-linked list for everything, we don’t want to traverse to the end of the array every step (an O(N) operation each time, or O(N2) for the whole line), simply tacking new entries on at the beginning (an O(1) operation each time), and reverse it at the end (so the whole line is O(N), much better).

try_integerize/1 is

Now, when evaling “3 4 +”, we get an array containing 3, 4, and “+”. That’s the easy part, so on to the fun part.

Arithmetic

eval/1 needs to call a function to actually turn the parsed array into results, which I’ll call eval/2. eval/2 will be recursive (tail-recursive, so we don’t blow up the stack), so it needs our two state elements from before as arguments - remaining line, and current stack. New version of eval/1 that calls eval/2:

eval(Line) ->
  Nodes = integerize(string:tokens(Line, " "), []),
  eval(Nodes, []).

Pushing to the stack

Let’s implement the “push” operation since it’s simple. When the first word in the remaining line is an integer, put it at the beginning of the stack array.

eval([First|Rest], Stack) when is_integer(First) -> % push
  eval(Rest, [First|Stack]);

Cars and Cdrs

This might be your first time with Erlang’s car/cdr equivalent, and guard expression. Basically, [First|Rest] is the expression for a list with first element in the First variable, and the sublist of every other element in the Last variable.

Array = [1, 2, 3, 4],
[First|Rest] = Array,
First = 1,
Rest = [2, 3, 4].
% if you flip line 2 and run this code backwards it'll  be successful :D

Printing results

With the code so far, as soon as we run out of words in the line, we’ll error out with a badmatch error (Erlang couldn’t find an appropriate function body for our arguments). Let’s just return the Stack when our array of remaining words is empty.

eval([], Stack) -> % no more ops, dump stack
  Stack;

Base case and fallback

Right now, if we do anything other than pushing an integer to the stack or running out of remaining words, Erlang will fail and dump a badmatch error report to your screen.

Oops. Let’s add a base case that’ll match every word that hasn’t already been matched with an appropriate operation.

eval([Invalid|Rest], Stack) ->
  format("Invalid opcode \"~s\".  Rest: ~w Stack: ~w",[Invalid, Rest, Stack]).

Now, whenever you execute 3 4 + you should get a friendly error.

Simple Addition and Multiplication

Let’s write the addition operator so we don’t get an error at all when we do 3 4 +:

eval(["+"|Rest], Stack) ->
  [Alpha|[Bravo|Remain]] = Stack,
  Sum = Alpha + Bravo,
  eval(Rest, [Sum|Remain]);

This body of eval/2 (which should go above the “Invalid opcode” body of the same function in your source file) only gets selected when it’s presented with an array beginning with the string “+” as its first argument. When it is selected, it:

  1. Pulls the first two entries from Stack into the variables Alpha and Bravo, using the nested array notation from above.
  2. Adds Alpha and Bravo together and calls the result Sum.
  3. Recurs eval/2 with the Rest of the input line, and a new stack consisting of the Sum in front of the Remaining stack.

Let’s implement multiplication too:

eval(["*"|Rest], Stack) ->
  [Alpha|[Bravo|Remain]] = Stack,
  Sum = Alpha * Bravo,
  eval(Rest, [Sum|Remain]);

Not too surprisingly, there’s a lot of repeated code here, since they’re both operations that pop off the stack twice, perform calculation, and push a result on. There’ll be even more repetition as we implement division, subtraction, and various bit-twiddling operations. Let’s put a stop to that.

Popping twice

The first thing we do in any of these binary operations is always going to be popping two entries off the stack. Let’s write pop_two/1, which takes the Stack, and returns its first two entries and the remaining stack in a tuple (that we’ll immediately unpack).

pop_two(Stack) ->
  [Alpha|[Bravo|Remain]] = Stack,
  {Alpha, Bravo, Remain}.

And rewrite addition and multiplication to use it:

eval(["+"|Rest], Stack) ->
  {Alpha, Bravo, Remain} = pop_two(Stack),
  Sum = Alpha + Bravo,
  eval(Rest, [Sum|Remain]);

eval(["*"|Rest], Stack) ->  
  [Alpha|[Bravo|Remain]] = Stack,
  Sum = Alpha * Bravo,
  eval(Rest, [Sum|Remain]);

Actually, let’s not. Doing it this way does clear up some possibly confusing syntax; but what we really want is a way to get rid of even more repeated code, such as the first and last lines of the function.

eval(["+"|Rest], Stack) ->
  Addition = fun (A,B) ->
    A+B
  end,
  binary_op(Addition, Stack, Rest);

eval(["*"|Rest], Stack) ->
  binary_op(fun (A,B) -> A*B end, Stack, Rest);

Now this is functional programming! We can just say that Addition is a function with an arity of 2 and that returns the sum of its arguments, and call a hypothetical binary_op/3 that handles all the popping, pushing, and recurring. We don’t even have to assign the fun for the operation to a variable, just use it in place (see the multiplication operation).

binary_op(Fun, Stack, Rest) ->
  [Alpha|[Bravo|Remain]] = Stack,
  case Fun(Alpha, Bravo) of
    {op_fail, Desc} ->
      describe_error(Desc, Rest, Stack);
    Result ->
      eval(Rest, [Result|Remain])
  end.

This is a bit more complicated than strictly necessary, but this lets the fun we’re called with throw an error message if necessary (for example, if we have operations that can only operate on integers) by returning a 2-tuple with the op_fail symbol in the first position, and a Description of the error in the second.

Looking Ahead

I haven’t explored any operations beyond addition and multiplication in this post because their implementation should be self-evident.

I also haven’t gone into any serious programming language features like storing and retrieving values, subprocedures, conditionals, and so on because this is just a calculator project (but rest assured those are coming).