Writing a Postfix Calculator in Erlang
Message from 2022
This post is pretty old! Opinions and technical information in it are almost certainly oudated. Commands and configurations will probably not work. Consider the age of the content before putting any of it into practice.
I’ve been messing around with IRC bots in different languages on and off for the last year, including a couple in Ruby (public broadcast messages for USF CTF 2007 and a stevenote countdown timer), and more recently, “tostada,” a multifunction one in Erlang, which now contains a full-blown programming language implementation.
The Erlang one’s functionality started pretty simply, as a bunch of hand-coded triggers and replies:
command("channel",Cmd) ->
format("This is ~p",[Cmd#toscmd.channel]);
command("version",) ->
"I don't have a version.";
command("slavepid",) ->
format("I'm ~p", [self()]);
command("whoami",Cmd) ->
format("You're ~p", [Cmd#toscmd.sender]);
command("sleep",Cmd) ->
Sleeptime = list_to_integer(lists:nth(1, Cmd#toscmd.args)),
timer:sleep(Sleeptime),
format("I am awake after sleeping ~p", [Sleeptime]).
However, this seemed pretty anemic compared to the other bot in the channel, Shaniqua, which is basically a frontend for a programmable tcl interpreter, allowing it to do stuff as far-reaching as being a slot machine one second, and fetching stuff from Flickr the next. I wanted tostada to not be completely worthless, but I also wanted it to be all-Erlang because I don’t really know Erlang yet.
So I decided to write a stack-based programming language in Erlang, because this kind of language is quick and easy. I called it “obscura” for reasons I’ve now forgotten.
Erlang
Erlang is a functional programming language invented by Ericsson for reliable parallel programming. To do this, it imposes certain constraints and supports certain conventions.
- The proper name for a function is
functionname/numargs
.multiply(X, Y) -> X*Y.
is namedmultiply/2
. - Functions can have multiple bodies. Erlang selects the best one based on arguments as labeled, and guards (basically caveats, such as
double(X) when is_integer(X)
) Seecommand/2
above for an example of a function selected on the value of its first argument. - Statements in the same function body are separated with commas “
,
” ; multiple bodies for the same function are separated with semicolons “;
” ; the last body of a function is terminated with a period “.
” . Seecommand/2
above for examples. - Variables are scoped to the function they’re in. Global variables are discouraged, because they promote coupling.
- Matching is used instead of assignment.
X = 42
succeeds ifX
is undefined or 42; fails otherwise. - Variables are assigned once per function. See previous bullet.
- In a function’s arguments,
_
(underscore) is used to indicate an argument that you don’t care about. If you use a variable name, the compiler will whine if it’s never used. This is a good thing. - If the last thing you do in a function is call another function, you don’t create a new stack frame. Thus, tail recursion is the preferred way to loop.
Example tail recursion:
sum(List) -> sum(List, 0).
sum([], Tot) -> % base case - empty list
Tot;
sum([Car|Cdr], Tot) -> % add current entry to running total
sum(Cdr, Tot + Cdr).
Note the use of a running total argument to sum/2
. This is necessary because simply returning sum(Cdr) + Car
will leave the current call on the stack, taking up more memory.
Erlang has quite usable documentation available online at http://www.erlang.org/doc/, and is free to download for most modern operating systems, and Windows too.
Parsing
You only need to parse two things to do simple arithmetic operations: integers and operators. Likewise, the only state you need to hold boils down to the remaining program and the current state of the stack. Let’s parse first, so we have something to test operators with.
We start off with basically a blank slate:
-module (obscura).
-author ("bkerley@brycekerley.net").
-include ("command.hrl").
-export ([eval/1]).
-import (io_lib, [format/2]).
eval(Line) ->
format("~p",[Line]).
At this point, we can call obscura:eval("3 4 +")
and get outÉ what we entered. For simplicity’s sake, we can write a parser in two steps:
- Split the line with
string:tokens/2
into an array of words. - Convert each word to an integer. Failing that, leave it as a string.
And coded:
eval(Line) ->
Nodes = integerize(string:tokens(Line, " "), []),
format("~p",[Nodes]).
integerize([],Accum) ->
lists:reverse(Accum);
integerize([First|Rest], Accum) ->
integerize(Rest, [try_integerize(First)|Accum]).
try_integerize(Candidate) ->
try
list_to_integer(Candidate)
catch
error:_ ->
Candidate
end.
integerize/2
is the recursive conversion function. Since Erlang optimizes tail recursion, and it uses a singly-linked list for everything, we don’t want to traverse to the end of the array every step (an O(N) operation each time, or O(N2) for the whole line), simply tacking new entries on at the beginning (an O(1) operation each time), and reverse it at the end (so the whole line is O(N), much better).
try_integerize/1
is
Now, when eval
ing “3 4 +”, we get an array containing 3, 4, and “+”. That’s the easy part, so on to the fun part.
Arithmetic
eval/1
needs to call a function to actually turn the parsed array into results, which I’ll call eval/2
. eval/2
will be recursive (tail-recursive, so we don’t blow up the stack), so it needs our two state elements from before as arguments - remaining line, and current stack. New version of eval/1
that calls eval/2
:
eval(Line) ->
Nodes = integerize(string:tokens(Line, " "), []),
eval(Nodes, []).
Pushing to the stack
Let’s implement the “push” operation since it’s simple. When the first word in the remaining line is an integer, put it at the beginning of the stack array.
eval([First|Rest], Stack) when is_integer(First) -> % push
eval(Rest, [First|Stack]);
Cars and Cdrs
This might be your first time with Erlang’s car
/cdr
equivalent, and guard expression. Basically, [First|Rest]
is the expression for a list with first element in the First
variable, and the sublist of every other element in the Last
variable.
Array = [1, 2, 3, 4],
[First|Rest] = Array,
First = 1,
Rest = [2, 3, 4].
% if you flip line 2 and run this code backwards it'll be successful :D
Printing results
With the code so far, as soon as we run out of words in the line, we’ll error out with a badmatch
error (Erlang couldn’t find an appropriate function body for our arguments). Let’s just return the Stack
when our array of remaining words is empty.
eval([], Stack) -> % no more ops, dump stack
Stack;
Base case and fallback
Right now, if we do anything other than pushing an integer to the stack or running out of remaining words, Erlang will fail and dump a badmatch
error report to your screen.
Oops. Let’s add a base case that’ll match every word that hasn’t already been matched with an appropriate operation.
eval([Invalid|Rest], Stack) ->
format("Invalid opcode \"~s\". Rest: ~w Stack: ~w",[Invalid, Rest, Stack]).
Now, whenever you execute 3 4 +
you should get a friendly error.
Simple Addition and Multiplication
Let’s write the addition operator so we don’t get an error at all when we do 3 4 +
:
eval(["+"|Rest], Stack) ->
[Alpha|[Bravo|Remain]] = Stack,
Sum = Alpha + Bravo,
eval(Rest, [Sum|Remain]);
This body of eval/2
(which should go above the “Invalid opcode” body of the same function in your source file) only gets selected when it’s presented with an array beginning with the string “+” as its first argument. When it is selected, it:
- Pulls the first two entries from
Stack
into the variablesAlpha
andBravo
, using the nested array notation from above. - Adds
Alpha
andBravo
together and calls the resultSum
. - Recurs
eval/2
with theRest
of the input line, and a new stack consisting of theSum
in front of theRemain
ing stack.
Let’s implement multiplication too:
eval(["*"|Rest], Stack) ->
[Alpha|[Bravo|Remain]] = Stack,
Sum = Alpha * Bravo,
eval(Rest, [Sum|Remain]);
Not too surprisingly, there’s a lot of repeated code here, since they’re both operations that pop off the stack twice, perform calculation, and push a result on. There’ll be even more repetition as we implement division, subtraction, and various bit-twiddling operations. Let’s put a stop to that.
Popping twice
The first thing we do in any of these binary operations is always going to be popping two entries off the stack. Let’s write pop_two/1
, which takes the Stack
, and returns its first two entries and the remaining stack in a tuple (that we’ll immediately unpack).
pop_two(Stack) ->
[Alpha|[Bravo|Remain]] = Stack,
{Alpha, Bravo, Remain}.
And rewrite addition and multiplication to use it:
eval(["+"|Rest], Stack) ->
{Alpha, Bravo, Remain} = pop_two(Stack),
Sum = Alpha + Bravo,
eval(Rest, [Sum|Remain]);
eval(["*"|Rest], Stack) ->
[Alpha|[Bravo|Remain]] = Stack,
Sum = Alpha * Bravo,
eval(Rest, [Sum|Remain]);
Actually, let’s not. Doing it this way does clear up some possibly confusing syntax; but what we really want is a way to get rid of even more repeated code, such as the first and last lines of the function.
eval(["+"|Rest], Stack) ->
Addition = fun (A,B) ->
A+B
end,
binary_op(Addition, Stack, Rest);
eval(["*"|Rest], Stack) ->
binary_op(fun (A,B) -> A*B end, Stack, Rest);
Now this is functional programming! We can just say that Addition
is a function with an arity of 2 and that returns the sum of its arguments, and call a hypothetical binary_op/3
that handles all the popping, pushing, and recurring. We don’t even have to assign the fun
for the operation to a variable, just use it in place (see the multiplication operation).
binary_op(Fun, Stack, Rest) ->
[Alpha|[Bravo|Remain]] = Stack,
case Fun(Alpha, Bravo) of
{op_fail, Desc} ->
describe_error(Desc, Rest, Stack);
Result ->
eval(Rest, [Result|Remain])
end.
This is a bit more complicated than strictly necessary, but this lets the fun
we’re called with throw an error message if necessary (for example, if we have operations that can only operate on integers) by returning a 2-tuple with the op_fail
symbol in the first position, and a Desc
ription of the error in the second.
Looking Ahead
I haven’t explored any operations beyond addition and multiplication in this post because their implementation should be self-evident.
I also haven’t gone into any serious programming language features like storing and retrieving values, subprocedures, conditionals, and so on because this is just a calculator project (but rest assured those are coming).