Perl Automata

Krishnamurthi presents a "short, non-trivial example that implements a construct not already found in mainstream languages," the use of Scheme macros to implement finite state automata. This construct is evaluated according to four use cases:

  • cosmetics
  • binding constructs
  • control operators
  • data language

Perl provides two facilities for providing similar additions to the language. One is its source code filters. This filter system provides the opportunity to transform Perl code immediatly prior to its parsing, using functions written in Perl, C, or another language. Unfortunately, using this facility to implement automata would be:

  1. Hard. Perl filters provide raw access to the unparsed text of the Perl code to be filtered.
  2. Uninstructive. It would be a relatively simple source-to-source transformation.

The other facility is Perl's own linguistic flexibility, which simultaneously makes the implementation of automata easier and more fun.

Attempt 1

There are a number of questions to be examined before attempting to implement a meta-program, in any language. The first is cosmetic: what should the automata look like? The second is more conceptual: how do they work, including what limitations are placed on the new construct and what openings it provides.

Consider an approach based on Perl code implementing the state transitions. The code implementing each transition would need to accept the next input item, and would need to inform the state machine of the next transition. The following code implements the c(a|d)*r machine Krishnamurthi describes:

my $a = automaton {
    'init' => rule {
        my ($h) = @_;
        if ($h eq 'c') {
            return 'more';
        }
    },
    'more' => rule {
        my ($h) = @_;
        if ($h eq 'a') {
            return 'more';
        } elsif ($h eq 'd') {
            return 'more';
        } elsif ($h eq 'r') {
            return 'end';
        }
    },
    'end' => rule {
        return acpt;
    },
};

$a->('init', @_);

The automaton is built of three transitions: init, more, and end. init accepts a 'c' (Perl does not provide symbols; character strings seem to be the Perlish translation) and indicates a next transition of more. Likewise, more accepts 'a', 'd', or 'r'. end accepts no input, but indicates that the machine accepts if it is at the end of the input. ('accept' in Perl is already taken; 'acpt' was chosen for the new pseudo-keyword.) The blocks following the pseudo-keyword rule are Perl code---there is no restriction on them other than that they return the name of the next transition if there is one. The return value of the automaton pseudo-keyword is a Perl code reference taking two arguments: the initial transition, and the list of inputs.

This automaton does not exactly match Krishnamurthi's: it does not bind a name to the machine, instead returning the machine to be assigned to a variable. For one thing, it could bind the name, but the symbol table manipulations are even uglier the rest of this.

Note the overloading of braces to structure the code. The 'argument' to automaton is a hash reference, mapping transition names to rules, while the argument to rule is a code reference, without the normal use of 'sub'. These two pseudo-keywords, and acpt are implemented by the following:

sub automaton {
    my ($rules) = @_;
        return sub {
            my $start = shift;
            if (exists $rules->{$start}) {
                my ($next, @tail) = $rules->{$start}->($rules, @_);
                if ($next) {
                    return 1;
                } else {
                    return 0;
                }
            } else {
                die 'start symbol does not exist';
            }
        };
}

sub rule (&) {
    my ($code) = @_;
    return sub {
        my ($automaton, $h, @t) = @_;
        my $next = $code->($h);
        if ($next && exists $automaton->{$next}) {
            return $automaton->{$next}->($automaton, @t);
        } elsif ($next && (!defined $h)) {
            return ($next);
        } else {
            return undef;
        }
    };
}

sub acpt { return 'SUCCEED'; }

rule is prototyped as accepting a code block, which provides the interface for the code transitions. automaton, on the other hand, simply takes the hash reference as its argument.

Note that the subroutines implementing rules are mutually recursive, while Perl does not normally provide tail-call optimization. This would, in Krisnamurthi's examination, be a fatal flaw. In Perl, this is fixable, however: trade the return on the line calling the next transition for the goto keyword, which manually optimizes the tail call. Unfortunatly, according to Mark Jason Dominus, this is not truely an optimization, since the goto slows down the computation. It would make the space taken by the machine constant.

Comments

The use of subroutine prototypes provides a good deal of flexibility to Perl's syntax. In fact, that is about all it is good for.

I have used an approach similar to this to implement a command-line handling/usage message interface in Perl. Coupling the simple modifications to Perl's syntax with higher-order functions (currying, in that case) provided a concise, clear, and straightforward alternative to what is usually a tedious and frequently error-ridden process. It also provides the possibility of handling GUI or web interfaces without changing the application code, although that has not been done. Flexibility is like portibility: there is no flexible code, only code that has been, well, flexed.

Attempt 2

Suppose I revisit my original decision to implement the rules using Perl code:

my $a = automaton {
    'init' => rule {
        'c' => 'more',
    },
    'more' => rule {
        'a' => 'more',
        'd' => 'more',
        'r' => 'end',
    },
    'end' => rule { acpt },
};

$a->('init', @_);

This automaton is significantly less useful than the previous version: it only allows the machine to succeed or fail. However, the transitions are clearer, only relating the input with the resulting next transition.

The '=>' operator provides a multi-level pun here. Itself, it is a synonym for a comma, to help describe hashes. In the rule, "'c' => 'more'", it is also acting as the arrow of a state transition.

The code that implements the automaton, rule, and acpt is:

sub automaton {
    my ($rules) = @_;
    return sub {
        my $next = shift;
        while (1) {
            my $h = shift @_;
            if (exists $rules->{$next}) {
                $next = $rules->{$next}->($h);
            } else {
                last;
            }
        }
        if (@_ || !$next) {
            return 0;
        } elsif ($next eq 'succeed') {
            return 1;
        }
    }
}

sub rule ($) {
    my ($transitions) = @_;
    return sub {
        my ($h) = @_;
        if ((!defined $h) && (! %{$transitions})) {
            return 'succeed';
        } elsif (exists $transitions->{$h}) {
            return $transitions->{$h};
        } else {
            return undef;
        }
    };
}

sub acpt { return; }

This code is much more of an interpreter than the previous version. Instead of using Perl code references, it uses the automaton as a data structure to walk---the use of code references in rule and automaton could likely be replaced, and they occupy a lesser role, both conceptually and syntactically.

Conclusion

Both of these techniques provide better cosmetic access to a solution technique, as well as a new control structure embodying the technique. While I did not address any symbol binding, Perl would allow me to do so, if i felt so inclined. A key part of both approaches here is the use of Perl's syntactical support for hash tables, as well as the use of puns to overload the syntactic appearance of the automata.

Perl is a horrific, Frankensteinian conglomerate of a programming language, with more features than any sane human being should have to keep in their head at any one time. This can be exploited to good effect. It can also be the source of more than a few bugs.

gloria i ad inferni
faciamus opus

Return to Top | About this site...
Last edited Sat Aug 8 03:29:10 2009.
Copyright © 2005-2016 Tommy M. McGuire