CHAPTER EIGHT

Macro Processing

The ultimate goal of this chapter is a macro processor along the lines of m4. The macro processor is essentially a simple programming language specialized for textual replacement. Before introducing that complexity and following K&P, I will begin with a simpler case: basic text replacement.

Simple text replacement

The define program reads from its standard input, recording (and removing) macro definitions and replacing uses of the macros with their expansions.

PROGRAM

  define - expand string definitions

USAGE

  define

FUNCTION

  define reads its input, looking for macro definitions of the form

    define(ident,string)

  and writes its output with each subsequent instance of the
  identifier ident replaced by the sequence of characters string.
  string must be balanced in parentheses.  The text of each definition
  proper results in no output text.  Each replacement string is
  rescanned for further possible replacements, permitting multi-level
  definitions.

EXAMPLE

    define
    define(ENDFILE,(-1))
    define(DONE,ENDFILE)
      if (getit(line) = DONE) then
        putit(sumline);
    <ENDFILE>

  produces

    if (getit(line) = (-1)) then
      putit(sumline);

BUGS

  A recursive definition such as define(x,x) will cause an infinite
  loop when x is invoked.

Like most of the other programs in the latter part of Software Tools in Haskell, this program uses a state record. In this case it contains a symbol table, stable, and a pushback buffer, pbuf.

Note that the two components, the symbol table and the pushback-buffer-supported input stream were developed independently. The program code here consists entirely of

  1. gluing the two components together correctly,
  2. building the application's logic.
data DefineState = DS { stable :: STable, pbuf :: PBBuffer }

emptyDS = DS { stable = initSTable, pbuf = emptypbbuffer }

The symbol table used by define contains simple text-replacement macros, with one exception, "define", itself. Each definition has the following format:

define(macro,definition)

This text establishes that "definition" will replace "macro" on subsequent input lines, by introducing a value (Macro "definition") into the table under the key "macro". The define(...) itself will be removed from the text.

Furthermore, in order to support redefining "define", it has an entry in the symbol table with type Define. As a result,

define(define,)

will

  1. disappear, and
  2. make any subsequent uses of "define" disappear.
data Defn = Define | Macro String

The symbol table itself, STable, is a Data.Map.Map with String keys and Defn values.

type STable = M.Map String Defn

Initially, the table only contains the entry for "define".

initSTable :: STable
initSTable = M.singleton "define" Define

A lookup is a simple Data.Map.lookup. The Monad m constraint and result of (m Defn) is a bit of overkill; it can be any monad, but m will normally (and always, this case) be Maybe.

stlookup :: (Monad m) => String -> STable -> m Defn
stlookup = M.lookup

Finally, installing a new value in the table means simply inserting a new, suitable Defn into the Map.

stinstall :: String -> String -> STable -> STable
stinstall key value table = M.insert key (Macro value) table

Following the original, the input stream is broken into tokens made of

  1. a string of alphanumeric characters, or
  2. a non-alphanumeric character.
data DToken = C Char | A String deriving Show

Since collecting all of an alphanumeric string will involve reading the first following non-alphanumeric character, the input stream uses a "pushback buffer", which allows the currently-unneeded character to be effectively pushed back onto the input.

This facility is extended to allow any number of characters to be pushed back, further allowing the macro expansion to re-scan expanded text in a simple way: the expansion of a macro is pushed back onto the input and then rescanned for further expansions.

A PBBuffer, or a list of Char, represents the pushback buffer. An empty PBBuffer is the empty list, and the putback operation is concatenation.

type PBBuffer = [Char]

emptypbbuffer = []

putback :: String -> PBBuffer -> PBBuffer
putback = (++)

The fundamental operation is getpbc; it checks the pushback buffer and returns any available character, or otherwise it reads an input character. (The end of file is still represented as Nothing.)

gettok gets an existing pushback buffer, and uses an accumulator to find the end of an alphanumeric string. It returns (in the IO monad) a possible DToken and a new pushback buffer.

gettok :: PBBuffer -> IO (Maybe DToken, PBBuffer)
gettok pb = gettok' pb []
    where
    gettok' pb a =
        do
        (mc,pb') <- getpbc pb
        case (mc,a)
             of (Just c,_) | isAlphaNum c -> gettok' pb' (c:a)
                (Just c,[])  -> return (Just $ C c, pb')
                (Just c,_)   -> return (Just $ A (reverse a), putback [c] pb')
                (Nothing,[]) -> return (Nothing, pb')
                (Nothing,_)  -> return (Just $ A (reverse a), pb')

    getpbc :: PBBuffer -> IO (Maybe Char, PBBuffer)
    getpbc (c:pb) = return (Just c,pb)
    getpbc [] = do { c <- getc; return (c,[]) }

Back to define, I am again using a StateT monad transformer to manage the DefineState/IO relationship. As before, I need a number of steps in the combined monad to provide access to the components of the state.

type DefineStep a = StateT DefineState IO a

getstable :: DefineStep STable
getstable = (liftM stable) get

putstable :: STable -> DefineStep ()
putstable st = do { ds <- get; put $ ds { stable = st } }

getpbuf :: DefineStep PBBuffer
getpbuf = (liftM pbuf) get

putpbuf :: PBBuffer -> DefineStep ()
putpbuf pb = do { ds <- get; put $ ds { pbuf = pb } }

The following functions are basic lower-level operations "lifted" into the DefineStep monad. The first is messageDS, which writes a string message to the output.

gettokDS is gettok lifted into the DefineStep monad and modified to simplify its operation there. First, pbuf is recovered from the current state, then gettok (lifted, to become a DefineStep computation) is called to get the possible next token and the updated pushback buffer. The buffer is then used to set the next state and the possible token is returned.

putbackDS is a similar lifting of the pushback buffer's putback operation. In this case, a String is pushed onto the current buffer and a new state is generated.

installDS is lifting of the symbol table install function, manipulating the symbol table in the monad state.

lookupDS is the lifting of stlookup, attempting to find a definition for a string in the symbol table.

messageDS :: String -> DefineStep ()
messageDS s = lift (message s)

gettokDS :: DefineStep (Maybe DToken)
gettokDS = do
           pb <- getpbuf
           (mt,pb') <- lift (gettok pb)
           putpbuf pb'
           return mt

putbackDS :: String -> DefineStep ()
putbackDS s = do { pb <- getpbuf; putpbuf (putback s pb) }

installDS :: String -> String -> DefineStep ()
installDS k v = liftM (stinstall k v) getstable >>= putstable

lookupDS :: String -> DefineStep (Maybe Defn)
lookupDS s = (liftM $ stlookup s) getstable

doDefn handles the result of looking up a string in the symbol table. (The original string is also passed to doDefn, for reasons that will become clear.)

doDefn has three cases:

  1. Nothing: the string was not found in the symbol table. In this case, the original string is returned.
  2. Just a macro expansion: the expansion text is pushed back onto the input buffer and an empty string is returned. Thus, the expansion text will be re-examined, and the final result will be printed when no further expansions are found.
  3. Just a definition: the next collection of input tokens is read, looking for the "(a,b)" format. Then, a is inserted into the symbol table with an expansion of b and an empty string is returned.

This function is made complex by the parsing of the "(a,b)" format.

doDefn :: String -> Maybe Defn -> DefineStep String
doDefn s Nothing          = return s
doDefn s (Just (Macro m)) = do { putbackDS m; return "" }
doDefn s (Just Define)    = do
                            getchartok '('
                            m <- gettokDS
                            getchartok ','
                            d <- getparen 1 ""
                            case m of Just (A s) -> installDS s d
                                      _          -> messageDS "bad definition"
                            return ""
    where
    getchartok :: Char -> DefineStep ()
    getchartok c = do
                   tok <- gettokDS
                   case tok of Just (C c) -> return ()
                               _ -> messageDS ("missing " ++ [c])
    getparen :: Int -> String -> DefineStep String
    getparen n a = if n == 0
                   then return (init a) -- skip last ')'
                   else do
                        tok <- gettokDS
                        case tok
                             of Just (C ')') -> getparen (n-1) (a++")")
                                Just (C '(') -> getparen (n+1) (a++"(")
                                Just (C c)   -> getparen n (a++[c])
                                Just (A s)   -> getparen n (a++s)
                                Nothing -> do
                                           messageDS "missing )"
                                           return ""

processToken is the fundamental operation block of define, combining gettokDS, lookupDS, and doDefn, along with handling both the end-of-file and non-alphanumeric tokens.

processToken :: DefineStep (Maybe String)
processToken = do
               t <- gettokDS
               case t of Nothing    -> return Nothing
                         Just (C c) -> return (Just [c])
                         Just (A a) -> do
                                       defn <- lookupDS a
                                       str <- doDefn a defn
                                       return (Just str)

Finally, define's main loop repeatedly gets and processes a token.

main = loop emptyDS
    where
    loop ds = do
              (ms,ds') <- runStateT processToken ds
              case ms of Nothing -> return ()
                         Just s -> do { putStr s; loop ds' }

Macros with arguments

The macro program is significantly more complex than define, because not only does it add facilities for arguments to the macros, it also includes

  • conditional operations,
  • arithmetic evaluation,
  • string manipulation, and
  • the ability to prevent expansion of definitions in the input text, including the ability to change the quotation characters.

Because of its facilities, it has the power of a simple programming language.

Further, for macro I have a significantly different implementation style. In this program, I have written multiple individual components as monad transformers. This style allows the individual components to be written and tested separately, allows them to use underlying monadic operations, and allows the final program to be constructed as an assemblage of the components. The style differs from that of define in that in define, the components were built on top of lower levels; in this case the levels are implemented in isolation.

PROGRAM

  macro  expand string definitions, with arguments

USAGE

  macro

FUNCTION

  macro reads its input, looking for macro definitions of the form

    define(ident,string)

  and writes its output with each subsequent instance of the
  identifier ident replaced by the arbitrary sequence of characters
  string.

  Within a replacement string, any dollar sign $ followed by a digit
  is replaced by an argument corresponding to that digit.  Arguments
  are written as a parenthesized list of strings following an instance
  of the identifier, e.g.,

    ident(arg1,arg2,...)

  So $1 is replaced in the replacement string by arg1, $2 by arg2, and
  so on; $0 is replaced by ident.  Missing arguments are taken as null
  strings; extra arguments are ignored.

  The replacement string in a definition is expanded before the
  definition occurs, except that any sequence of characters between a
  grave ` and a balancing apostrophe ' is taken literally, with the
  grave and apostrophe removed.  Thus, it is possible to make an alias
  for define by writing

    define(def,`define($1,$2)')

  Additional predefined built-ins are:

  ifelse(a,b,c,d) is replaced by the string c if the string a exactly
  matches the string b; otherwise it is replaced by the string d.

  expr(expression) is replaced by the decimal string representation of
  the numeric value of expression.  For correct operation, the
  expression must consist of parenthesis, integer operands written as
  decimal strings, and the operators +, -, *, / (integer division),
  and % (remainder).  Multiplication and division bind tighter than
  addition and subtraction, but parentheses may be used to alter this
  order.

  substr(s,m,n) is replaced by the substring of s starting at location
  m (counting from one) and continuing at most n characters.  If n is
  omitted, it is taken as a very large number; if mi is outside the
  string, the replacement string is null.  m and n may be expressions
  suitable for expr.

  len(s) is replaced by the string representing the length of its
  argument in characters.

  changeq(xy) changes the quote characters to x and y.  changeq()
  changes them back to ` and '.

  Each replacement string is rescanned for further possible
  replacements, permitting multi-level definitions to be expanded to
  final form.

EXAMPLE

  The macro len could be written in terms of the other built-ins as:

    define(`len',`ifelse($1,,0,`expr(1+len(substr($1,2)))')')

BUGS

  A recursive definition of the form define(x,x) will cause an
  infinite loop.  Expression evaluation is fragile.  There is no unary
  minus.  It is unwise to use parentheses as quote characters.

Like define, macro uses a pushback buffer to allow input characters (and processed strings) to be re-inserted into the input stream and re-examined at a later step. Also like define, macro uses a list to represent the pushback buffer and an empty list to be the initial pushback buffer state.

type PushbackBuf = String

emptyPb :: PushbackBuf
emptyPb = []

However, unlike define, the pushback operations for macro are written monadically. A PushbackStepT (a monad transformer; the m type parameter is a base monad) is a StateT monad using a PushbackBuf as a state:

type PushbackStepT m a = StateT PushbackBuf m a

The pushback operation is the equivalent of define's putback function except that it uses the string argument to modify the state rather than using it to transform a buffer into another buffer.

pushback :: Monad m => String -> PushbackStepT m ()
pushback s = modify (s ++)

The pull operation is a PushbackStepT which is roughly equivalent to the getpbc function from define (internal to gettok). It differs in that getpbc is built directly on the IO monad base while pull is parameterized by the lower-layer operation. This parameter isolates it from the lower layer. Further, pull is an operation of the PushbackStepT monad; where getpbc threads the pushback buffer directly, pull uses the StateT monad to implicitly thread the pushback buffer through its (and overlying layers') processing.

The operation parameter (suggestively called getc) must itself be an operation of the PushbackStepT monad. However, this can be accomplished by lifting the lower layer's operation (lift is the operation that makes a monad transformer). The final section of the macro implementation shows who the layers are assembled into the final application, including the use of pull and the lifting of the ubiquitous getc :: IO (Maybe Char) operation into a PushbackStepT IO (Maybe Char) operation.

pull :: Monad m => (PushbackStepT m (Maybe Char)) -> PushbackStepT m (Maybe Char)
pull getc = do buf <- get
               case buf of [] -> getc
                           (c:cs) -> do { put cs; return $ Just c }

The final necessary part of a pushback buffer component is the function that transforms a PushbackStepT m monadic computation into a computation of the base monad, m. runPushbackT accepts a PushbackStepT computation which returns a value of type t, plus an initial pushback buffer, and returns a computation in the base monad m that returns the value of type t and an updated pushback buffer. Since PushbackStepT is implemented by a StateT monad transformer, runPushbackT is implemented by runStateT.

runPushbackT :: Monad m => PushbackStepT m t -> PushbackBuf -> m (t, PushbackBuf)
runPushbackT = runStateT

The next component of the macro processor is a seemingly minor part which ultimately plays a major role. The macro processor provides the capability to quote text in the input, preventing it from being immediately expanded. Further, it provides the ability to change the quotation characters used to mark quoted text.

A QuoteState is the pair of characters being used to enclose quoted text. (Two, one for opening and one for closing, are needed because quotations can be nested---an outer quotation can be expanded during macro processing, for example, while leaving an inner quotation untouched.

The initial quotation characters are left and right single quotes.

newtype QuoteState = Qs (Char,Char) deriving Show

initialQuotes :: QuoteState
initialQuotes = Qs ('`', '\'')

In order to thread the quotation state through define's processing, I am creating another monad transformer layer: QuoteStepT. QuoteStepT is based on another StateT transformer with the QuoteState as the state.

type QuoteStepT m a = StateT QuoteState m a

Like the PushbackStepT monad transformer, a function for unwrapping the QuoteStepT monadic action is needed, and that function can be provided by runStateT.

runQuoteStepT :: Monad m => QuoteStepT m t -> QuoteState -> m (t, QuoteState)
runQuoteStepT = runStateT

The two basic actions in the QuoteStepT monad are retrieving the current quotation characters and updating those characters in the state.

getQuotes :: Monad m => QuoteStepT m (Char, Char)
getQuotes = do { Qs opcl <- get; return opcl }

setQuotes :: Monad m => Char -> Char -> QuoteStepT m ()
setQuotes open close = modify $ const $ Qs (open,close)

One of the benefits of writing the program this way is that the components can be tested in isolation. Wile I have not done serious testing, it is very convenient to call the functions to ensure they are behaving the way I expected.

By loading the file in an interactive session, I can use runQuoteStepT to set up simple scenarios to ensure the output is what I think it should be:

*Main> runQuoteStepT (setQuotes 'a' 'b') (Qs ('<','>'))
((),Qs ('a','b'))
*Main> runQuoteStepT (setQuotes 'a' 'b' >> getQuotes) (Qs ('<','>'))
(('a','b'),Qs ('a','b'))

In the first example, setQuotes updates the state with new quotation characters and returns unit. setQuotes bound to getQuotes updates the characters, then returns the new pair.

With a pushback buffer and quotation state handled, the next stage of the implementation is parsing the input stream. That is the next computation in the QuoteStepT monad.

But first, I need to define what a Token is, and provide a quick way to get from a token back to the external representation of a token.

The Token type has seven possible shapes:

  • C: a miscellaneous character that does not fit into any other form,
  • T: an alphanumeric string suitable (although not necessarily used as) the identifier of a macro,
  • Oq and Cq: an open quotation or close quotation character (since the quotation characters can change, these two include the actual characters seen,
  • Comma: a comma as used to separate arguments in a macro use, and
  • Open and Close: the opening and closing parentheses of a macro use.
data Token = C Char
           | T String
           | Oq Char
           | Cq Char
           | Comma
           | Open
           | Close
             deriving Show

tokenToString (C c)  = [c]
tokenToString (T s)  = s
tokenToString (Oq c) = [c]
tokenToString (Cq c) = [c]
tokenToString Comma  = ","
tokenToString Open   = "("
tokenToString Close  = ")"

gettoken is a QuoteStepT operation which requires a "pull" argument (again, suggestively named) used to read the next character from the input and a "pushback" argument used to return unnecessary input back to the input stream, and which produces either Nothing, in the case of the end of the input, or Just a Token.

  • An alphanumeric string causes gettoken to iterate in order to collect the entire alphanumeric value (and returning the terminating, non-alphanumeric character to the input stream), and produces a T value.
  • A character matching the current open quotation value produces an Oq, while
  • a character matching the current close quotation value produces a Cq.
  • A comma, open or close parenthesis produce a Comma, Open, or Close value, respectively.
  • And finally, any other character results in a C value.
gettoken :: Monad m =>
              QuoteStepT m (Maybe Char) ->
                (String -> QuoteStepT m ()) ->
                    QuoteStepT m (Maybe Token)
gettoken pull pushback = token' []
    where token' acc = do (open,close) <- getQuotes
                          ch <- pull
                          case (ch, acc) of
                            (Just c, _)  | isAlphaNum c -> token' (c:acc)
                            (Just c, []) | c == open    -> return $ Just $ Oq c
                            (Just c, []) | c == close   -> return $ Just $ Cq c
                            (Just ',', []) -> return $ Just Comma
                            (Just '(', []) -> return $ Just Open
                            (Just ')', []) -> return $ Just Close
                            (Just c, [])   -> return $ Just $ C c
                            (Just c, _)    -> do pushback [c]
                                                 return $ Just $ T (reverse acc)
                            (Nothing, [])  -> return Nothing
                            (Nothing, _)   -> return $ Just $ T (reverse acc)

In the final result, gettoken is the reason that the QuoteStepT monadic layer is responsible for tokenizing the input stream.

The next layer, however, begins to get hairy.


A ProcessorMapStepT is a monadic layer that contains the logic of the macro operations. Most of the Processors (ifelse, expr, substr, and len), as I am calling the operations, are relatively simple. changeq is a bit different, in that it changes the quotation characters. But the processors that cause the majority of the problem is define, the operation that defines new processors.

The design steps I went through to get to this state were:

  • ifelse: ifelse and most of the other fundamental processors are simple; they require a list of strings representing the arguments to the built-in. The basic code for ifelse is:

    -- ifelse args = if a == b then c else d
    --   where
    --     (a:b:c:d:_) = args ++ (repeat "")
    

    The other fundamental processors are similar.

  • changeq: The changeq processor uses the setQuotes operation of the underlying monad to change the quotation characters used by the tokenizer. There are a few corollaries: the processors need to be parameterized with the setQuotes (or "setQ) operation and the processor operation needs to be monadic. changeq becomes:

    -- changeq setQ args = do case a of (o:c:_) -> setQ o c
    --                                  _ -> setQ '`' '\''
    --                        return ""
    --   where
    --     (a:_) = args ++ (repeat "")
    

    The "args ++ (repeat "")" idiom is used to simplify the implementation as well as fill out the documented behavior: missing arguments are replaced by the empty string.

    ifelse then becomes:

    -- ifelse args = return $ if a == b then c else d
    --   where
    --     (a:b:c:d:_) = args ++ (repeat "")
    
  • define: Now, it would be easy enough to put the previously described processors into a Map under String keys, but the define processor modifies that mapping. Further, that mapping should be threaded through this layer's computation. As a result, I came up with the following types:

    type Processor base = Char -> Char -> base ()
                          -> String
                          -> [String]
                          -> ProcessorMapStepT base (Either String String)
    

    A processor is a monad transformer with a base monad that has the setQuotes operation; it further accepts a String (which is the name of the macro; that becomes useful in a bit) and a list of Strings (the arguments of the macro). The resulting computation is a ProcessorMapStepT, a step in a computation where a ProcessorMap is threaded.

    A ProcessorMap is:

    newtype ProcessorMap base = Pm (M.Map String (Processor base))
    
    instance Show (ProcessorMap a) where
        show (Pm m) = "Pm " ++ show (M.keys m)
    

    The difficulty is in the definition of a ProcessorMapStepT. I would like to use something like:

    -- type ProcessorMapStepT base a = StateT (ProcessorMap base) base a
    

    but I cannot because the type is recursive (and recursively bemused). Instead I need to use newtype (or data) so that the type constructor interrupts the recursion:

    newtype ProcessorMapStepT base a =
      Pmst {
             runProcessorMapStepT ::
               (ProcessorMap base) -> base (a, (ProcessorMap base))
           }
    

    Unfortunately, newtype and the monad transformers do not really mix (although an extension to ghc allows the following steps to be automatically derived). As a result, I used cut-n-paste code-reuse to steal the basic ideas of the StateT implementation.

    First, the above definition of ProcessorMapStepT uses a record idiom to simultaneously define:

    • The ProcessorMapStepT type,
    • The Pmst constructor, which turns a function with the type ((ProcessorMap base) -> base (a, (ProcessorMap base))) into a value of type ProcessorMapStepT base a, and finally
    • a function runProcessorMapStepT which can be applied to a ProcessorMapStepT value to uncover the underlying function; this function can be given an initial ProcessorMap to produce a computation in the base monad returning some value of type a and a terminal ProcessorMap.

    The downside of using newtype (without the ghc extensions) is that I need to declare my new type to be a monad and a monad transformer manually. Fortunately, I can adapt these too from StateT.

    As a first, although useless, exercise, I provide an instance of Functor:

    pmstFmap :: Monad m
              => (a -> b)
                 -> ProcessorMapStepT m a
                 -> ProcessorMapStepT m b
    pmstFmap f step = Pmst $ \pm -> do (a, pm') <- runProcessorMapStepT step pm
                                       return (f a, pm')
    
    instance (Monad m) => Functor (ProcessorMapStepT m) where
        fmap = pmstFmap
    

    fmap in this case applies a function f to the value of a ProcessorMapStepT, returning the result. The function wrapped in Pmst first uses runProcessorMapStepT to run the step, then returns the result of applying f and the new ProcessorMap state.

    Next, I provide an instance of Monad:

    pmstReturn :: Monad m => a -> ProcessorMapStepT m a
    pmstReturn a = Pmst $ \pm -> return (a, pm)
    
    pmstBind :: Monad m
                => (ProcessorMapStepT m a)
                   -> (a -> (ProcessorMapStepT m b))
                   -> (ProcessorMapStepT m b)
    pmstBind m k = Pmst $ \pm -> do (a, pm') <- runProcessorMapStepT m pm
                                    runProcessorMapStepT (k a) pm'
    
    instance Monad m => Monad (ProcessorMapStepT m) where
        return = pmstReturn
        (>>=)  = pmstBind
    

    return simply wraps the appropriate value in the Pmst constructor and function. bind uses runProcessorMapStepT to get to the left side's computation in the underlying monad, collects the results, then uses runProcessorMapStepT again to get to the underlying computation of the right side.

    Finally, I provide an instance of MonadTrans (since lift comes in handy):

    pmstLift :: (Monad m) => m a -> ProcessorMapStepT m a
    pmstLift op = Pmst $ \pm -> do { a <- op; return (a, pm) }
    
    instance MonadTrans ProcessorMapStepT where
        lift = pmstLift
    

    lift here calls the lifted operation, threading the ProcessorMap through the computation ensuring that the lifted op never needs to see nor can change it.

    I would like to provide an instance of MonadState, but the ProcessorMapStepT contains only a ProcessorMap, and the general type signatures for get and put cannot be satisfied. Instead, I simply provided two functions:

    pmstGet :: (Monad m) => ProcessorMapStepT m (ProcessorMap m)
    pmstGet = Pmst $ \pm -> return (pm,pm)
    
    pmstPut :: (Monad m) => ProcessorMap m -> ProcessorMapStepT m ()
    pmstPut pm = Pmst $ const $ return ((),pm)
    

With the preliminaries out of the way (it is rather amazing what nearly being able to use StateT can involve), I can now begin to write the actual processors.

Two processors are rather special; they are used when no other processor can be identified. defaultProcessor is used when a string is seen such as:

notAMacro(some args)

In this case, if notAMacro is not found in the current ProcessorMap, defaultProcessor is used to reconstruct the original text including the parentheses and commas.

On the other hand, a bare word is also potentially a macro invocation:

notAMacro

bareDefaultProcessor is used in that case; the difference is between the cases:

  • text(arg1,...) - one or more arguments,
  • text() - a parenthesized list of arguments that happens to be empty, and
  • text - a macro invocation with no arguments.

Previously, when I presented the definition of a Processor, I showed a return type of Either String String. The reason is that the result of a macro expansion is pushed back onto the input to be read again. However, some results are known not to need reevaluation; the default processors do not change the input, so no further evaluation is useful. To indicate that the results need to be pushed back onto the input, the result string is tagged Right; to indicate that they do not, the string is tagged Left.

defaultProcessor :: Monad m => Processor m
defaultProcessor _ name args =
  return $ Left $ name ++ "(" ++ join (intersperse "," args) ++ ")"

bareDefaultProcessor :: Monad m => Processor m
bareDefaultProcessor _ name _ = return $ Left $ name

The next processor is the final version of ifelse; it includes the monadic structure as well as the Right tag.

ifelseProcessor :: Monad m => Processor m
ifelseProcessor _ _ args = lift ifelse
    where
      ifelse = return $ Right $ if a == b then c else d
      (a:b:c:d:_) = args ++ (repeat "")

The expr processor is modestly more complex than ifelse, because it an expression parser and numerical evaluation. expr uses Parsec's buildExpressionParser to parse the expression, then evaluates it to return the Integer. runExpr is used to call Parsec's parse and to handle any parser errors. Finally, exprProcessor is the Processor. It converts the Integer back into a String and returns the result.

expr :: Parser Integer
expr = buildExpressionParser operators simple
    where
      operators = [ [op '*', op '/', op '%'], [op '+', op '-'] ]
      op ch     = Infix (do { char ch; return $ opFunc ch }) AssocLeft
      opFunc ch = case ch of '+' -> (+);  '-' -> (-)
                             '*' -> (*);  '/' -> div
                             '%' -> mod
      simple    = try paren <|> number
      paren     = do { ws; char '('; n <- expr; char ')'; ws; return n }
      number    = do { ws; n <- many digit; ws; return $ stringToInteger n }
      ws        = many space

stringToInteger :: String -> Integer
stringToInteger n = case readDec n of [(n,_)] -> n
                                      _       -> 0

runExpr :: String -> Integer
runExpr str = either left right $ parse expr "" str
    where
      left _ = 0
      right n = n

exprProcessor :: Monad m => Processor m
exprProcessor _ _ args = lift exprP
    where
      exprP = return $ Right $ show $ runExpr a
      (a:_) = args ++ (repeat "")

The next two processors, substr and len, are almost trivial, although substr uses runExpr to parse and evaluate expressions for its arguments.

substrProcessor :: Monad m => Processor m
substrProcessor _ _ args = lift substr
    where
      substr = return $ Right $ if c' == 0 then a' else take c' a'
      (a:b:c:_) = args ++ (repeat "")
      a'        = drop (b' - 1) a
      b'        = fromIntegral $ runExpr b
      c'        = fromIntegral $ runExpr c

lenProcessor :: Monad m => Processor m
lenProcessor _ _ args = lift len
    where
      len = return $ Right $ show $ length a
      (a:_) = args ++ (repeat "")

And, changeqProcessor completes the partial processor described above.

changeqProcessor :: Monad m => Processor m
changeqProcessor setQ _ args = lift changeq
    where
      changeq = do case a of (o:c:_) -> setQ o c
                             _ -> setQ '`' '\''
                   return $ Right ""
      (a:_) = args ++ (repeat "")

replacement :: [String] -> String -> String
replacement args str = replace1 str
    where
      replace1 str = case break (== '$') str of
                       (s, "")    -> s
                       (s, '$':t) -> s ++ replace2 t
      replace2 str = case readDec str of
                      []          -> '$':str
                      [(n,str')]  -> (args !! n) ++ replace1 str'

definedProcessor :: Monad m => String -> Processor m
definedProcessor definition _ name args = defined
    where
      defined = return $ Right $ replacement args' definition
      args' = (name:args) ++ (repeat "")

defineProcessor :: Monad m => Processor m
defineProcessor _ _ args = do (Pm pm) <- pmstGet
                              pmstPut (Pm $ M.insert a (definedProcessor b) pm)
                              return $ Right ""
    where
      (a:b:_) = args ++ (repeat "")

initialProcessorMap :: Monad m => ProcessorMap m
initialProcessorMap = Pm $ M.fromList [
                                       ("ifelse", ifelseProcessor),
                                       ("expr", exprProcessor),
                                       ("substr", substrProcessor),
                                       ("len", lenProcessor),
                                       ("changeq", changeqProcessor),
                                       ("define", defineProcessor)
                                      ]

runProcessor :: Monad m
                => Char -> Char -> m ()
                    -> String
                    -> Maybe [String]
                    -> ProcessorMapStepT m (Either String String)
runProcessor setQ name args =
    do (Pm pm) <- pmstGet
       case args of
         Just args' -> (M.findWithDefault defaultProcessor name pm) setQ name args'
         Nothing    -> (M.findWithDefault bareDefaultProcessor name pm) setQ name []

--------------------------------------------------

parseToken :: (Monad m) =>
              m (Maybe Token)
                  -> (a -> String -> Maybe [String] -> m (Either String String))
                  -> (String -> m ())
                  -> a
                  -> m (Maybe String)
parseToken token runprocessor pushback setq =
    do t <- token
       case t of Just (T s)  -> parseFrame token runprocessor pushback setq s
                 Just (Oq o) -> collectQuote token
                 Just t'     -> return $ Just $ tokenToString t'
                 Nothing     -> return Nothing

parseFrame :: (Monad m) =>
              m (Maybe Token)
                  -> (a -> String -> Maybe [String] -> m (Either String String))
                  -> (String -> m ())
                  -> a
                  -> String
                  -> m (Maybe String)
parseFrame token runprocessor pushback setq name =
    do t <- token
       case t of
         Just Open -> do args <- collectArgs token runprocessor pushback setq [""]
                         res <- runprocessor setq name $ Just args
                         case res of Left s -> return $ Just s
                                     Right s -> do { pushback s; return $ Just "" }
         Just t'   -> do pushback $ tokenToString t'
                         res <- runprocessor setq name Nothing
                         case res of Left s -> return $ Just s
                                     Right s -> do { pushback s; return $ Just "" }
         Nothing   -> return $ Just name

collectArgs :: (Monad m)
               => m (Maybe Token)
                   -> (a -> String -> Maybe [String] -> m (Either String String))
                   -> (String -> m ())
                   -> a
                   -> [String]
                   -> m [String]
collectArgs token runprocessor pushback setq (cur:prev) =
    do t <- token
       case t of Just Comma -> collectArgs token runprocessor pushback setq ("":cur:prev)
                 Just Close -> return $ reverse $ cur:prev
                 Just (T s) -> do Just s' <- parseFrame token runprocessor pushback setq s
                                  collectArgs token runprocessor pushback setq $ (cur ++ s') : prev
                 Just (Oq o) -> do Just s' <- collectQuote token
                                   collectArgs token runprocessor pushback setq $ (cur ++ s') : prev
                 Just t     -> collectArgs token runprocessor pushback setq $ (cur ++ tokenToString t) : prev
                 Nothing    -> return $ reverse $ cur:prev

collectQuote :: Monad m => m (Maybe Token) -> m (Maybe String)
collectQuote token = collect' 0 []
    where
      collect' depth acc =
          do t <- token
             case t of Just (Cq c) -> if depth == 0
                                      then return $ Just $ reverse acc
                                      else collect' (depth-1) (c:acc)
                       Just (Oq o) -> collect' (depth+1) (o:acc)
                       Just (T s)  -> collect'  depth    (reverse s ++ acc)
                       Just (C c)  -> collect'  depth    (c:acc)
                       Just Comma  -> collect'  depth    (',':acc)
                       Just Open   -> collect'  depth    ('(':acc)
                       Just Close  -> collect'  depth    (')':acc)

--------------------------------------------------

type PushbackM     = StateT PushbackBuf IO
type QuoteM        = StateT QuoteState PushbackM
type ProcessorMapM = ProcessorMap QuoteM

type PushbackStep     a = PushbackStepT IO a
type QuoteStep        a = QuoteStepT PushbackM a
type ProcessorMapStep a = ProcessorMapStepT QuoteM a

getcM :: PushbackStep (Maybe Char)
getcM  = lift getc

pullM :: QuoteStep (Maybe Char)
pullM  = lift $ pull getcM

pushbackM :: String -> QuoteStep ()
pushbackM s = lift $ pushback s

pushbackMM :: String -> ProcessorMapStep ()
pushbackMM s = lift $ pushbackM s

gettokenM :: ProcessorMapStep (Maybe Token)
gettokenM  = lift $ gettoken pullM pushbackM

setQuotesM :: Char -> Char -> ProcessorMapStep ()
setQuotesM o c = lift $ setQuotes o c

doparse :: ProcessorMapStep (Maybe String)
doparse  = parseToken gettokenM runProcessor pushbackMM setQuotes

runProcess :: ProcessorMapM -> QuoteStep (Maybe String, ProcessorMapM)
runProcess pm = runProcessorMapStepT doparse pm

runQuote :: ProcessorMapM -> QuoteState
         -> PushbackStep (Maybe String, ProcessorMapM, QuoteState)
runQuote pm qs = do ((ms, pm), qs) <- runQuoteStepT (runProcess pm) qs
                    return (ms,pm,qs)

runPushback :: ProcessorMapM -> QuoteState -> PushbackBuf
            -> IO (Maybe String, ProcessorMapM, QuoteState, PushbackBuf)
runPushback pm qs pb = do ((ms, pm, qs), pb) <- runPushbackT (runQuote pm qs) pb
                          return (ms, pm, qs, pb)

macro :: ProcessorMapM -> QuoteState -> PushbackBuf -> IO ()
macro pm qs pb = do (ms, pm', qs', pb') <- runPushback pm qs pb
                    case ms of Nothing -> return ()
                               Just s' -> do { putStr s'; macro pm' qs' pb' }

main :: IO ()
main = macro initialProcessorMap initialQuotes emptyPb

gloria i ad inferni
faciamus opus

Return to Top | About this site...
Last edited Sat Aug 8 03:29:10 2009.
Copyright © 2005-2016 Tommy M. McGuire