试玩了下parsec

fleuria » 26 Apr 2009

上个星期翻的那篇文章里貌似提了下parsec,照着文档试玩了下。之前貌似玩过ruby的那个代码生成工具racc,它的名气不是很大而且文档不怎么全,感觉不是很爽。写了个helloworld就再也没碰。再写个parsec的helloworld。

语法分析的helloword就是表达式计算了。parsec跟*acc貌似不怎么像。不过无非都是上下文无关文法嘛,这个有名的ebnf:

expr     ::=   expr  ’+’   term  |  term
term     ::=   term  ’*’   factor   |  factor
factor   ::=   ’(’  expr   ’)’  |  digit+

digit    ::=   ’0’  |  ’1’  |  ...  |  ’9’

葫芦画瓢地写出parsec版:

import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellDef)

do_parse p input
    = case (parse p "" input) of
        Left err -> do {
            putStr "parse error at:";
            print err;
        }
        Right x -> do {
            print x;
        }

expr :: Parser Int
expr = do {
    a < - expr;
    char '+';
    b <- term;
    return (a + b);
} <|> term

term :: Parser Int
term = do {
    a < - term;
    char '*';
    b <- factor;
    return (a*b);
}

factor :: Parser Int
factor = do {
    char '(';
    a <- expr;
    char ')';
    return a;
} <|> number

number = do {
    a < - many1 digit;
    return (read a);
}

parsec, Parser Combinators嘛。当然跟*acc那套不一样了。函数在这里就都是个组合子,可以像零件那样方便地组合。

do_parse这个函数取两个参数,一个是parser,一个是表示表达式的字符串。haskell的do-notation除了允许像python那样的缩进风格之外,还有这里用的c-style。parsec就用monad来表示sequence,< |>来表示choice,某种意义上讲,貌似只要它俩就足够构造出复杂的parser了。咋一看可能要比*acc那伪bnf的观感麻烦得多,但别忘了组合子的优势:可以方便地组合。用几个简单的组合子可以方便地构造出复杂的组合子,复杂的组合子继续组合成更复杂的组合子,而使用起来可是极为简洁。

嗯,废话真多。 进ghci测试一下

Main> do_parse expr "1+1"
** Exception: stack overflow
查手册,发现这么一句“Unfortunately, left-recursive grammars can not be specified directly in a combinator library. If you accidently write a left recursive program, the parser will go into an infinite loop! ”

哑然。 组合子还是函数嘛,想想在c中要是定义这样的函数会怎样。

Paser expr(){
    a = expr();
    ...
}

不过“However, every left-recursive grammar can be rewritten to a non-left- recursive grammar. The library provides combinators which do this automatically for you (chainl and chainl1). ”

嗯。手册里貌似写了个表达式计算的例子,就直接用了Parsec内置的ParsecExpr:

import   ParsecExpr

expr      ::  Parser   Integer
expr      =  buildExpressionParser      table   factor
          < ?>  "expression"

table     =  [[op  "*"  (*)  AssocLeft ,   op  "/"  div  AssocLeft]
             ,[op  "+"  (+)  AssocLeft,    op  "-"  (-)  AssocLeft]
             ]
          where
             op  s f  assoc
                =  Infix   (do{  string   s;  return  f})   assoc

factor    =  do{  char  ’(’
                ; x  < - expr
                ; char  ’)’
                ; return   x
               }
          <|>  number
          < ?>  "simple   expression"

number    ::  Parser   Integer
number    =  do{  ds < -  many1   digit
                ; return   (read   ds)
               }
            "number"

呃,运算符的优先级,左结合都考虑了。不过这样也太没意思了,该我写的都让它给内置了,不过那句话不是提到有个chainl1么,就用chainl1重写:

module Main where

import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellDef)

main = do {
    input < - getLine;
    do_parse expr input;
}

do_parse p input
    = case (parse p "" input) of
        Left err -> do {
            putStr "parse error at:";
            print err;
        }
        Right x -> do {
            print x;
        }

lexer = P.makeTokenParser haskellDef
parens = P.parens lexer
symbol = P.symbol lexer
naturalOrFloat = P.naturalOrFloat lexer

expr = term `chainl1` addop
term = factor `chainl1` mulop
factor = parens expr < |> number
number = do {
    norf < - naturalOrFloat;
    case norf of
        Left n -> return (read $ show n);  --so sucks
        Right f -> return f;
}

mulop = do { symbol "*" ; return (*); } < |> do { symbol "/" ; return (/); }
addop = do { symbol "+" ; return (+); } < |> do { symbol "-" ; return (-); }

用到了ParsecToken。它就直接使用了haskell的token定义,也就是说,你可以在这个程序里使用haskell的注释

*Main> do_parse expr "1+{-- it's a comment --}(1/2)"
1.5
hosted on github, and powered by jekyll. (rss)