SpecialistOff.NET / Вопросы / Статьи / Фрагменты кода / Резюме / Метки / Помощь / Файлы
НазадМетки: [packrat parsing]; [ll]; [lr];
Packrat parsing is a novel and practical method for implementing linear-time parsers for grammars defined in Top-Down Parsing Language (TDPL). While TDPL was originally created as a formal model for top-down parsers with backtracking capability, this thesis extends TDPL into a powerful general-purpose notation for describing language syntax, providing a compelling alternative to traditional context-free grammars (CFGs). Common syntactic idioms that cannot be represented concisely in a CFG are easily expressed in TDPL, such as longest-match disambiguation and "syntactic predicates," making it possible to describe the complete lexical and grammatical syntax of a practical programming language in a single TDPL grammar.
Packrat parsing is an adaptation of a 30-year-old tabular parsing algorithm that was never put into practice until now. A packrat parser can recognize any string defined by a TDPL grammar in linear time, providing the power and flexibility of a backtracking recursive descent parser without the attendant risk of exponential parse time. A packrat parser can recognize any LL(k) or LR(k) language, as well as many languages requiring unlimited lookahead that cannot be parsed by shift/reduce parsers. Packrat parsing also provides better composition properties than LL/LR parsing, making it more suitable for dynamic or extensible languages. The primary disadvantage of packrat parsing is its storage cost, which is a constant multiple of the total input size rather than being proportional to the nesting depth of the syntactic constructs appearing in the input.
Monadic combinators and lazy evaluation enable elegant and direct implementations of packrat parsers in recent functional programming languages such as Haskell. Three different packrat parsers for the Java language are presented here, demonstrating the construction of packrat parsers in Haskell using primitive pattern matching, using monadic combinators, and by automatic generation from a declarative parser specification. The prototype packrat parser generator developed for the third case itself uses a packrat parser to read its parser specifications, and supports full TDPL notation extended with "semantic predicates," allowing parsing decisions to depend on the semantic values of other syntactic entities. Experimental results show that all of these packrat parsers run reliably in linear time, efficiently support "scannerless" parsing with integrated lexical analysis, and provide the user-friendly error-handling facilities necessary in practical applications.
In PDF or PostScript
The full source code for Pappy, the prototype packrat parser generator described in the thesis, is available for browsing in this directory, or for downloading as a gzipped tar file. A brief breakdown of the source files follows:
Following are complete versions of the example parsers for the trivial arithmetic expression language used in the thesis:
The three complete and working parsers for the Java language, which are described in the paper and used for analysis and comparison purposes, are available here:
The test suite of Java source files used to obtain the experimental results in the thesis are available in this gzipped tar file. All of these Java source files were taken from Cryptix version 3.2.0.
Enjoy!
--- JavaPat.hs --- JavaMonad.hs --- Java.pappy --- Parse.hs --- Pos.hs --- Arith.pappy --- ArithMonad.hs --- ArithLex.hs --- ArithLeft.hs --- ArithPackrat.hs --- ArithRecurse.hs --- Main.hs --- WriteParser.hs --- MemoAnalysis.hs --- SimplifyGrammar.hs --- ReduceGrammar.hs --- ReadGrammar.hs --- thesis.ps
module JavaPat where
import Char
import List
import Pos
import Parse
-------------------- Lexical Preprocessing --------------------
-- Preprocess unicode escapes and newlines
javaPrep :: String -> String
javaPrep [] = []
javaPrep ('\r':'\n':s) = '\n':javaPrep s
javaPrep ('\r':s) = '\n':javaPrep s
javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s
javaPrep ('\\':'u':s) = case s of
h1:h2:h3:h4:s ->
if isHexDigit h1 && isHexDigit h2 &&
isHexDigit h3 && isHexDigit h4
then chr v4:javaPrep s
else error "invalid Unicode escape sequence"
where v1 = digitToInt h1
v2 = v1*16 + digitToInt h2
v3 = v2*16 + digitToInt h3
v4 = v3*16 + digitToInt h4
_ -> error "incomplete Unicode escape sequence"
javaPrep (c:s) = c:javaPrep s
-------------------- Abstract Syntax Trees --------------------
type Identifier = String
type Name = [Identifier]
data Token = TokKeyword String
| TokIdent Identifier
| TokSymbol String
| TokInt Integer Bool
| TokFloat Double Bool
| TokChar Char
| TokString String
| TokBool Bool
| TokNull
instance Eq Token where
TokKeyword s == TokKeyword s' = s == s'
TokIdent s == TokIdent s' = s == s'
TokSymbol s == TokSymbol s' = s == s'
TokInt i l == TokInt i' l' = i == i' && l == l'
TokFloat f l == TokFloat f' l' = f == f' && l == l'
TokChar c == TokChar c' = c == c'
TokString s == TokString s' = s == s'
_ == _ = False
instance Show Token where
show (TokKeyword s) = "reserved word " ++ show s
show (TokIdent s) = "identifier " ++ show s
show (TokSymbol s) = "symbol " ++ show s
show (TokInt i l) = show i ++ (if l then "l" else "")
show (TokFloat f l) = show f ++ (if l then "l" else "")
show (TokChar c) = show c
show (TokString s) = show s
data Expression = ExpLiteral Token
| ExpIdent Identifier
| ExpPrefix String Expression
| ExpPostfix String Expression
| ExpBinary String Expression Expression
| ExpSelect Expression Expression
| ExpInstanceof Expression DeclType
| ExpNewClass DeclType [Expression] (Maybe [Declaration])
| ExpNewArray DeclType [Maybe Expression] (Maybe [Initializer])
| ExpCall Expression [Expression]
| ExpArray Expression (Maybe Expression)
| ExpCast DeclType Expression
| ExpCond Expression Expression Expression
| ExpThis
| ExpSuper
| ExpClass
type Modifier = String
data DeclType = DtBasic String Int
| DtIdent [Identifier] Int
type Declarator = (Identifier, Int, Maybe Initializer)
data Declaration =
DeclSimple [Modifier] DeclType [Declarator]
| DeclMethod [Modifier] (Maybe DeclType) Identifier
[FormalParam] Int [Name]
(Maybe [Statement])
| DeclConstructor [Modifier] Identifier
[FormalParam] [Name] [Statement]
| DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType]
[Declaration]
| DeclInterface [Modifier] Identifier [DeclType]
[Declaration]
| DeclBlock Bool [Statement]
-- formal parameter: "final" flag, type, parameter name, array dimension
type FormalParam = (Bool, DeclType, Identifier, Int)
data Initializer =
IniExpr Expression
| IniList [Initializer]
data SwitchGroup =
SwCase Expression [Statement]
| SwDefault [Statement]
data ForInit = FiExpr [Expression]
| FiDecl Bool DeclType [Declarator]
| FiNone
data Statement = StLabel Identifier Statement
| StCase Expression Statement
| StDefault Statement
| StDecl Declaration
| StExpr Expression
| StBlock [Statement]
| StIf Expression Statement (Maybe Statement)
| StSwitch Expression [SwitchGroup]
| StWhile Expression Statement
| StDo Statement Expression
| StFor ForInit (Maybe Expression)
[Expression] Statement
| StTry [Statement] [CatchClause] (Maybe [Statement])
| StSynch Expression [Statement]
| StContinue (Maybe Identifier)
| StBreak (Maybe Identifier)
| StReturn (Maybe Expression)
| StThrow Expression
| StNull
-- catch clause:
type CatchClause = (FormalParam, [Statement])
-- import declaration: qualified name, ".*" flag
type ImportDecl = (Name, Bool)
-- compilation unit: package name, import declarations, type declarations
type CompUnit = (Maybe Name, [ImportDecl], [Declaration])
instance Show Expression where
show expr = cprintExpr 0 0 expr
instance Show Statement where
show stmt = cprintStmt 0 stmt
cprintExpr indent prec expr = undefined
cprintStmt indent stmt = undefined
-------------------- Packrat Parsing --------------------
data JavaDerivs = JavaDerivs {
cdPos :: Pos,
cdText :: String,
cdChar :: Result JavaDerivs Char,
cdTok :: TokDerivs,
cdExpr :: ExprDerivs,
cdStmt :: StmtDerivs,
cdDecl :: DeclDerivs
}
data TokDerivs = TokDerivs {
tdWhitespace :: Result JavaDerivs (),
tdWord :: Result JavaDerivs Token,
tdSym :: Result JavaDerivs Token,
tdHexLit :: Result JavaDerivs Token,
tdOctLit :: Result JavaDerivs Token,
tdDecLit :: Result JavaDerivs Token,
tdFloatSize :: Result JavaDerivs Bool,
tdFloatExp :: Result JavaDerivs Integer,
tdFloatLit :: Result JavaDerivs Token,
tdCharLit :: Result JavaDerivs Token,
tdStringLit :: Result JavaDerivs Token,
tdToken :: Result JavaDerivs Token
}
data ExprDerivs = ExprDerivs {
edParExpr :: Result JavaDerivs Expression,
edPrimExpr :: Result JavaDerivs Expression,
edPostfixExpr :: Result JavaDerivs Expression,
edPrefixExpr :: Result JavaDerivs Expression,
edMultExpr :: Result JavaDerivs Expression,
edAddExpr :: Result JavaDerivs Expression,
edShiftExpr :: Result JavaDerivs Expression,
edRelExpr :: Result JavaDerivs Expression,
edEqExpr :: Result JavaDerivs Expression,
edAndExpr :: Result JavaDerivs Expression,
edXorExpr :: Result JavaDerivs Expression,
edOrExpr :: Result JavaDerivs Expression,
edCondAndExpr :: Result JavaDerivs Expression,
edCondOrExpr :: Result JavaDerivs Expression,
edCondExpr :: Result JavaDerivs Expression,
edAssignExpr :: Result JavaDerivs Expression,
edExpression :: Result JavaDerivs Expression
}
data StmtDerivs = StmtDerivs {
sdCatchClause :: Result JavaDerivs CatchClause,
sdSwitchGroup :: Result JavaDerivs SwitchGroup,
sdForInit :: Result JavaDerivs ForInit,
sdStatement :: Result JavaDerivs Statement,
sdBlockStmt :: Result JavaDerivs Statement,
sdBlock :: Result JavaDerivs [Statement]
}
data DeclDerivs = DeclDerivs {
ddModifier :: Result JavaDerivs Modifier,
ddDeclType :: Result JavaDerivs DeclType,
ddFormalParam :: Result JavaDerivs FormalParam,
ddFormalParams :: Result JavaDerivs [FormalParam],
ddDeclarator :: Result JavaDerivs Declarator,
ddDeclaration :: Result JavaDerivs Declaration,
ddInitializer :: Result JavaDerivs Initializer,
ddArrayInit :: Result JavaDerivs [Initializer],
ddImportDecl :: Result JavaDerivs ImportDecl,
ddCompUnit :: Result JavaDerivs CompUnit
}
instance Derivs JavaDerivs where
dvPos d = cdPos d
dvChar d = cdChar d
whitespace = tdWhitespace . cdTok
word = tdWord . cdTok
sym = tdSym . cdTok
hexLit = tdHexLit . cdTok
octLit = tdOctLit . cdTok
decLit = tdDecLit . cdTok
floatSize = tdFloatSize . cdTok
floatExp = tdFloatExp . cdTok
floatLit = tdFloatLit . cdTok
charLit = tdCharLit . cdTok
stringLit = tdStringLit . cdTok
token = tdToken . cdTok
--parExpr = edParExpr . cdExpr
--primExpr = edPrimExpr . cdExpr
--postfixExpr = edPostfixExpr . cdExpr
--prefixExpr = edPrefixExpr . cdExpr
--multExpr = edMultExpr . cdExpr
--addExpr = edAddExpr . cdExpr
--shiftExpr = edShiftExpr . cdExpr
--relExpr = edRelExpr . cdExpr
--eqExpr = edEqExpr . cdExpr
--andExpr = edAndExpr . cdExpr
--xorExpr = edXorExpr . cdExpr
--orExpr = edOrExpr . cdExpr
--condAndExpr = edCondAndExpr . cdExpr
--condOrExpr = edCondOrExpr . cdExpr
--condExpr = edCondExpr . cdExpr
--assignExpr = edAssignExpr . cdExpr
--expression = edExpression . cdExpr
--catchClause = sdCatchClause . cdStmt
--switchGroup = sdSwitchGroup . cdStmt
--forInit = sdForInit . cdStmt
--statement = sdStatement . cdStmt
--blockStmt = sdBlockStmt . cdStmt
--block = sdBlock . cdStmt
--modifier = ddModifier . cdDecl
--declType = ddDeclType . cdDecl
--formalParam = ddFormalParam . cdDecl
--formalParams = ddFormalParams. cdDecl
--declarator = ddDeclarator . cdDecl
--declaration = ddDeclaration . cdDecl
--initializer = ddInitializer . cdDecl
--arrayInit = ddArrayInit . cdDecl
--importDecl = ddImportDecl . cdDecl
--compUnit = ddCompUnit . cdDecl
parExpr = Parser (edParExpr . cdExpr)
primExpr = Parser (edPrimExpr . cdExpr)
postfixExpr = Parser (edPostfixExpr . cdExpr)
prefixExpr = Parser (edPrefixExpr . cdExpr)
multExpr = Parser (edMultExpr . cdExpr)
addExpr = Parser (edAddExpr . cdExpr)
shiftExpr = Parser (edShiftExpr . cdExpr)
relExpr = Parser (edRelExpr . cdExpr)
eqExpr = Parser (edEqExpr . cdExpr)
andExpr = Parser (edAndExpr . cdExpr)
xorExpr = Parser (edXorExpr . cdExpr)
orExpr = Parser (edOrExpr . cdExpr)
condAndExpr = Parser (edCondAndExpr . cdExpr)
condOrExpr = Parser (edCondOrExpr . cdExpr)
condExpr = Parser (edCondExpr . cdExpr)
assignExpr = Parser (edAssignExpr . cdExpr)
expression = Parser (edExpression . cdExpr)
catchClause = Parser (sdCatchClause . cdStmt)
switchGroup = Parser (sdSwitchGroup . cdStmt)
forInit = Parser (sdForInit . cdStmt)
statement = Parser (sdStatement . cdStmt)
blockStmt = Parser (sdBlockStmt . cdStmt)
block = Parser (sdBlock . cdStmt)
modifier = Parser (ddModifier . cdDecl)
declType = Parser (ddDeclType . cdDecl)
formalParam = Parser (ddFormalParam . cdDecl)
formalParams = Parser (ddFormalParams. cdDecl)
declarator = Parser (ddDeclarator . cdDecl)
declaration = Parser (ddDeclaration . cdDecl)
initializer = Parser (ddInitializer . cdDecl)
arrayInit = Parser (ddArrayInit . cdDecl)
importDecl = Parser (ddImportDecl . cdDecl)
compUnit = Parser (ddCompUnit . cdDecl)
-------------------- Lexical Structure --------------------
lineTerminator :: JavaDerivs -> Result JavaDerivs ()
lineTerminator d =
case dvChar d of
Parsed '\r' d' e' ->
case dvChar d' of
Parsed '\n' d'' e'' -> Parsed () d'' e''
Parsed _ d'' e'' -> Parsed () d' e'
NoParse e'' -> NoParse e''
Parsed '\n' d' e' -> Parsed () d' e'
Parsed _ d' e' -> NoParse (nullError d)
NoParse e' -> NoParse e'
-- Whitespace
spaceChar :: JavaDerivs -> Result JavaDerivs ()
spaceChar d =
case dvChar d of
Parsed c d' e' ->
if isSpace c then Parsed () d' e'
else NoParse (nullError d)
NoParse e' -> NoParse e'
traditionalCommentRest :: JavaDerivs -> Result JavaDerivs ()
traditionalCommentRest d =
case dvChar d of
Parsed '*' d' e' ->
case dvChar d' of
Parsed '/' d'' e'' -> Parsed () d'' e''
_ -> traditionalCommentRest d'
Parsed _ d' e' -> traditionalCommentRest d'
NoParse e' -> NoParse e'
traditionalComment :: JavaDerivs -> Result JavaDerivs ()
traditionalComment d =
case dvChar d of
Parsed '/' d' e' ->
case dvChar d' of
Parsed '*' d'' e'' -> traditionalCommentRest d''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
endOfLineCommentRest :: JavaDerivs -> Result JavaDerivs ()
endOfLineCommentRest d =
case lineTerminator d of
Parsed _ d' e' -> Parsed () d' e'
_ ->
case dvChar d of
Parsed _ d' e' -> endOfLineCommentRest d'
NoParse e' -> NoParse e'
endOfLineComment :: JavaDerivs -> Result JavaDerivs ()
endOfLineComment d =
case dvChar d of
Parsed '/' d' e' ->
case dvChar d' of
Parsed '/' d'' e'' -> endOfLineCommentRest d''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
pWhitespace :: JavaDerivs -> Result JavaDerivs ()
pWhitespace d =
case spaceChar d of
Parsed _ d' e' -> whitespace d'
_ ->
case traditionalComment d of
Parsed _ d' e' -> whitespace d'
_ ->
case endOfLineComment d of
Parsed _ d' e' -> whitespace d'
_ -> Parsed () d (nullError d)
-- Keywords and identifiers
keywords = [
"abstract",
"boolean", "break", "byte",
"case", "catch", "char", "class", "const", "continue",
"default", "do", "double",
"else", "extends",
"final", "finally", "float", "for",
"goto",
"if", "implements", "import", "instanceof", "int", "interface",
"long",
"native", "new",
"package", "private", "protected", "public",
"return",
"short", "static", "strictfp", "super", "switch", "synchronized",
"this", "throw", "throws", "transient", "try",
"void", "volatile",
"while"
]
isIdentStart c = isAlpha c || c == '_'
isIdentCont c = isIdentStart c || isDigit c
pWordRest :: JavaDerivs -> Result JavaDerivs String
pWordRest d =
case dvChar d of
Parsed c d' e' ->
if isIdentCont c
then case pWordRest d' of
Parsed w d'' e'' -> Parsed (c:w) d'' e''
else case whitespace d of
Parsed _ d'' e'' -> Parsed [] d'' e''
_ -> Parsed [] d (nullError d)
pWord :: JavaDerivs -> Result JavaDerivs Token
pWord d =
case dvChar d of
Parsed c d' e' ->
if isIdentStart c
then case pWordRest d' of
Parsed w d'' e'' -> Parsed (interp (c:w)) d'' e''
else NoParse (nullError d)
_ -> NoParse (nullError d)
where interp w =
case w of
"null" -> TokNull
"true" -> TokBool True
"false" -> TokBool False
_ -> if w `elem` keywords
then TokKeyword w
else TokIdent w
-- Recognize an operator or punctuation symbol.
-- There are certainly more readable alternative ways to code this,
-- but the approach here makes very simple and direct use of pattern matching
-- to produce an efficient decision tree.
-- This code simulates what an automated packrat parser generator might create
-- for rules with many alternatives matching simple constant strings.
pSym :: JavaDerivs -> Result JavaDerivs Token
pSym d =
case dvChar d of
Parsed '<' d' e' ->
case dvChar d' of
Parsed '<' d'' e'' ->
case dvChar d'' of
Parsed '=' d''' e''' ->
case whitespace d''' of
Parsed _ d'''' e'''' ->
Parsed (TokSymbol "<<=") d'''' e''''
_ -> case whitespace d'' of
Parsed _ d''' e''' ->
Parsed (TokSymbol "<<") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "<=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "<") d'' e''
Parsed '>' d' e' ->
case dvChar d' of
Parsed '>' d'' e'' ->
case dvChar d'' of
Parsed '>' d''' e''' ->
case dvChar d''' of
Parsed '=' d'''' e'''' ->
case whitespace d'''' of
Parsed _ e''''' d''''' ->
Parsed (TokSymbol ">>>=") e''''' d'''''
_ -> case whitespace d''' of
Parsed _ d'''' e'''' ->
Parsed (TokSymbol ">>>") d'''' e''''
Parsed '=' d''' e''' ->
case whitespace d''' of
Parsed _ d'''' e'''' ->
Parsed (TokSymbol ">>=") d'''' e''''
_ -> case whitespace d'' of
Parsed _ d''' e''' ->
Parsed (TokSymbol ">>") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol ">=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol ">") d'' e''
Parsed '+' d' e' ->
case dvChar d' of
Parsed '+' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "++") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "+=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "+") d'' e''
Parsed '-' d' e' ->
case dvChar d' of
Parsed '-' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "--") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "-=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "-") d'' e''
Parsed '*' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "*=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "*") d'' e''
Parsed '/' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "/=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "/") d'' e''
Parsed '%' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "%=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "%") d'' e''
Parsed '&' d' e' ->
case dvChar d' of
Parsed '&' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "&&") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "&=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "&") d'' e''
Parsed '^' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "^=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "^") d'' e''
Parsed '|' d' e' ->
case dvChar d' of
Parsed '|' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "||") d''' e'''
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "|=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "|") d'' e''
Parsed '=' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "==") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "=") d'' e''
Parsed '!' d' e' ->
case dvChar d' of
Parsed '=' d'' e'' ->
case whitespace d'' of
Parsed _ d''' e''' -> Parsed (TokSymbol "!=") d''' e'''
_ -> case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol "!") d'' e''
Parsed c d' e' ->
if c `elem` ";:,.{}[]()~?"
then case whitespace d' of
Parsed _ d'' e'' -> Parsed (TokSymbol [c]) d'' e''
else NoParse (nullError d)
NoParse e' -> NoParse e'
-- Integer literals
pIntTypeSuffix :: JavaDerivs -> Result JavaDerivs Bool
pIntTypeSuffix d =
case dvChar d of
Parsed 'l' d' e' -> ws True d'
Parsed 'L' d' e' -> ws True d'
_ -> ws False d
where ws islong d' =
case whitespace d' of
Parsed _ d'' e'' -> Parsed islong d'' e''
pHexDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer)
pHexDigits d =
case dvChar d of
Parsed c d' e' ->
if isHexDigit c
then case pHexDigits d' of
Parsed (v,n) d'' e'' ->
Parsed (v + toInteger (digitToInt c) * 16^n, n+1) d'' e''
_ ->
Parsed (toInteger (digitToInt c), 1) d' e'
else NoParse (nullError d)
_ -> NoParse (nullError d)
pHexLit :: JavaDerivs -> Result JavaDerivs Token
pHexLit d =
case dvChar d of
Parsed '0' d' e' ->
case dvChar d' of
Parsed 'x' d'' e'' -> digs d''
Parsed 'X' d'' e'' -> digs d''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
where digs d'' =
case pHexDigits d'' of
Parsed (v,n) d''' e''' ->
case pIntTypeSuffix d''' of
Parsed l d'''' e'''' -> Parsed (TokInt v l) d'''' e''''
_ -> NoParse (nullError d)
pOctDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer)
pOctDigits d =
case dvChar d of
Parsed c d' e' ->
if isOctDigit c
then case pOctDigits d' of
Parsed (v,n) d'' e'' ->
Parsed (v + toInteger (digitToInt c) * 8^n, n+1) d'' e''
_ ->
Parsed (toInteger (digitToInt c), 1) d' e'
else NoParse (nullError d)
_ -> NoParse (nullError d)
pOctLit :: JavaDerivs -> Result JavaDerivs Token
pOctLit d =
case dvChar d of
Parsed '0' d' e' ->
case pOctDigits d' of
Parsed (v,n) d'' e'' ->
case pIntTypeSuffix d'' of
Parsed l d''' e''' -> Parsed (TokInt v l) d''' e'''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
pDecDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer)
pDecDigits d =
case dvChar d of
Parsed c d' e' ->
if isDigit c
then case pDecDigits d' of
Parsed (v,n) d'' e'' ->
Parsed (v + toInteger (digitToInt c) * 10^n, n+1) d'' e''
_ ->
Parsed (toInteger (digitToInt c), 1) d' e'
else NoParse (nullError d)
_ -> NoParse (nullError d)
pDecLit :: JavaDerivs -> Result JavaDerivs Token
pDecLit d =
case pDecDigits d of
Parsed (v,n) d' e' ->
case pIntTypeSuffix d' of
Parsed l d'' e'' -> Parsed (TokInt v l) d'' e''
_ -> NoParse (nullError d)
-- Floating-point literals
scanDec :: String -> Integer
scanDec digits = foldl f 0 digits
where f v c = v * 10 + toInteger (digitToInt c)
pFloatSize :: JavaDerivs -> Result JavaDerivs Bool
Parser pFloatSize = (do { s <- oneOf "dD"; return True })
</> (do { s <- oneOf "fF"; return False })
pFloatExp :: JavaDerivs -> Result JavaDerivs Integer
Parser pFloatExp =
do oneOf "eE"
neg <- plusminus
digits <- many1 digit
let f v c = v * 10 + toInteger (digitToInt c)
val = foldl f 0 digits
return (if neg then -val else val)
where plusminus = (do { char '+'; return False })
</> (do { char '-'; return True })
</> return False
pFloatLit :: JavaDerivs -> Result JavaDerivs Token
Parser pFloatLit = (do i <- many1 digit
char '.'
f <- many digit
e <- Parser floatExp </> return 0
s <- Parser floatSize </> return False
Parser whitespace
return (mkfloat (scanDec i) f e s))
</> (do char '.'
f <- many1 digit
e <- Parser floatExp </> return 0
s <- Parser floatSize </> return False
Parser whitespace
return (mkfloat 0 f e s))
</> (do i <- many1 digit
e <- Parser floatExp
s <- Parser floatSize </> return False
Parser whitespace
return (mkfloat (scanDec i) [] e s))
</> (do i <- many1 digit
e <- Parser floatExp </> return 0
s <- Parser floatSize
Parser whitespace
return (mkfloat (scanDec i) [] e s))
where mkfloat :: Integer -> String -> Integer -> Bool -> Token
mkfloat i f e s = TokFloat (scanfrac i f * 10.0**(fromInteger e))
s
scanfrac :: Integer -> String -> Double
scanfrac i [] = fromInteger i
scanfrac i (c:cs) =
scanfrac (i * 10 + toInteger (digitToInt c)) cs / 10.0
-- Character and string literals
quotedChar quote d =
case dvChar d of
Parsed '\\' d' e' ->
case dvChar d' of
Parsed c d'' e'' ->
case c of
'n' -> Parsed '\n' d'' e''
'r' -> Parsed '\r' d'' e''
't' -> Parsed '\t' d'' e''
'v' -> Parsed '\v' d'' e''
'f' -> Parsed '\f' d'' e''
'\\' -> Parsed '\\' d'' e''
'\'' -> Parsed '\'' d'' e''
'\"' -> Parsed '\"' d'' e''
-- XXX octal characters, other escapes
_ -> NoParse (msgError (dvPos d) "invalid escape sequence")
_ -> NoParse (nullError d)
Parsed c d' e' ->
if c /= quote
then Parsed c d' e'
else NoParse (nullError d)
_ -> NoParse (nullError d)
pCharLit :: JavaDerivs -> Result JavaDerivs Token
pCharLit d =
case dvChar d of
Parsed '\'' d' e' ->
case quotedChar '\'' d' of
Parsed c d'' e'' ->
case dvChar d'' of
Parsed '\'' d''' e''' ->
case whitespace d''' of
Parsed _ d'''' e'''' -> Parsed (TokChar c) d'''' e''''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
stringLitChars :: JavaDerivs -> Result JavaDerivs String
stringLitChars d =
case quotedChar '"' d of
Parsed c d' e' ->
case stringLitChars d' of
Parsed s d'' e'' -> Parsed (c:s) d'' e''
_ -> Parsed [] d (nullError d)
pStringLit :: JavaDerivs -> Result JavaDerivs Token
pStringLit d =
case dvChar d of
Parsed '"' d' e' ->
case stringLitChars d' of
Parsed s d'' e'' ->
case dvChar d'' of
Parsed '"' d''' e''' ->
case whitespace d''' of
Parsed _ d'''' e'''' -> Parsed (TokString s) d'''' e''''
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
_ -> NoParse (nullError d)
-- Token tie-up
pToken :: JavaDerivs -> Result JavaDerivs Token
pToken d =
case word d of
r @ (Parsed t d' e') -> r
_ ->
case sym d of
r @ (Parsed t d' e') -> r
_ ->
case charLit d of
r @ (Parsed t d' e') -> r
_ ->
case stringLit d of
r @ (Parsed t d' e') -> r
_ ->
case floatLit d of
r @ (Parsed t d' e') -> r
_ ->
case hexLit d of
r @ (Parsed t d' e') -> r
_ ->
case octLit d of
r @ (Parsed t d' e') -> r
_ ->
case decLit d of
r @ (Parsed t d' e') -> r
_ -> NoParse (nullError d)
keyword :: String -> Parser JavaDerivs String
keyword w = Parser parse
where parse d =
case token d of
Parsed (TokKeyword w') d' e' ->
if w' == w then Parsed w d' e' else none d
_ -> none d
none d = NoParse (expError (dvPos d) (show w))
symbol :: String -> Parser JavaDerivs String
symbol s = Parser parse
where parse d =
case token d of
Parsed (TokSymbol s') d' e' ->
if s' == s then Parsed s d' e' else none d
_ -> none d
none d = NoParse (expError (dvPos d) (show s))
identifier :: Parser JavaDerivs Identifier
identifier = Parser parse
where parse d =
case token d of
Parsed (TokIdent s) d' e' -> Parsed s d' e'
_ -> NoParse (expError (dvPos d) "identifier")
-------------------- Expressions --------------------
arguments = do symbol "("; eargs <- sepBy expression (symbol ","); symbol ")"
return eargs
pParExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pParExpr =
do symbol "("; e <- expression; symbol ")"; return e
pPrimExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPrimExpr =
(do t <- Parser token
case t of
TokIdent s -> return (ExpIdent s)
TokInt i l -> return (ExpLiteral t)
TokFloat f l -> return (ExpLiteral t)
TokChar c -> return (ExpLiteral t)
TokString c -> return (ExpLiteral t)
TokBool b -> return (ExpLiteral t)
TokNull -> return (ExpLiteral t)
_ -> fail ("found " ++ show t))
</> parExpr
</> (do keyword "this"; return ExpThis)
</> (do keyword "super"; return ExpSuper)
</> (do keyword "class"; return ExpClass)
</> (do keyword "new" -- class creator
ty <- declType
args <- arguments
body <- optional classBody
return (ExpNewClass ty args body))
</> (do keyword "new" -- array creator
ty <- declType
dims <- many (do symbol "[" -- XXX many1
e <- optional expression
symbol "]"
return e)
init <- optional arrayInit
return (ExpNewArray ty dims init))
<?> "primary expression"
suffix :: Parser JavaDerivs (Expression -> Expression)
suffix =
(do symbol "["; eidx <- optional expression; symbol "]"
return (\ebase -> ExpArray ebase eidx))
</> (do eargs <- arguments
return (\efunc -> ExpCall efunc eargs))
</> (do symbol "."; eitem <- primExpr
return (\econtext -> ExpSelect econtext eitem))
</> (do op <- (symbol "++" </> symbol "--")
return (\e -> ExpPostfix op e))
pPostfixExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPostfixExpr =
(do exp <- primExpr
suffixes <- many suffix
return (foldl (\e s -> s e) exp suffixes))
<?> "postfix expression"
pPrefixExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPrefixExpr =
(do op <- (symbol "++" </> symbol "--" </>
symbol "+" </> symbol "-" </>
symbol "~" </> symbol "!")
exp <- prefixExpr
return (ExpPostfix op exp))
</> (do symbol "("; t <- declType; symbol ")"; e <- prefixExpr
return (ExpCast t e))
</> postfixExpr
<?> "prefix expression"
pMultExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pMultExpr =
chainl1 prefixExpr
(do op <- (symbol "*" </> symbol "/" </> symbol "%")
return (\l r -> ExpBinary op l r))
pAddExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAddExpr =
chainl1 multExpr
(do op <- (symbol "+" </> symbol "-")
return (\l r -> ExpBinary op l r))
pShiftExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pShiftExpr =
chainl1 addExpr
(do op <- (symbol "<<" </> symbol ">>" </> symbol ">>>")
return (\l r -> ExpBinary op l r))
pRelExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pRelExpr =
do l <- shiftExpr; suffix l
where suffix l =
(do op <- (symbol "<=" </> symbol ">=" </>
symbol "<" </> symbol ">")
r <- shiftExpr
suffix (ExpBinary op l r))
</> (do keyword "instanceof"
t <- declType
suffix (ExpInstanceof l t))
</> return l
pEqExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pEqExpr =
chainl1 relExpr
(do op <- (symbol "==" </> symbol "!=")
return (\l r -> ExpBinary op l r))
pAndExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAndExpr =
chainl1 eqExpr
(do op <- symbol "&"
return (\l r -> ExpBinary op l r))
pXorExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pXorExpr =
chainl1 andExpr
(do op <- symbol "^"
return (\l r -> ExpBinary op l r))
pOrExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pOrExpr =
chainl1 xorExpr
(do op <- symbol "|"
return (\l r -> ExpBinary op l r))
pCondAndExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondAndExpr =
chainl1 orExpr
(do op <- symbol "&&"
return (\l r -> ExpBinary op l r))
pCondOrExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondOrExpr =
chainl1 condAndExpr
(do op <- symbol "||"
return (\l r -> ExpBinary op l r))
pCondExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondExpr =
(do c <- condOrExpr; symbol "?"
t <- expression; symbol ":"
f <- condExpr
return (ExpCond c t f))
</> condOrExpr
<?> "conditional expression"
pAssignExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAssignExpr =
(do l <- prefixExpr
op <- (symbol "=" </>
symbol "*=" </> symbol "/=" </> symbol "%=" </>
symbol "+=" </> symbol "-=" </>
symbol "<<=" </> symbol ">>=" </> symbol ">>>=" </>
symbol "&=" </> symbol "^=" </> symbol "|=")
r <- assignExpr
return (ExpBinary op l r))
</> condExpr
<?> "assignment expression"
pExpression :: JavaDerivs -> Result JavaDerivs Expression
pExpression = pAssignExpr
-------------------- Statements --------------------
pCatchClause :: JavaDerivs -> Result JavaDerivs CatchClause
Parser pCatchClause =
(do keyword "catch"
symbol "("
p <- formalParam
symbol ")"
b <- block
return (p,b))
pSwitchGroup :: JavaDerivs -> Result JavaDerivs SwitchGroup
Parser pSwitchGroup =
(do keyword "case"; e <- expression; symbol ":"
s <- many blockStmt
return (SwCase e s))
</> (do keyword "default"; symbol ":"
s <- many blockStmt
return (SwDefault s))
<?> "switch statement group"
pForInit :: JavaDerivs -> Result JavaDerivs ForInit
Parser pForInit =
(do f <- (do keyword "final"; return True) </> (return False)
t <- declType
d <- sepBy1 declarator (symbol ",")
return (FiDecl f t d))
</> (do es <- sepBy expression (symbol ",")
return (FiExpr es))
</> return FiNone
pStatement :: JavaDerivs -> Result JavaDerivs Statement
Parser pStatement =
(do b <- block
return (StBlock b))
</> (do keyword "if"; e <- parExpr
t <- statement; keyword "else"; f <- statement
return (StIf e t (Just f)))
</> (do keyword "if"; e <- parExpr
t <- statement
return (StIf e t Nothing))
</> (do keyword "for"; symbol "("
i <- forInit; symbol ";"
c <- optional expression; symbol ";"
n <- sepBy expression (symbol ","); symbol ")"
b <- statement
return (StFor i c n b))
</> (do keyword "while"; e <- parExpr
b <- statement
return (StWhile e b))
</> (do keyword "do"; b <- statement; keyword "while"
e <- parExpr; symbol ";"
return (StDo b e))
</> (do keyword "try"
b <- block
c <- many catchClause
f <- optional (do keyword "finally"; block)
return (StTry b c f))
</> (do keyword "switch"; e <- parExpr
symbol "{"; b <- many switchGroup; symbol "}"
return (StSwitch e b))
</> (do keyword "synchronized"; e <- parExpr; b <- block
return (StSynch e b))
</> (do keyword "return"; e <- optional expression; symbol ";"
return (StReturn e))
</> (do keyword "throw"; e <- expression; symbol ";"
return (StThrow e))
</> (do keyword "break"; i <- optional identifier; symbol ";"
return (StBreak i))
</> (do keyword "continue"; i <- optional identifier; symbol ";"
return (StContinue i))
</> (do i <- identifier; symbol ":"; s <- statement
return (StLabel i s))
</> (do e <- expression; symbol ";"
return (StExpr e))
</> (do symbol ";"
return (StNull))
<?> "statement"
pBlockStmt :: JavaDerivs -> Result JavaDerivs Statement
Parser pBlockStmt =
(do d <- declaration
return (StDecl d))
</> statement
<?> "block statement"
pBlock :: JavaDerivs -> Result JavaDerivs [Statement]
Parser pBlock =
(do symbol "{"; s <- many blockStmt; symbol "}"; return s)
-------------------- Declarations --------------------
qualName = sepBy1 identifier (symbol ".")
pModifier :: JavaDerivs -> Result JavaDerivs Modifier
Parser pModifier =
keyword "public" </> keyword "protected" </> keyword "private" </>
keyword "static" </> keyword "abstract" </> keyword "final" </>
keyword "native" </> keyword "synchronized" </> keyword "transient" </>
keyword "volatile" </> keyword "strictfp"
pDeclType :: JavaDerivs -> Result JavaDerivs DeclType
Parser pDeclType =
(do s <- keyword "byte" </> keyword "short" </> keyword "char" </>
keyword "int" </> keyword "long" </>
keyword "float" </> keyword "double" </> keyword "boolean"
b <- many (do symbol "["; symbol "]")
return (DtBasic s (length b)))
</> (do i <- qualName
b <- many (do symbol "["; symbol "]")
return (DtIdent i (length b)))
<?> "type"
pFormalParam :: JavaDerivs -> Result JavaDerivs FormalParam
Parser pFormalParam =
do f <- (do keyword "final"; return True) </> (return False)
t <- declType
i <- identifier
a <- many (do symbol "["; symbol "]")
return (f, t, i, length a)
pFormalParams :: JavaDerivs -> Result JavaDerivs [FormalParam]
Parser pFormalParams =
do symbol "("
fs <- sepBy formalParam (symbol ",")
symbol ")"
return fs
pDeclarator :: JavaDerivs -> Result JavaDerivs Declarator
Parser pDeclarator =
(do ident <- identifier
bkts <- many (do symbol "["; symbol "]")
init <- optional (do symbol "="; initializer)
return (ident, length bkts, init))
<?> "declarator"
classBody = do symbol "{"; ds <- many declaration; symbol "}"; return ds
pDeclaration :: JavaDerivs -> Result JavaDerivs Declaration
Parser pDeclaration =
(do m <- many modifier -- variable declaration
t <- declType
d <- sepBy declarator (symbol ",")
symbol ";"
return (DeclSimple m t d))
</> (do m <- many modifier -- method declaration
t <- (do t <- declType; return (Just t)) </>
(do keyword "void"; return Nothing)
i <- identifier
p <- formalParams
a <- many (do symbol "["; symbol "]")
th <- throws
b <- (do b <- block; return (Just b))
</> (do symbol ";"; return Nothing)
return (DeclMethod m t i p (length a) th b))
</> (do m <- many modifier -- constructor declaration
i <- identifier
p <- formalParams
th <- throws
b <- block
return (DeclConstructor m i p th b))
</> (do m <- many modifier -- class declaration
keyword "class"
i <- identifier
ext <- optional (do keyword "extends"; declType)
imp <- (do keyword "implements"; sepBy1 declType (symbol ","))
</> return []
ds <- classBody
return (DeclClass m i ext imp ds))
</> (do m <- many modifier -- interface declaration
keyword "interface"
i <- identifier
ext <- (do keyword "extends"; sepBy1 declType (symbol ","))
</> return []
ds <- classBody
return (DeclInterface m i ext ds))
</> (do st <- (do keyword "static"; return True) </> (return False)
b <- block
return (DeclBlock st b))
<?> "declaration"
where throws =
(do keyword "throws"; sepBy1 qualName (symbol ","))
</> return []
pInitializer :: JavaDerivs -> Result JavaDerivs Initializer
Parser pInitializer =
(do inits <- arrayInit
return (IniList inits))
</> (do e <- expression
return (IniExpr e))
<?> "variable initializer"
pArrayInit :: JavaDerivs -> Result JavaDerivs [Initializer]
Parser pArrayInit =
(do symbol "{"
inits <- sepEndBy initializer (symbol ",")
symbol "}"
return inits)
<?> "array initializer"
pImportDecl :: JavaDerivs -> Result JavaDerivs ImportDecl
Parser pImportDecl =
(do keyword "import"
name <- qualName
all <- (do symbol "."; symbol "*"; return True)
</> return False
symbol ";"
return (name, all))
<?> "import declaration"
pCompUnit :: JavaDerivs -> Result JavaDerivs CompUnit
Parser pCompUnit =
do Parser whitespace
p <- (do keyword "package"; n <- qualName; symbol ";"
return (Just n)) </> return Nothing
i <- many importDecl
t <- many declaration
notFollowedBy anyChar
return (p, i, t)
-------------------- Recursive Tie-Up --------------------
pTokDerivs dvs = TokDerivs whitespace word sym
hexlit octlit declit
floatsize floatexp floatlit
charlit stringlit
token
where whitespace = pWhitespace dvs
word = pWord dvs
sym = pSym dvs
hexlit = pHexLit dvs
octlit = pOctLit dvs
declit = pDecLit dvs
floatsize = pFloatSize dvs
floatexp = pFloatExp dvs
floatlit = pFloatLit dvs
charlit = pCharLit dvs
stringlit = pStringLit dvs
token = pToken dvs
pExprDerivs dvs = ExprDerivs parexpr primexpr postfixexpr prefixexpr
multexpr addexpr shiftexpr relexpr eqexpr
andexpr xorexpr orexpr logandexpr logorexpr
condexpr assignexpr expr
where parexpr = pParExpr dvs
primexpr = pPrimExpr dvs
postfixexpr = pPostfixExpr dvs
prefixexpr = pPrefixExpr dvs
multexpr = pMultExpr dvs
addexpr = pAddExpr dvs
shiftexpr = pShiftExpr dvs
relexpr = pRelExpr dvs
eqexpr = pEqExpr dvs
andexpr = pAndExpr dvs
xorexpr = pXorExpr dvs
orexpr = pOrExpr dvs
logandexpr = pCondAndExpr dvs
logorexpr = pCondOrExpr dvs
condexpr = pCondExpr dvs
assignexpr = pAssignExpr dvs
expr = pExpression dvs
pStmtDerivs dvs = StmtDerivs catch switch forinit statement blockstmt block
where catch = pCatchClause dvs
switch = pSwitchGroup dvs
forinit = pForInit dvs
statement = pStatement dvs
blockstmt = pBlockStmt dvs
block = pBlock dvs
pDeclDerivs dvs = DeclDerivs modifier decltype formal formals
declarator declaration initializer arrayinit
importdecl compunit
where modifier = pModifier dvs
decltype = pDeclType dvs
formal = pFormalParam dvs
formals = pFormalParams dvs
declarator = pDeclarator dvs
declaration = pDeclaration dvs
initializer = pInitializer dvs
arrayinit = pArrayInit dvs
importdecl = pImportDecl dvs
compunit = pCompUnit dvs
javaDerivs pos text = dvs
where dvs = JavaDerivs pos text chr tok expr stmt decl
chr = case text of
[] -> NoParse (eofError dvs)
(c:cs) -> Parsed c (javaDerivs (nextPos pos c) cs)
(nullError dvs)
tok = pTokDerivs dvs
expr = pExprDerivs dvs
stmt = pStmtDerivs dvs
decl = pDeclDerivs dvs
javaParse :: String -> String -> JavaDerivs
javaParse name text = javaDerivs (Pos name 1 1) text
javaParseFile :: FilePath -> IO CompUnit
javaParseFile name =
do text <- readFile name
let text' = javaPrep text
derivs = javaParse name text'
case ddCompUnit (cdDecl derivs) of
Parsed cu _ _ -> return cu
NoParse e -> fail (show e)
JavaMonad.hs
Скачать
module JavaMonad where
import Char
import List
import Pos
import Parse
-------------------- Lexical Preprocessing --------------------
-- Preprocess unicode escapes and newlines
javaPrep :: String -> String
javaPrep [] = []
javaPrep ('\r':'\n':s) = '\n':javaPrep s
javaPrep ('\r':s) = '\n':javaPrep s
javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s
javaPrep ('\\':'u':s) = case s of
h1:h2:h3:h4:s ->
if isHexDigit h1 && isHexDigit h2 &&
isHexDigit h3 && isHexDigit h4
then chr v4:javaPrep s
else error "invalid Unicode escape sequence"
where v1 = digitToInt h1
v2 = v1*16 + digitToInt h2
v3 = v2*16 + digitToInt h3
v4 = v3*16 + digitToInt h4
_ -> error "incomplete Unicode escape sequence"
javaPrep (c:s) = c:javaPrep s
-------------------- Abstract Syntax Trees --------------------
type Identifier = String
type Name = [Identifier]
data Token = TokKeyword String
| TokIdent Identifier
| TokSymbol String
| TokInt Integer Bool
| TokFloat Double Bool
| TokChar Char
| TokString String
| TokBool Bool
| TokNull
instance Eq Token where
TokKeyword s == TokKeyword s' = s == s'
TokIdent s == TokIdent s' = s == s'
TokSymbol s == TokSymbol s' = s == s'
TokInt i l == TokInt i' l' = i == i' && l == l'
TokFloat f l == TokFloat f' l' = f == f' && l == l'
TokChar c == TokChar c' = c == c'
TokString s == TokString s' = s == s'
_ == _ = False
instance Show Token where
show (TokKeyword s) = "reserved word " ++ show s
show (TokIdent s) = "identifier " ++ show s
show (TokSymbol s) = "symbol " ++ show s
show (TokInt i l) = show i ++ (if l then "l" else "")
show (TokFloat f l) = show f ++ (if l then "l" else "")
show (TokChar c) = show c
show (TokString s) = show s
data Expression = ExpLiteral Token
| ExpIdent Identifier
| ExpPrefix String Expression
| ExpPostfix String Expression
| ExpBinary String Expression Expression
| ExpSelect Expression Expression
| ExpInstanceof Expression DeclType
| ExpNewClass DeclType [Expression] (Maybe [Declaration])
| ExpNewArray DeclType [Maybe Expression] (Maybe [Initializer])
| ExpCall Expression [Expression]
| ExpArray Expression (Maybe Expression)
| ExpCast DeclType Expression
| ExpCond Expression Expression Expression
| ExpThis
| ExpSuper
| ExpClass
type Modifier = String
data DeclType = DtBasic String Int
| DtIdent [Identifier] Int
type Declarator = (Identifier, Int, Maybe Initializer)
data Declaration =
DeclSimple [Modifier] DeclType [Declarator]
| DeclMethod [Modifier] (Maybe DeclType) Identifier
[FormalParam] Int [Name]
(Maybe [Statement])
| DeclConstructor [Modifier] Identifier
[FormalParam] [Name] [Statement]
| DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType]
[Declaration]
| DeclInterface [Modifier] Identifier [DeclType]
[Declaration]
| DeclBlock Bool [Statement]
-- formal parameter: "final" flag, type, parameter name, array dimension
type FormalParam = (Bool, DeclType, Identifier, Int)
data Initializer =
IniExpr Expression
| IniList [Initializer]
data SwitchGroup =
SwCase Expression [Statement]
| SwDefault [Statement]
data ForInit = FiExpr [Expression]
| FiDecl Bool DeclType [Declarator]
| FiNone
data Statement = StLabel Identifier Statement
| StCase Expression Statement
| StDefault Statement
| StDecl Declaration
| StExpr Expression
| StBlock [Statement]
| StIf Expression Statement (Maybe Statement)
| StSwitch Expression [SwitchGroup]
| StWhile Expression Statement
| StDo Statement Expression
| StFor ForInit (Maybe Expression)
[Expression] Statement
| StTry [Statement] [CatchClause] (Maybe [Statement])
| StSynch Expression [Statement]
| StContinue (Maybe Identifier)
| StBreak (Maybe Identifier)
| StReturn (Maybe Expression)
| StThrow Expression
| StNull
-- catch clause:
type CatchClause = (FormalParam, [Statement])
-- import declaration: qualified name, ".*" flag
type ImportDecl = (Name, Bool)
-- compilation unit: package name, import declarations, type declarations
type CompUnit = (Maybe Name, [ImportDecl], [Declaration])
instance Show Expression where
show expr = cprintExpr 0 0 expr
instance Show Statement where
show stmt = cprintStmt 0 stmt
cprintExpr indent prec expr = undefined
cprintStmt indent stmt = undefined
-------------------- Packrat Parsing --------------------
data JavaDerivs = JavaDerivs {
cdPos :: Pos,
cdText :: String,
cdChar :: Result JavaDerivs Char,
cdTok :: TokDerivs,
cdExpr :: ExprDerivs,
cdStmt :: StmtDerivs,
cdDecl :: DeclDerivs
}
data TokDerivs = TokDerivs {
tdWhitespace :: Result JavaDerivs (),
tdWord :: Result JavaDerivs Token,
tdSym :: Result JavaDerivs Token,
tdHexLit :: Result JavaDerivs Token,
tdOctLit :: Result JavaDerivs Token,
tdDecLit :: Result JavaDerivs Token,
tdFloatSize :: Result JavaDerivs Bool,
tdFloatExp :: Result JavaDerivs Integer,
tdFloatLit :: Result JavaDerivs Token,
tdCharLit :: Result JavaDerivs Token,
tdStringLit :: Result JavaDerivs Token,
tdToken :: Result JavaDerivs Token
}
data ExprDerivs = ExprDerivs {
edParExpr :: Result JavaDerivs Expression,
edPrimExpr :: Result JavaDerivs Expression,
edPostfixExpr :: Result JavaDerivs Expression,
edPrefixExpr :: Result JavaDerivs Expression,
edMultExpr :: Result JavaDerivs Expression,
edAddExpr :: Result JavaDerivs Expression,
edShiftExpr :: Result JavaDerivs Expression,
edRelExpr :: Result JavaDerivs Expression,
edEqExpr :: Result JavaDerivs Expression,
edAndExpr :: Result JavaDerivs Expression,
edXorExpr :: Result JavaDerivs Expression,
edOrExpr :: Result JavaDerivs Expression,
edCondAndExpr :: Result JavaDerivs Expression,
edCondOrExpr :: Result JavaDerivs Expression,
edCondExpr :: Result JavaDerivs Expression,
edAssignExpr :: Result JavaDerivs Expression,
edExpression :: Result JavaDerivs Expression
}
data StmtDerivs = StmtDerivs {
sdCatchClause :: Result JavaDerivs CatchClause,
sdSwitchGroup :: Result JavaDerivs SwitchGroup,
sdForInit :: Result JavaDerivs ForInit,
sdStatement :: Result JavaDerivs Statement,
sdBlockStmt :: Result JavaDerivs Statement,
sdBlock :: Result JavaDerivs [Statement]
}
data DeclDerivs = DeclDerivs {
ddModifier :: Result JavaDerivs Modifier,
ddDeclType :: Result JavaDerivs DeclType,
ddFormalParam :: Result JavaDerivs FormalParam,
ddFormalParams :: Result JavaDerivs [FormalParam],
ddDeclarator :: Result JavaDerivs Declarator,
ddDeclaration :: Result JavaDerivs Declaration,
ddInitializer :: Result JavaDerivs Initializer,
ddArrayInit :: Result JavaDerivs [Initializer],
ddImportDecl :: Result JavaDerivs ImportDecl,
ddCompUnit :: Result JavaDerivs CompUnit
}
instance Derivs JavaDerivs where
dvPos d = cdPos d
dvChar d = cdChar d
whitespace = Parser (tdWhitespace . cdTok)
word = Parser (tdWord . cdTok)
sym = Parser (tdSym . cdTok)
hexLit = Parser (tdHexLit . cdTok)
octLit = Parser (tdOctLit . cdTok)
decLit = Parser (tdDecLit . cdTok)
floatSize = Parser (tdFloatSize . cdTok)
floatExp = Parser (tdFloatExp . cdTok)
floatLit = Parser (tdFloatLit . cdTok)
charLit = Parser (tdCharLit . cdTok)
stringLit = Parser (tdStringLit . cdTok)
token = Parser (tdToken . cdTok)
parExpr = Parser (edParExpr . cdExpr)
primExpr = Parser (edPrimExpr . cdExpr)
postfixExpr = Parser (edPostfixExpr . cdExpr)
prefixExpr = Parser (edPrefixExpr . cdExpr)
multExpr = Parser (edMultExpr . cdExpr)
addExpr = Parser (edAddExpr . cdExpr)
shiftExpr = Parser (edShiftExpr . cdExpr)
relExpr = Parser (edRelExpr . cdExpr)
eqExpr = Parser (edEqExpr . cdExpr)
andExpr = Parser (edAndExpr . cdExpr)
xorExpr = Parser (edXorExpr . cdExpr)
orExpr = Parser (edOrExpr . cdExpr)
condAndExpr = Parser (edCondAndExpr . cdExpr)
condOrExpr = Parser (edCondOrExpr . cdExpr)
condExpr = Parser (edCondExpr . cdExpr)
assignExpr = Parser (edAssignExpr . cdExpr)
expression = Parser (edExpression . cdExpr)
catchClause = Parser (sdCatchClause . cdStmt)
switchGroup = Parser (sdSwitchGroup . cdStmt)
forInit = Parser (sdForInit . cdStmt)
statement = Parser (sdStatement . cdStmt)
blockStmt = Parser (sdBlockStmt . cdStmt)
block = Parser (sdBlock . cdStmt)
modifier = Parser (ddModifier . cdDecl)
declType = Parser (ddDeclType . cdDecl)
formalParam = Parser (ddFormalParam . cdDecl)
formalParams = Parser (ddFormalParams. cdDecl)
declarator = Parser (ddDeclarator . cdDecl)
declaration = Parser (ddDeclaration . cdDecl)
initializer = Parser (ddInitializer . cdDecl)
arrayInit = Parser (ddArrayInit . cdDecl)
importDecl = Parser (ddImportDecl . cdDecl)
compUnit = Parser (ddCompUnit . cdDecl)
-------------------- Lexical Structure --------------------
lineTerminator :: Parser JavaDerivs String
lineTerminator =
(do string "\r\n")
</> string "\r"
</> string "\n"
-- Whitespace
spaceChar :: Parser JavaDerivs ()
spaceChar = do satisfy anyChar isSpace
return ()
traditionalComment :: Parser JavaDerivs ()
traditionalComment =
do string "/*"
many (do notFollowedBy (string "*/"); anyChar)
string "*/"
return ()
endOfLineComment :: Parser JavaDerivs ()
endOfLineComment =
do string "//"
many (do notFollowedBy lineTerminator; anyChar)
lineTerminator
return ()
pWhitespace :: JavaDerivs -> Result JavaDerivs ()
Parser pWhitespace = do many (spaceChar
</> traditionalComment
</> endOfLineComment)
return ()
-- Keywords and identifiers
keywords = [
"abstract",
"boolean", "break", "byte",
"case", "catch", "char", "class", "const", "continue",
"default", "do", "double",
"else", "extends",
"final", "finally", "float", "for",
"goto",
"if", "implements", "import", "instanceof", "int", "interface",
"long",
"native", "new",
"package", "private", "protected", "public",
"return",
"short", "static", "strictfp", "super", "switch", "synchronized",
"this", "throw", "throws", "transient", "try",
"void", "volatile",
"while"
]
symbols = [
">>>=",
">>=",
"<<=",
">>>",
"+=",
"-=",
"*=",
"/=",
"%=",
"&=",
"^=",
"|=",
">>",
"<<",
"++",
"--",
"&&",
"||",
"<=",
">=",
"==",
"!=",
";",
"{",
"}",
",",
":",
"=",
"(",
")",
"[",
"]",
".",
"&",
"!",
"~",
"-",
"+",
"*",
"/",
"%",
"<",
">",
"^",
"|",
"?"
]
isIdentStart c = isAlpha c || c == '_'
isIdentCont c = isIdentStart c || isDigit c
pWord :: JavaDerivs -> Result JavaDerivs Token
Parser pWord = do c <- satisfy anyChar isIdentStart
cs <- many (satisfy anyChar isIdentCont)
whitespace
let w = c : cs
case w of -- keyword-literals
"null" -> return (TokNull)
"true" -> return (TokBool True)
"false" -> return (TokBool False)
_ -> if w `elem` keywords
then return (TokKeyword w)
else return (TokIdent w)
pSym :: JavaDerivs -> Result JavaDerivs Token
Parser pSym = do s <- stringFrom symbols
whitespace
return (TokSymbol s)
-- Integer literals
pHexLit :: JavaDerivs -> Result JavaDerivs Token
Parser pHexLit = do char '0'
oneOf "xX"
digits <- many1 hexDigit
l <- (do oneOf "lL"; return True) </> return False
whitespace
let f v c = v * 16 + toInteger (digitToInt c)
val = foldl f 0 digits
return (TokInt val l)
scanOct :: String -> Integer
scanOct digits = foldl f 0 digits
where f v c = v * 8 + toInteger (digitToInt c)
pOctLit :: JavaDerivs -> Result JavaDerivs Token
Parser pOctLit = do char '0'
digits <- many1 digit
l <- (do oneOf "lL"; return True) </> return False
whitespace
let bad c = digitToInt c >= 8
if any bad digits -- so is this cute or what? :)
then fail "non-octal digits in octal constant"
else return (TokInt (scanOct digits) l)
scanDec :: String -> Integer
scanDec digits = foldl f 0 digits
where f v c = v * 10 + toInteger (digitToInt c)
pDecLit :: JavaDerivs -> Result JavaDerivs Token
Parser pDecLit = do digits <- many1 digit
l <- (do oneOf "lL"; return True) </> return False
whitespace
return (TokInt (scanDec digits) l)
-- Floating-point literals
pFloatSize :: JavaDerivs -> Result JavaDerivs Bool
Parser pFloatSize = (do { s <- oneOf "dD"; return True })
</> (do { s <- oneOf "fF"; return False })
pFloatExp :: JavaDerivs -> Result JavaDerivs Integer
Parser pFloatExp =
do oneOf "eE"
neg <- plusminus
digits <- many1 digit
let f v c = v * 10 + toInteger (digitToInt c)
val = foldl f 0 digits
return (if neg then -val else val)
where plusminus = (do { char '+'; return False })
</> (do { char '-'; return True })
</> return False
pFloatLit :: JavaDerivs -> Result JavaDerivs Token
Parser pFloatLit = (do i <- many1 digit
char '.'
f <- many digit
e <- floatExp </> return 0
s <- floatSize </> return False
whitespace
return (mkfloat (scanDec i) f e s))
</> (do char '.'
f <- many1 digit
e <- floatExp </> return 0
s <- floatSize </> return False
whitespace
return (mkfloat 0 f e s))
</> (do i <- many1 digit
e <- floatExp
s <- floatSize </> return False
whitespace
return (mkfloat (scanDec i) [] e s))
</> (do i <- many1 digit
e <- floatExp </> return 0
s <- floatSize
whitespace
return (mkfloat (scanDec i) [] e s))
where mkfloat :: Integer -> String -> Integer -> Bool -> Token
mkfloat i f e s = TokFloat (scanfrac i f * 10.0**(fromInteger e))
s
scanfrac :: Integer -> String -> Double
scanfrac i [] = fromInteger i
scanfrac i (c:cs) =
scanfrac (i * 10 + toInteger (digitToInt c)) cs / 10.0
-- Character and string literals
quotedChar quote =
(do char '\\'
c <- anyChar
case c of
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
'v' -> return '\v'
'f' -> return '\f'
'\\' -> return '\\'
'\'' -> return '\''
'\"' -> return '\"'
-- XXX octal characters, other escapes
_ -> fail "invalid escape sequence")
</> satisfy anyChar (\c -> c /= quote)
pCharLit :: JavaDerivs -> Result JavaDerivs Token
Parser pCharLit = (do char '\''
c <- quotedChar '\''
char '\''
whitespace
return (TokChar c))
pStringLit :: JavaDerivs -> Result JavaDerivs Token
Parser pStringLit = (do char '"'
s <- many (quotedChar '"')
char '"'
whitespace
return (TokString s))
-- Token tie-up
pToken :: JavaDerivs -> Result JavaDerivs Token
Parser pToken = word </> sym </> charLit </> stringLit </> floatLit </>
hexLit </> octLit </> decLit
keyword :: String -> Parser JavaDerivs String
keyword w = (do t <- token
case t of (TokKeyword w') ->
if w' == w then return w
else fail ""
_ -> fail "")
<?!> show w
symbol :: String -> Parser JavaDerivs String
symbol s = (do t <- token
case t of (TokSymbol s') ->
if s' == s then return s
else fail ""
_ -> fail "")
<?!> show s
identifier :: Parser JavaDerivs Identifier
identifier = (do t <- token
case t of
TokIdent s -> return s
_ -> fail "")
<?!> "identifier"
-------------------- Expressions --------------------
arguments = do symbol "("; eargs <- sepBy expression (symbol ","); symbol ")"
return eargs
pParExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pParExpr =
do symbol "("; e <- expression; symbol ")"; return e
pPrimExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPrimExpr =
(do t <- token
case t of
TokIdent s -> return (ExpIdent s)
TokInt i l -> return (ExpLiteral t)
TokFloat f l -> return (ExpLiteral t)
TokChar c -> return (ExpLiteral t)
TokString c -> return (ExpLiteral t)
TokBool b -> return (ExpLiteral t)
TokNull -> return (ExpLiteral t)
_ -> fail ("found " ++ show t))
</> parExpr
</> (do keyword "this"; return ExpThis)
</> (do keyword "super"; return ExpSuper)
</> (do keyword "class"; return ExpClass)
</> (do keyword "new" -- class creator
ty <- declType
args <- arguments
body <- optional classBody
return (ExpNewClass ty args body))
</> (do keyword "new" -- array creator
ty <- declType
dims <- many (do symbol "[" -- XXX many1
e <- optional expression
symbol "]"
return e)
init <- optional arrayInit
return (ExpNewArray ty dims init))
<?> "primary expression"
suffix :: Parser JavaDerivs (Expression -> Expression)
suffix =
(do symbol "["; eidx <- optional expression; symbol "]"
return (\ebase -> ExpArray ebase eidx))
</> (do eargs <- arguments
return (\efunc -> ExpCall efunc eargs))
</> (do symbol "."; eitem <- primExpr
return (\econtext -> ExpSelect econtext eitem))
</> (do op <- (symbol "++" </> symbol "--")
return (\e -> ExpPostfix op e))
pPostfixExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPostfixExpr =
(do exp <- primExpr
suffixes <- many suffix
return (foldl (\e s -> s e) exp suffixes))
<?> "postfix expression"
pPrefixExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pPrefixExpr =
(do op <- (symbol "++" </> symbol "--" </>
symbol "+" </> symbol "-" </>
symbol "~" </> symbol "!")
exp <- prefixExpr
return (ExpPostfix op exp))
</> (do symbol "("; t <- declType; symbol ")"; e <- prefixExpr
return (ExpCast t e))
</> postfixExpr
<?> "prefix expression"
pMultExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pMultExpr =
chainl1 prefixExpr
(do op <- (symbol "*" </> symbol "/" </> symbol "%")
return (\l r -> ExpBinary op l r))
pAddExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAddExpr =
chainl1 multExpr
(do op <- (symbol "+" </> symbol "-")
return (\l r -> ExpBinary op l r))
pShiftExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pShiftExpr =
chainl1 addExpr
(do op <- (symbol "<<" </> symbol ">>" </> symbol ">>>")
return (\l r -> ExpBinary op l r))
pRelExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pRelExpr =
do l <- shiftExpr; suffix l
where suffix l =
(do op <- (symbol "<=" </> symbol ">=" </>
symbol "<" </> symbol ">")
r <- shiftExpr
suffix (ExpBinary op l r))
</> (do keyword "instanceof"
t <- declType
suffix (ExpInstanceof l t))
</> return l
pEqExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pEqExpr =
chainl1 relExpr
(do op <- (symbol "==" </> symbol "!=")
return (\l r -> ExpBinary op l r))
pAndExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAndExpr =
chainl1 eqExpr
(do op <- symbol "&"
return (\l r -> ExpBinary op l r))
pXorExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pXorExpr =
chainl1 andExpr
(do op <- symbol "^"
return (\l r -> ExpBinary op l r))
pOrExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pOrExpr =
chainl1 xorExpr
(do op <- symbol "|"
return (\l r -> ExpBinary op l r))
pCondAndExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondAndExpr =
chainl1 orExpr
(do op <- symbol "&&"
return (\l r -> ExpBinary op l r))
pCondOrExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondOrExpr =
chainl1 condAndExpr
(do op <- symbol "||"
return (\l r -> ExpBinary op l r))
pCondExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pCondExpr =
(do c <- condOrExpr; symbol "?"
t <- expression; symbol ":"
f <- condExpr
return (ExpCond c t f))
</> condOrExpr
<?> "conditional expression"
pAssignExpr :: JavaDerivs -> Result JavaDerivs Expression
Parser pAssignExpr =
(do l <- prefixExpr
op <- (symbol "=" </>
symbol "*=" </> symbol "/=" </> symbol "%=" </>
symbol "+=" </> symbol "-=" </>
symbol "<<=" </> symbol ">>=" </> symbol ">>>=" </>
symbol "&=" </> symbol "^=" </> symbol "|=")
r <- assignExpr
return (ExpBinary op l r))
</> condExpr
<?> "assignment expression"
pExpression :: JavaDerivs -> Result JavaDerivs Expression
pExpression = pAssignExpr
-------------------- Statements --------------------
pCatchClause :: JavaDerivs -> Result JavaDerivs CatchClause
Parser pCatchClause =
(do keyword "catch"
symbol "("
p <- formalParam
symbol ")"
b <- block
return (p,b))
pSwitchGroup :: JavaDerivs -> Result JavaDerivs SwitchGroup
Parser pSwitchGroup =
(do keyword "case"; e <- expression; symbol ":"
s <- many blockStmt
return (SwCase e s))
</> (do keyword "default"; symbol ":"
s <- many blockStmt
return (SwDefault s))
<?> "switch statement group"
pForInit :: JavaDerivs -> Result JavaDerivs ForInit
Parser pForInit =
(do f <- (do keyword "final"; return True) </> (return False)
t <- declType
d <- sepBy1 declarator (symbol ",")
return (FiDecl f t d))
</> (do es <- sepBy expression (symbol ",")
return (FiExpr es))
</> return FiNone
pStatement :: JavaDerivs -> Result JavaDerivs Statement
Parser pStatement =
(do b <- block
return (StBlock b))
</> (do keyword "if"; e <- parExpr
t <- statement; keyword "else"; f <- statement
return (StIf e t (Just f)))
</> (do keyword "if"; e <- parExpr
t <- statement
return (StIf e t Nothing))
</> (do keyword "for"; symbol "("
i <- forInit; symbol ";"
c <- optional expression; symbol ";"
n <- sepBy expression (symbol ","); symbol ")"
b <- statement
return (StFor i c n b))
</> (do keyword "while"; e <- parExpr
b <- statement
return (StWhile e b))
</> (do keyword "do"; b <- statement; keyword "while"
e <- parExpr; symbol ";"
return (StDo b e))
</> (do keyword "try"
b <- block
c <- many catchClause
f <- optional (do keyword "finally"; block)
return (StTry b c f))
</> (do keyword "switch"; e <- parExpr
symbol "{"; b <- many switchGroup; symbol "}"
return (StSwitch e b))
</> (do keyword "synchronized"; e <- parExpr; b <- block
return (StSynch e b))
</> (do keyword "return"; e <- optional expression; symbol ";"
return (StReturn e))
</> (do keyword "throw"; e <- expression; symbol ";"
return (StThrow e))
</> (do keyword "break"; i <- optional identifier; symbol ";"
return (StBreak i))
</> (do keyword "continue"; i <- optional identifier; symbol ";"
return (StContinue i))
</> (do i <- identifier; symbol ":"; s <- statement
return (StLabel i s))
</> (do e <- expression; symbol ";"
return (StExpr e))
</> (do symbol ";"
return (StNull))
<?> "statement"
pBlockStmt :: JavaDerivs -> Result JavaDerivs Statement
Parser pBlockStmt =
(do d <- declaration
return (StDecl d))
</> statement
<?> "block statement"
pBlock :: JavaDerivs -> Result JavaDerivs [Statement]
Parser pBlock =
(do symbol "{"; s <- many blockStmt; symbol "}"; return s)
-------------------- Declarations --------------------
qualName = sepBy1 identifier (symbol ".")
pModifier :: JavaDerivs -> Result JavaDerivs Modifier
Parser pModifier =
keyword "public" </> keyword "protected" </> keyword "private" </>
keyword "static" </> keyword "abstract" </> keyword "final" </>
keyword "native" </> keyword "synchronized" </> keyword "transient" </>
keyword "volatile" </> keyword "strictfp"
pDeclType :: JavaDerivs -> Result JavaDerivs DeclType
Parser pDeclType =
(do s <- keyword "byte" </> keyword "short" </> keyword "char" </>
keyword "int" </> keyword "long" </>
keyword "float" </> keyword "double" </> keyword "boolean"
b <- many (do symbol "["; symbol "]")
return (DtBasic s (length b)))
</> (do i <- qualName
b <- many (do symbol "["; symbol "]")
return (DtIdent i (length b)))
<?> "type"
pFormalParam :: JavaDerivs -> Result JavaDerivs FormalParam
Parser pFormalParam =
do f <- (do keyword "final"; return True) </> (return False)
t <- declType
i <- identifier
a <- many (do symbol "["; symbol "]")
return (f, t, i, length a)
pFormalParams :: JavaDerivs -> Result JavaDerivs [FormalParam]
Parser pFormalParams =
do symbol "("
fs <- sepBy formalParam (symbol ",")
symbol ")"
return fs
pDeclarator :: JavaDerivs -> Result JavaDerivs Declarator
Parser pDeclarator =
(do ident <- identifier
bkts <- many (do symbol "["; symbol "]")
init <- optional (do symbol "="; initializer)
return (ident, length bkts, init))
<?> "declarator"
classBody = do symbol "{"; ds <- many declaration; symbol "}"; return ds
pDeclaration :: JavaDerivs -> Result JavaDerivs Declaration
Parser pDeclaration =
(do m <- many modifier -- variable declaration
t <- declType
d <- sepBy declarator (symbol ",")
symbol ";"
return (DeclSimple m t d))
</> (do m <- many modifier -- method declaration
t <- (do t <- declType; return (Just t)) </>
(do keyword "void"; return Nothing)
i <- identifier
p <- formalParams
a <- many (do symbol "["; symbol "]")
th <- throws
b <- (do b <- block; return (Just b))
</> (do symbol ";"; return Nothing)
return (DeclMethod m t i p (length a) th b))
</> (do m <- many modifier -- constructor declaration
i <- identifier
p <- formalParams
th <- throws
b <- block
return (DeclConstructor m i p th b))
</> (do m <- many modifier -- class declaration
keyword "class"
i <- identifier
ext <- optional (do keyword "extends"; declType)
imp <- (do keyword "implements"; sepBy1 declType (symbol ","))
</> return []
ds <- classBody
return (DeclClass m i ext imp ds))
</> (do m <- many modifier -- interface declaration
keyword "interface"
i <- identifier
ext <- (do keyword "extends"; sepBy1 declType (symbol ","))
</> return []
ds <- classBody
return (DeclInterface m i ext ds))
</> (do st <- (do keyword "static"; return True) </> (return False)
b <- block
return (DeclBlock st b))
<?> "declaration"
where throws =
(do keyword "throws"; sepBy1 qualName (symbol ","))
</> return []
pInitializer :: JavaDerivs -> Result JavaDerivs Initializer
Parser pInitializer =
(do inits <- arrayInit
return (IniList inits))
</> (do e <- expression
return (IniExpr e))
<?> "variable initializer"
pArrayInit :: JavaDerivs -> Result JavaDerivs [Initializer]
Parser pArrayInit =
(do symbol "{"
inits <- sepEndBy initializer (symbol ",")
symbol "}"
return inits)
<?> "array initializer"
pImportDecl :: JavaDerivs -> Result JavaDerivs ImportDecl
Parser pImportDecl =
(do keyword "import"
name <- qualName
all <- (do symbol "."; symbol "*"; return True)
</> return False
symbol ";"
return (name, all))
<?> "import declaration"
pCompUnit :: JavaDerivs -> Result JavaDerivs CompUnit
Parser pCompUnit =
do whitespace
p <- (do keyword "package"; n <- qualName; symbol ";"
return (Just n)) </> return Nothing
i <- many importDecl
t <- many declaration
notFollowedBy anyChar
return (p, i, t)
-------------------- Recursive Tie-Up --------------------
pTokDerivs dvs = TokDerivs whitespace word sym
hexlit octlit declit
floatsize floatexp floatlit
charlit stringlit
token
where whitespace = pWhitespace dvs
word = pWord dvs
sym = pSym dvs
hexlit = pHexLit dvs
octlit = pOctLit dvs
declit = pDecLit dvs
floatsize = pFloatSize dvs
floatexp = pFloatExp dvs
floatlit = pFloatLit dvs
charlit = pCharLit dvs
stringlit = pStringLit dvs
token = pToken dvs
pExprDerivs dvs = ExprDerivs parexpr primexpr postfixexpr prefixexpr
multexpr addexpr shiftexpr relexpr eqexpr
andexpr xorexpr orexpr logandexpr logorexpr
condexpr assignexpr expr
where parexpr = pParExpr dvs
primexpr = pPrimExpr dvs
postfixexpr = pPostfixExpr dvs
prefixexpr = pPrefixExpr dvs
multexpr = pMultExpr dvs
addexpr = pAddExpr dvs
shiftexpr = pShiftExpr dvs
relexpr = pRelExpr dvs
eqexpr = pEqExpr dvs
andexpr = pAndExpr dvs
xorexpr = pXorExpr dvs
orexpr = pOrExpr dvs
logandexpr = pCondAndExpr dvs
logorexpr = pCondOrExpr dvs
condexpr = pCondExpr dvs
assignexpr = pAssignExpr dvs
expr = pExpression dvs
pStmtDerivs dvs = StmtDerivs catch switch forinit statement blockstmt block
where catch = pCatchClause dvs
switch = pSwitchGroup dvs
forinit = pForInit dvs
statement = pStatement dvs
blockstmt = pBlockStmt dvs
block = pBlock dvs
pDeclDerivs dvs = DeclDerivs modifier decltype formal formals
declarator declaration initializer arrayinit
importdecl compunit
where modifier = pModifier dvs
decltype = pDeclType dvs
formal = pFormalParam dvs
formals = pFormalParams dvs
declarator = pDeclarator dvs
declaration = pDeclaration dvs
initializer = pInitializer dvs
arrayinit = pArrayInit dvs
importdecl = pImportDecl dvs
compunit = pCompUnit dvs
javaDerivs pos text = dvs
where dvs = JavaDerivs pos text chr tok expr stmt decl
chr = case text of
[] -> NoParse (eofError dvs)
(c:cs) -> Parsed c (javaDerivs (nextPos pos c) cs)
(nullError dvs)
tok = pTokDerivs dvs
expr = pExprDerivs dvs
stmt = pStmtDerivs dvs
decl = pDeclDerivs dvs
javaParse :: String -> String -> JavaDerivs
javaParse name text = javaDerivs (Pos name 1 1) text
javaParseFile :: FilePath -> IO CompUnit
javaParseFile name =
do text <- readFile name
let text' = javaPrep text
derivs = javaParse name text'
case ddCompUnit (cdDecl derivs) of
Parsed cu _ _ -> return cu
NoParse e -> fail (show e)
Java.pappy
Скачать
-- Pappy packrat parser specification for the Java language version 1.1
parser Java:
{
import Char
import System
import Numeric
-- Abstract syntax tree data types
type Identifier = String
type Name = [Identifier]
data Literal = LitInt Integer
| LitLong Integer
| LitFloat Float
| LitDouble Double
| LitChar Char
| LitString String
| LitBool Bool
| LitNull
data Expression = ExpLiteral Literal
| ExpIdent Identifier
| ExpPrefix String Expression
| ExpPostfix String Expression
| ExpBinary String Expression Expression
| ExpSelect Expression Expression
| ExpInstanceof Expression DeclType
| ExpNewClass [Identifier] [Expression] (Maybe [Declaration])
| ExpNewArray DeclType [Expression] Int
| ExpNewArrayInit DeclType Int [Initializer]
| ExpCall Expression [Expression]
| ExpArray Expression (Maybe Expression)
| ExpCast DeclType Expression
| ExpCond Expression Expression Expression
| ExpThis
| ExpSuper
| ExpDotClass Expression
| ExpVoidClass
data Modifier = ModPublic
| ModProtected
| ModPrivate
| ModStatic
| ModAbstract
| ModFinal
| ModNative
| ModSynchronized
| ModTransient
| ModVolatile
| ModStrictfp
data DeclType = DtByte
| DtShort
| DtChar
| DtInt
| DtLong
| DtFloat
| DtDouble
| DtBoolean
| DtName [Identifier]
| DtArray DeclType Int
type Declarator = (Identifier, Int, Maybe Initializer)
data Declaration =
DeclSimple [Modifier] DeclType [Declarator]
| DeclMethod [Modifier] (Maybe DeclType) Identifier
[FormalParam] Int [Name]
(Maybe [Statement])
| DeclConstructor [Modifier] Identifier
[FormalParam] [Name] [Statement]
| DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType]
[Declaration]
| DeclInterface [Modifier] Identifier [DeclType]
[Declaration]
| DeclBlock Bool [Statement]
-- formal parameter: "final" flag, type, parameter name, array dimension
type FormalParam = (Bool, DeclType, Identifier, Int)
data Initializer =
IniExpr Expression
| IniList [Initializer]
data SwitchGroup =
SwCase Expression [Statement]
| SwDefault [Statement]
data ForInit = FiExpr [Expression]
| FiDecl Bool DeclType [Declarator]
data Statement = StLabel Identifier Statement
| StCase Expression Statement
| StDefault Statement
| StDecl Declaration
| StExpr Expression
| StBlock [Statement]
| StIf Expression Statement (Maybe Statement)
| StSwitch Expression [SwitchGroup]
| StWhile Expression Statement
| StDo Statement Expression
| StFor ForInit (Maybe Expression)
[Expression] Statement
| StTry [Statement] [CatchClause] (Maybe [Statement])
| StSynch Expression [Statement]
| StContinue (Maybe Identifier)
| StBreak (Maybe Identifier)
| StReturn (Maybe Expression)
| StThrow Expression
| StNull
-- catch clause:
type CatchClause = (FormalParam, [Statement])
-- import declaration: qualified name, ".*" flag
type ImportDecl = (Name, Bool)
-- compilation unit: package name, import declarations, type declarations
type CompUnit = (Maybe Name, [ImportDecl], [Declaration])
-- List of Java's reserved words, used in Keyword below
keywords = [
"abstract",
"boolean", "break", "byte",
"case", "catch", "char", "class", "const", "continue",
"default", "do", "double",
"else", "extends",
"final", "finally", "float", "for",
"goto",
"if", "implements", "import", "instanceof", "int", "interface",
"long",
"native", "new",
"package", "private", "protected", "public",
"return",
"short", "static", "strictfp", "super", "switch", "synchronized",
"this", "throw", "throws", "transient", "try",
"void", "volatile",
"while"
]
}
top CompilationUnit
-------------------- Lexical Structure --------------------
-- Whitespace and comments
LineTerminator :: Char =
'\n' -- \r's have been canonicalized by javaPrep
InputCharacter :: Char =
!LineTerminator c:Char -> c
Spacing :: {()} =
Space* -> {()}
Space :: {()} =
WhiteSpace -> {()}
/ Comment -> {()}
WhiteSpace :: Char =
' ' / '\t' / '\f' / LineTerminator
Comment :: {()} =
TraditionalComment
/ EndOfLineComment
TraditionalComment :: {()} =
"/*" (!"*/" c:Char -> c)* "*/" -> {()}
EndOfLineComment :: {()} =
"//" (!LineTerminator c:Char -> c)* '\n' -> {()}
-- Keywords and identifiers
Identifier :: Identifier =
!Keyword !BooleanLiteral !NullLiteral s:Word -> s
Keyword :: String =
s:Word &{s `elem` keywords} -> s
Word :: String =
c:JavaLetter cs:JavaLetterOrDigit* Spacing -> {c : cs}
JavaLetter :: Char =
c:Char &{isAlpha c} -> c
/ '_'
/ '$'
JavaLetterOrDigit :: Char =
c:Char &{isAlphaNum c} -> c
/ '_'
/ '$'
-- Symbols (operators and punctuation)
Sym :: String =
s:SymChars Spacing -> s
SymChars :: String =
">>>="
/ ">>=" / "<<=" / ">>>"
/ ">>" / "<<"
/ "+=" / "-=" / "*=" / "/=" / "%=" / "&=" / "^=" / "|="
/ "++" / "--" / "&&" / "||" / "<=" / ">=" / "==" / "!="
/ ";" / ":" / "," / "." / "{" / "}" / "(" / ")"
/ "[" / "]" / "!" / "~" / "+" / "-" / "*" / "/"
/ "%" / "<" / ">" / "=" / "&" / "^" / "|" / "?"
-- Literals
Literal :: Literal =
FloatingPointLiteral
/ IntegerLiteral
/ BooleanLiteral
/ CharacterLiteral
/ StringLiteral
/ NullLiteral
-- Integer literals
IntegerLiteral :: Literal =
v:HexNumeral t:IntegerTypeSuffixOpt Spacing -> {t v}
/ v:OctalNumeral t:IntegerTypeSuffixOpt Spacing -> {t v}
/ v:DecimalNumeral t:IntegerTypeSuffixOpt Spacing -> {t v}
IntegerTypeSuffixOpt :: {Integer -> Literal} =
('l' / 'L') -> {LitLong}
/ -> {LitInt}
HexNumeral :: Integer =
("0x" / "0X") v:HexDigits -> {v}
HexDigits :: Integer =
v:HexDigits d:HexDigit -> {v * 16 + toInteger d}
/ d:HexDigit -> {toInteger d}
HexDigit :: Int =
c:Char &{isHexDigit c} -> {digitToInt c}
OctalNumeral :: Integer =
'0' v:OctalDigits -> {v}
OctalDigits :: Integer =
v:OctalDigits d:OctalDigit -> {v * 8 + toInteger d}
/ d:OctalDigit -> {toInteger d}
OctalDigit :: Int =
c:Char &{isOctDigit c} -> {digitToInt c}
DecimalNumeral :: Integer =
v:Digits -> {v}
Digits :: Integer =
v:Digits d:Digit -> {v * 10 + toInteger d}
/ d:Digit -> {toInteger d}
Digit :: Int =
c:Char &{isDigit c} -> {digitToInt c}
-- Floating point
FloatingPointLiteral :: Literal =
m:Digits '.' f:FractionPartOpt e:ExponentPartOpt t:FloatTypeSuffixOpt
Spacing
-> {t (fromRational ((fromInteger m + f) * 10.0 ^^ e))}
/ '.' f:FractionPart e:ExponentPartOpt t:FloatTypeSuffixOpt Spacing
-> {t (fromRational (f * 10.0 ^^ e))}
/ m:Digits e:ExponentPart t:FloatTypeSuffixOpt Spacing
-> {t (fromInteger m * 10.0 ^^ e)}
/ m:Digits e:ExponentPartOpt t:FloatTypeSuffix Spacing
-> {t (fromInteger m * 10.0 ^^ e)}
FractionPartOpt :: Rational =
f:FractionPart -> {f}
/ -> {0.0}
FractionPart :: Rational =
d:Digit -> { toEnum d / 10.0 }
/ d:Digit f:FractionPart -> { (f + toEnum d) / 10.0 }
ExponentPartOpt :: Integer =
e:ExponentPart -> { e }
/ -> { 0 }
ExponentPart :: Integer =
('e' / 'E') '-' e:Digits -> { -e }
/ ('e' / 'E') '+'? e:Digits -> { e }
FloatTypeSuffixOpt :: {Rational -> Literal} =
t:FloatTypeSuffix -> t
/ -> {\r -> LitDouble (fromRational r)}
FloatTypeSuffix :: {Rational -> Literal} =
('f' / 'F') -> {\r -> LitFloat (fromRational r)}
/ ('d' / 'D') -> {\r -> LitDouble (fromRational r)}
-- Boolean literals
BooleanLiteral :: Literal =
"true":Word -> { LitBool True }
/ "false":Word -> { LitBool False }
-- Character and string literals
CharacterLiteral :: Literal =
"'" c:SingleCharacter "'" Spacing -> { LitChar c }
/ "'" c:EscapeSequence "'" Spacing -> { LitChar c }
SingleCharacter :: Char =
!"'" !"\\" c:InputCharacter -> c
StringLiteral :: Literal =
'"' s:StringCharacter* '"' Spacing -> { LitString s }
StringCharacter :: Char =
!'"' !'\\' c:InputCharacter -> c
/ c:EscapeSequence -> c
EscapeSequence :: Char =
'\\' 'b' -> {'\b'}
/ '\\' 't' -> {'\t'}
/ '\\' 'n' -> {'\n'}
/ '\\' 'f' -> {'\f'}
/ '\\' 'r' -> {'\r'}
/ '\\' '"' -> {'\"'}
/ '\\' "'" -> {'\''}
/ '\\' '\\' -> {'\\'}
/ c:OctalEscape -> c
OctalEscape :: Char =
'\\' x:ZeroToThree y:OctalDigit z:OctalDigit
-> {chr (x*8*8 + y*8 + z)}
/ '\\' y:OctalDigit z:OctalDigit -> {chr (y*8 + z)}
/ '\\' z:OctalDigit -> {chr (z)}
ZeroToThree :: Int =
'0' -> {0}
/ '1' -> {1}
/ '2' -> {2}
/ '3' -> {3}
-- Null literals
NullLiteral :: Literal =
"null":Word -> { LitNull }
-------------------- Types --------------------
TypeSpec :: DeclType =
t:TypeName d:Dims -> {DtArray t d}
/ TypeName
TypeName :: DeclType =
PrimitiveType
/ n:QualifiedName -> {DtName n}
PrimitiveType :: DeclType =
"byte":Word -> {DtByte}
/ "short":Word -> {DtShort}
/ "char":Word -> {DtChar}
/ "int":Word -> {DtInt}
/ "long":Word -> {DtLong}
/ "float":Word -> {DtFloat}
/ "double":Word -> {DtDouble}
/ "boolean":Word -> {DtBoolean}
QualifiedName :: {[Identifier]} =
i:Identifier is:(".":Sym i:Identifier -> i)* -> {i : is}
DimsOpt :: Int =
d:Dims -> d
/ -> {0}
Dims :: Int =
"[":Sym "]":Sym d:Dims -> {d+1}
/ "[":Sym "]":Sym -> {1}
-------------------- Expressions --------------------
Expression :: Expression =
l:CondExpr op:AssignmentOperator r:Expression
-> {ExpBinary op l r}
/ CondExpr
AssignmentOperator :: String =
"=":Sym -> {"="}
/ "+=":Sym -> {"+="}
/ "-=":Sym -> {"-="}
/ "*=":Sym -> {"*="}
/ "/=":Sym -> {"/="}
/ "%=":Sym -> {"%="}
/ "&=":Sym -> {"&="}
/ "|=":Sym -> {"|="}
/ "^=":Sym -> {"^="}
/ "<<=":Sym -> {"<<="}
/ ">>=":Sym -> {">>="}
/ ">>>=":Sym -> {">>>="}
CondExpr :: Expression =
c:CondOrExpr "?":Sym t:Expression ":":Sym f:CondExpr
-> {ExpCond c t f}
/ CondOrExpr
CondOrExpr :: Expression =
l:CondOrExpr "||":Sym r:CondAndExpr -> {ExpBinary "||" l r}
/ CondAndExpr
CondAndExpr :: Expression =
l:CondAndExpr "&&":Sym r:OrExpr -> {ExpBinary "&&" l r}
/ OrExpr
OrExpr :: Expression =
l:OrExpr "|":Sym r:XorExpr -> {ExpBinary "|" l r}
/ XorExpr
XorExpr :: Expression =
l:XorExpr "^":Sym r:AndExpr -> {ExpBinary "^" l r}
/ AndExpr
AndExpr :: Expression =
l:AndExpr "&":Sym r:EqExpr -> {ExpBinary "&" l r}
/ EqExpr
EqExpr :: Expression =
l:EqExpr "==":Sym r:RelExpr -> {ExpBinary "==" l r}
/ l:EqExpr "!=":Sym r:RelExpr -> {ExpBinary "!=" l r}
/ RelExpr
RelExpr :: Expression =
l:RelExpr "<=":Sym r:ShiftExpr -> {ExpBinary "<=" l r}
/ l:RelExpr ">=":Sym r:ShiftExpr -> {ExpBinary ">=" l r}
/ l:RelExpr "<":Sym r:ShiftExpr -> {ExpBinary "<" l r}
/ l:RelExpr ">":Sym r:ShiftExpr -> {ExpBinary ">" l r}
/ l:RelExpr "instanceof":Word t:TypeSpec -> {ExpInstanceof l t}
/ ShiftExpr
ShiftExpr :: Expression =
l:ShiftExpr "<<":Sym r:AddExpr -> {ExpBinary "<<" l r}
/ l:ShiftExpr ">>":Sym r:AddExpr -> {ExpBinary ">>" l r}
/ l:ShiftExpr ">>>":Sym r:AddExpr -> {ExpBinary ">>>" l r}
/ AddExpr
AddExpr :: Expression =
l:AddExpr "+":Sym r:MultExpr -> {ExpBinary "+" l r}
/ l:AddExpr "-":Sym r:MultExpr -> {ExpBinary "-" l r}
/ MultExpr
MultExpr :: Expression =
l:MultExpr "*":Sym r:UnaryExpr -> {ExpBinary "*" l r}
/ l:MultExpr "/":Sym r:UnaryExpr -> {ExpBinary "/" l r}
/ l:MultExpr "%":Sym r:UnaryExpr -> {ExpBinary "%" l r}
/ UnaryExpr
UnaryExpr :: Expression =
"++":Sym e:UnaryExpr -> {ExpPrefix "++" e}
/ "--":Sym e:UnaryExpr -> {ExpPrefix "--" e}
/ "+":Sym e:UnaryExpr -> {ExpPrefix "+" e}
/ "-":Sym e:UnaryExpr -> {ExpPrefix "-" e}
/ UnaryExprNotPlusMinus
UnaryExprNotPlusMinus :: Expression =
"~":Sym e:UnaryExpr -> {ExpPrefix "~" e}
/ "!":Sym e:UnaryExpr -> {ExpPrefix "!" e}
/ "(":Sym t:TypeName d:Dims ")":Sym e:UnaryExpr
-> {ExpCast (DtArray t d) e}
/ "(":Sym t:PrimitiveType ")":Sym e:UnaryExpr
-> {ExpCast t e}
/ "(":Sym t:TypeName ")":Sym e:UnaryExprNotPlusMinus
-> {ExpCast t e}
/ PostfixExpr
PostfixExpr :: Expression =
l:PostfixExpr "[":Sym r:Expression? "]":Sym
-> {ExpArray l r}
/ l:PostfixExpr a:Arguments -> {ExpCall l a}
/ l:PostfixExpr ".":Sym r:PrimExpr -> {ExpSelect l r}
/ l:PostfixExpr ".":Sym "class":Word -> {ExpDotClass l}
/ l:PostfixExpr "++":Sym -> {ExpPostfix "++" l}
/ l:PostfixExpr "--":Sym -> {ExpPostfix "--" l}
/ PrimExpr
PrimExpr :: Expression =
l:Literal -> {ExpLiteral l}
/ i:Identifier -> {ExpIdent i}
/ "(":Sym e:Expression ")":Sym -> e
/ "this":Word -> {ExpThis}
/ "super":Word -> {ExpSuper}
/ "new":Word n:QualifiedName a:Arguments b:ClassBody?
-> {ExpNewClass n a b}
/ "new":Word t:TypeName de:DimExpr+ d:DimsOpt
-> {ExpNewArray t de d}
/ "new":Word t:TypeName d:Dims i:ArrayInitializer
-> {ExpNewArrayInit t d i}
/ "void":Word ".":Sym "class":Word -> {ExpVoidClass}
Arguments :: {[Expression]} =
"(":Sym e:Expression es:(",":Sym e:Expression -> e)* ")":Sym
-> {e : es}
/ "(":Sym ")":Sym -> {[]}
DimExpr :: Expression =
"[":Sym e:Expression "]":Sym -> e
ArrayInitializer :: {[Initializer]} =
"{":Sym is:(i:Initializer ",":Sym -> i)* "}":Sym
-> is
/ "{":Sym i:Initializer is:(",":Sym i:Initializer -> i)* "}":Sym
-> {i : is}
Initializer :: Initializer =
ai:ArrayInitializer -> {IniList ai}
/ e:Expression -> {IniExpr e}
-------------------- Statements --------------------
Block :: {[Statement]} =
"{":Sym ss:BlockStatement* "}":Sym -> ss
BlockStatement :: Statement =
d:Declaration -> {StDecl d}
/ Statement
Statement :: Statement =
b:Block -> {StBlock b}
/ "if":Word e:ParExpr t:Statement f:("else":Word s:Statement -> s)?
-> {StIf e t f}
/ "for":Word "(":Sym
i:ForInitOpt ";":Sym
c:Expression? ";":Sym
n:ExpressionsOpt ")":Sym
b:Statement -> {StFor i c n b}
/ "while":Word e:ParExpr b:Statement -> {StWhile e b}
/ "do":Word b:Statement "while":Word e:ParExpr ";":Sym -> {StDo b e}
/ "try":Word b:Block c:CatchClause* "finally":Word f:Block
-> {StTry b c (Just f)}
/ "try":Word b:Block c:CatchClause+ -> {StTry b c Nothing}
/ "switch":Word e:ParExpr "{":Sym b:SwitchGroup* "}":Sym
-> {StSwitch e b}
/ "synchronized":Word e:ParExpr b:Block -> {StSynch e b}
/ "return":Word e:Expression? ";":Sym -> {StReturn e}
/ "throw":Word e:Expression ";":Sym -> {StThrow e}
/ "break":Word i:Identifier? ";":Sym -> {StBreak i}
/ "continue":Word i:Identifier? ";":Sym -> {StContinue i}
/ l:Identifier ":":Sym s:Statement -> {StLabel l s}
/ e:Expression ";":Sym -> {StExpr e}
/ ";":Sym -> {StNull}
ParExpr :: Expression =
"(":Sym e:Expression ")":Sym -> e
ForInitOpt :: ForInit =
f:FinalOpt t:TypeSpec d:Declarators -> {FiDecl f t d}
/ e:ExpressionsOpt -> {FiExpr e}
FinalOpt :: Bool =
"final":Word -> {True}
/ -> {False}
ExpressionsOpt :: {[Expression]} =
e:Expression es:(",":Sym e:Expression -> e)* -> {e : es}
/ -> {[]}
CatchClause :: CatchClause =
"catch":Word "(":Sym p:FormalParam ")":Sym b:Block
-> {(p, b)}
SwitchGroup :: SwitchGroup =
"case":Word e:Expression ":":Sym ss:BlockStatement*
-> {SwCase e ss}
/ "default":Word ":":Sym ss:BlockStatement*
-> {SwDefault ss}
-------------------- Declarations --------------------
Modifier :: Modifier =
"public":Word -> {ModPublic}
/ "protected":Word -> {ModProtected}
/ "private":Word -> {ModPrivate}
/ "static":Word -> {ModStatic}
/ "abstract":Word -> {ModAbstract}
/ "final":Word -> {ModFinal}
/ "native":Word -> {ModNative}
/ "synchronized":Word -> {ModSynchronized}
/ "transient":Word -> {ModTransient}
/ "volatile":Word -> {ModVolatile}
/ "strictfp":Word -> {ModStrictfp}
FormalParam :: FormalParam =
f:FinalOpt t:TypeSpec i:Identifier d:DimsOpt
-> {(f, t, i, d)}
FormalParams :: {[FormalParam]} =
"(":Sym p:FormalParam ps:(",":Sym p:FormalParam -> p)* ")":Sym
-> {p : ps}
/ "(":Sym ")":Sym -> {[]}
Declarators :: {[Declarator]} =
d:Declarator ds:(",":Sym d:Declarator -> d)*
-> {d : ds}
Declarator :: Declarator =
id:Identifier dim:DimsOpt init:("=":Sym i:Initializer -> i)?
-> {(id, dim, init)}
ClassBody :: {[Declaration]} =
"{":Sym ds:Declaration* "}":Sym -> ds
Declaration :: Declaration =
-- Variable declaration
ms:Modifier* t:TypeSpec ds:Declarators ";":Sym
-> {DeclSimple ms t ds}
-- Method declaration
/ mods:Modifier*
typ:(t:TypeSpec -> {Just t} / "void":Word -> {Nothing})
id:Identifier ps:FormalParams dim:DimsOpt th:ThrowsOpt
body:(b:Block -> {Just b} / ";":Sym -> {Nothing})
-> {DeclMethod mods typ id ps dim th body}
-- Constructor declaration
/ mods:Modifier* id:Identifier ps:FormalParams th:ThrowsOpt body:Block
-> {DeclConstructor mods id ps th body}
-- Class declaration
/ mods:Modifier* "class":Word id:Identifier
ext:("extends":Word t:TypeSpec -> t)?
imp:("implements":Word t:TypeSpec ts:(",":Sym t:TypeSpec -> t)*
-> {t:ts} / -> {[]})
body:ClassBody -> {DeclClass mods id ext imp body}
-- Interface declaration
/ mods:Modifier* "interface":Word id:Identifier
ext:("extends":Word t:TypeSpec ts:(",":Sym t:TypeSpec -> t)* -> {t:ts}
/ -> {[]})
body:ClassBody -> {DeclInterface mods id ext body}
-- Initialization block
/ st:("static":Word -> {True} / -> {False})
b:Block
-> {DeclBlock st b}
ThrowsOpt :: {[Name]} =
"throws":Word n:QualifiedName ns:(",":Sym n:QualifiedName -> n)*
-> {n : ns}
/ -> {[]}
-------------------- Top Level --------------------
CompilationUnit :: CompUnit =
Spacing -- eat whitespace at beginning
pkg:("package":Word n:QualifiedName ";":Sym -> {Just n}
/ -> {Nothing})
imps:ImportDecl*
decls:Declaration*
!Char -- make sure we consume all input text
-> {(pkg, imps, decls)}
ImportDecl :: ImportDecl =
"import":Word name:QualifiedName
allflag:(".":Sym "*":Sym -> {True} / -> {False})
";":Sym
-> {(name, allflag)}
{
-- Preprocess unicode escapes and newlines before parsing
javaPrep :: String -> String
javaPrep [] = []
javaPrep ('\r':'\n':s) = '\n':javaPrep s
javaPrep ('\r':s) = '\n':javaPrep s
javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s
javaPrep ('\\':'u':s) = case s of
h1:h2:h3:h4:s ->
if isHexDigit h1 && isHexDigit h2 &&
isHexDigit h3 && isHexDigit h4
then chr v4:javaPrep s
else error "invalid Unicode escape sequence"
where v1 = digitToInt h1
v2 = v1*16 + digitToInt h2
v3 = v2*16 + digitToInt h3
v4 = v3*16 + digitToInt h4
_ -> error "incomplete Unicode escape sequence"
javaPrep (c:s) = c:javaPrep s
javaParseFile :: FilePath -> IO CompUnit
javaParseFile name =
do text <- readFile name
let text' = javaPrep text
derivs = javaParse name text'
case javaCompilationUnit derivs of
Parsed cu _ _ -> return cu
NoParse e -> fail (show e)
}
Parse.hs
Скачать
module Parse where
import Char
import List
import Pos
---------- Data types used for parsing
data ErrorDescriptor =
Expected String
| Message String
data ParseError = ParseError {
errorPos :: Pos,
errorDescrs :: [ErrorDescriptor]
}
data Result d v =
Parsed v d ParseError
| NoParse ParseError
newtype Parser d v = Parser (d -> Result d v)
class Derivs d where
dvPos :: d -> Pos
dvChar :: d -> Result d Char
---------- Basic parsing combinators
infixl 2 </> -- ordered choice
infixl 1 <?> -- error labeling
infixl 1 <?!> -- unconditional error labeling
-- Standard monadic combinators
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))
-- Ordered choice
(</>) :: Derivs d => Parser d v -> Parser d v -> Parser d v
(Parser p1) </> (Parser p2) = Parser parse
where parse dvs = first dvs (p1 dvs)
first dvs (result @ (Parsed val rem err)) = result
first dvs (NoParse err) = second err (p2 dvs)
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Semantic predicate: 'satisfy <parser> <pred>' acts like <parser>
-- but only succeeds if the result it generates satisfies <pred>.
satisfy :: Derivs d => Parser d v -> (v -> Bool) -> Parser d v
satisfy (Parser p) test = Parser parse
where parse dvs = check dvs (p dvs)
check dvs (result @ (Parsed val rem err)) =
if test val then result
else NoParse (nullError dvs)
check dvs none = none
-- Syntactic predicate: 'followedBy <parser>' acts like <parser>
-- but does not consume any input.
followedBy :: Derivs d => Parser d v -> Parser d v
followedBy (Parser p) = Parser parse
where parse dvs = case (p dvs) of
Parsed val rem err -> Parsed val dvs (nullError dvs)
err -> err
-- Negative syntactic predicate: 'followedBy <parser>' invokes <parser>,
-- then succeeds without consuming any input if <parser> fails,
-- and fails if <parser> succeeds.
notFollowedBy :: Derivs d => Parser d v -> Parser d ()
notFollowedBy (Parser p) = Parser parse
where parse dvs = case (p dvs) of
Parsed val rem err -> NoParse (nullError dvs)
NoParse err -> Parsed () dvs (nullError dvs)
-- Optional combinator: 'optional <parser>' invokes <parser>,
-- then produces the result 'Just <v>' if <parser> produced <v>,
-- or else produces the success result 'Nothing'
-- without consuming any input if <parser> failed.
optional :: Derivs d => Parser d v -> Parser d (Maybe v)
optional p = (do v <- p; return (Just v)) </> return Nothing
---------- Iterative combinators
-- Note: use of these combinators can break
-- a packrat parser's linear-time guarantee.
-- Zero or more repetition combinator:
-- 'many <parser>' invokes <parser> repeatedly until it fails,
-- collecting all success result values into a list.
-- Always succeeds, producing an empty list in the degenerate case.
many :: Derivs d => Parser d v -> Parser d [v]
many p = (do { v <- p; vs <- many p; return (v : vs) } )
</> return []
-- One or more repetition combinator:
-- 'many1 <parser>' invokes <parser> repeatedly until it fails,
-- collecting all success result values into a list.
-- Fails if <parser> does not succeed even once.
many1 :: Derivs d => Parser d v -> Parser d [v]
many1 p = do { v <- p; vs <- many p; return (v : vs) }
-- One or more repetitions with a separator:
-- 'sepBy1 <parser> <separator>' scans one or more iterations of <parser>,
-- with a match of <separator> between each instance.
-- Only the results of <parser> are collected into the final result list.
sepBy1 :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v]
sepBy1 p psep = do v <- p
vs <- many (do { psep; p })
return (v : vs)
-- Zero or more repetitions with a separator:
-- like sepBy1, but succeeds with an empty list if nothing can be parsed.
sepBy :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v]
sepBy p psep = sepBy1 p psep </> return []
-- Zero or more repetitions with a terminator
endBy :: Derivs d => Parser d v -> Parser d vend -> Parser d [v]
endBy p pend = many (do { v <- p; pend; return v })
-- One or more repetitions with a terminator
endBy1 :: Derivs d => Parser d v -> Parser d vend -> Parser d [v]
endBy1 p pend = many1 (do { v <- p; pend; return v })
-- One or more repetitions with a separator or terminator:
-- 'sepEndBy1 <parser> <septerm>' scans for a sequence of <parser> matches
-- in which instances are separated by <septerm>,
-- and if a <septerm> is found following the last <parser> match
-- then it is consumed as well.
sepEndBy1 :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v]
sepEndBy1 p psep = do v <- sepBy1 p psep; optional psep; return v
-- Zero or more repetitions with a separator or terminator.
sepEndBy :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v]
sepEndBy p psep = do v <- sepBy p psep; optional psep; return v
-- One or more repetitions separated by left-associative operators.
-- 'chainl1 <term> <oper>' matches instances of <term> separated by <oper>,
-- but uses the result of <oper> as a left-associative binary combinator:
-- e.g., 't1 op t2 op t3' is interpreted as '(t1 op t2) op t3'
chainl1 :: Derivs d => Parser d v -> Parser d (v->v->v) -> Parser d v
chainl1 p psep =
let psuffix z = (do f <- psep
v <- p
psuffix (f z v))
</> return z
in do v <- p
psuffix v
-- Zero or more repetitions separated by left-associative operators.
chainl :: Derivs d => Parser d v -> Parser d (v->v->v) -> v -> Parser d v
chainl p psep z = chainl1 p psep </> return z
-- One or more repetitions separated by left-associative operators:
-- e.g., 't1 op t2 op t3' is interpreted as 't1 op (t2 op t3)'
chainr1 :: Derivs d => Parser d v -> Parser d (v->v->v) -> Parser d v
chainr1 p psep = (do v <- p
f <- psep
w <- chainr1 p psep
return (f v w))
</> p
-- Zero or more repetitions separated by left-associative operators.
chainr :: Derivs d => Parser d v -> Parser d (v->v->v) -> v -> Parser d v
chainr p psep z = chainr1 p psep </> return z
-- N-ary ordered choice:
-- given a list of parsers producing results of the same type,
-- try them all in order and use the first successful result.
choice :: Derivs d => [Parser d v] -> Parser d v
choice [p] = p
choice (p:ps) = p </> choice ps
---------- Error handling
instance Eq ErrorDescriptor where
Expected e1 == Expected e2 = e1 == e2
Message m1 == Message m2 = m1 == m2
_ == _ = False
failAt :: Derivs d => Pos -> String -> Parser d v
failAt pos msg = Parser (\dvs -> NoParse (msgError pos msg))
-- Annotate a parser with a description of the construct to be parsed.
-- The resulting parser yields an "expected" error message
-- if the construct cannot be parsed
-- and if no error information is already available
-- indicating a position farther right in the source code
-- (which would normally be more localized/detailed information).
(<?>) :: Derivs d => Parser d v -> String -> Parser d v
(Parser p) <?> desc = Parser (\dvs -> munge dvs (p dvs))
where munge dvs (Parsed v rem err) =
Parsed v rem (fix dvs err)
munge dvs (NoParse err) =
NoParse (fix dvs err)
fix dvs (err @ (ParseError p ms)) =
if p > dvPos dvs then err
else expError (dvPos dvs) desc
-- Stronger version of the <?> error annotation operator above,
-- which unconditionally overrides any existing error information.
(<?!>) :: Derivs d => Parser d v -> String -> Parser d v
(Parser p) <?!> desc = Parser (\dvs -> munge dvs (p dvs))
where munge dvs (Parsed v rem err) =
Parsed v rem (fix dvs err)
munge dvs (NoParse err) =
NoParse (fix dvs err)
fix dvs (err @ (ParseError p ms)) =
expError (dvPos dvs) desc
-- Join two ParseErrors, giving preference to the one farthest right,
-- or merging their descriptor sets if they are at the same position.
joinErrors :: ParseError -> ParseError -> ParseError
joinErrors (e @ (ParseError p m)) (e' @ (ParseError p' m')) =
if p' > p || null m then e'
else if p > p' || null m' then e
else ParseError p (m `union` m')
nullError dvs = ParseError (dvPos dvs) []
expError pos desc = ParseError pos [Expected desc]
msgError pos msg = ParseError pos [Message msg]
eofError dvs = msgError (dvPos dvs) "end of input"
expected :: Derivs d => String -> Parser d v
expected desc = Parser (\dvs -> NoParse (expError (dvPos dvs) desc))
unexpected :: Derivs d => String -> Parser d v
unexpected str = fail ("unexpected " ++ str)
-- Comparison operators for ParseError just compare relative positions.
instance Eq ParseError where
ParseError p1 m1 == ParseError p2 m2 = p1 == p2
ParseError p1 m1 /= ParseError p2 m2 = p1 /= p2
instance Ord ParseError where
ParseError p1 m1 < ParseError p2 m2 = p1 < p2
ParseError p1 m1 > ParseError p2 m2 = p1 > p2
ParseError p1 m1 <= ParseError p2 m2 = p1 <= p2
ParseError p1 m1 >= ParseError p2 m2 = p1 >= p2
-- Special behavior: "max" joins two errors
max p1 p2 = joinErrors p1 p2
min p1 p2 = undefined
-- Show function for error messages
instance Show ParseError where
show (ParseError pos []) =
show pos ++ ": parse error"
show (ParseError pos msgs) = expectmsg expects ++ messages msgs
where
expects = getExpects msgs
getExpects [] = []
getExpects (Expected exp : rest) = exp : getExpects rest
getExpects (Message msg : rest) = getExpects rest
expectmsg [] = ""
expectmsg [exp] = show pos ++ ": expecting " ++ exp ++ "\n"
expectmsg [e1, e2] = show pos ++ ": expecting either "
++ e1 ++ " or " ++ e2 ++ "\n"
expectmsg (first : rest) = show pos ++ ": expecting one of: "
++ first ++ expectlist rest
++ "\n"
expectlist [last] = ", or " ++ last
expectlist (mid : rest) = ", " ++ mid ++ expectlist rest
messages [] = []
messages (Expected exp : rest) = messages rest
messages (Message msg : rest) =
show pos ++ ": " ++ msg ++ "\n" ++ messages rest
---------- Character-oriented parsers
-- 'anyChar' matches any single character.
anyChar :: Derivs d => Parser d Char
anyChar = Parser dvChar
-- 'char <c>' matches the specific character <c>.
char :: Derivs d => Char -> Parser d Char
char ch = satisfy anyChar (\c -> c == ch) <?> show ch
-- 'oneOf <s>' matches any character in string <s>.
oneOf :: Derivs d => [Char] -> Parser d Char
oneOf chs = satisfy anyChar (\c -> c `elem` chs)
<?> ("one of the characters " ++ show chs)
-- 'noneOf <s>' matches any character not in string <s>.
noneOf :: Derivs d => [Char] -> Parser d Char
noneOf chs = satisfy anyChar (\c -> not (c `elem` chs))
<?> ("any character not in " ++ show chs)
-- 'string <s>' matches all the characters in <s> in sequence.
string :: Derivs d => String -> Parser d String
string str = p str <?> show str
where p [] = return str
p (ch:chs) = do { char ch; p chs }
-- 'stringFrom <ss>' matches any string in the list of strings <ss>.
-- If any strings in <ss> are prefixes of other strings in <ss>,
-- then the prefixes must appear later in the list
-- in order for the longer strings to be recognized at all.
stringFrom :: Derivs d => [String] -> Parser d String
stringFrom [str] = string str
stringFrom (str : strs) = string str </> stringFrom strs
-- Match an uppercase letter.
upper :: Derivs d => Parser d Char
upper = satisfy anyChar isUpper <?> "uppercase letter"
-- Match a lowercase letter.
lower :: Derivs d => Parser d Char
lower = satisfy anyChar isLower <?> "lowercase letter"
-- Match any letter.
letter :: Derivs d => Parser d Char
letter = satisfy anyChar isAlpha <?> "letter"
-- Match any letter or digit.
alphaNum :: Derivs d => Parser d Char
alphaNum = satisfy anyChar isAlphaNum <?> "letter or digit"
-- Match any digit.
digit :: Derivs d => Parser d Char
digit = satisfy anyChar isDigit <?> "digit"
-- Match any hexadecimal digit.
hexDigit :: Derivs d => Parser d Char
hexDigit = satisfy anyChar isHexDigit <?> "hexadecimal digit (0-9, a-f)"
-- Match any octal digit.
octDigit :: Derivs d => Parser d Char
octDigit = satisfy anyChar isOctDigit <?> "octal digit (0-7)"
-- Match a newline.
newline :: Derivs d => Parser d Char
newline = char '\n'
-- Match a tab character.
tab :: Derivs d => Parser d Char
tab = char '\t'
-- Match any whitespace character (space, tab, newline, etc.).
space :: Derivs d => Parser d Char
space = satisfy anyChar isSpace <?> "whitespace character"
-- Match a sequence of zero or more whitespace characters.
spaces :: Derivs d => Parser d [Char]
spaces = many space
-- Match the end of file (i.e., "the absence of a character").
eof :: Derivs d => Parser d ()
eof = notFollowedBy anyChar <?> "end of input"
---------- Parser state manipulation combinators
-- Combinator to get the Derivs object for the current position:
-- e.g., 'dvs <- getDerivs' as part of a 'do' sequence.
getDerivs :: Derivs d => Parser d d
getDerivs = Parser (\dvs -> Parsed dvs dvs (nullError dvs))
-- Combinator to set the Derivs object used for subsequent parsing;
-- typically used to change parsing state elements in the Derivs tuple.
setDerivs :: Derivs d => d -> Parser d ()
setDerivs newdvs = Parser (\dvs -> Parsed () newdvs (nullError dvs))
-- Get the current position in the input text.
getPos :: Derivs d => Parser d Pos
getPos = Parser (\dvs -> Parsed (dvPos dvs) dvs (nullError dvs))
Pos.hs
Скачать
-- Simple data type to keep track of character positions
-- within a text file or other text stream.
module Pos where
-- Basic position indicator data type: filename, line number, column number.
data Pos = Pos {
posFile :: !String,
posLine :: !Int,
posCol :: !Int
}
-- Incrementally compute the next position in a text file
-- if 'c' is the character at the current position.
-- Follows the standard convention of 8-character tab stops.
nextPos (Pos file line col) c =
if c == '\n' then Pos file (line + 1) 1
else if c == '\t' then Pos file line ((div (col + 8 - 1) 8) * 8 + 1)
else Pos file line (col + 1)
-- Two positions are equal if each of their elements are equal.
instance Eq Pos where
Pos f1 l1 c1 == Pos f2 l2 c2 =
f1 == f2 && l1 == l2 && c1 == c2
-- Two positions are ordered by line number, then column number.
instance Ord Pos where
Pos f1 l1 c1 <= Pos f2 l2 c2 =
(l1 < l2) || (l1 == l2 && c1 <= c2)
-- Standard way to display positions - "file:line:col"
instance Show Pos where
show (Pos file line col) = file ++ ":" ++ show line ++ ":" ++ show col
-- Show a position relative to a base position.
-- If the new position is in the same line, just show its column number;
-- otherwise if the new position is in the same file,
-- just show its line and column numbers;
-- otherwise show the complete new position.
showPosRel (Pos file line col) (Pos file' line' col') =
if (file == file')
then if (line == line')
then "column " ++ show col'
else "line " ++ show line' ++ ", column " ++ show col'
else show (Pos file' line' col')
Arith.pappy
Скачать
parser Triv:
top Expression
-- Expressions
Expression :: Integer =
Spacing v:Additive !Char -> { v }
Additive :: Integer =
l:Additive "+":Symbol r:Multitive -> { l + r }
/ l:Additive "-":Symbol r:Multitive -> { l - r }
/ v:Multitive -> { v }
Multitive :: Integer =
l:Multitive "*":Symbol r:Primary -> { l * r }
/ l:Multitive "/":Symbol r:Primary -> { l `div` r }
/ l:Multitive "%":Symbol r:Primary -> { l `mod` r }
/ v:Primary -> { v }
Primary :: Integer =
v:Decimal -> { v }
/ "(":Symbol v:Additive ")":Symbol -> { v }
-- Decimal literals
Decimal :: Integer =
v:Digits Spacing -> { v }
Digits :: Integer =
v:Digits d:Digit -> { v * 10 + toInteger d }
/ d:Digit -> { toInteger d }
Digit :: Int =
c:Char &{isDigit c} -> { digitToInt c }
-- Symbols
Symbol :: String =
s:SymChars Spacing -> { s }
SymChars :: String =
"+"
/ "-"
/ "*"
/ "/"
/ "%"
/ "("
/ ")"
-- Spacing between tokens
Spacing :: {()} =
SpaceChar* -> { () }
SpaceChar :: Char =
c:Char &{isSpace c} -> { c }
{
eval :: String -> Integer
eval str = case trivExpression (trivParse "expression" str) of
Parsed v _ _ -> v
NoParse e -> error (show e)
}
ArithMonad.hs
Скачать
-- Monadic packrat parser for trivial arithmetic language
-- with left-associative operators and integrated lexical analysis.
-- Uses NO "unsafe" combinators containing hidden recursion.
module ArithMonad where
import Pos
import Parse
data ArithDerivs = ArithDerivs {
advAdditive :: Result ArithDerivs Int,
advAdditiveSuffix :: Result ArithDerivs (Int -> Int),
advMultitive :: Result ArithDerivs Int,
advMultitiveSuffix :: Result ArithDerivs (Int -> Int),
advPrimary :: Result ArithDerivs Int,
advDecimal :: Result ArithDerivs Int,
advDigits :: Result ArithDerivs (Int, Int),
advDigit :: Result ArithDerivs Int,
advSymbol :: Result ArithDerivs Char,
advSpacing :: Result ArithDerivs (),
advChar :: Result ArithDerivs Char,
advPos :: Pos
}
instance Derivs ArithDerivs where
dvChar d = advChar d
dvPos d = advPos d
-- Evaluate an expression and return the unpackaged result,
-- ignoring any unparsed remainder.
eval s = case pExpression (parse (Pos "<input>" 1 1) s) of
Parsed v _ _ -> v
NoParse e -> error (show e)
where Parser pExpression =
do Parser advSpacing
v <- Parser advAdditive
notFollowedBy anyChar
return v
-- Construct a (lazy) parse result structure for an input string,
-- in which any result can be computed in linear time
-- with respect to the length of the input.
parse :: Pos -> String -> ArithDerivs
parse pos s = d where
d = ArithDerivs add addsuf mult multsuf prim
dec digs dig sym spc
chr pos
add = pAdditive d
addsuf = pAdditiveSuffix d
mult = pMultitive d
multsuf = pMultitiveSuffix d
prim = pPrimary d
dec = pDecimal d
digs = pDigits d
dig = pDigit d
sym = pSymbol d
spc = pSpacing d
chr = case s of
(c:s') -> Parsed c (parse (nextPos pos c) s')
(nullError d)
[] -> NoParse (eofError d)
-- Parse an additive-precedence expression
pAdditive :: ArithDerivs -> Result ArithDerivs Int
Parser pAdditive =
(do l <- Parser advMultitive
f <- Parser advAdditiveSuffix
return (f l))
pAdditiveSuffix :: ArithDerivs -> Result ArithDerivs (Int -> Int)
Parser pAdditiveSuffix =
(do symbol '+'
r <- Parser advMultitive
f <- Parser advAdditiveSuffix
return (\l -> f (l + r)))
</> (do symbol '-'
r <- Parser advMultitive
f <- Parser advAdditiveSuffix
return (\l -> f (l - r)))
</> (do return (\v -> v))
-- Parse a multiplicative-precedence expression
pMultitive :: ArithDerivs -> Result ArithDerivs Int
Parser pMultitive =
(do l <- Parser advPrimary
f <- Parser advMultitiveSuffix
return (f l))
pMultitiveSuffix :: ArithDerivs -> Result ArithDerivs (Int -> Int)
Parser pMultitiveSuffix =
(do symbol '*'
r <- Parser advMultitive
f <- Parser advMultitiveSuffix
return (\l -> f (l * r)))
</> (do symbol '/'
r <- Parser advMultitive
f <- Parser advMultitiveSuffix
return (\l -> f (l `div` r)))
</> (do symbol '%'
r <- Parser advMultitive
f <- Parser advMultitiveSuffix
return (\l -> f (l `mod` r)))
</> (do return (\v -> v))
-- Parse a primary expression
pPrimary :: ArithDerivs -> Result ArithDerivs Int
Parser pPrimary =
(do symbol '('
v <- Parser advAdditive
symbol ')'
return v)
</> (do Parser advDecimal)
-- Parse a decimal number followed by optional whitespace
pDecimal :: ArithDerivs -> Result ArithDerivs Int
Parser pDecimal =
(do (v,n) <- Parser advDigits
Parser advSpacing
return v)
<?> "decimal number"
-- Parse a string of consecutive decimal digits.
-- The result is (value, number of digits).
pDigits :: ArithDerivs -> Result ArithDerivs (Int, Int)
Parser pDigits =
(do vl <- Parser advDigit
(vr,n) <- Parser advDigits
return (vl*10^n + vr, n+1))
</> (do vl <- Parser advDigit
return (vl, 1))
-- Parse a decimal digit
pDigit :: ArithDerivs -> Result ArithDerivs Int
Parser pDigit =
(do c <- digit
return (digitToInt c))
-- "Parser combinator" to look for a specific symbol.
-- Cannot be memoized directly because it takes a parameter,
-- but that does not matter because it is non-recursive.
symbol c = (do c' <- Parser advSymbol
if c' == c then return c
else fail [])
<?> show c
-- Parse a symbol character followed by optional whitespace
pSymbol :: ArithDerivs -> Result ArithDerivs Char
Parser pSymbol =
do c <- oneOf "+-*/%()"
Parser advSpacing
return c
-- Parse zero or more whitespace characters
pSpacing :: ArithDerivs -> Result ArithDerivs ()
Parser pSpacing =
(do space
Parser advSpacing
return ())
</> (do return ())
ArithLex.hs
Скачать
-- Packrat parser for arithmetic expression language
-- with left-associative operators and integrated lexical analysis.
module ArithLex where
data Result v =
Parsed v Derivs
| NoParse
data Derivs = Derivs {
dvAdditive :: Result Int,
dvAdditiveSuffix :: Result (Int -> Int),
dvMultitive :: Result Int,
dvMultitiveSuffix :: Result (Int -> Int),
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvDigits :: Result (Int, Int),
dvDigit :: Result Int,
dvSymbol :: Result Char,
dvSpacing :: Result (),
dvChar :: Result Char
}
-- Evaluate an expression and return the unpackaged result,
-- ignoring any unparsed remainder.
eval s = case dvSpacing (parse s) of
Parsed _ d' ->
case dvAdditive d' of
Parsed v d'' ->
case dvChar d'' of
Parsed _ _ -> error "Incomplete parse"
_ -> v
_ -> error "Parse error"
-- Construct a (lazy) parse result structure for an input string,
-- in which any result can be computed in linear time
-- with respect to the length of the input.
parse :: String -> Derivs
parse s = d where
d = Derivs add addsuff mult multsuff prim
dec digs dig sym white chr
add = pAdditive d
addsuff = pAdditiveSuffix d
mult = pMultitive d
multsuff = pMultitiveSuffix d
prim = pPrimary d
dec = pDecimal d
digs = pDigits d
dig = pDigit d
sym = pSymbol d
white = pSpacing d
chr = case s of
(c:s') -> Parsed c (parse s')
[] -> NoParse
-- Parse an additive-precedence expression
-- Additive <- Multitive AdditiveSuffix
pAdditive :: Derivs -> Result Int
pAdditive d = case dvMultitive d of
Parsed vleft d' ->
case dvAdditiveSuffix d' of
Parsed vsuff d'' ->
Parsed (vsuff vleft) d''
_ -> NoParse
_ -> NoParse
-- Parse an additive-precedence expression suffix
pAdditiveSuffix :: Derivs -> Result (Int -> Int)
pAdditiveSuffix d = alt1 where
-- AdditiveSuffix <- '+' Multitive AdditiveSuffix
alt1 = case dvSymbol d of
Parsed '+' d' ->
case dvMultitive d' of
Parsed vright d'' ->
case dvAdditiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft + vright)) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- AdditiveSuffix <- '-' Multitive AdditiveSuffix
alt2 = case dvSymbol d of
Parsed '-' d' ->
case dvMultitive d' of
Parsed vright d'' ->
case dvAdditiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft - vright)) d'''
_ -> alt3
_ -> alt3
_ -> alt3
-- AdditiveSuffix <- <empty>
alt3 = Parsed (\v -> v) d
-- Parse a multiplicative-precedence expression
-- Multitive <- Primary MultitiveSuffix
pMultitive :: Derivs -> Result Int
pMultitive d = case dvPrimary d of
Parsed vleft d' ->
case dvMultitiveSuffix d' of
Parsed vsuff d'' ->
Parsed (vsuff vleft) d''
_ -> NoParse
_ -> NoParse
-- Parse a multiplicative-precedence expression suffix
pMultitiveSuffix :: Derivs -> Result (Int -> Int)
pMultitiveSuffix d = alt1 where
-- MultitiveSuffix <- '*' Primary MultitiveSuffix
alt1 = case dvSymbol d of
Parsed '*' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft * vright)) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- MultitiveSuffix <- '/' Primary MultitiveSuffix
alt2 = case dvSymbol d of
Parsed '/' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft `div` vright)) d'''
_ -> alt3
_ -> alt3
_ -> alt3
-- MultitiveSuffix <- '%' Primary MultitiveSuffix
alt3 = case dvSymbol d of
Parsed '%' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft `mod` vright)) d'''
_ -> alt4
_ -> alt4
_ -> alt4
-- MultitiveSuffix <- <empty>
alt4 = Parsed (\v -> v) d
-- Parse a primary expression
pPrimary :: Derivs -> Result Int
pPrimary d = alt1 where
-- Primary <- '(' Additive ')'
alt1 = case dvSymbol d of
Parsed '(' d' ->
case dvAdditive d' of
Parsed v d'' ->
case dvSymbol d'' of
Parsed ')' d''' -> Parsed v d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Primary <- Decimal
alt2 = case dvDecimal d of
Parsed v d' -> Parsed v d'
NoParse -> NoParse
-- Parse a decimal number followed by optional whitespace
pDecimal :: Derivs -> Result Int
pDecimal d = case dvDigits d of
Parsed (v,n) d' ->
case dvSpacing d' of
Parsed _ d'' -> Parsed v d''
_ -> NoParse
_ -> NoParse
-- Parse a string of consecutive decimal digits
-- The result is (value, number of digits).
pDigits :: Derivs -> Result (Int, Int)
pDigits d = case dvDigit d of
Parsed vl d' ->
case dvDigits d' of
Parsed (vr, n) d'' ->
Parsed (vl*10^n+vr, n+1) d''
_ -> Parsed (vl, 1) d'
_ -> NoParse
-- Parse a decimal digit
pDigit :: Derivs -> Result Int
pDigit d = case dvChar d of
Parsed '0' d' -> Parsed 0 d'
Parsed '1' d' -> Parsed 1 d'
Parsed '2' d' -> Parsed 2 d'
Parsed '3' d' -> Parsed 3 d'
Parsed '4' d' -> Parsed 4 d'
Parsed '5' d' -> Parsed 5 d'
Parsed '6' d' -> Parsed 6 d'
Parsed '7' d' -> Parsed 7 d'
Parsed '8' d' -> Parsed 8 d'
Parsed '9' d' -> Parsed 9 d'
_ -> NoParse
-- Parse a symbol character followed by optional whitespace
pSymbol :: Derivs -> Result Char
pSymbol d = case dvChar d of
Parsed c d' ->
if c `elem` "+-*/%()"
then case dvSpacing d' of
Parsed _ d'' -> Parsed c d''
_ -> NoParse
else NoParse
_ -> NoParse
-- Parse zero or more whitespace characters
pSpacing :: Derivs -> Result ()
pSpacing d = case dvChar d of
Parsed c d' ->
if isSpace c
then pSpacing d'
else Parsed () d
_ -> Parsed () d
ArithLeft.hs
Скачать
-- Packrat parser for arithmetic expression language
-- supporting left-associative infix operators
module ArithLeft where
data Result v =
Parsed v Derivs
| NoParse
data Derivs = Derivs {
dvAdditive :: Result Int,
dvAdditiveSuffix :: Result (Int -> Int),
dvMultitive :: Result Int,
dvMultitiveSuffix :: Result (Int -> Int),
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char
}
-- Evaluate an expression and return the unpackaged result,
-- ignoring any unparsed remainder.
eval s = case dvAdditive (parse s) of
Parsed v rem -> v
_ -> error "Parse error"
-- Construct a (lazy) parse result structure for an input string,
-- in which any result can be computed in linear time
-- with respect to the length of the input.
parse :: String -> Derivs
parse s = d where
d = Derivs add addsuff mult multsuff prim dec chr
add = pAdditive d
addsuff = pAdditiveSuffix d
mult = pMultitive d
multsuff = pMultitiveSuffix d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s') -> Parsed c (parse s')
[] -> NoParse
-- Parse an additive-precedence expression
-- Additive <- Multitive AdditiveSuffix
pAdditive :: Derivs -> Result Int
pAdditive d = case dvMultitive d of
Parsed vleft d' ->
case dvAdditiveSuffix d' of
Parsed vsuff d'' ->
Parsed (vsuff vleft) d''
_ -> NoParse
_ -> NoParse
-- Parse an additive-precedence expression suffix
pAdditiveSuffix :: Derivs -> Result (Int -> Int)
pAdditiveSuffix d = alt1 where
-- AdditiveSuffix <- '+' Multitive AdditiveSuffix
alt1 = case dvChar d of
Parsed '+' d' ->
case dvMultitive d' of
Parsed vright d'' ->
case dvAdditiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft + vright)) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- AdditiveSuffix <- '-' Multitive AdditiveSuffix
alt2 = case dvChar d of
Parsed '-' d' ->
case dvMultitive d' of
Parsed vright d'' ->
case dvAdditiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft - vright)) d'''
_ -> alt3
_ -> alt3
_ -> alt3
-- AdditiveSuffix <- <empty>
alt3 = Parsed (\v -> v) d
-- Parse a multiplicative-precedence expression
-- Multitive <- Primary MultitiveSuffix
pMultitive :: Derivs -> Result Int
pMultitive d = case dvPrimary d of
Parsed vleft d' ->
case dvMultitiveSuffix d' of
Parsed vsuff d'' ->
Parsed (vsuff vleft) d''
_ -> NoParse
_ -> NoParse
-- Parse a multiplicative-precedence expression suffix
pMultitiveSuffix :: Derivs -> Result (Int -> Int)
pMultitiveSuffix d = alt1 where
-- MultitiveSuffix <- '*' Primary MultitiveSuffix
alt1 = case dvChar d of
Parsed '*' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft * vright)) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- MultitiveSuffix <- '/' Primary MultitiveSuffix
alt2 = case dvChar d of
Parsed '/' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft `div` vright)) d'''
_ -> alt3
_ -> alt3
_ -> alt3
-- MultitiveSuffix <- '%' Primary MultitiveSuffix
alt3 = case dvChar d of
Parsed '%' d' ->
case dvPrimary d' of
Parsed vright d'' ->
case dvMultitiveSuffix d'' of
Parsed vsuff d''' ->
Parsed (\vleft -> vsuff (vleft `mod` vright)) d'''
_ -> alt4
_ -> alt4
_ -> alt4
-- MultitiveSuffix <- <empty>
alt4 = Parsed (\v -> v) d
-- Parse a primary expression
pPrimary :: Derivs -> Result Int
pPrimary d = alt1 where
-- Primary <- '(' Additive ')'
alt1 = case dvChar d of
Parsed '(' d' ->
case dvAdditive d' of
Parsed v d'' ->
case dvChar d'' of
Parsed ')' d''' -> Parsed v d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Primary <- Decimal
alt2 = case dvDecimal d of
Parsed v d' -> Parsed v d'
NoParse -> NoParse
-- Parse a decimal digit
pDecimal :: Derivs -> Result Int
pDecimal d = case dvChar d of
Parsed '0' d' -> Parsed 0 d'
Parsed '1' d' -> Parsed 1 d'
Parsed '2' d' -> Parsed 2 d'
Parsed '3' d' -> Parsed 3 d'
Parsed '4' d' -> Parsed 4 d'
Parsed '5' d' -> Parsed 5 d'
Parsed '6' d' -> Parsed 6 d'
Parsed '7' d' -> Parsed 7 d'
Parsed '8' d' -> Parsed 8 d'
Parsed '9' d' -> Parsed 9 d'
_ -> NoParse
ArithPackrat.hs
Скачать
-- Packrat parser for trivial arithmetic language.
module ArithPackrat where
data Result v =
Parsed v Derivs
| NoParse
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char
}
-- Evaluate an expression and return the unpackaged result,
-- ignoring any unparsed remainder.
eval s = case dvAdditive (parse s) of
Parsed v rem -> v
_ -> error "Parse error"
-- Construct a (lazy) parse result structure for an input string,
-- in which any result can be computed in linear time
-- with respect to the length of the input.
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s') -> Parsed c (parse s')
[] -> NoParse
-- Parse an additive-precedence expression
pAdditive :: Derivs -> Result Int
pAdditive d = alt1 where
-- Additive <- Multitive '+' Additive
alt1 = case dvMultitive d of
Parsed vleft d' ->
case dvChar d' of
Parsed '+' d'' ->
case dvAdditive d'' of
Parsed vright d''' ->
Parsed (vleft + vright) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Additive <- Multitive
alt2 = case dvMultitive d of
Parsed v d' -> Parsed v d'
NoParse -> NoParse
-- Parse a multiplicative-precedence expression
pMultitive :: Derivs -> Result Int
pMultitive d = alt1 where
-- Multitive <- Primary '*' Multitive
alt1 = case dvPrimary d of
Parsed vleft d' ->
case dvChar d' of
Parsed '*' d'' ->
case dvMultitive d'' of
Parsed vright d''' ->
Parsed (vleft * vright) d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Multitive <- Primary
alt2 = case dvPrimary d of
Parsed v d' -> Parsed v d'
NoParse -> NoParse
-- Parse a primary expression
pPrimary :: Derivs -> Result Int
pPrimary d = alt1 where
-- Primary <- '(' Additive ')'
alt1 = case dvChar d of
Parsed '(' d' ->
case dvAdditive d' of
Parsed v d'' ->
case dvChar d'' of
Parsed ')' d''' -> Parsed v d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Primary <- Decimal
alt2 = case dvDecimal d of
Parsed v d' -> Parsed v d'
NoParse -> NoParse
-- Parse a decimal digit
pDecimal :: Derivs -> Result Int
pDecimal d = case dvChar d of
Parsed '0' d' -> Parsed 0 d'
Parsed '1' d' -> Parsed 1 d'
Parsed '2' d' -> Parsed 2 d'
Parsed '3' d' -> Parsed 3 d'
Parsed '4' d' -> Parsed 4 d'
Parsed '5' d' -> Parsed 5 d'
Parsed '6' d' -> Parsed 6 d'
Parsed '7' d' -> Parsed 7 d'
Parsed '8' d' -> Parsed 8 d'
Parsed '9' d' -> Parsed 9 d'
_ -> NoParse
ArithRecurse.hs
Скачать
-- Recursive descent parser for trivial arithmetic language
-- supporting only addition and multiplication operators.
module ArithRecurse where
data Result v =
Parsed v String
| NoParse
-- Evaluate an expression and return the unpackaged result,
-- ignoring any unparsed remainder.
eval s = case pAdditive s of
Parsed v rem -> v
_ -> error "Parse error"
-- Parse an additive-precedence expression
pAdditive :: String -> Result Int
pAdditive s = alt1 where
-- Additive <- Multitive '+' Additive
alt1 = case pMultitive s of
Parsed vleft s' ->
case s' of
('+':s'') ->
case pAdditive s'' of
Parsed vright s''' ->
Parsed (vleft + vright) s'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Additive <- Multitive
alt2 = case pMultitive s of
Parsed v s' -> Parsed v s'
NoParse -> NoParse
-- Parse a multiplicative-precedence expression
pMultitive :: String -> Result Int
pMultitive s = alt1 where
-- Multitive <- Primary '*' Multitive
alt1 = case pPrimary s of
Parsed vleft s' ->
case s' of
('*':s'') ->
case pMultitive s'' of
Parsed vright s''' ->
Parsed (vleft * vright) s'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Multitive <- Primary
alt2 = case pPrimary s of
Parsed v s' -> Parsed v s'
NoParse -> NoParse
-- Parse a primary expression
pPrimary :: String -> Result Int
pPrimary s = alt1 where
-- Primary <- '(' Additive ')'
alt1 = case s of
('(':s') ->
case pAdditive s' of
Parsed v s'' ->
case s'' of
(')':s''') -> Parsed v s'''
_ -> alt2
_ -> alt2
_ -> alt2
-- Primary <- Decimal
alt2 = case pDecimal s of
Parsed v s' -> Parsed v s'
NoParse -> NoParse
-- Parse a decimal digit
pDecimal :: String -> Result Int
pDecimal ('0':cs) = Parsed 0 cs
pDecimal ('1':cs) = Parsed 1 cs
pDecimal ('2':cs) = Parsed 2 cs
pDecimal ('3':cs) = Parsed 3 cs
pDecimal ('4':cs) = Parsed 4 cs
pDecimal ('5':cs) = Parsed 5 cs
pDecimal ('6':cs) = Parsed 6 cs
pDecimal ('7':cs) = Parsed 7 cs
pDecimal ('8':cs) = Parsed 8 cs
pDecimal ('9':cs) = Parsed 9 cs
pDecimal _ = NoParse
Main.hs
Скачать
module Main where
import IO
import System
import Pappy
import ReadGrammar
import ReduceGrammar
import SimplifyGrammar
import MemoAnalysis
import WriteParser
processArgs =
do args <- getArgs
case args of
[inname] -> return (inname, mkoutname inname)
[inname, outname] -> return (inname, outname)
_ -> usage
where
mkoutname inname =
case reverse inname of
'y':'p':'p':'a':'p':'.':inr ->
reverse inr ++ ".hs"
_ -> inname ++ ".hs"
usage = fail "Usage: pappy <infile> [<outfile>]"
processFile inname outname =
do (n, c, t, g) <- pappyParseFile inname
putStrLn ("Original grammar: " ++ show (length g) ++
" size " ++ show (sizeofNonterminals g))
--putStrLn (showNonterminals g)
let (n', c', t', g') = reduceGrammar (n, c, t, g)
putStrLn ("Reduced grammar: " ++ show (length g') ++
" size " ++ show (sizeofNonterminals g'))
--putStrLn (showNonterminals g')
let (n'', c'', t'', g'') = simplifyGrammar (n', c', t', g')
putStrLn ("Simplified grammar: " ++ show (length g'') ++
" size " ++ show (sizeofNonterminals g''))
--putStrLn (showNonterminals g'')
let m'' = memoAnalysis g''
putStrLn ("Memoized: " ++ show (length m''))
let parser = writeParser m'' (n'', c'', t'', g'')
writeFile outname parser
main = do (inname, outname) <- processArgs
processFile inname outname
WriteParser.hs
Скачать
module WriteParser where
import Char
import Pappy
-- Generate the source code for a packrat parser
writeParser :: [Identifier] -> Grammar -> String
writeParser memos (parsername, topcode, topnts, nonterms) =
"-- Parser generated by Pappy - do not edit\n" ++
"module " ++ upcase parsername ++ " where\n\n" ++
"import Pos\n" ++
"import Parse\n" ++
topcode ++ "\n" ++
tie2level ++ -- tie1level or tie2level
patpfuncs 0 nonterms -- patpfuncs or monadpfuncs
where
-- Names of identifiers used throughout the parser
derivstype = upcase parsername ++ "Derivs"
pfuncname = locase parsername ++ "Parse"
dfuncname = locase parsername ++ "Derivs"
dvname n = locase parsername ++ n
pname n = if n `elem` memos
then locase parsername ++ "Parse" ++ n
else dvname n
-- Build an expression to reference nonterminal accessor n in d
dvexp n d = dvname n ++ " " ++ d
-- Names of temporaries used in the parsing functions
--valname i = "pappyVal" ++ show i
--dvsname i = "pappyDvs" ++ show i
--errname i = "pappyErr" ++ show i
valname i = "v" ++ show i -- easier for debugging
dvsname i = "d" ++ show i
errname i = "e" ++ show i
---------- Single-level parser generation
tie1level =
derivsdecl ++
derivsinstance ++
parsefunc ++
derivsfunc 0
where
-- Generate the Derivs type declaration
derivsdecl =
"data " ++ derivstype ++ " = " ++ derivstype ++ " {\n" ++
components nonterms ++
"\t\t" ++ dvname "Char" ++ " :: Result " ++ derivstype ++
" Char,\n" ++
"\t\t" ++ dvname "Pos" ++ " :: Pos\n" ++
"\t}\n" ++
"\n"
where
components ((name, typ, rule) : rest) | name `elem` memos =
"\t\t" ++ dvname name ++
" :: Result " ++ derivstype ++
" (" ++ typ ++ "),\n" ++ components rest
components(_ : rest) = components rest
components [] = ""
-- Generate the "instance Derivs" declaration
derivsinstance =
"instance Derivs " ++ derivstype ++ " where\n" ++
"\tdvChar d = " ++ dvname "Char" ++ " d\n" ++
"\tdvPos d = " ++ dvname "Pos" ++ " d\n" ++
"\n"
-- Generate the top-level parse function
parsefunc =
pfuncname ++ " :: String -> String -> " ++ derivstype ++ "\n" ++
pfuncname ++ " name text = " ++ dfuncname ++
" (Pos name 1 1) text\n\n"
-- Generate the derivs-creation function
derivsfunc i =
ind i ++ dfuncname ++ " :: Pos -> String -> " ++
derivstype ++ "\n" ++
ind i ++ dfuncname ++ " pos text = dvs where\n" ++
ind (i+1) ++ "dvs = " ++ derivstype ++ "\n" ++
parsercalls nonterms ++
ind (i+2) ++ "chr pos\n" ++
ind (i+1) ++ "chr = case text of\n" ++
ind (i+2) ++ "[] -> NoParse (eofError dvs)\n" ++
ind (i+2) ++ "(c:cs) -> Parsed c (" ++ dfuncname ++
" (nextPos pos c) cs) (nullError dvs)\n\n"
where
parsercalls ((name, typ, rule) : rest) | name `elem` memos =
ind (i+2) ++ "(" ++ pname name ++ " dvs)\n" ++
parsercalls rest
parsercalls (_ : rest) = parsercalls rest
parsercalls [] = ""
---------- Two-level parser generation
tie2level =
derivsdecl ++
derivsinstance ++
derivs2decls 0 ++
aliases 0 nonterms ++ "\n" ++
parsefunc ++
derivsfunc 0 ++
dsubfuncs 0
where
-- Total number of nonterminals in the grammar
nmemos :: Int
nmemos = length memos
-- The number of sub-derivation structures to use
nsubderivs :: Int
nsubderivs = round (sqrt (fromInteger (toEnum nmemos)) / 2.0)
-- Find the appropriate sub-derivation structure
-- for a particular nonterminal index
derivsofk :: Int -> Int
derivsofk k = (k * nsubderivs) `div` nmemos
-- Find the appropriate sub-derivation structure
-- for a particular nonterminal name
derivsof :: Identifier -> Int
derivsof n = scan 0 nonterms where
scan k ((n', t', r') : nts) | n' `elem` memos =
if n' == n then derivsofk k
else scan (k+1) nts
scan k (_ : nts) = scan k nts
scan k [] = error "derivsof: oops!"
-- Names for types and identifiers relating to the derivs structures
derivstype = upcase parsername ++ "Derivs"
derivs2type j = upcase parsername ++ "Derivs" ++ show j
dsubfuncname j = locase parsername ++ "Derivs" ++ show j
dvsub j = locase parsername ++ "Sub" ++ show j
dvsubname n = locase parsername ++ "Sub" ++ n
-- Generate the top-level Derivs type declaration
derivsdecl =
"data " ++ derivstype ++ " = " ++ derivstype ++ " {\n" ++
components 0 ++
"\t" ++ dvname "Char" ++ " :: Result " ++ derivstype ++
" Char,\n" ++
"\t" ++ dvname "Pos" ++ " :: Pos\n" ++
"}\n\n"
where
components j | j < nsubderivs =
"\t" ++ dvsub j ++
" :: " ++ derivs2type j ++ ",\n" ++
components (j+1)
components _ = []
-- Generate the "instance Derivs" declaration
derivsinstance =
"instance Derivs " ++ derivstype ++ " where\n" ++
"\tdvChar d = " ++ dvname "Char" ++ " d\n" ++
"\tdvPos d = " ++ dvname "Pos" ++ " d\n" ++
"\n"
-- Generate the second-level Derivs type declarations
derivs2decls j | j < nsubderivs =
"data " ++ derivs2type j ++ " = " ++ derivs2type j ++
" {" ++
components True j 0 nonterms ++
"}\n\n" ++
derivs2decls (j+1)
where
components first j k ((name, typ, rule) : rest)
| name `elem` memos && derivsofk k == j =
(if first then "\n\t" else ",\n\t") ++
dvsubname name ++ " :: Result " ++ derivstype ++
" (" ++ typ ++ ")" ++ components False j (k+1) rest
components first j k ((name, typ, rule) : rest)
| name `elem` memos =
components first j (k+1) rest
components first j k (_ : rest) =
components first j k rest
components first j k [] = ""
derivs2decls _ = ""
-- Generate the second-level accessor aliases
aliases k ((n, t, r) : nts) | n `elem` memos =
dvname n ++ " = " ++ dvsubname n ++ " . " ++
dvsub (derivsofk k) ++ "\n" ++
aliases (k+1) nts
aliases k (_ : nts) =
aliases k nts
aliases k [] = ""
-- Generate the top-level parse function
parsefunc =
pfuncname ++ " :: String -> String -> " ++ derivstype ++ "\n" ++
pfuncname ++ " name text = " ++ dfuncname ++
" (Pos name 1 1) text\n\n"
-- Generate the top-level derivs-creation function
derivsfunc i =
ind i ++ dfuncname ++ " :: Pos -> String -> " ++
derivstype ++ "\n" ++
ind i ++ dfuncname ++ " pos text = dvs where\n" ++
ind (i+1) ++ "dvs = " ++ derivstype ++ "\n" ++
subcalls 0 ++
ind (i+2) ++ "chr pos\n" ++
ind (i+1) ++ "chr = case text of\n" ++
ind (i+2) ++ "[] -> NoParse (eofError dvs)\n" ++
ind (i+2) ++ "(c:cs) -> Parsed c (" ++ dfuncname ++
" (nextPos pos c) cs) (nullError dvs)\n\n"
where
subcalls j | j < nsubderivs =
ind (i+2) ++ "(" ++ dsubfuncname j ++ " dvs)\n" ++
subcalls (j+1)
subcalls _ = ""
-- Generate the second-level derivs creation functions.
-- Note: GHC's evaluation model makes it very important
-- that these actually be separate functions!
dsubfuncs j | j < nsubderivs =
dsubfuncname j ++ " dvs = " ++ derivs2type j ++ "\n" ++
parsercalls j 0 nonterms ++ "\n" ++
dsubfuncs (j+1)
where
parsercalls j k ((name, typ, rule) : rest) | name `elem` memos =
if derivsofk k == j
then "\t(" ++ pname name ++ " dvs)\n" ++
parsercalls j (k+1) rest
else parsercalls j (k+1) rest
parsercalls j k (_ : rest) =
parsercalls j k rest
parsercalls j k [] = ""
dsubfuncs _ = ""
---------- Monadic parsing function code generation
monadpfuncs i [] = ""
monadpfuncs i ((name, typ, rule) : rest) =
ind i ++ pname name ++ " :: " ++ derivstype ++
" -> Result " ++ derivstype ++ " (" ++ typ ++ ")\n" ++
ind i ++ "Parser " ++ pname name ++ " =\n" ++
prule (i+1) rule ++ "\n" ++
monadpfuncs i rest
where
-- Generate the code for a parse rule.
prule i (RulePrim n) = ind i ++ "(Parser " ++ dvname n ++ ")\n"
prule i (RuleChar c) = ind i ++ "(char " ++ show c ++ ")\n"
prule i (RuleString s) = ind i ++ "(string " ++ show s ++ ")\n"
prule i (RuleSeq matches prod) =
ind i ++ "(do\n" ++ pseq matches where
-- Match a rule without binding its semantic value
pseq (MatchAnon r : ms) =
prule (i+1) r ++ pseq ms
-- Match a rule and give its semantic value the name 'id'
pseq (MatchName r id : ms) =
ind (i+1) ++ id ++ " <-\n" ++
prule (i+2) r ++ pseq ms
-- Match the semantic value of a rule
-- against a Haskell pattern
pseq (MatchPat r p : ms) =
ind (i+1) ++ p ++ " <-\n" ++
prule (i+2) r ++ pseq ms
-- Match the semantic value of a rule against a string:
-- typically used for keywords, operators, etc.
pseq (MatchString r s : ms) =
ind (i+1) ++ "satisfy\n" ++
prule (i+2) r ++
ind (i+2) ++ "((==) " ++ show s ++ ")\n" ++
ind (i+2) ++ "<?!> " ++ show (show s) ++ "\n" ++
pseq ms
-- and-followed-by syntactic predicate
pseq (MatchAnd r : ms) =
ind (i+1) ++ "followedBy\n" ++
prule (i+2) r ++ pseq ms
-- not-followed-by syntactic predicate
pseq (MatchNot r : ms) =
ind (i+1) ++ "notFollowedBy\n" ++
prule (i+2) r ++ pseq ms
-- semantic predicate
pseq (MatchPred pred : ms) =
ind (i+1) ++ "if not (" ++ pred ++ ") then fail " ++
show ("failed predicate " ++ show pred) ++
" else return ()\n" ++ pseq ms
-- end of sequence
pseq [] =
ind (i+1) ++ "return " ++ code prod ++ ")\n"
where code (ProdName id) = id
code (ProdCode c) = "(" ++ c ++ ")"
prule i (RuleAlt []) = error "prule: no alternatives!"
prule i (RuleAlt [r]) = prule i r
prule i (RuleAlt (r : rs)) =
ind i ++ "(\n" ++ prule (i+1) r ++
concat (map (\r -> ind i ++ " <|>\n" ++ prule (i+1) r) rs) ++
ind i ++ " )\n"
prule i (RuleOpt r) =
ind i ++ "optional\n" ++ prule (i+1) r
prule i (RuleStar r) =
ind i ++ "many\n" ++ prule (i+1) r
prule i (RulePlus r) =
ind i ++ "many1\n" ++ prule (i+1) r
---------- Pattern-matching parsing function code generation
patpfuncs i [] = ""
patpfuncs i ((name, typ, rule) : rest) =
ind i ++ pname name ++ " :: " ++ derivstype ++
" -> Result " ++ derivstype ++ " (" ++ typ ++ ")\n" ++
ind i ++ pname name ++ " d =\n" ++
prule (i+1) "d" [] (\e -> "NoParse " ++ e) rule ++ "\n" ++
patpfuncs i rest
where
-- Generate the code for a parse rule,
-- directly generating a result on success
-- but using a continutation to produce failure results.
-- The failure continuation may be called multiple times.
-- prule indent derivs-name error-names fail-cont rule
prule i d es f (RulePrim n) =
ind i ++ "case " ++ dvexp n d ++ " of\n" ++
ind (i+1) ++ "Parsed " ++ v ++ " " ++ d' ++ " " ++ e' ++
" -> Parsed " ++ v ++ " " ++ d' ++ " " ++
join d es' ++ "\n" ++
ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++
f (join d es') ++ "\n"
where v = valname i
d' = dvsname i
e' = errname i
es' = e' : es
prule i d es f (RuleChar c) =
ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++
ind (i+1) ++ "r @ (Parsed " ++ show c ++ " _ _) -> r\n" ++
ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n"
prule i d es f (RuleSeq [MatchName r n] (ProdName n')) | n' == n =
prule i d es f r
prule i d es f (RuleSeq matches prod) = pseq i d es matches
where
-- Special handling for characters in sequences,
-- preventing unnecessary generation of intermediates.
pseq i d es (MatchAnon (RuleChar c) : ms) =
ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++
ind (i+1) ++ "Parsed " ++ show c ++ " " ++ d' ++
" _ ->\n" ++
pseq (i+2) d' es ms ++
ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n"
where d' = dvsname i
-- Match a rule without binding its semantic value
pseq i d es (MatchAnon r : ms) =
pseq i d es (MatchName r "_" : ms)
-- Match a rule and give its semantic value the name 'id'
pseq i d es (MatchName r id : ms) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "Parsed " ++ id ++ " " ++ d' ++ " " ++
e' ++ " ->\n" ++
pseq (i+2) d' (e':es) ms ++
ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++
f (join d (e':es)) ++ "\n" ++
prexpdef i d r
where d' = dvsname i
e' = errname i
-- Match the semantic value of a rule
-- against a Haskell pattern
pseq i d es (MatchPat r p : ms) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "Parsed (" ++ p ++ ") " ++ d' ++ " " ++
e' ++ " ->\n" ++
pseq (i+2) d' (e':es) ms ++
ind (i+1) ++ "Parsed _ _ _ -> " ++
f (join d es) ++ "\n" ++
ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++
f (join d (e':es)) ++ "\n" ++
prexpdef i d r
where d' = dvsname i
e' = errname i
-- match the semantic value of a rule against a string:
-- typically used for keywords, operators, etc.
pseq i d es (MatchString r s : ms) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "Parsed " ++ show s ++ " " ++ d' ++ " " ++
e' ++ " ->\n" ++
pseq (i+2) d' (e':es) ms ++
ind (i+1) ++ "_ -> " ++
f (join d (exp:es)) ++ "\n" ++
prexpdef i d r
where d' = dvsname i
e' = errname i
exp = "(ParseError (" ++ dvexp "Pos" d ++
") [Expected " ++ show s ++ "])"
-- and-followed-by syntactic predicate
pseq i d es (MatchAnd r : ms) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "Parsed _ _ " ++ e' ++ " ->\n" ++
pseq (i+2) d (e':es) ms ++
ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++
f (join d (e':es)) ++ "\n" ++
prexpdef i d r
where e' = errname i
-- not-followed-by syntactic predicate
pseq i d es (MatchNot r : ms) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "NoParse " ++ e' ++ " ->\n" ++
pseq (i+2) d (e':es) ms ++
ind (i+1) ++ "Parsed _ _ " ++ e' ++ " -> " ++
f (join d (e':es)) ++ "\n" ++
prexpdef i d r
where e' = errname i
-- semantic predicate
pseq i d es (MatchPred pred : ms) =
ind i ++ "case (" ++ pred ++ ") of\n" ++
ind (i+1) ++ "True ->\n" ++
pseq (i+2) d es ms ++
ind (i+1) ++ "False -> " ++ f (join d es) ++ "\n"
-- end of sequence
pseq i d es [] =
ind i ++ "Parsed (" ++ code prod ++ ") " ++ d ++
" " ++ join d es ++ "\n"
where code (ProdName id) = id
code (ProdCode c) = c
prule i d es f (RuleAlt []) = error "prule: no alternatives!"
prule i d es f (RuleAlt [r]) = prule i d es f r
prule i d es f (RuleAlt rs) =
ind i ++ alt 1 (join d es) ++ " where\n" ++ ralt 1 rs
where
ralt j (r:rs) =
ind (i+1) ++ alt j e' ++ " =\n" ++
prule (i+2) d [e']
(\e'' -> alt (j+1) e'') r ++
ralt (j+1) rs
ralt j [] =
ind (i+1) ++ alt j e' ++ " = " ++ f e' ++ "\n"
alt j e = "pappyAlt" ++ show i ++ "_" ++ show j ++ " " ++ e
e' = errname i
prule i d es f (RuleOpt r) =
ind i ++ "case " ++ prexp i d r ++ " of\n" ++
ind (i+1) ++ "Parsed " ++ v ++ " " ++ d' ++ " " ++ e' ++
" -> " ++ "Parsed (Just " ++ v ++ ") " ++ d' ++
" " ++ join d es' ++ "\n" ++
ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++
"Parsed (Nothing) " ++ d ++ " " ++ join d es' ++
"\n" ++
prexpdef i d r
where v = valname i
d' = dvsname i
e' = errname i
es' = e' : es
-- Special low-level rule for single-choice alternation on characters.
-- The left factoring code in the simplifier produces these.
prule i d es f (RuleSwitchChar alts dfl) =
ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++
cases alts ++
deflt dfl
where
cases ((c, r) : crs) =
ind (i+1) ++ "Parsed " ++ show c ++ " " ++
d' ++ " _ ->\n" ++
prule (i+2) d' es f r ++
cases crs
where d' = dvsname i
cases [] = ""
deflt (Just r) =
ind (i+1) ++ "_ ->\n" ++
prule (i+2) d es f r
deflt (Nothing) =
ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n"
-- Override any errors produced by the associated rule
-- with a given set of Expected strings.
prule i d es f (RuleExpect r (s:ss)) = prule i d es f' r where
f' _ = "ParseError (" ++ dvexp "Pos" d ++
") [Expected " ++ show (show s) ++
concat (map (\s -> ", Expected " ++ show (show s)) ss)
++ "]\n"
-- Produce a short expression for a rule.
-- Any associated definition is produced by prexpdef below.
prexp i d (RulePrim n) = dvexp n d
prexp i d r = prexpname i
prexpdef i d (RulePrim n) = ""
prexpdef i d r = ind i ++ "where\n" ++
ind (i+1) ++ prexpname i ++ " =\n" ++
prule (i+2) d [] (\e -> "NoParse " ++ e) r
prexpname i = "pappyResult" ++ show i
-- Helper: generate expression to join a list of errors
join d [] = "(ParseError (" ++ dvexp "Pos" d ++ ") [])"
join d [e] = e
join d [e,e'] = "(max " ++ e ++ " " ++ e' ++ ")"
join d (e:es) = "(maximum [" ++ e ++
concat (map (\e -> "," ++ e) es) ++ "])"
-- Helper: generate variable indentation for rule code
ind i = replicate (i*2) ' '
-- Convert the first letter of a string to uppercase or lowercase,
-- to produce Haskell type/constructor names or identifiers.
upcase (c:cs) = toUpper c : cs
locase (c:cs) = toLower c : cs
MemoAnalysis.hs
Скачать
module MemoAnalysis (memoAnalysis) where import List import Pappy -- Cost metric for a rule as used below, -- Infinite if a cycle is detected. data Cost = Cost Int | Infinite instance Eq Cost where Cost c1 == Cost c2 = c1 == c2 Infinite == Infinite = True _ == _ = False instance Ord Cost where Cost c1 <= Cost c2 = c1 <= c2 Infinite <= Infinite = True Cost c1 <= Infinite = True Infinite <= Cost c2 = False instance Show Cost where show (Cost c) = show c show Infinite = "<infinite>" instance Num Cost where Cost c1 + Cost c2 = Cost (c1 + c2) Infinite + _ = Infinite _ + Infinite = Infinite c1 * c2 = error "no multiplication for Cost" abs c = error "no 'abs' for Cost" signum c = error "no 'signum' for Cost" fromInteger i = Cost (fromInteger i) -- Produce a list of the nonterminals in a grammar to be memoized, -- by analyzing the size and recursion structure of the grammar. -- What we are doing is essentially "virtual inlining": -- finding small- to medium-size definitions that we _could_ inline directly, -- but do not want to because of the code expansion they would cause, -- and instead turn those definitions into simple Haskell functions. -- Since every circularity in the grammar is still broken by memoization, -- this transformation does not break the linear parse time guarantee. memoAnalysis :: [Nonterminal] -> [Identifier] memoAnalysis nts = iter [] where -- Iterate over the grammar, -- at each step finding the smallest nonterminal that can be inlined, -- and adding that to our list of "virtual inline" nonterminals. -- We have to take the list into account in subsequent iterations -- in order to ensure that cylces stay broken, -- and to take virtual inlining into account in our size metrics: -- the size of the definition for a virtual-inlined nonterminal -- counts towards the sizes of rules that reference it. iter :: [Nonterminal] -> [Identifier] iter vnts = case sel vnts Nothing nts of Just vnt -> iter (vnt : vnts) Nothing -> memos nts where memos ((n, t, r) : nts) = case findnt n vnts of Just _ -> memos nts Nothing -> n : memos nts memos [] = [] -- Find the smallest nonterminal that can be inlined. sel :: [Nonterminal] -> Maybe (Nonterminal, Cost) -> [Nonterminal] -> Maybe Nonterminal sel vnts (Just (nt, c)) ((nt' @ (n', t', r')) : nts) = if use vnts c c' n' r' then sel vnts (Just (nt', c')) nts else sel vnts (Just (nt, c)) nts where c' = costrule vnts [n'] r' sel vnts (Nothing) ((nt' @ (n', t', r')) : nts) = if use vnts Infinite c' n' r' then sel vnts (Just (nt', c')) nts else sel vnts (Nothing) nts where c' = costrule vnts [n'] r' sel vnts (Just (nt, c)) [] = Just nt sel vnts (Nothing) [] = Nothing -- Taking a particular nonterminal as a candidate, -- determine if we should use it (possibly over an existing candidate). -- Only select nonterminals that aren't already in vnts. use vnts curcost newcost n r = newcost < curcost && newcost < 25 && notfound where notfound = case findnt n vnts of Just _ -> False Nothing -> True -- Scan a rule and all virtually-inlined rules it refers to, -- calculating the total cost of the rule, and making sure that -- we do not hit a nonterminal we've already visited. -- ns: nonterminals we have already visited in our search. costrule :: [Nonterminal] -> [Identifier] -> Rule -> Cost costrule vnts ns (RulePrim n) = if n `elem` ns then Infinite else case findnt n vnts of Just (_, _, r) -> 1 + costrule vnts (n:ns) r Nothing -> Cost 1 -- not virtually-inlined (yet) costrule vnts ns (RuleChar c) = 1 costrule vnts ns (RuleSeq ms p) = 1 + seq ms where seq (MatchAnon r : ms) = costrule vnts ns r + seq ms seq (MatchName r id : ms) = costrule vnts ns r + seq ms seq (MatchPat r p : ms) = costrule vnts ns r + seq ms seq (MatchString r s : ms) = costrule vnts ns r + seq ms seq (MatchAnd r : ms) = costrule vnts ns r + seq ms seq (MatchNot r : ms) = costrule vnts ns r + seq ms seq (MatchPred c : ms) = 1 seq [] = 0 costrule vnts ns (RuleAlt rs) = 1 + alts rs where alts (r : rs) = costrule vnts ns r + alts rs alts [] = 0 costrule vnts ns (RuleOpt r) = costrule vnts ns r costrule vnts ns (RuleString s) = 1 costrule vnts ns (RuleStar r) = costrule vnts ns r costrule vnts ns (RulePlus r) = costrule vnts ns r costrule vnts ns (RuleExpect r ss) = costrule vnts ns r costrule vnts ns (RuleSwitchChar crs dfl) = 1 + cases crs + defl dfl where cases ((c, r) : crs) = costrule vnts ns r + cases crs cases [] = 0 defl (Just r) = costrule vnts ns r defl (Nothing) = 0 -- Find a named nonterminal in a list. findnt :: Identifier -> [Nonterminal] -> Maybe Nonterminal findnt n ((nt @ (n', t', r')) : nts) | n' == n = Just nt findnt n (nt : nts) = findnt n nts findnt n [] = NothingSimplifyGrammar.hs Скачать
module SimplifyGrammar (simplifyGrammar) where
import Pappy
-- Rewrite composite and simple left recursive rules
simplifyGrammar :: Grammar -> Grammar
simplifyGrammar (parsername, topcode, topnts, nonterms) =
(parsername, topcode, topnts, iteropt topnts nonterms)
-- Iterate the optimizations until no more simplifications are possible,
-- since eliminating one nonterminal may make others reducible.
-- Be careful not to eliminate any nonterminals listed in 'tops'.
iteropt :: [Identifier] -> [Nonterminal] -> [Nonterminal]
iteropt tops nts = nts'''' where
(nts', peepholed) = peephole nts
(nts'', collapsed) = collapse tops nts'
(nts''', inlined) = inline tops nts''
nts'''' = if peepholed || collapsed || inlined
then iteropt tops nts''' else nts'''
-- Simple peephole grammar optimizer,
-- implementing various local simplifications.
peephole :: [Nonterminal] -> ([Nonterminal], Bool)
peephole g = nonterms g where
nonterms ((n, t, r) : g) = ((n, t, r') : g', sr || sg) where
(r', sr) = rule r
(g', sg) = nonterms g
nonterms [] = ([], False)
---------- Simplify a rule
rule :: Rule -> (Rule, Bool)
rule (r @ (RulePrim n)) = (r, False)
rule (r @ (RuleChar c)) = (r, False)
rule (r @ (RuleString s)) = (r, False)
-- Eliminate useless sequencing
rule (RuleSeq [MatchName r id] (ProdName id')) | id' == id =
(r', True) where
(r', _) = rule r
rule (RuleSeq ms p) = (RuleSeq ms' p, sms) where
(ms', sms) = seq ms
-- Eliminate degenerate alternation
rule (RuleAlt [r]) = (r', True) where (r', _) = rule r
-- Flatten nested alternation operators
rule (RuleAlt (RuleAlt rs1 : rs2)) = (r'', True) where
r' = RuleAlt (rs1 ++ rs2)
(r'', _) = rule r'
rule (RuleAlt rs) = (RuleAlt rs', srs) where (rs', srs) = alts rs
rule (RuleOpt r) = (RuleOpt r', sr) where (r', sr) = rule r
rule (RuleStar r) = (RuleStar r', sr) where (r', sr) = rule r
rule (RulePlus r) = (RulePlus r', sr) where (r', sr) = rule r
rule (RuleSwitchChar crs dfl) = cases False [] crs where
cases b ncrs ((c, r) : crs) =
cases (b' || b) ((c, r') : ncrs) crs
where (r', b') = rule r
cases b ncrs [] = dodfl b (reverse ncrs) dfl
dodfl b ncrs (Just r) =
(RuleSwitchChar ncrs (Just r'), b || b')
where (r', b') = rule r
dodfl b ncrs (Nothing) =
(RuleSwitchChar ncrs Nothing, b)
---------- Simplify a list of alternatives
alts :: [Rule] -> ([Rule], Bool)
-- Left-factor consecutive alternatives starting with characters,
-- producing an efficiently-implementable SwitchChar rule.
alts (rs @ (RuleSeq (MatchAnon (RuleChar _) : _) _ :
RuleSeq (MatchAnon (RuleChar _) : _) _ : _)) =
mksw [] rs where
-- Collect alternatives starting with a character as cases.
mksw cases (RuleSeq (MatchAnon (RuleChar c) : ms) p : rs) =
mksw (addcase c (RuleSeq ms p) cases) rs
-- If we hit an empty alternative, it always matches,
-- so we can use it as the default case and ignore any others.
mksw cases (RuleSeq [] p : _) = (rs'', True) where
rs' = [RuleSwitchChar cases (Just (RuleSeq [] p))]
(rs'', _) = alts rs'
-- If we hit some other kind of alternative,
-- we have to stop there and keep the rest separate.
mksw cases rs = (rs'', True) where
rs' = RuleSwitchChar cases Nothing : rs
(rs'', _) = alts rs'
-- Add an alternative to a list of cases, merging appropriately
addcase c r [] = [(c, r)]
addcase c r ((c', RuleAlt rs) : crs) | c == c' =
(c, RuleAlt (rs ++ [r])) : crs
addcase c r ((c', r') : crs) | c == c' =
(c, RuleAlt [r', r]) : crs
addcase c r ((c', r') : crs) =
(c', r') : addcase c r crs
-- Special degenerate case of character left-factoring.
alts [RuleSeq (MatchAnon (RuleChar c) : ms) p, RuleSeq [] p2] =
([RuleSwitchChar [(c, r1')] (Just r2')], True)
where (r1', _) = rule (RuleSeq ms p)
(r2', _) = rule (RuleSeq [] p2)
alts (r : rs) = (r' : rs', sr || srs) where
(r', sr) = rule r
(rs', srs) = alts rs
alts [] = ([], False)
---------- Simplify a list of matches in a sequence
seq :: [Match] -> ([Match], Bool)
-- Flatten nested sequencing operators if possible
seq (MatchAnon (RuleSeq ms1 _) : ms2) = (ms', True) where
(ms', _) = seq (ms1 ++ ms2)
seq ((m @ (MatchName (RuleSeq ms1 (ProdName idi)) ido)) : ms2) =
flattenseq m ms1 idi (\r -> MatchName r ido) ms2
seq ((m @ (MatchPat (RuleSeq ms1 (ProdName idi)) p)) : ms2) =
flattenseq m ms1 idi (\r -> MatchPat r p) ms2
seq ((m @ (MatchString (RuleSeq ms1 (ProdName idi)) s)) : ms2) =
flattenseq m ms1 idi (\r -> MatchString r s) ms2
-- -- merge anonymous character and string sequences
-- seq (MatchAnon (RuleChar c1) : MatchAnon (RuleChar c2) : ms) =
-- (ms'', True) where
-- ms' = MatchAnon (RuleString [c1, c2]) : ms
-- (ms'', _) = seq ms'
--
-- seq (MatchAnon (RuleChar c1) : MatchAnon (RuleString s2) : ms) =
-- (ms'', True) where
-- ms' = MatchAnon (RuleString (c1 : s2)) : ms
-- (ms'', _) = seq ms'
--
-- seq (MatchAnon (RuleString s1) : MatchAnon (RuleChar c2) : ms) =
-- (ms'', True) where
-- ms' = MatchAnon (RuleString (s1 ++ [c2])) : ms
-- (ms'', _) = seq ms'
--
-- seq (MatchAnon (RuleString s1) : MatchAnon (RuleString s2) : ms) =
-- (ms'', True) where
-- ms' = MatchAnon (RuleString (s1 ++ s2)) : ms
-- (ms'', _) = seq ms'
seq (m : ms) = (m' : ms', sm || sms) where
(m', sm) = match m
(ms', sms) = seq ms
seq [] = ([], False)
-- Helper for sequence flattening code above
flattenseq m ms1 idi f ms2 =
case simpidi ms1 of
True -> (ms', True) where
(ms', _) = seq (rebind ms1 ++ ms2)
False -> (m' : ms2', sm || sms2) where
(m', sm) = match m
(ms2', sms2) = seq ms2
where
-- Make sure idi is bound by a simple MatchName,
-- rather than within a Haskell pattern.
simpidi [] = False
simpidi (MatchName d id' : ms) | id' == idi = True
simpidi (m : ms) = simpidi ms
-- Convert the matcher for idi into the appropriate substitute
rebind [] = error "flattenseq/rebind - oops!"
rebind (MatchName r id' : ms) | id' == idi = f r : ms
rebind (m : ms) = m : rebind ms
---------- Simplify a single match in a sequence
match :: Match -> (Match, Bool)
match (MatchAnon r) = (MatchAnon r', sr) where (r', sr) = rule r
match (MatchName r id) = (MatchName r' id, sr) where (r', sr) = rule r
match (MatchPat r p) = (MatchPat r' p, sr) where (r', sr) = rule r
match (MatchString r s) = (MatchString r' s, sr) where (r', sr) = rule r
match (MatchAnd r) = (MatchAnd r', sr) where (r', sr) = rule r
match (MatchNot r) = (MatchNot r', sr) where (r', sr) = rule r
match (r @ (MatchPred p)) = (r, False)
-- Search for and eliminate nonterminals in a grammar
-- that are structurally equivalent to some other nonterminal.
-- This optimization is fairly straightforward and stupid:
-- e.g., it doesn't catch structurally equivalent
-- mutually recursive _groups_ of nonterminals -
-- but I have no reason to believe such groups are likely to be common.
-- Most of the duplicates this pass finds
-- are star- and plus-rules generated above,
-- but we run the optimization over the whole grammar anyway
-- just in case the original grammar has some duplicates we can eliminate.
collapse :: [Identifier] -> [Nonterminal] -> ([Nonterminal], Bool)
collapse tops g = scandups [] (\n -> n) g where
-- Scan a grammar for duplicates to eliminate.
-- ns is the list of nonterminals we can eliminate,
-- and f is a function to rename those nonterminals correctly.
-- We will actually eliminate those nonterminals all at once later.
-- Always rename to the _last_ duplicate in the grammar,
-- to make sure it won't also go away if there are several.
scandups ns f ((n, t, r) : g) =
if n `elem` tops then scandups ns f g
else case scandupof n t r f (reverse g) of
Just n' -> scandups (n:ns) f' g where
f' nx = if nx == n then n' else f nx
Nothing -> scandups ns f g
scandups ns f [] =
case ns of
[] -> (g, False)
_ -> (rebuild ns f g, True)
-- Scan for a duplicate of a particular nonterminal.
scandupof n t r f ((n', t', r') : g) =
if n' == n then error "shouldn't have found myself!"
else if t' == t && equivRule f' r r' then Just n'
else scandupof n t r f g
where f' nx = if nx == n then n' else f nx
scandupof n t r f [] = Nothing
-- Check two rules for structural equivalence
-- under the nonterminal renaming function f.
equivRule f (RulePrim n1) (RulePrim n2) = f n1 == f n2
equivRule f (RuleChar c1) (RuleChar c2) = c1 == c2
equivRule f (RuleString s1) (RuleString s2) = s1 == s2
equivRule f (RuleSeq ms1 p1) (RuleSeq ms2 p2) = p1 == p2 && seq ms1 ms2
where seq (m1 : ms1) (m2 : ms2) =
equivMatch f m1 m2 && seq ms1 ms2
seq [] [] = True
seq _ _ = False
equivRule f (RuleAlt rs1) (RuleAlt rs2) = alts rs1 rs2
where alts (r1 : rs1) (r2 : rs2) =
equivRule f r1 r2 && alts rs1 rs2
alts [] [] = True
alts _ _ = False
equivRule f (RuleOpt r1) (RuleOpt r2) = equivRule f r1 r2
equivRule f (RuleStar r1) (RuleStar r2) = equivRule f r1 r2
equivRule f (RulePlus r1) (RulePlus r2) = equivRule f r1 r2
equivRule f (RuleSwitchChar crs1 dfl1) (RuleSwitchChar crs2 dfl2) =
cases crs1 crs2 && eqdefl dfl1 dfl2
where cases ((c1, r1) : crs1) ((c2, r2) : crs2) =
c1 == c2 && equivRule f r1 r2 && cases crs1 crs2
cases [] [] = True
cases _ _ = False
eqdefl (Just r1) (Just r2) = equivRule f r1 r2
eqdefl (Nothing) (Nothing) = True
eqdefl _ _ = False
equivRule f _ _ = False
equivMatch f (MatchAnon r1) (MatchAnon r2) = equivRule f r1 r2
equivMatch f (MatchName r1 id1) (MatchName r2 id2) =
id1 == id2 && equivRule f r1 r2
equivMatch f (MatchPat r1 p1) (MatchPat r2 p2) =
p1 == p2 && equivRule f r1 r2
equivMatch f (MatchString r1 s1) (MatchString r2 s2) =
s1 == s2 && equivRule f r1 r2
equivMatch f (MatchAnd r1) (MatchAnd r2) = equivRule f r1 r2
equivMatch f (MatchNot r1) (MatchNot r2) = equivRule f r1 r2
equivMatch f (MatchPred p1) (MatchPred p2) = p1 == p2
equivMatch f _ _ = False
-- Rebuild a grammar, eliminating definitions for nonterminals in 'ns'
-- and applying renaming function 'f' to all nonterminal references.
rebuild ns f [] = []
rebuild ns f ((n, t, r) : g) =
if n `elem` ns then rebuild ns f g
else (n, t, r') : rebuild ns f g
where r' = rebuildRule f r
rebuildRule f (RulePrim n) = RulePrim (f n)
rebuildRule f (RuleChar c) = RuleChar c
rebuildRule f (RuleString s) = RuleString s
rebuildRule f (RuleSeq ms p) = RuleSeq (reseq ms) p
where reseq (m : ms) = rebuildMatch f m : reseq ms
reseq [] = []
rebuildRule f (RuleAlt rs) = RuleAlt (alts rs)
where alts (r : rs) = rebuildRule f r : alts rs
alts [] = []
rebuildRule f (RuleOpt r) = RuleOpt (rebuildRule f r)
rebuildRule f (RuleStar r) = RuleStar (rebuildRule f r)
rebuildRule f (RulePlus r) = RulePlus (rebuildRule f r)
rebuildRule f (RuleSwitchChar crs dfl) =
RuleSwitchChar (cases crs) (defl dfl)
where cases ((c, r) : crs) = (c, rebuildRule f r) : cases crs
cases [] = []
defl (Just r) = Just (rebuildRule f r)
defl (Nothing) = Nothing
rebuildMatch f (MatchAnon r) = MatchAnon (rebuildRule f r)
rebuildMatch f (MatchName r id) = MatchName (rebuildRule f r) id
rebuildMatch f (MatchPat r p) = MatchPat (rebuildRule f r) p
rebuildMatch f (MatchString r s) = MatchString (rebuildRule f r) s
rebuildMatch f (MatchAnd r) = MatchAnd (rebuildRule f r)
rebuildMatch f (MatchNot r) = MatchNot (rebuildRule f r)
rebuildMatch f (MatchPred p) = MatchPred p
-- Inline all nonterminals that are only referenced once in a grammar.
-- Also eliminates any nonterminals not referenced anywhere
-- except perhaps in their own rule.
inline :: [Identifier] -> [Nonterminal] -> ([Nonterminal], Bool)
inline tops g = scan g where
-- Scan for a nonterminal we can eliminate.
scan ((n, t, r) : nts) | n `elem` tops = scan nts
scan ((n, t, r) : nts) =
if elim then (rebuild n g, True) else scan nts where
elim = case refs n g of
0 -> True
1 -> refsRule n r == 0
_ -> refsRule n r == 0 && sizeofRule r <= 2
scan [] = (g, False) -- nothing eliminated
-- Count the number of references to a particular nonterminal
-- *outside of* that nonterminal's own definition.
refs n [] = 0
refs n ((n', t', r') : nts) =
if (n == n') then refs n nts
else refsRule n r' + refs n nts
-- Count the number of references to nonterminal 'n' in rule 'r'
refsRule :: Identifier -> Rule -> Int
refsRule n (RulePrim n') = if n' == n then 1 else 0
refsRule n (RuleChar c) = 0
refsRule n (RuleString s) = 0
refsRule n (RuleSeq ms p) = seq ms
where seq (m : ms) = refsMatch n m + seq ms
seq [] = 0
refsRule n (RuleAlt rs) = alts rs
where alts (r : rs) = refsRule n r + alts rs
alts [] = 0
refsRule n (RuleOpt r) = refsRule n r
refsRule n (RuleStar r) = refsRule n r
refsRule n (RulePlus r) = refsRule n r
refsRule n (RuleSwitchChar crs dfl) = cases crs + deflt dfl
where cases ((c, r) : crs) = refsRule n r + cases crs
cases [] = 0
deflt (Just r) = refsRule n r
deflt (Nothing) = 0
refsMatch n (MatchAnon r) = refsRule n r
refsMatch n (MatchName r id) = refsRule n r
refsMatch n (MatchPat r p) = refsRule n r
refsMatch n (MatchString r s) = refsRule n r
refsMatch n (MatchAnd r) = refsRule n r
refsMatch n (MatchNot r) = refsRule n r
refsMatch n (MatchPred p) = 0
-- Rebuild the grammar, eliminating nonterminals in 'ns'
-- and applying the inlining function 'f'.
rebuild ns [] = []
rebuild ns ((n, t, r) : nts) =
if n == ns then rebuild ns nts
else (n, t, rebuildRule ns r) : rebuild ns nts
rebuildRule ns (RulePrim n) =
if n == ns
then rebuildRule ns (find n g)
else RulePrim n
rebuildRule ns (RuleChar c) = RuleChar c
rebuildRule ns (RuleString s) = RuleString s
rebuildRule ns (RuleSeq ms p) = RuleSeq (reseq ms) p
where reseq (m : ms) = rebuildMatch ns m : reseq ms
reseq [] = []
rebuildRule ns (RuleAlt rs) = RuleAlt (alts rs)
where alts (r : rs) = rebuildRule ns r : alts rs
alts [] = []
rebuildRule ns (RuleOpt r) = RuleOpt (rebuildRule ns r)
rebuildRule ns (RuleStar r) = RuleStar (rebuildRule ns r)
rebuildRule ns (RulePlus r) = RulePlus (rebuildRule ns r)
rebuildRule ns (RuleSwitchChar crs dfl) =
RuleSwitchChar (cases crs) (deflt dfl)
where cases ((c, r) : crs) =
(c, rebuildRule ns r) : cases crs
cases [] = []
deflt (Just r) = Just (rebuildRule ns r)
deflt (Nothing) = Nothing
rebuildMatch ns (MatchAnon r) = MatchAnon (rebuildRule ns r)
rebuildMatch ns (MatchName r id) = MatchName (rebuildRule ns r) id
rebuildMatch ns (MatchPat r p) = MatchPat (rebuildRule ns r) p
rebuildMatch ns (MatchString r s) = MatchString (rebuildRule ns r) s
rebuildMatch ns (MatchAnd r) = MatchAnd (rebuildRule ns r)
rebuildMatch ns (MatchNot r) = MatchNot (rebuildRule ns r)
rebuildMatch ns (MatchPred p) = MatchPred p
-- Find the definition rule for a particular nonterminal
find n ((n', t', r') : nts) =
if n' == n then r' else find n nts
find n [] = error ("inline: can't find nonterminal " ++ n)
ReduceGrammar.hs
Скачать
-- Pappy grammar analysis/reduction module:
--
-- - checks that all referenced nonterminals have definitions
-- - checks that nonterminals are not defined multiple times
-- - checks for illegal (e.g., indirect) left recursion
-- - rewrites simple left-recursive rules using right recursion
-- - rewrites '*' (zero-or-more) and '+' (one-or-more) iteration rules
--
module ReduceGrammar (reduceGrammar) where
import Pappy
-- Rewrite composite and simple left recursive rules
reduceGrammar :: Grammar -> Grammar
reduceGrammar (parsername, topcode, topnts, nonterms) = g'' where
-- First reduce the grammar
g' = (parsername, topcode, topnts, reverse (reducents [] nonterms))
-- Then check it for remaining illegal left recursion
g'' = case checkLeftRecursion g' of
Just e -> error e
Nothing -> g'
-- Reduce the rules in a grammar and add them to 'ng'
reducents ng [] = ng
reducents ng ((n, t, r) : g) =
if existstnt n ng
then error ("Duplicate nonterminal " ++ show n)
else reducents ng''' g where
-- First rewrite composite constructs in the rule
(ng', r') = rerule ng r
-- Then eliminate simple left recursion, if any
(ng'', r'') = elimleft ng' n t r'
-- Add the final reduced nonterminal to the grammar
ng''' = (n, t, r'') : ng''
-- Reduce iterative operators in a grammar rule.
rerule ng (r @ (RulePrim n)) =
if existstnt n nonterms
then (ng, r)
else error ("Reference to undefined nonterminal " ++ show n)
rerule ng (r @ (RuleChar c)) = (ng, r)
rerule ng (RuleSeq ms p) = (ng', RuleSeq ms' p) where
(ng', ms') = reseq ng ms
reseq ng [] = (ng, [])
reseq ng (MatchAnon r : ms) = (ng'', MatchAnon r' : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchName r id : ms) = (ng'', MatchName r' id : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchPat r p : ms) = (ng'', MatchPat r' p : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchString r s : ms) = (ng'', MatchString r' s : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchAnd r : ms) = (ng'', MatchAnd r' : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchNot r : ms) = (ng'', MatchNot r' : ms')
where (ng', r') = rerule ng r
(ng'', ms') = reseq ng' ms
reseq ng (MatchPred c : ms) = (ng', MatchPred c : ms')
where (ng', ms') = reseq ng ms
rerule ng (RuleAlt [alt]) = rerule ng alt
rerule ng (RuleAlt alts) = (ng', RuleAlt alts') where
(ng', alts') = rerules ng alts
rerules ng [] = (ng, [])
rerules ng (r : rs) = (ng'', r' : rs') where
(ng', r') = rerule ng r
(ng'', rs') = rerules ng' rs
rerule ng (RuleOpt r) = (ng', RuleOpt r') where
(ng', r') = rerule ng r
-- Reduce string literals to character sequences
rerule ng (RuleString s) =
(ng, RuleSeq (matches s) (ProdCode (show s)))
where
matches (c : cs) = MatchAnon (RuleChar c) : matches cs
matches [] = []
-- Reduce zero-or-more (star operator) rules
rerule ng (RuleStar r) = (ng'', RulePrim n'') where
(ng', r') = rerule ng r
n'' = newnt "StarRule" (nonterms ++ ng)
t'' = "[" ++ infertype r' ++ "]"
r'' = RuleAlt [
RuleSeq [MatchName r' "v",
MatchName (RulePrim n'') "vs"]
(ProdCode "v : vs"),
RuleSeq [] (ProdCode "[]")]
ng'' = (n'', t'', r'') : ng'
-- reuse existing equivalent iteration nonterminals
-- Reduce one-or-more (plus operator) rules
rerule ng (RulePlus r) = (ng'', RulePrim n'') where
(ng', r') = rerule ng r
n'' = newnt "PlusRule" (nonterms ++ ng)
t'' = "[" ++ infertype r' ++ "]"
r'' = RuleAlt [
RuleSeq [MatchName r' "v",
MatchName (RulePrim n'') "vs"]
(ProdCode "v : vs"),
RuleSeq [MatchName r' "v"]
(ProdCode "[v]")]
ng'' = (n'', t'', r'') : ng'
-- reuse existing equivalent iteration nonterminals
-- Eliminate simple left recursion in a grammar rule
elimleft ng n t (r @ (RuleAlt alts)) = fix (scan alts) where
-- Separate left-recursive alternatives (las)
-- from terminating alternatives (tas).
scan [] = ([], [])
scan ((ra @ (RuleSeq (MatchName (RulePrim n') id'
: ms') p)) : ras) =
if n' == n
then (ra : las, tas)
else (las, ra : tas)
where (las, tas) = scan ras
scan (ra : ras) = (las, ra : tas)
where (las, tas) = scan ras
-- Trivial case: no left-recursive alternatives
fix ([], _) = (ng, r)
-- Illegal case: no terminating alternatives
fix (_, []) =
error ("No termination for left recursive rule " ++
show n)
-- Left recursive case
fix (las, tas) = (ng', r') where
ntail = n ++ "Tail"
ttail = "(" ++ t ++ " -> " ++ t ++ ")"
rtail = RuleAlt (map tailalt las ++ [rnull])
rnull = RuleSeq [] (ProdCode "\\v -> v")
ng' = (ntail, ttail, rtail) : ng
r' = RuleAlt (map headalt tas)
headalt r =
RuleSeq [MatchName r "l",
MatchName (RulePrim ntail) "t"]
(ProdCode "t l")
tailalt (RuleSeq (MatchName _ id : ms) p) =
RuleSeq (ms ++ [m]) (ProdCode code) where
m = MatchName (RulePrim ntail) "pappyTail"
code = "\\" ++ id ++ " -> pappyTail (" ++
oldcode ++ ")"
oldcode = case p of
ProdName id' -> id'
ProdCode c -> c
-- Default case when nonterminal isn't an alternation
elimleft ng n t r = (ng, r)
-- Infer the type of a rule expression, for use by */+ reducers.
-- Doesn't work on sequences with results produced by raw Haskell code.
infertype (RulePrim "Char") = "Char"
infertype (RulePrim n) = t where
(t, r) = findnt n nonterms
infertype (RuleChar c) = "Char"
infertype (RuleString s) = "String"
infertype (RuleSeq ms (ProdName id)) = findmatch ms where
findmatch [] = error ("Match variable " ++ id ++ " not found")
findmatch (MatchName r id' : ms) =
if id' == id then infertype r else findmatch ms
findmatch (m : ms) = findmatch ms
infertype (RuleAlt (r : rs)) = infertype r -- ignore others
infertype (RuleOpt r) = "Maybe (" ++ infertype r ++ ")"
infertype (RuleStar r) = "[" ++ infertype r ++ "]"
infertype (RulePlus r) = "[" ++ infertype r ++ "]"
infertype r = error ("Unable to infer type of: " ++ show r)
-- After simple left recursion has been eliminated,
-- check for any remaining (illegal) left recursion.
-- Takes a grammar and returns an error message if any is found.
checkLeftRecursion :: Grammar -> Maybe String
checkLeftRecursion (parsername, topcode, topnts, nonterms) =
checkgram nonterms nonterms where
-- Iterate through the grammar checking each nonterminal.
checkgram :: [Nonterminal] -> [Nonterminal] -> Maybe String
checkgram g ((n, t, r) : nts) =
case (checknt g [n] r, checkgram g nts) of
((_, Just e), _) -> Just e
(_, Just e) -> Just e
_ -> Nothing
checkgram g [] = Nothing
-- checknt takes a list of nonterminals that have been visited
-- and the rule to check,
-- and descends into the rule detecting circular left-references
-- to those nonterminals.
-- It returns True if the rule can match the empty string,
-- False otherwise.
checknt :: [Nonterminal] -> [Identifier] -> Rule -> (Bool, Maybe String)
checknt g ns (RulePrim n) =
if n == "Char" then (False, Nothing)
else if n `elem` ns
then (True,
Just ("Illegal left recursion: " ++
concat (map (\n -> n ++ " -> ") (reverse ns))
++ n))
else let (t, r) = findnt n g in checknt g (n:ns) r
checknt g ns (RuleChar c) = (False, Nothing)
checknt g ns (RuleString s) = (s == [], Nothing)
checknt g ns (RuleSeq ms p) = cseq ms where
cseq (MatchAnon r : ms) = seq (checknt g ns r) (cseq ms)
cseq (MatchName r id : ms) = seq (checknt g ns r) (cseq ms)
cseq (MatchPat r p : ms) = seq (checknt g ns r) (cseq ms)
cseq (MatchString r s : ms) = seq (checknt g ns r) (cseq ms)
cseq (MatchAnd r : ms) = pred (checknt g ns r) (cseq ms)
cseq (MatchNot r : ms) = pred (checknt g ns r) (cseq ms)
cseq (MatchPred c : ms) = pred (True, Nothing) (cseq ms)
cseq [] = (True, Nothing)
-- Used for normal sequence matchers
seq (_, Just e) _ = (True, Just e)
seq (True, Nothing) r2 = r2
seq r1 _ = r1
-- Used for predicate matchers, which always match empty
pred (_, Just e) _ = (True, Just e)
pred (_, Nothing) r2 = r2
checknt g ns (RuleAlt rs) = calts rs where
calts [r] = checknt g ns r
calts (r : rs) =
case (checknt g ns r, calts rs) of
((_, Just e), _) -> (True, Just e)
(_, (_, Just e)) -> (True, Just e)
((b1, _), (b2, _)) -> (b1 || b2, Nothing)
checknt g ns (RuleOpt r) =
case checknt g ns r of
(_, Just e) -> (True, Just e)
_ -> (True, Nothing)
checknt g ns (RuleStar r) =
case checknt g ns r of
(_, Just e) -> (True, Just e)
_ -> (True, Nothing)
checknt g ns (RulePlus r) = checknt g ns r
-- Check if a terminal or nonterminal of a given name exists
existstnt n nts = n == "Char" || existsnt n nts
-- Check if a nonterminal of a given name exists in a grammar
existsnt n [] = False
existsnt n ((n', _, _) : nts) = (n' == n) || existsnt n nts
-- Find the type and rule for a given nonterminal in the grammar
findnt n [] = error ("Nonterminal " ++ show n ++ " not found")
findnt n ((n', t, r) : nts) = if n' == n then (t, r) else findnt n nts
-- Construct a name for a new nonterminal out of a given basename,
-- using ascending numeric indices to prevent conflicts
newnt base nts = scan 0 where
scan i = let n = base ++ show i
in if existstnt n nts then scan (i+1) else n
ReadGrammar.hs
Скачать
-- Packrat parser for Pappy grammar definition files
module ReadGrammar (pappyParse, pappyParseFile) where
import Char
import Pos
import Parse
import Pappy
-------------------- Top-level tie-up --------------------
data PappyDerivs = PappyDerivs {
-- Grammar definition structure
pdGrammar :: Result PappyDerivs Grammar,
pdNonterminal :: Result PappyDerivs Nonterminal,
-- Grammar rule expressions
pdAltRule :: Result PappyDerivs Rule,
pdSeqRule :: Result PappyDerivs Rule,
pdSeqMatch :: Result PappyDerivs Match,
pdUnaryRule :: Result PappyDerivs Rule,
pdPrimRule :: Result PappyDerivs Rule,
-- Lexical structure
pdIdentifier :: Result PappyDerivs String,
pdWord :: Result PappyDerivs String,
pdSymbol :: Result PappyDerivs String,
pdRawCode :: Result PappyDerivs String,
pdCodeBlock :: Result PappyDerivs String,
pdCodeChars :: Result PappyDerivs String,
pdCodeSQChars :: Result PappyDerivs String,
pdCodeDQChars :: Result PappyDerivs String,
pdCharLit :: Result PappyDerivs Char,
pdStringLit :: Result PappyDerivs String,
pdWhitespace :: Result PappyDerivs String,
-- Input text
pdChar :: Result PappyDerivs Char,
pdPos :: Pos
}
instance Derivs PappyDerivs where
dvPos d = pdPos d
dvChar d = pdChar d
grammar = Parser pdGrammar
nonterminal = Parser pdNonterminal
altRule = Parser pdAltRule
seqRule = Parser pdSeqRule
seqMatch = Parser pdSeqMatch
unaryRule = Parser pdUnaryRule
primRule = Parser pdPrimRule
identifier = Parser pdIdentifier
word = Parser pdWord
symbol = Parser pdSymbol
rawCode = Parser pdRawCode
codeBlock = Parser pdCodeBlock
codeChars = Parser pdCodeChars
codeSQChars = Parser pdCodeSQChars
codeDQChars = Parser pdCodeDQChars
charLit = Parser pdCharLit
stringLit = Parser pdStringLit
whitespace = Parser pdWhitespace
pappyParse :: String -> String -> Either Grammar String
pappyParse name text =
let initpos = Pos name 1 1
d = pappyDerivs initpos text
in case pdGrammar d of
Parsed g _ _ -> Left g
NoParse e -> Right (show e)
pappyParseFile :: String -> IO Grammar
pappyParseFile filename =
do text <- readFile filename
let initpos = Pos filename 1 1
d = pappyDerivs initpos text
case pdGrammar d of
Parsed g _ _ -> return g
NoParse e -> fail (show e)
pappyDerivs :: Pos -> String -> PappyDerivs
pappyDerivs pos text = d
where d = PappyDerivs
(pGrammar d)
(pNonterminal d)
(pAltRule d)
(pSeqRule d)
(pSeqMatch d)
(pUnaryRule d)
(pPrimRule d)
(pIdentifier d)
(pWord d)
(pSymbol d)
(pRawCode d)
(pCodeBlock d)
(pCodeChars d)
(pCodeSQChars d)
(pCodeDQChars d)
(pCharLit d)
(pStringLit d)
(pWhitespace d)
chr pos
chr = case text of
[] -> NoParse (eofError d)
(c:cs) -> Parsed c (pappyDerivs (nextPos pos c) cs)
(nullError d)
-------------------- Grammar Definition Structure --------------------
pGrammar :: PappyDerivs -> Result PappyDerivs Grammar
Parser pGrammar =
do whitespace
keyword "parser"; name <- identifier; sym ":"
c1 <- rawCode </> return []
keyword "top"; tops <- sepBy1 identifier (sym ",")
ns <- many1 nonterminal
c2 <- rawCode </> return []
let code = c1 ++ "\n" ++ c2 ++ "\n"
notFollowedBy anyChar
return (name, code, tops, ns)
pNonterminal :: PappyDerivs -> Result PappyDerivs Nonterminal
Parser pNonterminal =
do n <- identifier
sym "::"
t <- identifier </> rawCode
sym "="
r <- altRule
return (n, t, r)
-------------------- Grammar Rule Expressions --------------------
pAltRule :: PappyDerivs -> Result PappyDerivs Rule
Parser pAltRule =
do alts <- sepBy1 seqRule (sym "/")
case alts of
[a] -> return a
_ -> return (RuleAlt alts)
pSeqRule :: PappyDerivs -> Result PappyDerivs Rule
Parser pSeqRule =
(do ms <- many seqMatch
sym "->"
id <- identifier
return (RuleSeq ms (ProdName id)))
</> (do ms <- many seqMatch
sym "->"
c <- rawCode
return (RuleSeq ms (ProdCode c)))
</> unaryRule
pSeqMatch :: PappyDerivs -> Result PappyDerivs Match
Parser pSeqMatch =
(do i <- identifier
sym ":"
r <- unaryRule
return (MatchName r i))
</> (do p <- rawCode
sym ":"
r <- unaryRule
return (MatchPat r p))
</> (do c <- charLit
sym ":"
r <- unaryRule
return (MatchString r [c]))
</> (do s <- stringLit
sym ":"
r <- unaryRule
return (MatchString r s))
</> (do sym "&"
r <- unaryRule
return (MatchAnd r))
</> (do sym "!"
r <- unaryRule
return (MatchNot r))
</> (do sym "&"
p <- rawCode
return (MatchPred p))
</> (do r <- unaryRule
return (MatchAnon r))
pUnaryRule :: PappyDerivs -> Result PappyDerivs Rule
Parser pUnaryRule =
(do r <- primRule
sym "?"
return (RuleOpt r))
</> (do r <- primRule
sym "+"
return (RulePlus r))
</> (do r <- primRule
sym "*"
return (RuleStar r))
</> primRule
pPrimRule :: PappyDerivs -> Result PappyDerivs Rule
Parser pPrimRule =
(do n <- identifier
return (RulePrim n))
</> (do c <- charLit
return (RuleChar c))
</> (do s <- stringLit
return (RuleString s))
</> (do sym "("
r <- altRule
sym ")"
return r)
-------------------- Lexical Structure --------------------
-- Keywords and identifiers
keyword :: String -> Parser PappyDerivs String
keyword w =
(do s <- word
if s == w then return w else fail "")
<?!> show w
pIdentifier :: PappyDerivs -> Result PappyDerivs String
Parser pIdentifier =
(do s <- word
if s `elem` keywords then fail "" else return s)
<?!> "identifier"
pWord :: PappyDerivs -> Result PappyDerivs String
Parser pWord =
do c <- satisfy anyChar isIdentStart
cs <- many (satisfy anyChar isIdentCont)
whitespace
return (c : cs)
isIdentStart c = isAlpha c || c == '_'
isIdentCont c = isIdentStart c || isDigit c || c == '\''
keywords = [
"parser", "top"]
-- Symbols: operators, parentheses, etc.
sym :: String -> Parser PappyDerivs String
sym s = (do s' <- symbol
if s' == s then return s else fail "")
<?!> show s
pSymbol :: PappyDerivs -> Result PappyDerivs String
Parser pSymbol =
do s <- stringFrom symbols
whitespace
return s
symbols = [
"->",
"::",
":",
"=",
"(",
")",
"/",
"+",
"*",
"?",
"&",
"!"]
-- Raw code block, ignoring following whitespace
pRawCode :: PappyDerivs -> Result PappyDerivs String
Parser pRawCode =
do c <- codeBlock
whitespace
return c
-- Raw code block, consuming only the braces and contents
pCodeBlock :: PappyDerivs -> Result PappyDerivs String
Parser pCodeBlock =
do char '{'
c <- codeChars
char '}'
return c
-- Characters in a code block
pCodeChars :: PappyDerivs -> Result PappyDerivs String
Parser pCodeChars =
(do b <- codeBlock -- nested code blocks
cs <- codeChars
return ("{" ++ b ++ "}" ++ cs))
</> (do char '\'' -- character literals
lit <- codeSQChars
char '\''
cs <- codeChars
return ("\'" ++ lit ++ "\'" ++ cs))
</> (do char '\"' -- string literals
lit <- codeDQChars
char '\"'
cs <- codeChars
return ("\"" ++ lit ++ "\"" ++ cs))
</> (do ic <- satisfy anyChar isIdentStart -- identifiers
ics <- many (satisfy anyChar isIdentCont)
cs <- codeChars
return (ic : ics ++ cs))
</> (do s <- lineComment -- comments
cs <- codeChars
return (s ++ cs))
</> (do c <- noneOf "{}\"\'" -- other characters
cs <- codeChars
return (c : cs))
</> return []
-- Characters in a single-quoted character literal in a code block
pCodeSQChars :: PappyDerivs -> Result PappyDerivs String
Parser pCodeSQChars =
(do char '\\'
c <- anyChar
cs <- codeSQChars
return ('\\' : c : cs))
</> (do c <- noneOf "\'\\\r\n"
cs <- codeSQChars
return (c : cs))
</> return []
-- Characters in a double-quoted string literal in a code block
pCodeDQChars :: PappyDerivs -> Result PappyDerivs String
Parser pCodeDQChars =
(do char '\\'
c <- anyChar
cs <- codeDQChars
return ('\\' : c : cs))
</> (do c <- noneOf "\"\\\r\n"
cs <- codeDQChars
return (c : cs))
</> return []
-- Character and string literals
quotedChar quote =
(do char '\\'
c <- anyChar
case c of
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
'v' -> return '\v'
'f' -> return '\f'
'\\' -> return '\\'
'\'' -> return '\''
'\"' -> return '\"'
-- XXX octal characters, other escapes
_ -> fail "invalid escape sequence")
</> satisfy anyChar (\c -> c /= quote)
pCharLit :: PappyDerivs -> Result PappyDerivs Char
Parser pCharLit = (do char '\''
c <- quotedChar '\''
char '\''
whitespace
return c)
pStringLit :: PappyDerivs -> Result PappyDerivs String
Parser pStringLit = (do char '"'
s <- many (quotedChar '"')
char '"'
whitespace
return s)
-- Whitespace
pWhitespace :: PappyDerivs -> Result PappyDerivs String
Parser pWhitespace =
do ss <- many (many1 spaceChar </> lineComment)
return (concat ss)
spaceChar :: Parser PappyDerivs Char
spaceChar = satisfy anyChar isSpace
--flowComment :: Parser PappyDerivs ()
--flowComment =
-- do string "-*"
-- many (do notFollowedBy (string "*/"); anyChar)
-- string "*-"
-- return ()
lineComment :: Parser PappyDerivs String
lineComment =
do s1 <- string "--"
s2 <- many (do notFollowedBy lineTerminator; anyChar)
s3 <- lineTerminator
return (s1 ++ s2 ++ s3)
lineTerminator :: Parser PappyDerivs String
lineTerminator =
(do string "\r\n")
</> string "\r"
</> string "\n"
thesis.ps
Скачать