SpecialistOff.NET / Вопросы / Статьи / Фрагменты кода / Резюме / Метки / Помощь / Файлы
НазадМетки: [packrat parsing]; [ll]; [lr];
Packrat parsing is a novel and practical method for implementing linear-time parsers for grammars defined in Top-Down Parsing Language (TDPL). While TDPL was originally created as a formal model for top-down parsers with backtracking capability, this thesis extends TDPL into a powerful general-purpose notation for describing language syntax, providing a compelling alternative to traditional context-free grammars (CFGs). Common syntactic idioms that cannot be represented concisely in a CFG are easily expressed in TDPL, such as longest-match disambiguation and "syntactic predicates," making it possible to describe the complete lexical and grammatical syntax of a practical programming language in a single TDPL grammar.
Packrat parsing is an adaptation of a 30-year-old tabular parsing algorithm that was never put into practice until now. A packrat parser can recognize any string defined by a TDPL grammar in linear time, providing the power and flexibility of a backtracking recursive descent parser without the attendant risk of exponential parse time. A packrat parser can recognize any LL(k) or LR(k) language, as well as many languages requiring unlimited lookahead that cannot be parsed by shift/reduce parsers. Packrat parsing also provides better composition properties than LL/LR parsing, making it more suitable for dynamic or extensible languages. The primary disadvantage of packrat parsing is its storage cost, which is a constant multiple of the total input size rather than being proportional to the nesting depth of the syntactic constructs appearing in the input.
Monadic combinators and lazy evaluation enable elegant and direct implementations of packrat parsers in recent functional programming languages such as Haskell. Three different packrat parsers for the Java language are presented here, demonstrating the construction of packrat parsers in Haskell using primitive pattern matching, using monadic combinators, and by automatic generation from a declarative parser specification. The prototype packrat parser generator developed for the third case itself uses a packrat parser to read its parser specifications, and supports full TDPL notation extended with "semantic predicates," allowing parsing decisions to depend on the semantic values of other syntactic entities. Experimental results show that all of these packrat parsers run reliably in linear time, efficiently support "scannerless" parsing with integrated lexical analysis, and provide the user-friendly error-handling facilities necessary in practical applications.
In PDF or PostScript
The full source code for Pappy, the prototype packrat parser generator described in the thesis, is available for browsing in this directory, or for downloading as a gzipped tar file. A brief breakdown of the source files follows:
Following are complete versions of the example parsers for the trivial arithmetic expression language used in the thesis:
The three complete and working parsers for the Java language, which are described in the paper and used for analysis and comparison purposes, are available here:
The test suite of Java source files used to obtain the experimental results in the thesis are available in this gzipped tar file. All of these Java source files were taken from Cryptix version 3.2.0.
Enjoy!
--- JavaPat.hs --- JavaMonad.hs --- Java.pappy --- Parse.hs --- Pos.hs --- Arith.pappy --- ArithMonad.hs --- ArithLex.hs --- ArithLeft.hs --- ArithPackrat.hs --- ArithRecurse.hs --- Main.hs --- WriteParser.hs --- MemoAnalysis.hs --- SimplifyGrammar.hs --- ReduceGrammar.hs --- ReadGrammar.hs --- thesis.ps
module JavaPat where import Char import List import Pos import Parse -------------------- Lexical Preprocessing -------------------- -- Preprocess unicode escapes and newlines javaPrep :: String -> String javaPrep [] = [] javaPrep ('\r':'\n':s) = '\n':javaPrep s javaPrep ('\r':s) = '\n':javaPrep s javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s javaPrep ('\\':'u':s) = case s of h1:h2:h3:h4:s -> if isHexDigit h1 && isHexDigit h2 && isHexDigit h3 && isHexDigit h4 then chr v4:javaPrep s else error "invalid Unicode escape sequence" where v1 = digitToInt h1 v2 = v1*16 + digitToInt h2 v3 = v2*16 + digitToInt h3 v4 = v3*16 + digitToInt h4 _ -> error "incomplete Unicode escape sequence" javaPrep (c:s) = c:javaPrep s -------------------- Abstract Syntax Trees -------------------- type Identifier = String type Name = [Identifier] data Token = TokKeyword String | TokIdent Identifier | TokSymbol String | TokInt Integer Bool | TokFloat Double Bool | TokChar Char | TokString String | TokBool Bool | TokNull instance Eq Token where TokKeyword s == TokKeyword s' = s == s' TokIdent s == TokIdent s' = s == s' TokSymbol s == TokSymbol s' = s == s' TokInt i l == TokInt i' l' = i == i' && l == l' TokFloat f l == TokFloat f' l' = f == f' && l == l' TokChar c == TokChar c' = c == c' TokString s == TokString s' = s == s' _ == _ = False instance Show Token where show (TokKeyword s) = "reserved word " ++ show s show (TokIdent s) = "identifier " ++ show s show (TokSymbol s) = "symbol " ++ show s show (TokInt i l) = show i ++ (if l then "l" else "") show (TokFloat f l) = show f ++ (if l then "l" else "") show (TokChar c) = show c show (TokString s) = show s data Expression = ExpLiteral Token | ExpIdent Identifier | ExpPrefix String Expression | ExpPostfix String Expression | ExpBinary String Expression Expression | ExpSelect Expression Expression | ExpInstanceof Expression DeclType | ExpNewClass DeclType [Expression] (Maybe [Declaration]) | ExpNewArray DeclType [Maybe Expression] (Maybe [Initializer]) | ExpCall Expression [Expression] | ExpArray Expression (Maybe Expression) | ExpCast DeclType Expression | ExpCond Expression Expression Expression | ExpThis | ExpSuper | ExpClass type Modifier = String data DeclType = DtBasic String Int | DtIdent [Identifier] Int type Declarator = (Identifier, Int, Maybe Initializer) data Declaration = DeclSimple [Modifier] DeclType [Declarator] | DeclMethod [Modifier] (Maybe DeclType) Identifier [FormalParam] Int [Name] (Maybe [Statement]) | DeclConstructor [Modifier] Identifier [FormalParam] [Name] [Statement] | DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType] [Declaration] | DeclInterface [Modifier] Identifier [DeclType] [Declaration] | DeclBlock Bool [Statement] -- formal parameter: "final" flag, type, parameter name, array dimension type FormalParam = (Bool, DeclType, Identifier, Int) data Initializer = IniExpr Expression | IniList [Initializer] data SwitchGroup = SwCase Expression [Statement] | SwDefault [Statement] data ForInit = FiExpr [Expression] | FiDecl Bool DeclType [Declarator] | FiNone data Statement = StLabel Identifier Statement | StCase Expression Statement | StDefault Statement | StDecl Declaration | StExpr Expression | StBlock [Statement] | StIf Expression Statement (Maybe Statement) | StSwitch Expression [SwitchGroup] | StWhile Expression Statement | StDo Statement Expression | StFor ForInit (Maybe Expression) [Expression] Statement | StTry [Statement] [CatchClause] (Maybe [Statement]) | StSynch Expression [Statement] | StContinue (Maybe Identifier) | StBreak (Maybe Identifier) | StReturn (Maybe Expression) | StThrow Expression | StNull -- catch clause: type CatchClause = (FormalParam, [Statement]) -- import declaration: qualified name, ".*" flag type ImportDecl = (Name, Bool) -- compilation unit: package name, import declarations, type declarations type CompUnit = (Maybe Name, [ImportDecl], [Declaration]) instance Show Expression where show expr = cprintExpr 0 0 expr instance Show Statement where show stmt = cprintStmt 0 stmt cprintExpr indent prec expr = undefined cprintStmt indent stmt = undefined -------------------- Packrat Parsing -------------------- data JavaDerivs = JavaDerivs { cdPos :: Pos, cdText :: String, cdChar :: Result JavaDerivs Char, cdTok :: TokDerivs, cdExpr :: ExprDerivs, cdStmt :: StmtDerivs, cdDecl :: DeclDerivs } data TokDerivs = TokDerivs { tdWhitespace :: Result JavaDerivs (), tdWord :: Result JavaDerivs Token, tdSym :: Result JavaDerivs Token, tdHexLit :: Result JavaDerivs Token, tdOctLit :: Result JavaDerivs Token, tdDecLit :: Result JavaDerivs Token, tdFloatSize :: Result JavaDerivs Bool, tdFloatExp :: Result JavaDerivs Integer, tdFloatLit :: Result JavaDerivs Token, tdCharLit :: Result JavaDerivs Token, tdStringLit :: Result JavaDerivs Token, tdToken :: Result JavaDerivs Token } data ExprDerivs = ExprDerivs { edParExpr :: Result JavaDerivs Expression, edPrimExpr :: Result JavaDerivs Expression, edPostfixExpr :: Result JavaDerivs Expression, edPrefixExpr :: Result JavaDerivs Expression, edMultExpr :: Result JavaDerivs Expression, edAddExpr :: Result JavaDerivs Expression, edShiftExpr :: Result JavaDerivs Expression, edRelExpr :: Result JavaDerivs Expression, edEqExpr :: Result JavaDerivs Expression, edAndExpr :: Result JavaDerivs Expression, edXorExpr :: Result JavaDerivs Expression, edOrExpr :: Result JavaDerivs Expression, edCondAndExpr :: Result JavaDerivs Expression, edCondOrExpr :: Result JavaDerivs Expression, edCondExpr :: Result JavaDerivs Expression, edAssignExpr :: Result JavaDerivs Expression, edExpression :: Result JavaDerivs Expression } data StmtDerivs = StmtDerivs { sdCatchClause :: Result JavaDerivs CatchClause, sdSwitchGroup :: Result JavaDerivs SwitchGroup, sdForInit :: Result JavaDerivs ForInit, sdStatement :: Result JavaDerivs Statement, sdBlockStmt :: Result JavaDerivs Statement, sdBlock :: Result JavaDerivs [Statement] } data DeclDerivs = DeclDerivs { ddModifier :: Result JavaDerivs Modifier, ddDeclType :: Result JavaDerivs DeclType, ddFormalParam :: Result JavaDerivs FormalParam, ddFormalParams :: Result JavaDerivs [FormalParam], ddDeclarator :: Result JavaDerivs Declarator, ddDeclaration :: Result JavaDerivs Declaration, ddInitializer :: Result JavaDerivs Initializer, ddArrayInit :: Result JavaDerivs [Initializer], ddImportDecl :: Result JavaDerivs ImportDecl, ddCompUnit :: Result JavaDerivs CompUnit } instance Derivs JavaDerivs where dvPos d = cdPos d dvChar d = cdChar d whitespace = tdWhitespace . cdTok word = tdWord . cdTok sym = tdSym . cdTok hexLit = tdHexLit . cdTok octLit = tdOctLit . cdTok decLit = tdDecLit . cdTok floatSize = tdFloatSize . cdTok floatExp = tdFloatExp . cdTok floatLit = tdFloatLit . cdTok charLit = tdCharLit . cdTok stringLit = tdStringLit . cdTok token = tdToken . cdTok --parExpr = edParExpr . cdExpr --primExpr = edPrimExpr . cdExpr --postfixExpr = edPostfixExpr . cdExpr --prefixExpr = edPrefixExpr . cdExpr --multExpr = edMultExpr . cdExpr --addExpr = edAddExpr . cdExpr --shiftExpr = edShiftExpr . cdExpr --relExpr = edRelExpr . cdExpr --eqExpr = edEqExpr . cdExpr --andExpr = edAndExpr . cdExpr --xorExpr = edXorExpr . cdExpr --orExpr = edOrExpr . cdExpr --condAndExpr = edCondAndExpr . cdExpr --condOrExpr = edCondOrExpr . cdExpr --condExpr = edCondExpr . cdExpr --assignExpr = edAssignExpr . cdExpr --expression = edExpression . cdExpr --catchClause = sdCatchClause . cdStmt --switchGroup = sdSwitchGroup . cdStmt --forInit = sdForInit . cdStmt --statement = sdStatement . cdStmt --blockStmt = sdBlockStmt . cdStmt --block = sdBlock . cdStmt --modifier = ddModifier . cdDecl --declType = ddDeclType . cdDecl --formalParam = ddFormalParam . cdDecl --formalParams = ddFormalParams. cdDecl --declarator = ddDeclarator . cdDecl --declaration = ddDeclaration . cdDecl --initializer = ddInitializer . cdDecl --arrayInit = ddArrayInit . cdDecl --importDecl = ddImportDecl . cdDecl --compUnit = ddCompUnit . cdDecl parExpr = Parser (edParExpr . cdExpr) primExpr = Parser (edPrimExpr . cdExpr) postfixExpr = Parser (edPostfixExpr . cdExpr) prefixExpr = Parser (edPrefixExpr . cdExpr) multExpr = Parser (edMultExpr . cdExpr) addExpr = Parser (edAddExpr . cdExpr) shiftExpr = Parser (edShiftExpr . cdExpr) relExpr = Parser (edRelExpr . cdExpr) eqExpr = Parser (edEqExpr . cdExpr) andExpr = Parser (edAndExpr . cdExpr) xorExpr = Parser (edXorExpr . cdExpr) orExpr = Parser (edOrExpr . cdExpr) condAndExpr = Parser (edCondAndExpr . cdExpr) condOrExpr = Parser (edCondOrExpr . cdExpr) condExpr = Parser (edCondExpr . cdExpr) assignExpr = Parser (edAssignExpr . cdExpr) expression = Parser (edExpression . cdExpr) catchClause = Parser (sdCatchClause . cdStmt) switchGroup = Parser (sdSwitchGroup . cdStmt) forInit = Parser (sdForInit . cdStmt) statement = Parser (sdStatement . cdStmt) blockStmt = Parser (sdBlockStmt . cdStmt) block = Parser (sdBlock . cdStmt) modifier = Parser (ddModifier . cdDecl) declType = Parser (ddDeclType . cdDecl) formalParam = Parser (ddFormalParam . cdDecl) formalParams = Parser (ddFormalParams. cdDecl) declarator = Parser (ddDeclarator . cdDecl) declaration = Parser (ddDeclaration . cdDecl) initializer = Parser (ddInitializer . cdDecl) arrayInit = Parser (ddArrayInit . cdDecl) importDecl = Parser (ddImportDecl . cdDecl) compUnit = Parser (ddCompUnit . cdDecl) -------------------- Lexical Structure -------------------- lineTerminator :: JavaDerivs -> Result JavaDerivs () lineTerminator d = case dvChar d of Parsed '\r' d' e' -> case dvChar d' of Parsed '\n' d'' e'' -> Parsed () d'' e'' Parsed _ d'' e'' -> Parsed () d' e' NoParse e'' -> NoParse e'' Parsed '\n' d' e' -> Parsed () d' e' Parsed _ d' e' -> NoParse (nullError d) NoParse e' -> NoParse e' -- Whitespace spaceChar :: JavaDerivs -> Result JavaDerivs () spaceChar d = case dvChar d of Parsed c d' e' -> if isSpace c then Parsed () d' e' else NoParse (nullError d) NoParse e' -> NoParse e' traditionalCommentRest :: JavaDerivs -> Result JavaDerivs () traditionalCommentRest d = case dvChar d of Parsed '*' d' e' -> case dvChar d' of Parsed '/' d'' e'' -> Parsed () d'' e'' _ -> traditionalCommentRest d' Parsed _ d' e' -> traditionalCommentRest d' NoParse e' -> NoParse e' traditionalComment :: JavaDerivs -> Result JavaDerivs () traditionalComment d = case dvChar d of Parsed '/' d' e' -> case dvChar d' of Parsed '*' d'' e'' -> traditionalCommentRest d'' _ -> NoParse (nullError d) _ -> NoParse (nullError d) endOfLineCommentRest :: JavaDerivs -> Result JavaDerivs () endOfLineCommentRest d = case lineTerminator d of Parsed _ d' e' -> Parsed () d' e' _ -> case dvChar d of Parsed _ d' e' -> endOfLineCommentRest d' NoParse e' -> NoParse e' endOfLineComment :: JavaDerivs -> Result JavaDerivs () endOfLineComment d = case dvChar d of Parsed '/' d' e' -> case dvChar d' of Parsed '/' d'' e'' -> endOfLineCommentRest d'' _ -> NoParse (nullError d) _ -> NoParse (nullError d) pWhitespace :: JavaDerivs -> Result JavaDerivs () pWhitespace d = case spaceChar d of Parsed _ d' e' -> whitespace d' _ -> case traditionalComment d of Parsed _ d' e' -> whitespace d' _ -> case endOfLineComment d of Parsed _ d' e' -> whitespace d' _ -> Parsed () d (nullError d) -- Keywords and identifiers keywords = [ "abstract", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "extends", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "try", "void", "volatile", "while" ] isIdentStart c = isAlpha c || c == '_' isIdentCont c = isIdentStart c || isDigit c pWordRest :: JavaDerivs -> Result JavaDerivs String pWordRest d = case dvChar d of Parsed c d' e' -> if isIdentCont c then case pWordRest d' of Parsed w d'' e'' -> Parsed (c:w) d'' e'' else case whitespace d of Parsed _ d'' e'' -> Parsed [] d'' e'' _ -> Parsed [] d (nullError d) pWord :: JavaDerivs -> Result JavaDerivs Token pWord d = case dvChar d of Parsed c d' e' -> if isIdentStart c then case pWordRest d' of Parsed w d'' e'' -> Parsed (interp (c:w)) d'' e'' else NoParse (nullError d) _ -> NoParse (nullError d) where interp w = case w of "null" -> TokNull "true" -> TokBool True "false" -> TokBool False _ -> if w `elem` keywords then TokKeyword w else TokIdent w -- Recognize an operator or punctuation symbol. -- There are certainly more readable alternative ways to code this, -- but the approach here makes very simple and direct use of pattern matching -- to produce an efficient decision tree. -- This code simulates what an automated packrat parser generator might create -- for rules with many alternatives matching simple constant strings. pSym :: JavaDerivs -> Result JavaDerivs Token pSym d = case dvChar d of Parsed '<' d' e' -> case dvChar d' of Parsed '<' d'' e'' -> case dvChar d'' of Parsed '=' d''' e''' -> case whitespace d''' of Parsed _ d'''' e'''' -> Parsed (TokSymbol "<<=") d'''' e'''' _ -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "<<") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "<=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "<") d'' e'' Parsed '>' d' e' -> case dvChar d' of Parsed '>' d'' e'' -> case dvChar d'' of Parsed '>' d''' e''' -> case dvChar d''' of Parsed '=' d'''' e'''' -> case whitespace d'''' of Parsed _ e''''' d''''' -> Parsed (TokSymbol ">>>=") e''''' d''''' _ -> case whitespace d''' of Parsed _ d'''' e'''' -> Parsed (TokSymbol ">>>") d'''' e'''' Parsed '=' d''' e''' -> case whitespace d''' of Parsed _ d'''' e'''' -> Parsed (TokSymbol ">>=") d'''' e'''' _ -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol ">>") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol ">=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol ">") d'' e'' Parsed '+' d' e' -> case dvChar d' of Parsed '+' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "++") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "+=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "+") d'' e'' Parsed '-' d' e' -> case dvChar d' of Parsed '-' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "--") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "-=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "-") d'' e'' Parsed '*' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "*=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "*") d'' e'' Parsed '/' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "/=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "/") d'' e'' Parsed '%' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "%=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "%") d'' e'' Parsed '&' d' e' -> case dvChar d' of Parsed '&' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "&&") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "&=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "&") d'' e'' Parsed '^' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "^=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "^") d'' e'' Parsed '|' d' e' -> case dvChar d' of Parsed '|' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "||") d''' e''' Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "|=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "|") d'' e'' Parsed '=' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "==") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "=") d'' e'' Parsed '!' d' e' -> case dvChar d' of Parsed '=' d'' e'' -> case whitespace d'' of Parsed _ d''' e''' -> Parsed (TokSymbol "!=") d''' e''' _ -> case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol "!") d'' e'' Parsed c d' e' -> if c `elem` ";:,.{}[]()~?" then case whitespace d' of Parsed _ d'' e'' -> Parsed (TokSymbol [c]) d'' e'' else NoParse (nullError d) NoParse e' -> NoParse e' -- Integer literals pIntTypeSuffix :: JavaDerivs -> Result JavaDerivs Bool pIntTypeSuffix d = case dvChar d of Parsed 'l' d' e' -> ws True d' Parsed 'L' d' e' -> ws True d' _ -> ws False d where ws islong d' = case whitespace d' of Parsed _ d'' e'' -> Parsed islong d'' e'' pHexDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer) pHexDigits d = case dvChar d of Parsed c d' e' -> if isHexDigit c then case pHexDigits d' of Parsed (v,n) d'' e'' -> Parsed (v + toInteger (digitToInt c) * 16^n, n+1) d'' e'' _ -> Parsed (toInteger (digitToInt c), 1) d' e' else NoParse (nullError d) _ -> NoParse (nullError d) pHexLit :: JavaDerivs -> Result JavaDerivs Token pHexLit d = case dvChar d of Parsed '0' d' e' -> case dvChar d' of Parsed 'x' d'' e'' -> digs d'' Parsed 'X' d'' e'' -> digs d'' _ -> NoParse (nullError d) _ -> NoParse (nullError d) where digs d'' = case pHexDigits d'' of Parsed (v,n) d''' e''' -> case pIntTypeSuffix d''' of Parsed l d'''' e'''' -> Parsed (TokInt v l) d'''' e'''' _ -> NoParse (nullError d) pOctDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer) pOctDigits d = case dvChar d of Parsed c d' e' -> if isOctDigit c then case pOctDigits d' of Parsed (v,n) d'' e'' -> Parsed (v + toInteger (digitToInt c) * 8^n, n+1) d'' e'' _ -> Parsed (toInteger (digitToInt c), 1) d' e' else NoParse (nullError d) _ -> NoParse (nullError d) pOctLit :: JavaDerivs -> Result JavaDerivs Token pOctLit d = case dvChar d of Parsed '0' d' e' -> case pOctDigits d' of Parsed (v,n) d'' e'' -> case pIntTypeSuffix d'' of Parsed l d''' e''' -> Parsed (TokInt v l) d''' e''' _ -> NoParse (nullError d) _ -> NoParse (nullError d) pDecDigits :: JavaDerivs -> Result JavaDerivs (Integer, Integer) pDecDigits d = case dvChar d of Parsed c d' e' -> if isDigit c then case pDecDigits d' of Parsed (v,n) d'' e'' -> Parsed (v + toInteger (digitToInt c) * 10^n, n+1) d'' e'' _ -> Parsed (toInteger (digitToInt c), 1) d' e' else NoParse (nullError d) _ -> NoParse (nullError d) pDecLit :: JavaDerivs -> Result JavaDerivs Token pDecLit d = case pDecDigits d of Parsed (v,n) d' e' -> case pIntTypeSuffix d' of Parsed l d'' e'' -> Parsed (TokInt v l) d'' e'' _ -> NoParse (nullError d) -- Floating-point literals scanDec :: String -> Integer scanDec digits = foldl f 0 digits where f v c = v * 10 + toInteger (digitToInt c) pFloatSize :: JavaDerivs -> Result JavaDerivs Bool Parser pFloatSize = (do { s <- oneOf "dD"; return True }) </> (do { s <- oneOf "fF"; return False }) pFloatExp :: JavaDerivs -> Result JavaDerivs Integer Parser pFloatExp = do oneOf "eE" neg <- plusminus digits <- many1 digit let f v c = v * 10 + toInteger (digitToInt c) val = foldl f 0 digits return (if neg then -val else val) where plusminus = (do { char '+'; return False }) </> (do { char '-'; return True }) </> return False pFloatLit :: JavaDerivs -> Result JavaDerivs Token Parser pFloatLit = (do i <- many1 digit char '.' f <- many digit e <- Parser floatExp </> return 0 s <- Parser floatSize </> return False Parser whitespace return (mkfloat (scanDec i) f e s)) </> (do char '.' f <- many1 digit e <- Parser floatExp </> return 0 s <- Parser floatSize </> return False Parser whitespace return (mkfloat 0 f e s)) </> (do i <- many1 digit e <- Parser floatExp s <- Parser floatSize </> return False Parser whitespace return (mkfloat (scanDec i) [] e s)) </> (do i <- many1 digit e <- Parser floatExp </> return 0 s <- Parser floatSize Parser whitespace return (mkfloat (scanDec i) [] e s)) where mkfloat :: Integer -> String -> Integer -> Bool -> Token mkfloat i f e s = TokFloat (scanfrac i f * 10.0**(fromInteger e)) s scanfrac :: Integer -> String -> Double scanfrac i [] = fromInteger i scanfrac i (c:cs) = scanfrac (i * 10 + toInteger (digitToInt c)) cs / 10.0 -- Character and string literals quotedChar quote d = case dvChar d of Parsed '\\' d' e' -> case dvChar d' of Parsed c d'' e'' -> case c of 'n' -> Parsed '\n' d'' e'' 'r' -> Parsed '\r' d'' e'' 't' -> Parsed '\t' d'' e'' 'v' -> Parsed '\v' d'' e'' 'f' -> Parsed '\f' d'' e'' '\\' -> Parsed '\\' d'' e'' '\'' -> Parsed '\'' d'' e'' '\"' -> Parsed '\"' d'' e'' -- XXX octal characters, other escapes _ -> NoParse (msgError (dvPos d) "invalid escape sequence") _ -> NoParse (nullError d) Parsed c d' e' -> if c /= quote then Parsed c d' e' else NoParse (nullError d) _ -> NoParse (nullError d) pCharLit :: JavaDerivs -> Result JavaDerivs Token pCharLit d = case dvChar d of Parsed '\'' d' e' -> case quotedChar '\'' d' of Parsed c d'' e'' -> case dvChar d'' of Parsed '\'' d''' e''' -> case whitespace d''' of Parsed _ d'''' e'''' -> Parsed (TokChar c) d'''' e'''' _ -> NoParse (nullError d) _ -> NoParse (nullError d) _ -> NoParse (nullError d) stringLitChars :: JavaDerivs -> Result JavaDerivs String stringLitChars d = case quotedChar '"' d of Parsed c d' e' -> case stringLitChars d' of Parsed s d'' e'' -> Parsed (c:s) d'' e'' _ -> Parsed [] d (nullError d) pStringLit :: JavaDerivs -> Result JavaDerivs Token pStringLit d = case dvChar d of Parsed '"' d' e' -> case stringLitChars d' of Parsed s d'' e'' -> case dvChar d'' of Parsed '"' d''' e''' -> case whitespace d''' of Parsed _ d'''' e'''' -> Parsed (TokString s) d'''' e'''' _ -> NoParse (nullError d) _ -> NoParse (nullError d) _ -> NoParse (nullError d) -- Token tie-up pToken :: JavaDerivs -> Result JavaDerivs Token pToken d = case word d of r @ (Parsed t d' e') -> r _ -> case sym d of r @ (Parsed t d' e') -> r _ -> case charLit d of r @ (Parsed t d' e') -> r _ -> case stringLit d of r @ (Parsed t d' e') -> r _ -> case floatLit d of r @ (Parsed t d' e') -> r _ -> case hexLit d of r @ (Parsed t d' e') -> r _ -> case octLit d of r @ (Parsed t d' e') -> r _ -> case decLit d of r @ (Parsed t d' e') -> r _ -> NoParse (nullError d) keyword :: String -> Parser JavaDerivs String keyword w = Parser parse where parse d = case token d of Parsed (TokKeyword w') d' e' -> if w' == w then Parsed w d' e' else none d _ -> none d none d = NoParse (expError (dvPos d) (show w)) symbol :: String -> Parser JavaDerivs String symbol s = Parser parse where parse d = case token d of Parsed (TokSymbol s') d' e' -> if s' == s then Parsed s d' e' else none d _ -> none d none d = NoParse (expError (dvPos d) (show s)) identifier :: Parser JavaDerivs Identifier identifier = Parser parse where parse d = case token d of Parsed (TokIdent s) d' e' -> Parsed s d' e' _ -> NoParse (expError (dvPos d) "identifier") -------------------- Expressions -------------------- arguments = do symbol "("; eargs <- sepBy expression (symbol ","); symbol ")" return eargs pParExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pParExpr = do symbol "("; e <- expression; symbol ")"; return e pPrimExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPrimExpr = (do t <- Parser token case t of TokIdent s -> return (ExpIdent s) TokInt i l -> return (ExpLiteral t) TokFloat f l -> return (ExpLiteral t) TokChar c -> return (ExpLiteral t) TokString c -> return (ExpLiteral t) TokBool b -> return (ExpLiteral t) TokNull -> return (ExpLiteral t) _ -> fail ("found " ++ show t)) </> parExpr </> (do keyword "this"; return ExpThis) </> (do keyword "super"; return ExpSuper) </> (do keyword "class"; return ExpClass) </> (do keyword "new" -- class creator ty <- declType args <- arguments body <- optional classBody return (ExpNewClass ty args body)) </> (do keyword "new" -- array creator ty <- declType dims <- many (do symbol "[" -- XXX many1 e <- optional expression symbol "]" return e) init <- optional arrayInit return (ExpNewArray ty dims init)) <?> "primary expression" suffix :: Parser JavaDerivs (Expression -> Expression) suffix = (do symbol "["; eidx <- optional expression; symbol "]" return (\ebase -> ExpArray ebase eidx)) </> (do eargs <- arguments return (\efunc -> ExpCall efunc eargs)) </> (do symbol "."; eitem <- primExpr return (\econtext -> ExpSelect econtext eitem)) </> (do op <- (symbol "++" </> symbol "--") return (\e -> ExpPostfix op e)) pPostfixExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPostfixExpr = (do exp <- primExpr suffixes <- many suffix return (foldl (\e s -> s e) exp suffixes)) <?> "postfix expression" pPrefixExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPrefixExpr = (do op <- (symbol "++" </> symbol "--" </> symbol "+" </> symbol "-" </> symbol "~" </> symbol "!") exp <- prefixExpr return (ExpPostfix op exp)) </> (do symbol "("; t <- declType; symbol ")"; e <- prefixExpr return (ExpCast t e)) </> postfixExpr <?> "prefix expression" pMultExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pMultExpr = chainl1 prefixExpr (do op <- (symbol "*" </> symbol "/" </> symbol "%") return (\l r -> ExpBinary op l r)) pAddExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAddExpr = chainl1 multExpr (do op <- (symbol "+" </> symbol "-") return (\l r -> ExpBinary op l r)) pShiftExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pShiftExpr = chainl1 addExpr (do op <- (symbol "<<" </> symbol ">>" </> symbol ">>>") return (\l r -> ExpBinary op l r)) pRelExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pRelExpr = do l <- shiftExpr; suffix l where suffix l = (do op <- (symbol "<=" </> symbol ">=" </> symbol "<" </> symbol ">") r <- shiftExpr suffix (ExpBinary op l r)) </> (do keyword "instanceof" t <- declType suffix (ExpInstanceof l t)) </> return l pEqExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pEqExpr = chainl1 relExpr (do op <- (symbol "==" </> symbol "!=") return (\l r -> ExpBinary op l r)) pAndExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAndExpr = chainl1 eqExpr (do op <- symbol "&" return (\l r -> ExpBinary op l r)) pXorExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pXorExpr = chainl1 andExpr (do op <- symbol "^" return (\l r -> ExpBinary op l r)) pOrExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pOrExpr = chainl1 xorExpr (do op <- symbol "|" return (\l r -> ExpBinary op l r)) pCondAndExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondAndExpr = chainl1 orExpr (do op <- symbol "&&" return (\l r -> ExpBinary op l r)) pCondOrExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondOrExpr = chainl1 condAndExpr (do op <- symbol "||" return (\l r -> ExpBinary op l r)) pCondExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondExpr = (do c <- condOrExpr; symbol "?" t <- expression; symbol ":" f <- condExpr return (ExpCond c t f)) </> condOrExpr <?> "conditional expression" pAssignExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAssignExpr = (do l <- prefixExpr op <- (symbol "=" </> symbol "*=" </> symbol "/=" </> symbol "%=" </> symbol "+=" </> symbol "-=" </> symbol "<<=" </> symbol ">>=" </> symbol ">>>=" </> symbol "&=" </> symbol "^=" </> symbol "|=") r <- assignExpr return (ExpBinary op l r)) </> condExpr <?> "assignment expression" pExpression :: JavaDerivs -> Result JavaDerivs Expression pExpression = pAssignExpr -------------------- Statements -------------------- pCatchClause :: JavaDerivs -> Result JavaDerivs CatchClause Parser pCatchClause = (do keyword "catch" symbol "(" p <- formalParam symbol ")" b <- block return (p,b)) pSwitchGroup :: JavaDerivs -> Result JavaDerivs SwitchGroup Parser pSwitchGroup = (do keyword "case"; e <- expression; symbol ":" s <- many blockStmt return (SwCase e s)) </> (do keyword "default"; symbol ":" s <- many blockStmt return (SwDefault s)) <?> "switch statement group" pForInit :: JavaDerivs -> Result JavaDerivs ForInit Parser pForInit = (do f <- (do keyword "final"; return True) </> (return False) t <- declType d <- sepBy1 declarator (symbol ",") return (FiDecl f t d)) </> (do es <- sepBy expression (symbol ",") return (FiExpr es)) </> return FiNone pStatement :: JavaDerivs -> Result JavaDerivs Statement Parser pStatement = (do b <- block return (StBlock b)) </> (do keyword "if"; e <- parExpr t <- statement; keyword "else"; f <- statement return (StIf e t (Just f))) </> (do keyword "if"; e <- parExpr t <- statement return (StIf e t Nothing)) </> (do keyword "for"; symbol "(" i <- forInit; symbol ";" c <- optional expression; symbol ";" n <- sepBy expression (symbol ","); symbol ")" b <- statement return (StFor i c n b)) </> (do keyword "while"; e <- parExpr b <- statement return (StWhile e b)) </> (do keyword "do"; b <- statement; keyword "while" e <- parExpr; symbol ";" return (StDo b e)) </> (do keyword "try" b <- block c <- many catchClause f <- optional (do keyword "finally"; block) return (StTry b c f)) </> (do keyword "switch"; e <- parExpr symbol "{"; b <- many switchGroup; symbol "}" return (StSwitch e b)) </> (do keyword "synchronized"; e <- parExpr; b <- block return (StSynch e b)) </> (do keyword "return"; e <- optional expression; symbol ";" return (StReturn e)) </> (do keyword "throw"; e <- expression; symbol ";" return (StThrow e)) </> (do keyword "break"; i <- optional identifier; symbol ";" return (StBreak i)) </> (do keyword "continue"; i <- optional identifier; symbol ";" return (StContinue i)) </> (do i <- identifier; symbol ":"; s <- statement return (StLabel i s)) </> (do e <- expression; symbol ";" return (StExpr e)) </> (do symbol ";" return (StNull)) <?> "statement" pBlockStmt :: JavaDerivs -> Result JavaDerivs Statement Parser pBlockStmt = (do d <- declaration return (StDecl d)) </> statement <?> "block statement" pBlock :: JavaDerivs -> Result JavaDerivs [Statement] Parser pBlock = (do symbol "{"; s <- many blockStmt; symbol "}"; return s) -------------------- Declarations -------------------- qualName = sepBy1 identifier (symbol ".") pModifier :: JavaDerivs -> Result JavaDerivs Modifier Parser pModifier = keyword "public" </> keyword "protected" </> keyword "private" </> keyword "static" </> keyword "abstract" </> keyword "final" </> keyword "native" </> keyword "synchronized" </> keyword "transient" </> keyword "volatile" </> keyword "strictfp" pDeclType :: JavaDerivs -> Result JavaDerivs DeclType Parser pDeclType = (do s <- keyword "byte" </> keyword "short" </> keyword "char" </> keyword "int" </> keyword "long" </> keyword "float" </> keyword "double" </> keyword "boolean" b <- many (do symbol "["; symbol "]") return (DtBasic s (length b))) </> (do i <- qualName b <- many (do symbol "["; symbol "]") return (DtIdent i (length b))) <?> "type" pFormalParam :: JavaDerivs -> Result JavaDerivs FormalParam Parser pFormalParam = do f <- (do keyword "final"; return True) </> (return False) t <- declType i <- identifier a <- many (do symbol "["; symbol "]") return (f, t, i, length a) pFormalParams :: JavaDerivs -> Result JavaDerivs [FormalParam] Parser pFormalParams = do symbol "(" fs <- sepBy formalParam (symbol ",") symbol ")" return fs pDeclarator :: JavaDerivs -> Result JavaDerivs Declarator Parser pDeclarator = (do ident <- identifier bkts <- many (do symbol "["; symbol "]") init <- optional (do symbol "="; initializer) return (ident, length bkts, init)) <?> "declarator" classBody = do symbol "{"; ds <- many declaration; symbol "}"; return ds pDeclaration :: JavaDerivs -> Result JavaDerivs Declaration Parser pDeclaration = (do m <- many modifier -- variable declaration t <- declType d <- sepBy declarator (symbol ",") symbol ";" return (DeclSimple m t d)) </> (do m <- many modifier -- method declaration t <- (do t <- declType; return (Just t)) </> (do keyword "void"; return Nothing) i <- identifier p <- formalParams a <- many (do symbol "["; symbol "]") th <- throws b <- (do b <- block; return (Just b)) </> (do symbol ";"; return Nothing) return (DeclMethod m t i p (length a) th b)) </> (do m <- many modifier -- constructor declaration i <- identifier p <- formalParams th <- throws b <- block return (DeclConstructor m i p th b)) </> (do m <- many modifier -- class declaration keyword "class" i <- identifier ext <- optional (do keyword "extends"; declType) imp <- (do keyword "implements"; sepBy1 declType (symbol ",")) </> return [] ds <- classBody return (DeclClass m i ext imp ds)) </> (do m <- many modifier -- interface declaration keyword "interface" i <- identifier ext <- (do keyword "extends"; sepBy1 declType (symbol ",")) </> return [] ds <- classBody return (DeclInterface m i ext ds)) </> (do st <- (do keyword "static"; return True) </> (return False) b <- block return (DeclBlock st b)) <?> "declaration" where throws = (do keyword "throws"; sepBy1 qualName (symbol ",")) </> return [] pInitializer :: JavaDerivs -> Result JavaDerivs Initializer Parser pInitializer = (do inits <- arrayInit return (IniList inits)) </> (do e <- expression return (IniExpr e)) <?> "variable initializer" pArrayInit :: JavaDerivs -> Result JavaDerivs [Initializer] Parser pArrayInit = (do symbol "{" inits <- sepEndBy initializer (symbol ",") symbol "}" return inits) <?> "array initializer" pImportDecl :: JavaDerivs -> Result JavaDerivs ImportDecl Parser pImportDecl = (do keyword "import" name <- qualName all <- (do symbol "."; symbol "*"; return True) </> return False symbol ";" return (name, all)) <?> "import declaration" pCompUnit :: JavaDerivs -> Result JavaDerivs CompUnit Parser pCompUnit = do Parser whitespace p <- (do keyword "package"; n <- qualName; symbol ";" return (Just n)) </> return Nothing i <- many importDecl t <- many declaration notFollowedBy anyChar return (p, i, t) -------------------- Recursive Tie-Up -------------------- pTokDerivs dvs = TokDerivs whitespace word sym hexlit octlit declit floatsize floatexp floatlit charlit stringlit token where whitespace = pWhitespace dvs word = pWord dvs sym = pSym dvs hexlit = pHexLit dvs octlit = pOctLit dvs declit = pDecLit dvs floatsize = pFloatSize dvs floatexp = pFloatExp dvs floatlit = pFloatLit dvs charlit = pCharLit dvs stringlit = pStringLit dvs token = pToken dvs pExprDerivs dvs = ExprDerivs parexpr primexpr postfixexpr prefixexpr multexpr addexpr shiftexpr relexpr eqexpr andexpr xorexpr orexpr logandexpr logorexpr condexpr assignexpr expr where parexpr = pParExpr dvs primexpr = pPrimExpr dvs postfixexpr = pPostfixExpr dvs prefixexpr = pPrefixExpr dvs multexpr = pMultExpr dvs addexpr = pAddExpr dvs shiftexpr = pShiftExpr dvs relexpr = pRelExpr dvs eqexpr = pEqExpr dvs andexpr = pAndExpr dvs xorexpr = pXorExpr dvs orexpr = pOrExpr dvs logandexpr = pCondAndExpr dvs logorexpr = pCondOrExpr dvs condexpr = pCondExpr dvs assignexpr = pAssignExpr dvs expr = pExpression dvs pStmtDerivs dvs = StmtDerivs catch switch forinit statement blockstmt block where catch = pCatchClause dvs switch = pSwitchGroup dvs forinit = pForInit dvs statement = pStatement dvs blockstmt = pBlockStmt dvs block = pBlock dvs pDeclDerivs dvs = DeclDerivs modifier decltype formal formals declarator declaration initializer arrayinit importdecl compunit where modifier = pModifier dvs decltype = pDeclType dvs formal = pFormalParam dvs formals = pFormalParams dvs declarator = pDeclarator dvs declaration = pDeclaration dvs initializer = pInitializer dvs arrayinit = pArrayInit dvs importdecl = pImportDecl dvs compunit = pCompUnit dvs javaDerivs pos text = dvs where dvs = JavaDerivs pos text chr tok expr stmt decl chr = case text of [] -> NoParse (eofError dvs) (c:cs) -> Parsed c (javaDerivs (nextPos pos c) cs) (nullError dvs) tok = pTokDerivs dvs expr = pExprDerivs dvs stmt = pStmtDerivs dvs decl = pDeclDerivs dvs javaParse :: String -> String -> JavaDerivs javaParse name text = javaDerivs (Pos name 1 1) text javaParseFile :: FilePath -> IO CompUnit javaParseFile name = do text <- readFile name let text' = javaPrep text derivs = javaParse name text' case ddCompUnit (cdDecl derivs) of Parsed cu _ _ -> return cu NoParse e -> fail (show e)JavaMonad.hs Скачать
module JavaMonad where import Char import List import Pos import Parse -------------------- Lexical Preprocessing -------------------- -- Preprocess unicode escapes and newlines javaPrep :: String -> String javaPrep [] = [] javaPrep ('\r':'\n':s) = '\n':javaPrep s javaPrep ('\r':s) = '\n':javaPrep s javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s javaPrep ('\\':'u':s) = case s of h1:h2:h3:h4:s -> if isHexDigit h1 && isHexDigit h2 && isHexDigit h3 && isHexDigit h4 then chr v4:javaPrep s else error "invalid Unicode escape sequence" where v1 = digitToInt h1 v2 = v1*16 + digitToInt h2 v3 = v2*16 + digitToInt h3 v4 = v3*16 + digitToInt h4 _ -> error "incomplete Unicode escape sequence" javaPrep (c:s) = c:javaPrep s -------------------- Abstract Syntax Trees -------------------- type Identifier = String type Name = [Identifier] data Token = TokKeyword String | TokIdent Identifier | TokSymbol String | TokInt Integer Bool | TokFloat Double Bool | TokChar Char | TokString String | TokBool Bool | TokNull instance Eq Token where TokKeyword s == TokKeyword s' = s == s' TokIdent s == TokIdent s' = s == s' TokSymbol s == TokSymbol s' = s == s' TokInt i l == TokInt i' l' = i == i' && l == l' TokFloat f l == TokFloat f' l' = f == f' && l == l' TokChar c == TokChar c' = c == c' TokString s == TokString s' = s == s' _ == _ = False instance Show Token where show (TokKeyword s) = "reserved word " ++ show s show (TokIdent s) = "identifier " ++ show s show (TokSymbol s) = "symbol " ++ show s show (TokInt i l) = show i ++ (if l then "l" else "") show (TokFloat f l) = show f ++ (if l then "l" else "") show (TokChar c) = show c show (TokString s) = show s data Expression = ExpLiteral Token | ExpIdent Identifier | ExpPrefix String Expression | ExpPostfix String Expression | ExpBinary String Expression Expression | ExpSelect Expression Expression | ExpInstanceof Expression DeclType | ExpNewClass DeclType [Expression] (Maybe [Declaration]) | ExpNewArray DeclType [Maybe Expression] (Maybe [Initializer]) | ExpCall Expression [Expression] | ExpArray Expression (Maybe Expression) | ExpCast DeclType Expression | ExpCond Expression Expression Expression | ExpThis | ExpSuper | ExpClass type Modifier = String data DeclType = DtBasic String Int | DtIdent [Identifier] Int type Declarator = (Identifier, Int, Maybe Initializer) data Declaration = DeclSimple [Modifier] DeclType [Declarator] | DeclMethod [Modifier] (Maybe DeclType) Identifier [FormalParam] Int [Name] (Maybe [Statement]) | DeclConstructor [Modifier] Identifier [FormalParam] [Name] [Statement] | DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType] [Declaration] | DeclInterface [Modifier] Identifier [DeclType] [Declaration] | DeclBlock Bool [Statement] -- formal parameter: "final" flag, type, parameter name, array dimension type FormalParam = (Bool, DeclType, Identifier, Int) data Initializer = IniExpr Expression | IniList [Initializer] data SwitchGroup = SwCase Expression [Statement] | SwDefault [Statement] data ForInit = FiExpr [Expression] | FiDecl Bool DeclType [Declarator] | FiNone data Statement = StLabel Identifier Statement | StCase Expression Statement | StDefault Statement | StDecl Declaration | StExpr Expression | StBlock [Statement] | StIf Expression Statement (Maybe Statement) | StSwitch Expression [SwitchGroup] | StWhile Expression Statement | StDo Statement Expression | StFor ForInit (Maybe Expression) [Expression] Statement | StTry [Statement] [CatchClause] (Maybe [Statement]) | StSynch Expression [Statement] | StContinue (Maybe Identifier) | StBreak (Maybe Identifier) | StReturn (Maybe Expression) | StThrow Expression | StNull -- catch clause: type CatchClause = (FormalParam, [Statement]) -- import declaration: qualified name, ".*" flag type ImportDecl = (Name, Bool) -- compilation unit: package name, import declarations, type declarations type CompUnit = (Maybe Name, [ImportDecl], [Declaration]) instance Show Expression where show expr = cprintExpr 0 0 expr instance Show Statement where show stmt = cprintStmt 0 stmt cprintExpr indent prec expr = undefined cprintStmt indent stmt = undefined -------------------- Packrat Parsing -------------------- data JavaDerivs = JavaDerivs { cdPos :: Pos, cdText :: String, cdChar :: Result JavaDerivs Char, cdTok :: TokDerivs, cdExpr :: ExprDerivs, cdStmt :: StmtDerivs, cdDecl :: DeclDerivs } data TokDerivs = TokDerivs { tdWhitespace :: Result JavaDerivs (), tdWord :: Result JavaDerivs Token, tdSym :: Result JavaDerivs Token, tdHexLit :: Result JavaDerivs Token, tdOctLit :: Result JavaDerivs Token, tdDecLit :: Result JavaDerivs Token, tdFloatSize :: Result JavaDerivs Bool, tdFloatExp :: Result JavaDerivs Integer, tdFloatLit :: Result JavaDerivs Token, tdCharLit :: Result JavaDerivs Token, tdStringLit :: Result JavaDerivs Token, tdToken :: Result JavaDerivs Token } data ExprDerivs = ExprDerivs { edParExpr :: Result JavaDerivs Expression, edPrimExpr :: Result JavaDerivs Expression, edPostfixExpr :: Result JavaDerivs Expression, edPrefixExpr :: Result JavaDerivs Expression, edMultExpr :: Result JavaDerivs Expression, edAddExpr :: Result JavaDerivs Expression, edShiftExpr :: Result JavaDerivs Expression, edRelExpr :: Result JavaDerivs Expression, edEqExpr :: Result JavaDerivs Expression, edAndExpr :: Result JavaDerivs Expression, edXorExpr :: Result JavaDerivs Expression, edOrExpr :: Result JavaDerivs Expression, edCondAndExpr :: Result JavaDerivs Expression, edCondOrExpr :: Result JavaDerivs Expression, edCondExpr :: Result JavaDerivs Expression, edAssignExpr :: Result JavaDerivs Expression, edExpression :: Result JavaDerivs Expression } data StmtDerivs = StmtDerivs { sdCatchClause :: Result JavaDerivs CatchClause, sdSwitchGroup :: Result JavaDerivs SwitchGroup, sdForInit :: Result JavaDerivs ForInit, sdStatement :: Result JavaDerivs Statement, sdBlockStmt :: Result JavaDerivs Statement, sdBlock :: Result JavaDerivs [Statement] } data DeclDerivs = DeclDerivs { ddModifier :: Result JavaDerivs Modifier, ddDeclType :: Result JavaDerivs DeclType, ddFormalParam :: Result JavaDerivs FormalParam, ddFormalParams :: Result JavaDerivs [FormalParam], ddDeclarator :: Result JavaDerivs Declarator, ddDeclaration :: Result JavaDerivs Declaration, ddInitializer :: Result JavaDerivs Initializer, ddArrayInit :: Result JavaDerivs [Initializer], ddImportDecl :: Result JavaDerivs ImportDecl, ddCompUnit :: Result JavaDerivs CompUnit } instance Derivs JavaDerivs where dvPos d = cdPos d dvChar d = cdChar d whitespace = Parser (tdWhitespace . cdTok) word = Parser (tdWord . cdTok) sym = Parser (tdSym . cdTok) hexLit = Parser (tdHexLit . cdTok) octLit = Parser (tdOctLit . cdTok) decLit = Parser (tdDecLit . cdTok) floatSize = Parser (tdFloatSize . cdTok) floatExp = Parser (tdFloatExp . cdTok) floatLit = Parser (tdFloatLit . cdTok) charLit = Parser (tdCharLit . cdTok) stringLit = Parser (tdStringLit . cdTok) token = Parser (tdToken . cdTok) parExpr = Parser (edParExpr . cdExpr) primExpr = Parser (edPrimExpr . cdExpr) postfixExpr = Parser (edPostfixExpr . cdExpr) prefixExpr = Parser (edPrefixExpr . cdExpr) multExpr = Parser (edMultExpr . cdExpr) addExpr = Parser (edAddExpr . cdExpr) shiftExpr = Parser (edShiftExpr . cdExpr) relExpr = Parser (edRelExpr . cdExpr) eqExpr = Parser (edEqExpr . cdExpr) andExpr = Parser (edAndExpr . cdExpr) xorExpr = Parser (edXorExpr . cdExpr) orExpr = Parser (edOrExpr . cdExpr) condAndExpr = Parser (edCondAndExpr . cdExpr) condOrExpr = Parser (edCondOrExpr . cdExpr) condExpr = Parser (edCondExpr . cdExpr) assignExpr = Parser (edAssignExpr . cdExpr) expression = Parser (edExpression . cdExpr) catchClause = Parser (sdCatchClause . cdStmt) switchGroup = Parser (sdSwitchGroup . cdStmt) forInit = Parser (sdForInit . cdStmt) statement = Parser (sdStatement . cdStmt) blockStmt = Parser (sdBlockStmt . cdStmt) block = Parser (sdBlock . cdStmt) modifier = Parser (ddModifier . cdDecl) declType = Parser (ddDeclType . cdDecl) formalParam = Parser (ddFormalParam . cdDecl) formalParams = Parser (ddFormalParams. cdDecl) declarator = Parser (ddDeclarator . cdDecl) declaration = Parser (ddDeclaration . cdDecl) initializer = Parser (ddInitializer . cdDecl) arrayInit = Parser (ddArrayInit . cdDecl) importDecl = Parser (ddImportDecl . cdDecl) compUnit = Parser (ddCompUnit . cdDecl) -------------------- Lexical Structure -------------------- lineTerminator :: Parser JavaDerivs String lineTerminator = (do string "\r\n") </> string "\r" </> string "\n" -- Whitespace spaceChar :: Parser JavaDerivs () spaceChar = do satisfy anyChar isSpace return () traditionalComment :: Parser JavaDerivs () traditionalComment = do string "/*" many (do notFollowedBy (string "*/"); anyChar) string "*/" return () endOfLineComment :: Parser JavaDerivs () endOfLineComment = do string "//" many (do notFollowedBy lineTerminator; anyChar) lineTerminator return () pWhitespace :: JavaDerivs -> Result JavaDerivs () Parser pWhitespace = do many (spaceChar </> traditionalComment </> endOfLineComment) return () -- Keywords and identifiers keywords = [ "abstract", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "extends", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "try", "void", "volatile", "while" ] symbols = [ ">>>=", ">>=", "<<=", ">>>", "+=", "-=", "*=", "/=", "%=", "&=", "^=", "|=", ">>", "<<", "++", "--", "&&", "||", "<=", ">=", "==", "!=", ";", "{", "}", ",", ":", "=", "(", ")", "[", "]", ".", "&", "!", "~", "-", "+", "*", "/", "%", "<", ">", "^", "|", "?" ] isIdentStart c = isAlpha c || c == '_' isIdentCont c = isIdentStart c || isDigit c pWord :: JavaDerivs -> Result JavaDerivs Token Parser pWord = do c <- satisfy anyChar isIdentStart cs <- many (satisfy anyChar isIdentCont) whitespace let w = c : cs case w of -- keyword-literals "null" -> return (TokNull) "true" -> return (TokBool True) "false" -> return (TokBool False) _ -> if w `elem` keywords then return (TokKeyword w) else return (TokIdent w) pSym :: JavaDerivs -> Result JavaDerivs Token Parser pSym = do s <- stringFrom symbols whitespace return (TokSymbol s) -- Integer literals pHexLit :: JavaDerivs -> Result JavaDerivs Token Parser pHexLit = do char '0' oneOf "xX" digits <- many1 hexDigit l <- (do oneOf "lL"; return True) </> return False whitespace let f v c = v * 16 + toInteger (digitToInt c) val = foldl f 0 digits return (TokInt val l) scanOct :: String -> Integer scanOct digits = foldl f 0 digits where f v c = v * 8 + toInteger (digitToInt c) pOctLit :: JavaDerivs -> Result JavaDerivs Token Parser pOctLit = do char '0' digits <- many1 digit l <- (do oneOf "lL"; return True) </> return False whitespace let bad c = digitToInt c >= 8 if any bad digits -- so is this cute or what? :) then fail "non-octal digits in octal constant" else return (TokInt (scanOct digits) l) scanDec :: String -> Integer scanDec digits = foldl f 0 digits where f v c = v * 10 + toInteger (digitToInt c) pDecLit :: JavaDerivs -> Result JavaDerivs Token Parser pDecLit = do digits <- many1 digit l <- (do oneOf "lL"; return True) </> return False whitespace return (TokInt (scanDec digits) l) -- Floating-point literals pFloatSize :: JavaDerivs -> Result JavaDerivs Bool Parser pFloatSize = (do { s <- oneOf "dD"; return True }) </> (do { s <- oneOf "fF"; return False }) pFloatExp :: JavaDerivs -> Result JavaDerivs Integer Parser pFloatExp = do oneOf "eE" neg <- plusminus digits <- many1 digit let f v c = v * 10 + toInteger (digitToInt c) val = foldl f 0 digits return (if neg then -val else val) where plusminus = (do { char '+'; return False }) </> (do { char '-'; return True }) </> return False pFloatLit :: JavaDerivs -> Result JavaDerivs Token Parser pFloatLit = (do i <- many1 digit char '.' f <- many digit e <- floatExp </> return 0 s <- floatSize </> return False whitespace return (mkfloat (scanDec i) f e s)) </> (do char '.' f <- many1 digit e <- floatExp </> return 0 s <- floatSize </> return False whitespace return (mkfloat 0 f e s)) </> (do i <- many1 digit e <- floatExp s <- floatSize </> return False whitespace return (mkfloat (scanDec i) [] e s)) </> (do i <- many1 digit e <- floatExp </> return 0 s <- floatSize whitespace return (mkfloat (scanDec i) [] e s)) where mkfloat :: Integer -> String -> Integer -> Bool -> Token mkfloat i f e s = TokFloat (scanfrac i f * 10.0**(fromInteger e)) s scanfrac :: Integer -> String -> Double scanfrac i [] = fromInteger i scanfrac i (c:cs) = scanfrac (i * 10 + toInteger (digitToInt c)) cs / 10.0 -- Character and string literals quotedChar quote = (do char '\\' c <- anyChar case c of 'n' -> return '\n' 'r' -> return '\r' 't' -> return '\t' 'v' -> return '\v' 'f' -> return '\f' '\\' -> return '\\' '\'' -> return '\'' '\"' -> return '\"' -- XXX octal characters, other escapes _ -> fail "invalid escape sequence") </> satisfy anyChar (\c -> c /= quote) pCharLit :: JavaDerivs -> Result JavaDerivs Token Parser pCharLit = (do char '\'' c <- quotedChar '\'' char '\'' whitespace return (TokChar c)) pStringLit :: JavaDerivs -> Result JavaDerivs Token Parser pStringLit = (do char '"' s <- many (quotedChar '"') char '"' whitespace return (TokString s)) -- Token tie-up pToken :: JavaDerivs -> Result JavaDerivs Token Parser pToken = word </> sym </> charLit </> stringLit </> floatLit </> hexLit </> octLit </> decLit keyword :: String -> Parser JavaDerivs String keyword w = (do t <- token case t of (TokKeyword w') -> if w' == w then return w else fail "" _ -> fail "") <?!> show w symbol :: String -> Parser JavaDerivs String symbol s = (do t <- token case t of (TokSymbol s') -> if s' == s then return s else fail "" _ -> fail "") <?!> show s identifier :: Parser JavaDerivs Identifier identifier = (do t <- token case t of TokIdent s -> return s _ -> fail "") <?!> "identifier" -------------------- Expressions -------------------- arguments = do symbol "("; eargs <- sepBy expression (symbol ","); symbol ")" return eargs pParExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pParExpr = do symbol "("; e <- expression; symbol ")"; return e pPrimExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPrimExpr = (do t <- token case t of TokIdent s -> return (ExpIdent s) TokInt i l -> return (ExpLiteral t) TokFloat f l -> return (ExpLiteral t) TokChar c -> return (ExpLiteral t) TokString c -> return (ExpLiteral t) TokBool b -> return (ExpLiteral t) TokNull -> return (ExpLiteral t) _ -> fail ("found " ++ show t)) </> parExpr </> (do keyword "this"; return ExpThis) </> (do keyword "super"; return ExpSuper) </> (do keyword "class"; return ExpClass) </> (do keyword "new" -- class creator ty <- declType args <- arguments body <- optional classBody return (ExpNewClass ty args body)) </> (do keyword "new" -- array creator ty <- declType dims <- many (do symbol "[" -- XXX many1 e <- optional expression symbol "]" return e) init <- optional arrayInit return (ExpNewArray ty dims init)) <?> "primary expression" suffix :: Parser JavaDerivs (Expression -> Expression) suffix = (do symbol "["; eidx <- optional expression; symbol "]" return (\ebase -> ExpArray ebase eidx)) </> (do eargs <- arguments return (\efunc -> ExpCall efunc eargs)) </> (do symbol "."; eitem <- primExpr return (\econtext -> ExpSelect econtext eitem)) </> (do op <- (symbol "++" </> symbol "--") return (\e -> ExpPostfix op e)) pPostfixExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPostfixExpr = (do exp <- primExpr suffixes <- many suffix return (foldl (\e s -> s e) exp suffixes)) <?> "postfix expression" pPrefixExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pPrefixExpr = (do op <- (symbol "++" </> symbol "--" </> symbol "+" </> symbol "-" </> symbol "~" </> symbol "!") exp <- prefixExpr return (ExpPostfix op exp)) </> (do symbol "("; t <- declType; symbol ")"; e <- prefixExpr return (ExpCast t e)) </> postfixExpr <?> "prefix expression" pMultExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pMultExpr = chainl1 prefixExpr (do op <- (symbol "*" </> symbol "/" </> symbol "%") return (\l r -> ExpBinary op l r)) pAddExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAddExpr = chainl1 multExpr (do op <- (symbol "+" </> symbol "-") return (\l r -> ExpBinary op l r)) pShiftExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pShiftExpr = chainl1 addExpr (do op <- (symbol "<<" </> symbol ">>" </> symbol ">>>") return (\l r -> ExpBinary op l r)) pRelExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pRelExpr = do l <- shiftExpr; suffix l where suffix l = (do op <- (symbol "<=" </> symbol ">=" </> symbol "<" </> symbol ">") r <- shiftExpr suffix (ExpBinary op l r)) </> (do keyword "instanceof" t <- declType suffix (ExpInstanceof l t)) </> return l pEqExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pEqExpr = chainl1 relExpr (do op <- (symbol "==" </> symbol "!=") return (\l r -> ExpBinary op l r)) pAndExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAndExpr = chainl1 eqExpr (do op <- symbol "&" return (\l r -> ExpBinary op l r)) pXorExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pXorExpr = chainl1 andExpr (do op <- symbol "^" return (\l r -> ExpBinary op l r)) pOrExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pOrExpr = chainl1 xorExpr (do op <- symbol "|" return (\l r -> ExpBinary op l r)) pCondAndExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondAndExpr = chainl1 orExpr (do op <- symbol "&&" return (\l r -> ExpBinary op l r)) pCondOrExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondOrExpr = chainl1 condAndExpr (do op <- symbol "||" return (\l r -> ExpBinary op l r)) pCondExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pCondExpr = (do c <- condOrExpr; symbol "?" t <- expression; symbol ":" f <- condExpr return (ExpCond c t f)) </> condOrExpr <?> "conditional expression" pAssignExpr :: JavaDerivs -> Result JavaDerivs Expression Parser pAssignExpr = (do l <- prefixExpr op <- (symbol "=" </> symbol "*=" </> symbol "/=" </> symbol "%=" </> symbol "+=" </> symbol "-=" </> symbol "<<=" </> symbol ">>=" </> symbol ">>>=" </> symbol "&=" </> symbol "^=" </> symbol "|=") r <- assignExpr return (ExpBinary op l r)) </> condExpr <?> "assignment expression" pExpression :: JavaDerivs -> Result JavaDerivs Expression pExpression = pAssignExpr -------------------- Statements -------------------- pCatchClause :: JavaDerivs -> Result JavaDerivs CatchClause Parser pCatchClause = (do keyword "catch" symbol "(" p <- formalParam symbol ")" b <- block return (p,b)) pSwitchGroup :: JavaDerivs -> Result JavaDerivs SwitchGroup Parser pSwitchGroup = (do keyword "case"; e <- expression; symbol ":" s <- many blockStmt return (SwCase e s)) </> (do keyword "default"; symbol ":" s <- many blockStmt return (SwDefault s)) <?> "switch statement group" pForInit :: JavaDerivs -> Result JavaDerivs ForInit Parser pForInit = (do f <- (do keyword "final"; return True) </> (return False) t <- declType d <- sepBy1 declarator (symbol ",") return (FiDecl f t d)) </> (do es <- sepBy expression (symbol ",") return (FiExpr es)) </> return FiNone pStatement :: JavaDerivs -> Result JavaDerivs Statement Parser pStatement = (do b <- block return (StBlock b)) </> (do keyword "if"; e <- parExpr t <- statement; keyword "else"; f <- statement return (StIf e t (Just f))) </> (do keyword "if"; e <- parExpr t <- statement return (StIf e t Nothing)) </> (do keyword "for"; symbol "(" i <- forInit; symbol ";" c <- optional expression; symbol ";" n <- sepBy expression (symbol ","); symbol ")" b <- statement return (StFor i c n b)) </> (do keyword "while"; e <- parExpr b <- statement return (StWhile e b)) </> (do keyword "do"; b <- statement; keyword "while" e <- parExpr; symbol ";" return (StDo b e)) </> (do keyword "try" b <- block c <- many catchClause f <- optional (do keyword "finally"; block) return (StTry b c f)) </> (do keyword "switch"; e <- parExpr symbol "{"; b <- many switchGroup; symbol "}" return (StSwitch e b)) </> (do keyword "synchronized"; e <- parExpr; b <- block return (StSynch e b)) </> (do keyword "return"; e <- optional expression; symbol ";" return (StReturn e)) </> (do keyword "throw"; e <- expression; symbol ";" return (StThrow e)) </> (do keyword "break"; i <- optional identifier; symbol ";" return (StBreak i)) </> (do keyword "continue"; i <- optional identifier; symbol ";" return (StContinue i)) </> (do i <- identifier; symbol ":"; s <- statement return (StLabel i s)) </> (do e <- expression; symbol ";" return (StExpr e)) </> (do symbol ";" return (StNull)) <?> "statement" pBlockStmt :: JavaDerivs -> Result JavaDerivs Statement Parser pBlockStmt = (do d <- declaration return (StDecl d)) </> statement <?> "block statement" pBlock :: JavaDerivs -> Result JavaDerivs [Statement] Parser pBlock = (do symbol "{"; s <- many blockStmt; symbol "}"; return s) -------------------- Declarations -------------------- qualName = sepBy1 identifier (symbol ".") pModifier :: JavaDerivs -> Result JavaDerivs Modifier Parser pModifier = keyword "public" </> keyword "protected" </> keyword "private" </> keyword "static" </> keyword "abstract" </> keyword "final" </> keyword "native" </> keyword "synchronized" </> keyword "transient" </> keyword "volatile" </> keyword "strictfp" pDeclType :: JavaDerivs -> Result JavaDerivs DeclType Parser pDeclType = (do s <- keyword "byte" </> keyword "short" </> keyword "char" </> keyword "int" </> keyword "long" </> keyword "float" </> keyword "double" </> keyword "boolean" b <- many (do symbol "["; symbol "]") return (DtBasic s (length b))) </> (do i <- qualName b <- many (do symbol "["; symbol "]") return (DtIdent i (length b))) <?> "type" pFormalParam :: JavaDerivs -> Result JavaDerivs FormalParam Parser pFormalParam = do f <- (do keyword "final"; return True) </> (return False) t <- declType i <- identifier a <- many (do symbol "["; symbol "]") return (f, t, i, length a) pFormalParams :: JavaDerivs -> Result JavaDerivs [FormalParam] Parser pFormalParams = do symbol "(" fs <- sepBy formalParam (symbol ",") symbol ")" return fs pDeclarator :: JavaDerivs -> Result JavaDerivs Declarator Parser pDeclarator = (do ident <- identifier bkts <- many (do symbol "["; symbol "]") init <- optional (do symbol "="; initializer) return (ident, length bkts, init)) <?> "declarator" classBody = do symbol "{"; ds <- many declaration; symbol "}"; return ds pDeclaration :: JavaDerivs -> Result JavaDerivs Declaration Parser pDeclaration = (do m <- many modifier -- variable declaration t <- declType d <- sepBy declarator (symbol ",") symbol ";" return (DeclSimple m t d)) </> (do m <- many modifier -- method declaration t <- (do t <- declType; return (Just t)) </> (do keyword "void"; return Nothing) i <- identifier p <- formalParams a <- many (do symbol "["; symbol "]") th <- throws b <- (do b <- block; return (Just b)) </> (do symbol ";"; return Nothing) return (DeclMethod m t i p (length a) th b)) </> (do m <- many modifier -- constructor declaration i <- identifier p <- formalParams th <- throws b <- block return (DeclConstructor m i p th b)) </> (do m <- many modifier -- class declaration keyword "class" i <- identifier ext <- optional (do keyword "extends"; declType) imp <- (do keyword "implements"; sepBy1 declType (symbol ",")) </> return [] ds <- classBody return (DeclClass m i ext imp ds)) </> (do m <- many modifier -- interface declaration keyword "interface" i <- identifier ext <- (do keyword "extends"; sepBy1 declType (symbol ",")) </> return [] ds <- classBody return (DeclInterface m i ext ds)) </> (do st <- (do keyword "static"; return True) </> (return False) b <- block return (DeclBlock st b)) <?> "declaration" where throws = (do keyword "throws"; sepBy1 qualName (symbol ",")) </> return [] pInitializer :: JavaDerivs -> Result JavaDerivs Initializer Parser pInitializer = (do inits <- arrayInit return (IniList inits)) </> (do e <- expression return (IniExpr e)) <?> "variable initializer" pArrayInit :: JavaDerivs -> Result JavaDerivs [Initializer] Parser pArrayInit = (do symbol "{" inits <- sepEndBy initializer (symbol ",") symbol "}" return inits) <?> "array initializer" pImportDecl :: JavaDerivs -> Result JavaDerivs ImportDecl Parser pImportDecl = (do keyword "import" name <- qualName all <- (do symbol "."; symbol "*"; return True) </> return False symbol ";" return (name, all)) <?> "import declaration" pCompUnit :: JavaDerivs -> Result JavaDerivs CompUnit Parser pCompUnit = do whitespace p <- (do keyword "package"; n <- qualName; symbol ";" return (Just n)) </> return Nothing i <- many importDecl t <- many declaration notFollowedBy anyChar return (p, i, t) -------------------- Recursive Tie-Up -------------------- pTokDerivs dvs = TokDerivs whitespace word sym hexlit octlit declit floatsize floatexp floatlit charlit stringlit token where whitespace = pWhitespace dvs word = pWord dvs sym = pSym dvs hexlit = pHexLit dvs octlit = pOctLit dvs declit = pDecLit dvs floatsize = pFloatSize dvs floatexp = pFloatExp dvs floatlit = pFloatLit dvs charlit = pCharLit dvs stringlit = pStringLit dvs token = pToken dvs pExprDerivs dvs = ExprDerivs parexpr primexpr postfixexpr prefixexpr multexpr addexpr shiftexpr relexpr eqexpr andexpr xorexpr orexpr logandexpr logorexpr condexpr assignexpr expr where parexpr = pParExpr dvs primexpr = pPrimExpr dvs postfixexpr = pPostfixExpr dvs prefixexpr = pPrefixExpr dvs multexpr = pMultExpr dvs addexpr = pAddExpr dvs shiftexpr = pShiftExpr dvs relexpr = pRelExpr dvs eqexpr = pEqExpr dvs andexpr = pAndExpr dvs xorexpr = pXorExpr dvs orexpr = pOrExpr dvs logandexpr = pCondAndExpr dvs logorexpr = pCondOrExpr dvs condexpr = pCondExpr dvs assignexpr = pAssignExpr dvs expr = pExpression dvs pStmtDerivs dvs = StmtDerivs catch switch forinit statement blockstmt block where catch = pCatchClause dvs switch = pSwitchGroup dvs forinit = pForInit dvs statement = pStatement dvs blockstmt = pBlockStmt dvs block = pBlock dvs pDeclDerivs dvs = DeclDerivs modifier decltype formal formals declarator declaration initializer arrayinit importdecl compunit where modifier = pModifier dvs decltype = pDeclType dvs formal = pFormalParam dvs formals = pFormalParams dvs declarator = pDeclarator dvs declaration = pDeclaration dvs initializer = pInitializer dvs arrayinit = pArrayInit dvs importdecl = pImportDecl dvs compunit = pCompUnit dvs javaDerivs pos text = dvs where dvs = JavaDerivs pos text chr tok expr stmt decl chr = case text of [] -> NoParse (eofError dvs) (c:cs) -> Parsed c (javaDerivs (nextPos pos c) cs) (nullError dvs) tok = pTokDerivs dvs expr = pExprDerivs dvs stmt = pStmtDerivs dvs decl = pDeclDerivs dvs javaParse :: String -> String -> JavaDerivs javaParse name text = javaDerivs (Pos name 1 1) text javaParseFile :: FilePath -> IO CompUnit javaParseFile name = do text <- readFile name let text' = javaPrep text derivs = javaParse name text' case ddCompUnit (cdDecl derivs) of Parsed cu _ _ -> return cu NoParse e -> fail (show e)Java.pappy Скачать
-- Pappy packrat parser specification for the Java language version 1.1 parser Java: { import Char import System import Numeric -- Abstract syntax tree data types type Identifier = String type Name = [Identifier] data Literal = LitInt Integer | LitLong Integer | LitFloat Float | LitDouble Double | LitChar Char | LitString String | LitBool Bool | LitNull data Expression = ExpLiteral Literal | ExpIdent Identifier | ExpPrefix String Expression | ExpPostfix String Expression | ExpBinary String Expression Expression | ExpSelect Expression Expression | ExpInstanceof Expression DeclType | ExpNewClass [Identifier] [Expression] (Maybe [Declaration]) | ExpNewArray DeclType [Expression] Int | ExpNewArrayInit DeclType Int [Initializer] | ExpCall Expression [Expression] | ExpArray Expression (Maybe Expression) | ExpCast DeclType Expression | ExpCond Expression Expression Expression | ExpThis | ExpSuper | ExpDotClass Expression | ExpVoidClass data Modifier = ModPublic | ModProtected | ModPrivate | ModStatic | ModAbstract | ModFinal | ModNative | ModSynchronized | ModTransient | ModVolatile | ModStrictfp data DeclType = DtByte | DtShort | DtChar | DtInt | DtLong | DtFloat | DtDouble | DtBoolean | DtName [Identifier] | DtArray DeclType Int type Declarator = (Identifier, Int, Maybe Initializer) data Declaration = DeclSimple [Modifier] DeclType [Declarator] | DeclMethod [Modifier] (Maybe DeclType) Identifier [FormalParam] Int [Name] (Maybe [Statement]) | DeclConstructor [Modifier] Identifier [FormalParam] [Name] [Statement] | DeclClass [Modifier] Identifier (Maybe DeclType) [DeclType] [Declaration] | DeclInterface [Modifier] Identifier [DeclType] [Declaration] | DeclBlock Bool [Statement] -- formal parameter: "final" flag, type, parameter name, array dimension type FormalParam = (Bool, DeclType, Identifier, Int) data Initializer = IniExpr Expression | IniList [Initializer] data SwitchGroup = SwCase Expression [Statement] | SwDefault [Statement] data ForInit = FiExpr [Expression] | FiDecl Bool DeclType [Declarator] data Statement = StLabel Identifier Statement | StCase Expression Statement | StDefault Statement | StDecl Declaration | StExpr Expression | StBlock [Statement] | StIf Expression Statement (Maybe Statement) | StSwitch Expression [SwitchGroup] | StWhile Expression Statement | StDo Statement Expression | StFor ForInit (Maybe Expression) [Expression] Statement | StTry [Statement] [CatchClause] (Maybe [Statement]) | StSynch Expression [Statement] | StContinue (Maybe Identifier) | StBreak (Maybe Identifier) | StReturn (Maybe Expression) | StThrow Expression | StNull -- catch clause: type CatchClause = (FormalParam, [Statement]) -- import declaration: qualified name, ".*" flag type ImportDecl = (Name, Bool) -- compilation unit: package name, import declarations, type declarations type CompUnit = (Maybe Name, [ImportDecl], [Declaration]) -- List of Java's reserved words, used in Keyword below keywords = [ "abstract", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "extends", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "try", "void", "volatile", "while" ] } top CompilationUnit -------------------- Lexical Structure -------------------- -- Whitespace and comments LineTerminator :: Char = '\n' -- \r's have been canonicalized by javaPrep InputCharacter :: Char = !LineTerminator c:Char -> c Spacing :: {()} = Space* -> {()} Space :: {()} = WhiteSpace -> {()} / Comment -> {()} WhiteSpace :: Char = ' ' / '\t' / '\f' / LineTerminator Comment :: {()} = TraditionalComment / EndOfLineComment TraditionalComment :: {()} = "/*" (!"*/" c:Char -> c)* "*/" -> {()} EndOfLineComment :: {()} = "//" (!LineTerminator c:Char -> c)* '\n' -> {()} -- Keywords and identifiers Identifier :: Identifier = !Keyword !BooleanLiteral !NullLiteral s:Word -> s Keyword :: String = s:Word &{s `elem` keywords} -> s Word :: String = c:JavaLetter cs:JavaLetterOrDigit* Spacing -> {c : cs} JavaLetter :: Char = c:Char &{isAlpha c} -> c / '_' / '$' JavaLetterOrDigit :: Char = c:Char &{isAlphaNum c} -> c / '_' / '$' -- Symbols (operators and punctuation) Sym :: String = s:SymChars Spacing -> s SymChars :: String = ">>>=" / ">>=" / "<<=" / ">>>" / ">>" / "<<" / "+=" / "-=" / "*=" / "/=" / "%=" / "&=" / "^=" / "|=" / "++" / "--" / "&&" / "||" / "<=" / ">=" / "==" / "!=" / ";" / ":" / "," / "." / "{" / "}" / "(" / ")" / "[" / "]" / "!" / "~" / "+" / "-" / "*" / "/" / "%" / "<" / ">" / "=" / "&" / "^" / "|" / "?" -- Literals Literal :: Literal = FloatingPointLiteral / IntegerLiteral / BooleanLiteral / CharacterLiteral / StringLiteral / NullLiteral -- Integer literals IntegerLiteral :: Literal = v:HexNumeral t:IntegerTypeSuffixOpt Spacing -> {t v} / v:OctalNumeral t:IntegerTypeSuffixOpt Spacing -> {t v} / v:DecimalNumeral t:IntegerTypeSuffixOpt Spacing -> {t v} IntegerTypeSuffixOpt :: {Integer -> Literal} = ('l' / 'L') -> {LitLong} / -> {LitInt} HexNumeral :: Integer = ("0x" / "0X") v:HexDigits -> {v} HexDigits :: Integer = v:HexDigits d:HexDigit -> {v * 16 + toInteger d} / d:HexDigit -> {toInteger d} HexDigit :: Int = c:Char &{isHexDigit c} -> {digitToInt c} OctalNumeral :: Integer = '0' v:OctalDigits -> {v} OctalDigits :: Integer = v:OctalDigits d:OctalDigit -> {v * 8 + toInteger d} / d:OctalDigit -> {toInteger d} OctalDigit :: Int = c:Char &{isOctDigit c} -> {digitToInt c} DecimalNumeral :: Integer = v:Digits -> {v} Digits :: Integer = v:Digits d:Digit -> {v * 10 + toInteger d} / d:Digit -> {toInteger d} Digit :: Int = c:Char &{isDigit c} -> {digitToInt c} -- Floating point FloatingPointLiteral :: Literal = m:Digits '.' f:FractionPartOpt e:ExponentPartOpt t:FloatTypeSuffixOpt Spacing -> {t (fromRational ((fromInteger m + f) * 10.0 ^^ e))} / '.' f:FractionPart e:ExponentPartOpt t:FloatTypeSuffixOpt Spacing -> {t (fromRational (f * 10.0 ^^ e))} / m:Digits e:ExponentPart t:FloatTypeSuffixOpt Spacing -> {t (fromInteger m * 10.0 ^^ e)} / m:Digits e:ExponentPartOpt t:FloatTypeSuffix Spacing -> {t (fromInteger m * 10.0 ^^ e)} FractionPartOpt :: Rational = f:FractionPart -> {f} / -> {0.0} FractionPart :: Rational = d:Digit -> { toEnum d / 10.0 } / d:Digit f:FractionPart -> { (f + toEnum d) / 10.0 } ExponentPartOpt :: Integer = e:ExponentPart -> { e } / -> { 0 } ExponentPart :: Integer = ('e' / 'E') '-' e:Digits -> { -e } / ('e' / 'E') '+'? e:Digits -> { e } FloatTypeSuffixOpt :: {Rational -> Literal} = t:FloatTypeSuffix -> t / -> {\r -> LitDouble (fromRational r)} FloatTypeSuffix :: {Rational -> Literal} = ('f' / 'F') -> {\r -> LitFloat (fromRational r)} / ('d' / 'D') -> {\r -> LitDouble (fromRational r)} -- Boolean literals BooleanLiteral :: Literal = "true":Word -> { LitBool True } / "false":Word -> { LitBool False } -- Character and string literals CharacterLiteral :: Literal = "'" c:SingleCharacter "'" Spacing -> { LitChar c } / "'" c:EscapeSequence "'" Spacing -> { LitChar c } SingleCharacter :: Char = !"'" !"\\" c:InputCharacter -> c StringLiteral :: Literal = '"' s:StringCharacter* '"' Spacing -> { LitString s } StringCharacter :: Char = !'"' !'\\' c:InputCharacter -> c / c:EscapeSequence -> c EscapeSequence :: Char = '\\' 'b' -> {'\b'} / '\\' 't' -> {'\t'} / '\\' 'n' -> {'\n'} / '\\' 'f' -> {'\f'} / '\\' 'r' -> {'\r'} / '\\' '"' -> {'\"'} / '\\' "'" -> {'\''} / '\\' '\\' -> {'\\'} / c:OctalEscape -> c OctalEscape :: Char = '\\' x:ZeroToThree y:OctalDigit z:OctalDigit -> {chr (x*8*8 + y*8 + z)} / '\\' y:OctalDigit z:OctalDigit -> {chr (y*8 + z)} / '\\' z:OctalDigit -> {chr (z)} ZeroToThree :: Int = '0' -> {0} / '1' -> {1} / '2' -> {2} / '3' -> {3} -- Null literals NullLiteral :: Literal = "null":Word -> { LitNull } -------------------- Types -------------------- TypeSpec :: DeclType = t:TypeName d:Dims -> {DtArray t d} / TypeName TypeName :: DeclType = PrimitiveType / n:QualifiedName -> {DtName n} PrimitiveType :: DeclType = "byte":Word -> {DtByte} / "short":Word -> {DtShort} / "char":Word -> {DtChar} / "int":Word -> {DtInt} / "long":Word -> {DtLong} / "float":Word -> {DtFloat} / "double":Word -> {DtDouble} / "boolean":Word -> {DtBoolean} QualifiedName :: {[Identifier]} = i:Identifier is:(".":Sym i:Identifier -> i)* -> {i : is} DimsOpt :: Int = d:Dims -> d / -> {0} Dims :: Int = "[":Sym "]":Sym d:Dims -> {d+1} / "[":Sym "]":Sym -> {1} -------------------- Expressions -------------------- Expression :: Expression = l:CondExpr op:AssignmentOperator r:Expression -> {ExpBinary op l r} / CondExpr AssignmentOperator :: String = "=":Sym -> {"="} / "+=":Sym -> {"+="} / "-=":Sym -> {"-="} / "*=":Sym -> {"*="} / "/=":Sym -> {"/="} / "%=":Sym -> {"%="} / "&=":Sym -> {"&="} / "|=":Sym -> {"|="} / "^=":Sym -> {"^="} / "<<=":Sym -> {"<<="} / ">>=":Sym -> {">>="} / ">>>=":Sym -> {">>>="} CondExpr :: Expression = c:CondOrExpr "?":Sym t:Expression ":":Sym f:CondExpr -> {ExpCond c t f} / CondOrExpr CondOrExpr :: Expression = l:CondOrExpr "||":Sym r:CondAndExpr -> {ExpBinary "||" l r} / CondAndExpr CondAndExpr :: Expression = l:CondAndExpr "&&":Sym r:OrExpr -> {ExpBinary "&&" l r} / OrExpr OrExpr :: Expression = l:OrExpr "|":Sym r:XorExpr -> {ExpBinary "|" l r} / XorExpr XorExpr :: Expression = l:XorExpr "^":Sym r:AndExpr -> {ExpBinary "^" l r} / AndExpr AndExpr :: Expression = l:AndExpr "&":Sym r:EqExpr -> {ExpBinary "&" l r} / EqExpr EqExpr :: Expression = l:EqExpr "==":Sym r:RelExpr -> {ExpBinary "==" l r} / l:EqExpr "!=":Sym r:RelExpr -> {ExpBinary "!=" l r} / RelExpr RelExpr :: Expression = l:RelExpr "<=":Sym r:ShiftExpr -> {ExpBinary "<=" l r} / l:RelExpr ">=":Sym r:ShiftExpr -> {ExpBinary ">=" l r} / l:RelExpr "<":Sym r:ShiftExpr -> {ExpBinary "<" l r} / l:RelExpr ">":Sym r:ShiftExpr -> {ExpBinary ">" l r} / l:RelExpr "instanceof":Word t:TypeSpec -> {ExpInstanceof l t} / ShiftExpr ShiftExpr :: Expression = l:ShiftExpr "<<":Sym r:AddExpr -> {ExpBinary "<<" l r} / l:ShiftExpr ">>":Sym r:AddExpr -> {ExpBinary ">>" l r} / l:ShiftExpr ">>>":Sym r:AddExpr -> {ExpBinary ">>>" l r} / AddExpr AddExpr :: Expression = l:AddExpr "+":Sym r:MultExpr -> {ExpBinary "+" l r} / l:AddExpr "-":Sym r:MultExpr -> {ExpBinary "-" l r} / MultExpr MultExpr :: Expression = l:MultExpr "*":Sym r:UnaryExpr -> {ExpBinary "*" l r} / l:MultExpr "/":Sym r:UnaryExpr -> {ExpBinary "/" l r} / l:MultExpr "%":Sym r:UnaryExpr -> {ExpBinary "%" l r} / UnaryExpr UnaryExpr :: Expression = "++":Sym e:UnaryExpr -> {ExpPrefix "++" e} / "--":Sym e:UnaryExpr -> {ExpPrefix "--" e} / "+":Sym e:UnaryExpr -> {ExpPrefix "+" e} / "-":Sym e:UnaryExpr -> {ExpPrefix "-" e} / UnaryExprNotPlusMinus UnaryExprNotPlusMinus :: Expression = "~":Sym e:UnaryExpr -> {ExpPrefix "~" e} / "!":Sym e:UnaryExpr -> {ExpPrefix "!" e} / "(":Sym t:TypeName d:Dims ")":Sym e:UnaryExpr -> {ExpCast (DtArray t d) e} / "(":Sym t:PrimitiveType ")":Sym e:UnaryExpr -> {ExpCast t e} / "(":Sym t:TypeName ")":Sym e:UnaryExprNotPlusMinus -> {ExpCast t e} / PostfixExpr PostfixExpr :: Expression = l:PostfixExpr "[":Sym r:Expression? "]":Sym -> {ExpArray l r} / l:PostfixExpr a:Arguments -> {ExpCall l a} / l:PostfixExpr ".":Sym r:PrimExpr -> {ExpSelect l r} / l:PostfixExpr ".":Sym "class":Word -> {ExpDotClass l} / l:PostfixExpr "++":Sym -> {ExpPostfix "++" l} / l:PostfixExpr "--":Sym -> {ExpPostfix "--" l} / PrimExpr PrimExpr :: Expression = l:Literal -> {ExpLiteral l} / i:Identifier -> {ExpIdent i} / "(":Sym e:Expression ")":Sym -> e / "this":Word -> {ExpThis} / "super":Word -> {ExpSuper} / "new":Word n:QualifiedName a:Arguments b:ClassBody? -> {ExpNewClass n a b} / "new":Word t:TypeName de:DimExpr+ d:DimsOpt -> {ExpNewArray t de d} / "new":Word t:TypeName d:Dims i:ArrayInitializer -> {ExpNewArrayInit t d i} / "void":Word ".":Sym "class":Word -> {ExpVoidClass} Arguments :: {[Expression]} = "(":Sym e:Expression es:(",":Sym e:Expression -> e)* ")":Sym -> {e : es} / "(":Sym ")":Sym -> {[]} DimExpr :: Expression = "[":Sym e:Expression "]":Sym -> e ArrayInitializer :: {[Initializer]} = "{":Sym is:(i:Initializer ",":Sym -> i)* "}":Sym -> is / "{":Sym i:Initializer is:(",":Sym i:Initializer -> i)* "}":Sym -> {i : is} Initializer :: Initializer = ai:ArrayInitializer -> {IniList ai} / e:Expression -> {IniExpr e} -------------------- Statements -------------------- Block :: {[Statement]} = "{":Sym ss:BlockStatement* "}":Sym -> ss BlockStatement :: Statement = d:Declaration -> {StDecl d} / Statement Statement :: Statement = b:Block -> {StBlock b} / "if":Word e:ParExpr t:Statement f:("else":Word s:Statement -> s)? -> {StIf e t f} / "for":Word "(":Sym i:ForInitOpt ";":Sym c:Expression? ";":Sym n:ExpressionsOpt ")":Sym b:Statement -> {StFor i c n b} / "while":Word e:ParExpr b:Statement -> {StWhile e b} / "do":Word b:Statement "while":Word e:ParExpr ";":Sym -> {StDo b e} / "try":Word b:Block c:CatchClause* "finally":Word f:Block -> {StTry b c (Just f)} / "try":Word b:Block c:CatchClause+ -> {StTry b c Nothing} / "switch":Word e:ParExpr "{":Sym b:SwitchGroup* "}":Sym -> {StSwitch e b} / "synchronized":Word e:ParExpr b:Block -> {StSynch e b} / "return":Word e:Expression? ";":Sym -> {StReturn e} / "throw":Word e:Expression ";":Sym -> {StThrow e} / "break":Word i:Identifier? ";":Sym -> {StBreak i} / "continue":Word i:Identifier? ";":Sym -> {StContinue i} / l:Identifier ":":Sym s:Statement -> {StLabel l s} / e:Expression ";":Sym -> {StExpr e} / ";":Sym -> {StNull} ParExpr :: Expression = "(":Sym e:Expression ")":Sym -> e ForInitOpt :: ForInit = f:FinalOpt t:TypeSpec d:Declarators -> {FiDecl f t d} / e:ExpressionsOpt -> {FiExpr e} FinalOpt :: Bool = "final":Word -> {True} / -> {False} ExpressionsOpt :: {[Expression]} = e:Expression es:(",":Sym e:Expression -> e)* -> {e : es} / -> {[]} CatchClause :: CatchClause = "catch":Word "(":Sym p:FormalParam ")":Sym b:Block -> {(p, b)} SwitchGroup :: SwitchGroup = "case":Word e:Expression ":":Sym ss:BlockStatement* -> {SwCase e ss} / "default":Word ":":Sym ss:BlockStatement* -> {SwDefault ss} -------------------- Declarations -------------------- Modifier :: Modifier = "public":Word -> {ModPublic} / "protected":Word -> {ModProtected} / "private":Word -> {ModPrivate} / "static":Word -> {ModStatic} / "abstract":Word -> {ModAbstract} / "final":Word -> {ModFinal} / "native":Word -> {ModNative} / "synchronized":Word -> {ModSynchronized} / "transient":Word -> {ModTransient} / "volatile":Word -> {ModVolatile} / "strictfp":Word -> {ModStrictfp} FormalParam :: FormalParam = f:FinalOpt t:TypeSpec i:Identifier d:DimsOpt -> {(f, t, i, d)} FormalParams :: {[FormalParam]} = "(":Sym p:FormalParam ps:(",":Sym p:FormalParam -> p)* ")":Sym -> {p : ps} / "(":Sym ")":Sym -> {[]} Declarators :: {[Declarator]} = d:Declarator ds:(",":Sym d:Declarator -> d)* -> {d : ds} Declarator :: Declarator = id:Identifier dim:DimsOpt init:("=":Sym i:Initializer -> i)? -> {(id, dim, init)} ClassBody :: {[Declaration]} = "{":Sym ds:Declaration* "}":Sym -> ds Declaration :: Declaration = -- Variable declaration ms:Modifier* t:TypeSpec ds:Declarators ";":Sym -> {DeclSimple ms t ds} -- Method declaration / mods:Modifier* typ:(t:TypeSpec -> {Just t} / "void":Word -> {Nothing}) id:Identifier ps:FormalParams dim:DimsOpt th:ThrowsOpt body:(b:Block -> {Just b} / ";":Sym -> {Nothing}) -> {DeclMethod mods typ id ps dim th body} -- Constructor declaration / mods:Modifier* id:Identifier ps:FormalParams th:ThrowsOpt body:Block -> {DeclConstructor mods id ps th body} -- Class declaration / mods:Modifier* "class":Word id:Identifier ext:("extends":Word t:TypeSpec -> t)? imp:("implements":Word t:TypeSpec ts:(",":Sym t:TypeSpec -> t)* -> {t:ts} / -> {[]}) body:ClassBody -> {DeclClass mods id ext imp body} -- Interface declaration / mods:Modifier* "interface":Word id:Identifier ext:("extends":Word t:TypeSpec ts:(",":Sym t:TypeSpec -> t)* -> {t:ts} / -> {[]}) body:ClassBody -> {DeclInterface mods id ext body} -- Initialization block / st:("static":Word -> {True} / -> {False}) b:Block -> {DeclBlock st b} ThrowsOpt :: {[Name]} = "throws":Word n:QualifiedName ns:(",":Sym n:QualifiedName -> n)* -> {n : ns} / -> {[]} -------------------- Top Level -------------------- CompilationUnit :: CompUnit = Spacing -- eat whitespace at beginning pkg:("package":Word n:QualifiedName ";":Sym -> {Just n} / -> {Nothing}) imps:ImportDecl* decls:Declaration* !Char -- make sure we consume all input text -> {(pkg, imps, decls)} ImportDecl :: ImportDecl = "import":Word name:QualifiedName allflag:(".":Sym "*":Sym -> {True} / -> {False}) ";":Sym -> {(name, allflag)} { -- Preprocess unicode escapes and newlines before parsing javaPrep :: String -> String javaPrep [] = [] javaPrep ('\r':'\n':s) = '\n':javaPrep s javaPrep ('\r':s) = '\n':javaPrep s javaPrep ('\\':'\\':s) = '\\':'\\':javaPrep s javaPrep ('\\':'u':s) = case s of h1:h2:h3:h4:s -> if isHexDigit h1 && isHexDigit h2 && isHexDigit h3 && isHexDigit h4 then chr v4:javaPrep s else error "invalid Unicode escape sequence" where v1 = digitToInt h1 v2 = v1*16 + digitToInt h2 v3 = v2*16 + digitToInt h3 v4 = v3*16 + digitToInt h4 _ -> error "incomplete Unicode escape sequence" javaPrep (c:s) = c:javaPrep s javaParseFile :: FilePath -> IO CompUnit javaParseFile name = do text <- readFile name let text' = javaPrep text derivs = javaParse name text' case javaCompilationUnit derivs of Parsed cu _ _ -> return cu NoParse e -> fail (show e) }Parse.hs Скачать
module Parse where import Char import List import Pos ---------- Data types used for parsing data ErrorDescriptor = Expected String | Message String data ParseError = ParseError { errorPos :: Pos, errorDescrs :: [ErrorDescriptor] } data Result d v = Parsed v d ParseError | NoParse ParseError newtype Parser d v = Parser (d -> Result d v) class Derivs d where dvPos :: d -> Pos dvChar :: d -> Result d Char ---------- Basic parsing combinators infixl 2 </> -- ordered choice infixl 1 <?> -- error labeling infixl 1 <?!> -- unconditional error labeling -- Standard monadic combinators instance Derivs d => Monad (Parser d) where -- Sequencing combinator (Parser p1) >>= f = Parser parse where parse dvs = first (p1 dvs) first (Parsed val rem err) = let Parser p2 = f val in second err (p2 rem) first (NoParse err) = NoParse err second err1 (Parsed val rem err) = Parsed val rem (joinErrors err1 err) second err1 (NoParse err) = NoParse (joinErrors err1 err) -- Result-producing combinator return x = Parser (\dvs -> Parsed x dvs (nullError dvs)) -- Failure combinator fail [] = Parser (\dvs -> NoParse (nullError dvs)) fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg)) -- Ordered choice (</>) :: Derivs d => Parser d v -> Parser d v -> Parser d v (Parser p1) </> (Parser p2) = Parser parse where parse dvs = first dvs (p1 dvs) first dvs (result @ (Parsed val rem err)) = result first dvs (NoParse err) = second err (p2 dvs) second err1 (Parsed val rem err) = Parsed val rem (joinErrors err1 err) second err1 (NoParse err) = NoParse (joinErrors err1 err) -- Semantic predicate: 'satisfy <parser> <pred>' acts like <parser> -- but only succeeds if the result it generates satisfies <pred>. satisfy :: Derivs d => Parser d v -> (v -> Bool) -> Parser d v satisfy (Parser p) test = Parser parse where parse dvs = check dvs (p dvs) check dvs (result @ (Parsed val rem err)) = if test val then result else NoParse (nullError dvs) check dvs none = none -- Syntactic predicate: 'followedBy <parser>' acts like <parser> -- but does not consume any input. followedBy :: Derivs d => Parser d v -> Parser d v followedBy (Parser p) = Parser parse where parse dvs = case (p dvs) of Parsed val rem err -> Parsed val dvs (nullError dvs) err -> err -- Negative syntactic predicate: 'followedBy <parser>' invokes <parser>, -- then succeeds without consuming any input if <parser> fails, -- and fails if <parser> succeeds. notFollowedBy :: Derivs d => Parser d v -> Parser d () notFollowedBy (Parser p) = Parser parse where parse dvs = case (p dvs) of Parsed val rem err -> NoParse (nullError dvs) NoParse err -> Parsed () dvs (nullError dvs) -- Optional combinator: 'optional <parser>' invokes <parser>, -- then produces the result 'Just <v>' if <parser> produced <v>, -- or else produces the success result 'Nothing' -- without consuming any input if <parser> failed. optional :: Derivs d => Parser d v -> Parser d (Maybe v) optional p = (do v <- p; return (Just v)) </> return Nothing ---------- Iterative combinators -- Note: use of these combinators can break -- a packrat parser's linear-time guarantee. -- Zero or more repetition combinator: -- 'many <parser>' invokes <parser> repeatedly until it fails, -- collecting all success result values into a list. -- Always succeeds, producing an empty list in the degenerate case. many :: Derivs d => Parser d v -> Parser d [v] many p = (do { v <- p; vs <- many p; return (v : vs) } ) </> return [] -- One or more repetition combinator: -- 'many1 <parser>' invokes <parser> repeatedly until it fails, -- collecting all success result values into a list. -- Fails if <parser> does not succeed even once. many1 :: Derivs d => Parser d v -> Parser d [v] many1 p = do { v <- p; vs <- many p; return (v : vs) } -- One or more repetitions with a separator: -- 'sepBy1 <parser> <separator>' scans one or more iterations of <parser>, -- with a match of <separator> between each instance. -- Only the results of <parser> are collected into the final result list. sepBy1 :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v] sepBy1 p psep = do v <- p vs <- many (do { psep; p }) return (v : vs) -- Zero or more repetitions with a separator: -- like sepBy1, but succeeds with an empty list if nothing can be parsed. sepBy :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v] sepBy p psep = sepBy1 p psep </> return [] -- Zero or more repetitions with a terminator endBy :: Derivs d => Parser d v -> Parser d vend -> Parser d [v] endBy p pend = many (do { v <- p; pend; return v }) -- One or more repetitions with a terminator endBy1 :: Derivs d => Parser d v -> Parser d vend -> Parser d [v] endBy1 p pend = many1 (do { v <- p; pend; return v }) -- One or more repetitions with a separator or terminator: -- 'sepEndBy1 <parser> <septerm>' scans for a sequence of <parser> matches -- in which instances are separated by <septerm>, -- and if a <septerm> is found following the last <parser> match -- then it is consumed as well. sepEndBy1 :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v] sepEndBy1 p psep = do v <- sepBy1 p psep; optional psep; return v -- Zero or more repetitions with a separator or terminator. sepEndBy :: Derivs d => Parser d v -> Parser d vsep -> Parser d [v] sepEndBy p psep = do v <- sepBy p psep; optional psep; return v -- One or more repetitions separated by left-associative operators. -- 'chainl1 <term> <oper>' matches instances of <term> separated by <oper>, -- but uses the result of <oper> as a left-associative binary combinator: -- e.g., 't1 op t2 op t3' is interpreted as '(t1 op t2) op t3' chainl1 :: Derivs d => Parser d v -> Parser d (v->v->v) -> Parser d v chainl1 p psep = let psuffix z = (do f <- psep v <- p psuffix (f z v)) </> return z in do v <- p psuffix v -- Zero or more repetitions separated by left-associative operators. chainl :: Derivs d => Parser d v -> Parser d (v->v->v) -> v -> Parser d v chainl p psep z = chainl1 p psep </> return z -- One or more repetitions separated by left-associative operators: -- e.g., 't1 op t2 op t3' is interpreted as 't1 op (t2 op t3)' chainr1 :: Derivs d => Parser d v -> Parser d (v->v->v) -> Parser d v chainr1 p psep = (do v <- p f <- psep w <- chainr1 p psep return (f v w)) </> p -- Zero or more repetitions separated by left-associative operators. chainr :: Derivs d => Parser d v -> Parser d (v->v->v) -> v -> Parser d v chainr p psep z = chainr1 p psep </> return z -- N-ary ordered choice: -- given a list of parsers producing results of the same type, -- try them all in order and use the first successful result. choice :: Derivs d => [Parser d v] -> Parser d v choice [p] = p choice (p:ps) = p </> choice ps ---------- Error handling instance Eq ErrorDescriptor where Expected e1 == Expected e2 = e1 == e2 Message m1 == Message m2 = m1 == m2 _ == _ = False failAt :: Derivs d => Pos -> String -> Parser d v failAt pos msg = Parser (\dvs -> NoParse (msgError pos msg)) -- Annotate a parser with a description of the construct to be parsed. -- The resulting parser yields an "expected" error message -- if the construct cannot be parsed -- and if no error information is already available -- indicating a position farther right in the source code -- (which would normally be more localized/detailed information). (<?>) :: Derivs d => Parser d v -> String -> Parser d v (Parser p) <?> desc = Parser (\dvs -> munge dvs (p dvs)) where munge dvs (Parsed v rem err) = Parsed v rem (fix dvs err) munge dvs (NoParse err) = NoParse (fix dvs err) fix dvs (err @ (ParseError p ms)) = if p > dvPos dvs then err else expError (dvPos dvs) desc -- Stronger version of the <?> error annotation operator above, -- which unconditionally overrides any existing error information. (<?!>) :: Derivs d => Parser d v -> String -> Parser d v (Parser p) <?!> desc = Parser (\dvs -> munge dvs (p dvs)) where munge dvs (Parsed v rem err) = Parsed v rem (fix dvs err) munge dvs (NoParse err) = NoParse (fix dvs err) fix dvs (err @ (ParseError p ms)) = expError (dvPos dvs) desc -- Join two ParseErrors, giving preference to the one farthest right, -- or merging their descriptor sets if they are at the same position. joinErrors :: ParseError -> ParseError -> ParseError joinErrors (e @ (ParseError p m)) (e' @ (ParseError p' m')) = if p' > p || null m then e' else if p > p' || null m' then e else ParseError p (m `union` m') nullError dvs = ParseError (dvPos dvs) [] expError pos desc = ParseError pos [Expected desc] msgError pos msg = ParseError pos [Message msg] eofError dvs = msgError (dvPos dvs) "end of input" expected :: Derivs d => String -> Parser d v expected desc = Parser (\dvs -> NoParse (expError (dvPos dvs) desc)) unexpected :: Derivs d => String -> Parser d v unexpected str = fail ("unexpected " ++ str) -- Comparison operators for ParseError just compare relative positions. instance Eq ParseError where ParseError p1 m1 == ParseError p2 m2 = p1 == p2 ParseError p1 m1 /= ParseError p2 m2 = p1 /= p2 instance Ord ParseError where ParseError p1 m1 < ParseError p2 m2 = p1 < p2 ParseError p1 m1 > ParseError p2 m2 = p1 > p2 ParseError p1 m1 <= ParseError p2 m2 = p1 <= p2 ParseError p1 m1 >= ParseError p2 m2 = p1 >= p2 -- Special behavior: "max" joins two errors max p1 p2 = joinErrors p1 p2 min p1 p2 = undefined -- Show function for error messages instance Show ParseError where show (ParseError pos []) = show pos ++ ": parse error" show (ParseError pos msgs) = expectmsg expects ++ messages msgs where expects = getExpects msgs getExpects [] = [] getExpects (Expected exp : rest) = exp : getExpects rest getExpects (Message msg : rest) = getExpects rest expectmsg [] = "" expectmsg [exp] = show pos ++ ": expecting " ++ exp ++ "\n" expectmsg [e1, e2] = show pos ++ ": expecting either " ++ e1 ++ " or " ++ e2 ++ "\n" expectmsg (first : rest) = show pos ++ ": expecting one of: " ++ first ++ expectlist rest ++ "\n" expectlist [last] = ", or " ++ last expectlist (mid : rest) = ", " ++ mid ++ expectlist rest messages [] = [] messages (Expected exp : rest) = messages rest messages (Message msg : rest) = show pos ++ ": " ++ msg ++ "\n" ++ messages rest ---------- Character-oriented parsers -- 'anyChar' matches any single character. anyChar :: Derivs d => Parser d Char anyChar = Parser dvChar -- 'char <c>' matches the specific character <c>. char :: Derivs d => Char -> Parser d Char char ch = satisfy anyChar (\c -> c == ch) <?> show ch -- 'oneOf <s>' matches any character in string <s>. oneOf :: Derivs d => [Char] -> Parser d Char oneOf chs = satisfy anyChar (\c -> c `elem` chs) <?> ("one of the characters " ++ show chs) -- 'noneOf <s>' matches any character not in string <s>. noneOf :: Derivs d => [Char] -> Parser d Char noneOf chs = satisfy anyChar (\c -> not (c `elem` chs)) <?> ("any character not in " ++ show chs) -- 'string <s>' matches all the characters in <s> in sequence. string :: Derivs d => String -> Parser d String string str = p str <?> show str where p [] = return str p (ch:chs) = do { char ch; p chs } -- 'stringFrom <ss>' matches any string in the list of strings <ss>. -- If any strings in <ss> are prefixes of other strings in <ss>, -- then the prefixes must appear later in the list -- in order for the longer strings to be recognized at all. stringFrom :: Derivs d => [String] -> Parser d String stringFrom [str] = string str stringFrom (str : strs) = string str </> stringFrom strs -- Match an uppercase letter. upper :: Derivs d => Parser d Char upper = satisfy anyChar isUpper <?> "uppercase letter" -- Match a lowercase letter. lower :: Derivs d => Parser d Char lower = satisfy anyChar isLower <?> "lowercase letter" -- Match any letter. letter :: Derivs d => Parser d Char letter = satisfy anyChar isAlpha <?> "letter" -- Match any letter or digit. alphaNum :: Derivs d => Parser d Char alphaNum = satisfy anyChar isAlphaNum <?> "letter or digit" -- Match any digit. digit :: Derivs d => Parser d Char digit = satisfy anyChar isDigit <?> "digit" -- Match any hexadecimal digit. hexDigit :: Derivs d => Parser d Char hexDigit = satisfy anyChar isHexDigit <?> "hexadecimal digit (0-9, a-f)" -- Match any octal digit. octDigit :: Derivs d => Parser d Char octDigit = satisfy anyChar isOctDigit <?> "octal digit (0-7)" -- Match a newline. newline :: Derivs d => Parser d Char newline = char '\n' -- Match a tab character. tab :: Derivs d => Parser d Char tab = char '\t' -- Match any whitespace character (space, tab, newline, etc.). space :: Derivs d => Parser d Char space = satisfy anyChar isSpace <?> "whitespace character" -- Match a sequence of zero or more whitespace characters. spaces :: Derivs d => Parser d [Char] spaces = many space -- Match the end of file (i.e., "the absence of a character"). eof :: Derivs d => Parser d () eof = notFollowedBy anyChar <?> "end of input" ---------- Parser state manipulation combinators -- Combinator to get the Derivs object for the current position: -- e.g., 'dvs <- getDerivs' as part of a 'do' sequence. getDerivs :: Derivs d => Parser d d getDerivs = Parser (\dvs -> Parsed dvs dvs (nullError dvs)) -- Combinator to set the Derivs object used for subsequent parsing; -- typically used to change parsing state elements in the Derivs tuple. setDerivs :: Derivs d => d -> Parser d () setDerivs newdvs = Parser (\dvs -> Parsed () newdvs (nullError dvs)) -- Get the current position in the input text. getPos :: Derivs d => Parser d Pos getPos = Parser (\dvs -> Parsed (dvPos dvs) dvs (nullError dvs))Pos.hs Скачать
-- Simple data type to keep track of character positions -- within a text file or other text stream. module Pos where -- Basic position indicator data type: filename, line number, column number. data Pos = Pos { posFile :: !String, posLine :: !Int, posCol :: !Int } -- Incrementally compute the next position in a text file -- if 'c' is the character at the current position. -- Follows the standard convention of 8-character tab stops. nextPos (Pos file line col) c = if c == '\n' then Pos file (line + 1) 1 else if c == '\t' then Pos file line ((div (col + 8 - 1) 8) * 8 + 1) else Pos file line (col + 1) -- Two positions are equal if each of their elements are equal. instance Eq Pos where Pos f1 l1 c1 == Pos f2 l2 c2 = f1 == f2 && l1 == l2 && c1 == c2 -- Two positions are ordered by line number, then column number. instance Ord Pos where Pos f1 l1 c1 <= Pos f2 l2 c2 = (l1 < l2) || (l1 == l2 && c1 <= c2) -- Standard way to display positions - "file:line:col" instance Show Pos where show (Pos file line col) = file ++ ":" ++ show line ++ ":" ++ show col -- Show a position relative to a base position. -- If the new position is in the same line, just show its column number; -- otherwise if the new position is in the same file, -- just show its line and column numbers; -- otherwise show the complete new position. showPosRel (Pos file line col) (Pos file' line' col') = if (file == file') then if (line == line') then "column " ++ show col' else "line " ++ show line' ++ ", column " ++ show col' else show (Pos file' line' col')Arith.pappy Скачать
parser Triv: top Expression -- Expressions Expression :: Integer = Spacing v:Additive !Char -> { v } Additive :: Integer = l:Additive "+":Symbol r:Multitive -> { l + r } / l:Additive "-":Symbol r:Multitive -> { l - r } / v:Multitive -> { v } Multitive :: Integer = l:Multitive "*":Symbol r:Primary -> { l * r } / l:Multitive "/":Symbol r:Primary -> { l `div` r } / l:Multitive "%":Symbol r:Primary -> { l `mod` r } / v:Primary -> { v } Primary :: Integer = v:Decimal -> { v } / "(":Symbol v:Additive ")":Symbol -> { v } -- Decimal literals Decimal :: Integer = v:Digits Spacing -> { v } Digits :: Integer = v:Digits d:Digit -> { v * 10 + toInteger d } / d:Digit -> { toInteger d } Digit :: Int = c:Char &{isDigit c} -> { digitToInt c } -- Symbols Symbol :: String = s:SymChars Spacing -> { s } SymChars :: String = "+" / "-" / "*" / "/" / "%" / "(" / ")" -- Spacing between tokens Spacing :: {()} = SpaceChar* -> { () } SpaceChar :: Char = c:Char &{isSpace c} -> { c } { eval :: String -> Integer eval str = case trivExpression (trivParse "expression" str) of Parsed v _ _ -> v NoParse e -> error (show e) }ArithMonad.hs Скачать
-- Monadic packrat parser for trivial arithmetic language -- with left-associative operators and integrated lexical analysis. -- Uses NO "unsafe" combinators containing hidden recursion. module ArithMonad where import Pos import Parse data ArithDerivs = ArithDerivs { advAdditive :: Result ArithDerivs Int, advAdditiveSuffix :: Result ArithDerivs (Int -> Int), advMultitive :: Result ArithDerivs Int, advMultitiveSuffix :: Result ArithDerivs (Int -> Int), advPrimary :: Result ArithDerivs Int, advDecimal :: Result ArithDerivs Int, advDigits :: Result ArithDerivs (Int, Int), advDigit :: Result ArithDerivs Int, advSymbol :: Result ArithDerivs Char, advSpacing :: Result ArithDerivs (), advChar :: Result ArithDerivs Char, advPos :: Pos } instance Derivs ArithDerivs where dvChar d = advChar d dvPos d = advPos d -- Evaluate an expression and return the unpackaged result, -- ignoring any unparsed remainder. eval s = case pExpression (parse (Pos "<input>" 1 1) s) of Parsed v _ _ -> v NoParse e -> error (show e) where Parser pExpression = do Parser advSpacing v <- Parser advAdditive notFollowedBy anyChar return v -- Construct a (lazy) parse result structure for an input string, -- in which any result can be computed in linear time -- with respect to the length of the input. parse :: Pos -> String -> ArithDerivs parse pos s = d where d = ArithDerivs add addsuf mult multsuf prim dec digs dig sym spc chr pos add = pAdditive d addsuf = pAdditiveSuffix d mult = pMultitive d multsuf = pMultitiveSuffix d prim = pPrimary d dec = pDecimal d digs = pDigits d dig = pDigit d sym = pSymbol d spc = pSpacing d chr = case s of (c:s') -> Parsed c (parse (nextPos pos c) s') (nullError d) [] -> NoParse (eofError d) -- Parse an additive-precedence expression pAdditive :: ArithDerivs -> Result ArithDerivs Int Parser pAdditive = (do l <- Parser advMultitive f <- Parser advAdditiveSuffix return (f l)) pAdditiveSuffix :: ArithDerivs -> Result ArithDerivs (Int -> Int) Parser pAdditiveSuffix = (do symbol '+' r <- Parser advMultitive f <- Parser advAdditiveSuffix return (\l -> f (l + r))) </> (do symbol '-' r <- Parser advMultitive f <- Parser advAdditiveSuffix return (\l -> f (l - r))) </> (do return (\v -> v)) -- Parse a multiplicative-precedence expression pMultitive :: ArithDerivs -> Result ArithDerivs Int Parser pMultitive = (do l <- Parser advPrimary f <- Parser advMultitiveSuffix return (f l)) pMultitiveSuffix :: ArithDerivs -> Result ArithDerivs (Int -> Int) Parser pMultitiveSuffix = (do symbol '*' r <- Parser advMultitive f <- Parser advMultitiveSuffix return (\l -> f (l * r))) </> (do symbol '/' r <- Parser advMultitive f <- Parser advMultitiveSuffix return (\l -> f (l `div` r))) </> (do symbol '%' r <- Parser advMultitive f <- Parser advMultitiveSuffix return (\l -> f (l `mod` r))) </> (do return (\v -> v)) -- Parse a primary expression pPrimary :: ArithDerivs -> Result ArithDerivs Int Parser pPrimary = (do symbol '(' v <- Parser advAdditive symbol ')' return v) </> (do Parser advDecimal) -- Parse a decimal number followed by optional whitespace pDecimal :: ArithDerivs -> Result ArithDerivs Int Parser pDecimal = (do (v,n) <- Parser advDigits Parser advSpacing return v) <?> "decimal number" -- Parse a string of consecutive decimal digits. -- The result is (value, number of digits). pDigits :: ArithDerivs -> Result ArithDerivs (Int, Int) Parser pDigits = (do vl <- Parser advDigit (vr,n) <- Parser advDigits return (vl*10^n + vr, n+1)) </> (do vl <- Parser advDigit return (vl, 1)) -- Parse a decimal digit pDigit :: ArithDerivs -> Result ArithDerivs Int Parser pDigit = (do c <- digit return (digitToInt c)) -- "Parser combinator" to look for a specific symbol. -- Cannot be memoized directly because it takes a parameter, -- but that does not matter because it is non-recursive. symbol c = (do c' <- Parser advSymbol if c' == c then return c else fail []) <?> show c -- Parse a symbol character followed by optional whitespace pSymbol :: ArithDerivs -> Result ArithDerivs Char Parser pSymbol = do c <- oneOf "+-*/%()" Parser advSpacing return c -- Parse zero or more whitespace characters pSpacing :: ArithDerivs -> Result ArithDerivs () Parser pSpacing = (do space Parser advSpacing return ()) </> (do return ())ArithLex.hs Скачать
-- Packrat parser for arithmetic expression language -- with left-associative operators and integrated lexical analysis. module ArithLex where data Result v = Parsed v Derivs | NoParse data Derivs = Derivs { dvAdditive :: Result Int, dvAdditiveSuffix :: Result (Int -> Int), dvMultitive :: Result Int, dvMultitiveSuffix :: Result (Int -> Int), dvPrimary :: Result Int, dvDecimal :: Result Int, dvDigits :: Result (Int, Int), dvDigit :: Result Int, dvSymbol :: Result Char, dvSpacing :: Result (), dvChar :: Result Char } -- Evaluate an expression and return the unpackaged result, -- ignoring any unparsed remainder. eval s = case dvSpacing (parse s) of Parsed _ d' -> case dvAdditive d' of Parsed v d'' -> case dvChar d'' of Parsed _ _ -> error "Incomplete parse" _ -> v _ -> error "Parse error" -- Construct a (lazy) parse result structure for an input string, -- in which any result can be computed in linear time -- with respect to the length of the input. parse :: String -> Derivs parse s = d where d = Derivs add addsuff mult multsuff prim dec digs dig sym white chr add = pAdditive d addsuff = pAdditiveSuffix d mult = pMultitive d multsuff = pMultitiveSuffix d prim = pPrimary d dec = pDecimal d digs = pDigits d dig = pDigit d sym = pSymbol d white = pSpacing d chr = case s of (c:s') -> Parsed c (parse s') [] -> NoParse -- Parse an additive-precedence expression -- Additive <- Multitive AdditiveSuffix pAdditive :: Derivs -> Result Int pAdditive d = case dvMultitive d of Parsed vleft d' -> case dvAdditiveSuffix d' of Parsed vsuff d'' -> Parsed (vsuff vleft) d'' _ -> NoParse _ -> NoParse -- Parse an additive-precedence expression suffix pAdditiveSuffix :: Derivs -> Result (Int -> Int) pAdditiveSuffix d = alt1 where -- AdditiveSuffix <- '+' Multitive AdditiveSuffix alt1 = case dvSymbol d of Parsed '+' d' -> case dvMultitive d' of Parsed vright d'' -> case dvAdditiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft + vright)) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- AdditiveSuffix <- '-' Multitive AdditiveSuffix alt2 = case dvSymbol d of Parsed '-' d' -> case dvMultitive d' of Parsed vright d'' -> case dvAdditiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft - vright)) d''' _ -> alt3 _ -> alt3 _ -> alt3 -- AdditiveSuffix <- <empty> alt3 = Parsed (\v -> v) d -- Parse a multiplicative-precedence expression -- Multitive <- Primary MultitiveSuffix pMultitive :: Derivs -> Result Int pMultitive d = case dvPrimary d of Parsed vleft d' -> case dvMultitiveSuffix d' of Parsed vsuff d'' -> Parsed (vsuff vleft) d'' _ -> NoParse _ -> NoParse -- Parse a multiplicative-precedence expression suffix pMultitiveSuffix :: Derivs -> Result (Int -> Int) pMultitiveSuffix d = alt1 where -- MultitiveSuffix <- '*' Primary MultitiveSuffix alt1 = case dvSymbol d of Parsed '*' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft * vright)) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- MultitiveSuffix <- '/' Primary MultitiveSuffix alt2 = case dvSymbol d of Parsed '/' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft `div` vright)) d''' _ -> alt3 _ -> alt3 _ -> alt3 -- MultitiveSuffix <- '%' Primary MultitiveSuffix alt3 = case dvSymbol d of Parsed '%' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft `mod` vright)) d''' _ -> alt4 _ -> alt4 _ -> alt4 -- MultitiveSuffix <- <empty> alt4 = Parsed (\v -> v) d -- Parse a primary expression pPrimary :: Derivs -> Result Int pPrimary d = alt1 where -- Primary <- '(' Additive ')' alt1 = case dvSymbol d of Parsed '(' d' -> case dvAdditive d' of Parsed v d'' -> case dvSymbol d'' of Parsed ')' d''' -> Parsed v d''' _ -> alt2 _ -> alt2 _ -> alt2 -- Primary <- Decimal alt2 = case dvDecimal d of Parsed v d' -> Parsed v d' NoParse -> NoParse -- Parse a decimal number followed by optional whitespace pDecimal :: Derivs -> Result Int pDecimal d = case dvDigits d of Parsed (v,n) d' -> case dvSpacing d' of Parsed _ d'' -> Parsed v d'' _ -> NoParse _ -> NoParse -- Parse a string of consecutive decimal digits -- The result is (value, number of digits). pDigits :: Derivs -> Result (Int, Int) pDigits d = case dvDigit d of Parsed vl d' -> case dvDigits d' of Parsed (vr, n) d'' -> Parsed (vl*10^n+vr, n+1) d'' _ -> Parsed (vl, 1) d' _ -> NoParse -- Parse a decimal digit pDigit :: Derivs -> Result Int pDigit d = case dvChar d of Parsed '0' d' -> Parsed 0 d' Parsed '1' d' -> Parsed 1 d' Parsed '2' d' -> Parsed 2 d' Parsed '3' d' -> Parsed 3 d' Parsed '4' d' -> Parsed 4 d' Parsed '5' d' -> Parsed 5 d' Parsed '6' d' -> Parsed 6 d' Parsed '7' d' -> Parsed 7 d' Parsed '8' d' -> Parsed 8 d' Parsed '9' d' -> Parsed 9 d' _ -> NoParse -- Parse a symbol character followed by optional whitespace pSymbol :: Derivs -> Result Char pSymbol d = case dvChar d of Parsed c d' -> if c `elem` "+-*/%()" then case dvSpacing d' of Parsed _ d'' -> Parsed c d'' _ -> NoParse else NoParse _ -> NoParse -- Parse zero or more whitespace characters pSpacing :: Derivs -> Result () pSpacing d = case dvChar d of Parsed c d' -> if isSpace c then pSpacing d' else Parsed () d _ -> Parsed () dArithLeft.hs Скачать
-- Packrat parser for arithmetic expression language -- supporting left-associative infix operators module ArithLeft where data Result v = Parsed v Derivs | NoParse data Derivs = Derivs { dvAdditive :: Result Int, dvAdditiveSuffix :: Result (Int -> Int), dvMultitive :: Result Int, dvMultitiveSuffix :: Result (Int -> Int), dvPrimary :: Result Int, dvDecimal :: Result Int, dvChar :: Result Char } -- Evaluate an expression and return the unpackaged result, -- ignoring any unparsed remainder. eval s = case dvAdditive (parse s) of Parsed v rem -> v _ -> error "Parse error" -- Construct a (lazy) parse result structure for an input string, -- in which any result can be computed in linear time -- with respect to the length of the input. parse :: String -> Derivs parse s = d where d = Derivs add addsuff mult multsuff prim dec chr add = pAdditive d addsuff = pAdditiveSuffix d mult = pMultitive d multsuff = pMultitiveSuffix d prim = pPrimary d dec = pDecimal d chr = case s of (c:s') -> Parsed c (parse s') [] -> NoParse -- Parse an additive-precedence expression -- Additive <- Multitive AdditiveSuffix pAdditive :: Derivs -> Result Int pAdditive d = case dvMultitive d of Parsed vleft d' -> case dvAdditiveSuffix d' of Parsed vsuff d'' -> Parsed (vsuff vleft) d'' _ -> NoParse _ -> NoParse -- Parse an additive-precedence expression suffix pAdditiveSuffix :: Derivs -> Result (Int -> Int) pAdditiveSuffix d = alt1 where -- AdditiveSuffix <- '+' Multitive AdditiveSuffix alt1 = case dvChar d of Parsed '+' d' -> case dvMultitive d' of Parsed vright d'' -> case dvAdditiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft + vright)) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- AdditiveSuffix <- '-' Multitive AdditiveSuffix alt2 = case dvChar d of Parsed '-' d' -> case dvMultitive d' of Parsed vright d'' -> case dvAdditiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft - vright)) d''' _ -> alt3 _ -> alt3 _ -> alt3 -- AdditiveSuffix <- <empty> alt3 = Parsed (\v -> v) d -- Parse a multiplicative-precedence expression -- Multitive <- Primary MultitiveSuffix pMultitive :: Derivs -> Result Int pMultitive d = case dvPrimary d of Parsed vleft d' -> case dvMultitiveSuffix d' of Parsed vsuff d'' -> Parsed (vsuff vleft) d'' _ -> NoParse _ -> NoParse -- Parse a multiplicative-precedence expression suffix pMultitiveSuffix :: Derivs -> Result (Int -> Int) pMultitiveSuffix d = alt1 where -- MultitiveSuffix <- '*' Primary MultitiveSuffix alt1 = case dvChar d of Parsed '*' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft * vright)) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- MultitiveSuffix <- '/' Primary MultitiveSuffix alt2 = case dvChar d of Parsed '/' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft `div` vright)) d''' _ -> alt3 _ -> alt3 _ -> alt3 -- MultitiveSuffix <- '%' Primary MultitiveSuffix alt3 = case dvChar d of Parsed '%' d' -> case dvPrimary d' of Parsed vright d'' -> case dvMultitiveSuffix d'' of Parsed vsuff d''' -> Parsed (\vleft -> vsuff (vleft `mod` vright)) d''' _ -> alt4 _ -> alt4 _ -> alt4 -- MultitiveSuffix <- <empty> alt4 = Parsed (\v -> v) d -- Parse a primary expression pPrimary :: Derivs -> Result Int pPrimary d = alt1 where -- Primary <- '(' Additive ')' alt1 = case dvChar d of Parsed '(' d' -> case dvAdditive d' of Parsed v d'' -> case dvChar d'' of Parsed ')' d''' -> Parsed v d''' _ -> alt2 _ -> alt2 _ -> alt2 -- Primary <- Decimal alt2 = case dvDecimal d of Parsed v d' -> Parsed v d' NoParse -> NoParse -- Parse a decimal digit pDecimal :: Derivs -> Result Int pDecimal d = case dvChar d of Parsed '0' d' -> Parsed 0 d' Parsed '1' d' -> Parsed 1 d' Parsed '2' d' -> Parsed 2 d' Parsed '3' d' -> Parsed 3 d' Parsed '4' d' -> Parsed 4 d' Parsed '5' d' -> Parsed 5 d' Parsed '6' d' -> Parsed 6 d' Parsed '7' d' -> Parsed 7 d' Parsed '8' d' -> Parsed 8 d' Parsed '9' d' -> Parsed 9 d' _ -> NoParseArithPackrat.hs Скачать
-- Packrat parser for trivial arithmetic language. module ArithPackrat where data Result v = Parsed v Derivs | NoParse data Derivs = Derivs { dvAdditive :: Result Int, dvMultitive :: Result Int, dvPrimary :: Result Int, dvDecimal :: Result Int, dvChar :: Result Char } -- Evaluate an expression and return the unpackaged result, -- ignoring any unparsed remainder. eval s = case dvAdditive (parse s) of Parsed v rem -> v _ -> error "Parse error" -- Construct a (lazy) parse result structure for an input string, -- in which any result can be computed in linear time -- with respect to the length of the input. parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr add = pAdditive d mult = pMultitive d prim = pPrimary d dec = pDecimal d chr = case s of (c:s') -> Parsed c (parse s') [] -> NoParse -- Parse an additive-precedence expression pAdditive :: Derivs -> Result Int pAdditive d = alt1 where -- Additive <- Multitive '+' Additive alt1 = case dvMultitive d of Parsed vleft d' -> case dvChar d' of Parsed '+' d'' -> case dvAdditive d'' of Parsed vright d''' -> Parsed (vleft + vright) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- Additive <- Multitive alt2 = case dvMultitive d of Parsed v d' -> Parsed v d' NoParse -> NoParse -- Parse a multiplicative-precedence expression pMultitive :: Derivs -> Result Int pMultitive d = alt1 where -- Multitive <- Primary '*' Multitive alt1 = case dvPrimary d of Parsed vleft d' -> case dvChar d' of Parsed '*' d'' -> case dvMultitive d'' of Parsed vright d''' -> Parsed (vleft * vright) d''' _ -> alt2 _ -> alt2 _ -> alt2 -- Multitive <- Primary alt2 = case dvPrimary d of Parsed v d' -> Parsed v d' NoParse -> NoParse -- Parse a primary expression pPrimary :: Derivs -> Result Int pPrimary d = alt1 where -- Primary <- '(' Additive ')' alt1 = case dvChar d of Parsed '(' d' -> case dvAdditive d' of Parsed v d'' -> case dvChar d'' of Parsed ')' d''' -> Parsed v d''' _ -> alt2 _ -> alt2 _ -> alt2 -- Primary <- Decimal alt2 = case dvDecimal d of Parsed v d' -> Parsed v d' NoParse -> NoParse -- Parse a decimal digit pDecimal :: Derivs -> Result Int pDecimal d = case dvChar d of Parsed '0' d' -> Parsed 0 d' Parsed '1' d' -> Parsed 1 d' Parsed '2' d' -> Parsed 2 d' Parsed '3' d' -> Parsed 3 d' Parsed '4' d' -> Parsed 4 d' Parsed '5' d' -> Parsed 5 d' Parsed '6' d' -> Parsed 6 d' Parsed '7' d' -> Parsed 7 d' Parsed '8' d' -> Parsed 8 d' Parsed '9' d' -> Parsed 9 d' _ -> NoParseArithRecurse.hs Скачать
-- Recursive descent parser for trivial arithmetic language -- supporting only addition and multiplication operators. module ArithRecurse where data Result v = Parsed v String | NoParse -- Evaluate an expression and return the unpackaged result, -- ignoring any unparsed remainder. eval s = case pAdditive s of Parsed v rem -> v _ -> error "Parse error" -- Parse an additive-precedence expression pAdditive :: String -> Result Int pAdditive s = alt1 where -- Additive <- Multitive '+' Additive alt1 = case pMultitive s of Parsed vleft s' -> case s' of ('+':s'') -> case pAdditive s'' of Parsed vright s''' -> Parsed (vleft + vright) s''' _ -> alt2 _ -> alt2 _ -> alt2 -- Additive <- Multitive alt2 = case pMultitive s of Parsed v s' -> Parsed v s' NoParse -> NoParse -- Parse a multiplicative-precedence expression pMultitive :: String -> Result Int pMultitive s = alt1 where -- Multitive <- Primary '*' Multitive alt1 = case pPrimary s of Parsed vleft s' -> case s' of ('*':s'') -> case pMultitive s'' of Parsed vright s''' -> Parsed (vleft * vright) s''' _ -> alt2 _ -> alt2 _ -> alt2 -- Multitive <- Primary alt2 = case pPrimary s of Parsed v s' -> Parsed v s' NoParse -> NoParse -- Parse a primary expression pPrimary :: String -> Result Int pPrimary s = alt1 where -- Primary <- '(' Additive ')' alt1 = case s of ('(':s') -> case pAdditive s' of Parsed v s'' -> case s'' of (')':s''') -> Parsed v s''' _ -> alt2 _ -> alt2 _ -> alt2 -- Primary <- Decimal alt2 = case pDecimal s of Parsed v s' -> Parsed v s' NoParse -> NoParse -- Parse a decimal digit pDecimal :: String -> Result Int pDecimal ('0':cs) = Parsed 0 cs pDecimal ('1':cs) = Parsed 1 cs pDecimal ('2':cs) = Parsed 2 cs pDecimal ('3':cs) = Parsed 3 cs pDecimal ('4':cs) = Parsed 4 cs pDecimal ('5':cs) = Parsed 5 cs pDecimal ('6':cs) = Parsed 6 cs pDecimal ('7':cs) = Parsed 7 cs pDecimal ('8':cs) = Parsed 8 cs pDecimal ('9':cs) = Parsed 9 cs pDecimal _ = NoParseMain.hs Скачать
module Main where import IO import System import Pappy import ReadGrammar import ReduceGrammar import SimplifyGrammar import MemoAnalysis import WriteParser processArgs = do args <- getArgs case args of [inname] -> return (inname, mkoutname inname) [inname, outname] -> return (inname, outname) _ -> usage where mkoutname inname = case reverse inname of 'y':'p':'p':'a':'p':'.':inr -> reverse inr ++ ".hs" _ -> inname ++ ".hs" usage = fail "Usage: pappy <infile> [<outfile>]" processFile inname outname = do (n, c, t, g) <- pappyParseFile inname putStrLn ("Original grammar: " ++ show (length g) ++ " size " ++ show (sizeofNonterminals g)) --putStrLn (showNonterminals g) let (n', c', t', g') = reduceGrammar (n, c, t, g) putStrLn ("Reduced grammar: " ++ show (length g') ++ " size " ++ show (sizeofNonterminals g')) --putStrLn (showNonterminals g') let (n'', c'', t'', g'') = simplifyGrammar (n', c', t', g') putStrLn ("Simplified grammar: " ++ show (length g'') ++ " size " ++ show (sizeofNonterminals g'')) --putStrLn (showNonterminals g'') let m'' = memoAnalysis g'' putStrLn ("Memoized: " ++ show (length m'')) let parser = writeParser m'' (n'', c'', t'', g'') writeFile outname parser main = do (inname, outname) <- processArgs processFile inname outnameWriteParser.hs Скачать
module WriteParser where import Char import Pappy -- Generate the source code for a packrat parser writeParser :: [Identifier] -> Grammar -> String writeParser memos (parsername, topcode, topnts, nonterms) = "-- Parser generated by Pappy - do not edit\n" ++ "module " ++ upcase parsername ++ " where\n\n" ++ "import Pos\n" ++ "import Parse\n" ++ topcode ++ "\n" ++ tie2level ++ -- tie1level or tie2level patpfuncs 0 nonterms -- patpfuncs or monadpfuncs where -- Names of identifiers used throughout the parser derivstype = upcase parsername ++ "Derivs" pfuncname = locase parsername ++ "Parse" dfuncname = locase parsername ++ "Derivs" dvname n = locase parsername ++ n pname n = if n `elem` memos then locase parsername ++ "Parse" ++ n else dvname n -- Build an expression to reference nonterminal accessor n in d dvexp n d = dvname n ++ " " ++ d -- Names of temporaries used in the parsing functions --valname i = "pappyVal" ++ show i --dvsname i = "pappyDvs" ++ show i --errname i = "pappyErr" ++ show i valname i = "v" ++ show i -- easier for debugging dvsname i = "d" ++ show i errname i = "e" ++ show i ---------- Single-level parser generation tie1level = derivsdecl ++ derivsinstance ++ parsefunc ++ derivsfunc 0 where -- Generate the Derivs type declaration derivsdecl = "data " ++ derivstype ++ " = " ++ derivstype ++ " {\n" ++ components nonterms ++ "\t\t" ++ dvname "Char" ++ " :: Result " ++ derivstype ++ " Char,\n" ++ "\t\t" ++ dvname "Pos" ++ " :: Pos\n" ++ "\t}\n" ++ "\n" where components ((name, typ, rule) : rest) | name `elem` memos = "\t\t" ++ dvname name ++ " :: Result " ++ derivstype ++ " (" ++ typ ++ "),\n" ++ components rest components(_ : rest) = components rest components [] = "" -- Generate the "instance Derivs" declaration derivsinstance = "instance Derivs " ++ derivstype ++ " where\n" ++ "\tdvChar d = " ++ dvname "Char" ++ " d\n" ++ "\tdvPos d = " ++ dvname "Pos" ++ " d\n" ++ "\n" -- Generate the top-level parse function parsefunc = pfuncname ++ " :: String -> String -> " ++ derivstype ++ "\n" ++ pfuncname ++ " name text = " ++ dfuncname ++ " (Pos name 1 1) text\n\n" -- Generate the derivs-creation function derivsfunc i = ind i ++ dfuncname ++ " :: Pos -> String -> " ++ derivstype ++ "\n" ++ ind i ++ dfuncname ++ " pos text = dvs where\n" ++ ind (i+1) ++ "dvs = " ++ derivstype ++ "\n" ++ parsercalls nonterms ++ ind (i+2) ++ "chr pos\n" ++ ind (i+1) ++ "chr = case text of\n" ++ ind (i+2) ++ "[] -> NoParse (eofError dvs)\n" ++ ind (i+2) ++ "(c:cs) -> Parsed c (" ++ dfuncname ++ " (nextPos pos c) cs) (nullError dvs)\n\n" where parsercalls ((name, typ, rule) : rest) | name `elem` memos = ind (i+2) ++ "(" ++ pname name ++ " dvs)\n" ++ parsercalls rest parsercalls (_ : rest) = parsercalls rest parsercalls [] = "" ---------- Two-level parser generation tie2level = derivsdecl ++ derivsinstance ++ derivs2decls 0 ++ aliases 0 nonterms ++ "\n" ++ parsefunc ++ derivsfunc 0 ++ dsubfuncs 0 where -- Total number of nonterminals in the grammar nmemos :: Int nmemos = length memos -- The number of sub-derivation structures to use nsubderivs :: Int nsubderivs = round (sqrt (fromInteger (toEnum nmemos)) / 2.0) -- Find the appropriate sub-derivation structure -- for a particular nonterminal index derivsofk :: Int -> Int derivsofk k = (k * nsubderivs) `div` nmemos -- Find the appropriate sub-derivation structure -- for a particular nonterminal name derivsof :: Identifier -> Int derivsof n = scan 0 nonterms where scan k ((n', t', r') : nts) | n' `elem` memos = if n' == n then derivsofk k else scan (k+1) nts scan k (_ : nts) = scan k nts scan k [] = error "derivsof: oops!" -- Names for types and identifiers relating to the derivs structures derivstype = upcase parsername ++ "Derivs" derivs2type j = upcase parsername ++ "Derivs" ++ show j dsubfuncname j = locase parsername ++ "Derivs" ++ show j dvsub j = locase parsername ++ "Sub" ++ show j dvsubname n = locase parsername ++ "Sub" ++ n -- Generate the top-level Derivs type declaration derivsdecl = "data " ++ derivstype ++ " = " ++ derivstype ++ " {\n" ++ components 0 ++ "\t" ++ dvname "Char" ++ " :: Result " ++ derivstype ++ " Char,\n" ++ "\t" ++ dvname "Pos" ++ " :: Pos\n" ++ "}\n\n" where components j | j < nsubderivs = "\t" ++ dvsub j ++ " :: " ++ derivs2type j ++ ",\n" ++ components (j+1) components _ = [] -- Generate the "instance Derivs" declaration derivsinstance = "instance Derivs " ++ derivstype ++ " where\n" ++ "\tdvChar d = " ++ dvname "Char" ++ " d\n" ++ "\tdvPos d = " ++ dvname "Pos" ++ " d\n" ++ "\n" -- Generate the second-level Derivs type declarations derivs2decls j | j < nsubderivs = "data " ++ derivs2type j ++ " = " ++ derivs2type j ++ " {" ++ components True j 0 nonterms ++ "}\n\n" ++ derivs2decls (j+1) where components first j k ((name, typ, rule) : rest) | name `elem` memos && derivsofk k == j = (if first then "\n\t" else ",\n\t") ++ dvsubname name ++ " :: Result " ++ derivstype ++ " (" ++ typ ++ ")" ++ components False j (k+1) rest components first j k ((name, typ, rule) : rest) | name `elem` memos = components first j (k+1) rest components first j k (_ : rest) = components first j k rest components first j k [] = "" derivs2decls _ = "" -- Generate the second-level accessor aliases aliases k ((n, t, r) : nts) | n `elem` memos = dvname n ++ " = " ++ dvsubname n ++ " . " ++ dvsub (derivsofk k) ++ "\n" ++ aliases (k+1) nts aliases k (_ : nts) = aliases k nts aliases k [] = "" -- Generate the top-level parse function parsefunc = pfuncname ++ " :: String -> String -> " ++ derivstype ++ "\n" ++ pfuncname ++ " name text = " ++ dfuncname ++ " (Pos name 1 1) text\n\n" -- Generate the top-level derivs-creation function derivsfunc i = ind i ++ dfuncname ++ " :: Pos -> String -> " ++ derivstype ++ "\n" ++ ind i ++ dfuncname ++ " pos text = dvs where\n" ++ ind (i+1) ++ "dvs = " ++ derivstype ++ "\n" ++ subcalls 0 ++ ind (i+2) ++ "chr pos\n" ++ ind (i+1) ++ "chr = case text of\n" ++ ind (i+2) ++ "[] -> NoParse (eofError dvs)\n" ++ ind (i+2) ++ "(c:cs) -> Parsed c (" ++ dfuncname ++ " (nextPos pos c) cs) (nullError dvs)\n\n" where subcalls j | j < nsubderivs = ind (i+2) ++ "(" ++ dsubfuncname j ++ " dvs)\n" ++ subcalls (j+1) subcalls _ = "" -- Generate the second-level derivs creation functions. -- Note: GHC's evaluation model makes it very important -- that these actually be separate functions! dsubfuncs j | j < nsubderivs = dsubfuncname j ++ " dvs = " ++ derivs2type j ++ "\n" ++ parsercalls j 0 nonterms ++ "\n" ++ dsubfuncs (j+1) where parsercalls j k ((name, typ, rule) : rest) | name `elem` memos = if derivsofk k == j then "\t(" ++ pname name ++ " dvs)\n" ++ parsercalls j (k+1) rest else parsercalls j (k+1) rest parsercalls j k (_ : rest) = parsercalls j k rest parsercalls j k [] = "" dsubfuncs _ = "" ---------- Monadic parsing function code generation monadpfuncs i [] = "" monadpfuncs i ((name, typ, rule) : rest) = ind i ++ pname name ++ " :: " ++ derivstype ++ " -> Result " ++ derivstype ++ " (" ++ typ ++ ")\n" ++ ind i ++ "Parser " ++ pname name ++ " =\n" ++ prule (i+1) rule ++ "\n" ++ monadpfuncs i rest where -- Generate the code for a parse rule. prule i (RulePrim n) = ind i ++ "(Parser " ++ dvname n ++ ")\n" prule i (RuleChar c) = ind i ++ "(char " ++ show c ++ ")\n" prule i (RuleString s) = ind i ++ "(string " ++ show s ++ ")\n" prule i (RuleSeq matches prod) = ind i ++ "(do\n" ++ pseq matches where -- Match a rule without binding its semantic value pseq (MatchAnon r : ms) = prule (i+1) r ++ pseq ms -- Match a rule and give its semantic value the name 'id' pseq (MatchName r id : ms) = ind (i+1) ++ id ++ " <-\n" ++ prule (i+2) r ++ pseq ms -- Match the semantic value of a rule -- against a Haskell pattern pseq (MatchPat r p : ms) = ind (i+1) ++ p ++ " <-\n" ++ prule (i+2) r ++ pseq ms -- Match the semantic value of a rule against a string: -- typically used for keywords, operators, etc. pseq (MatchString r s : ms) = ind (i+1) ++ "satisfy\n" ++ prule (i+2) r ++ ind (i+2) ++ "((==) " ++ show s ++ ")\n" ++ ind (i+2) ++ "<?!> " ++ show (show s) ++ "\n" ++ pseq ms -- and-followed-by syntactic predicate pseq (MatchAnd r : ms) = ind (i+1) ++ "followedBy\n" ++ prule (i+2) r ++ pseq ms -- not-followed-by syntactic predicate pseq (MatchNot r : ms) = ind (i+1) ++ "notFollowedBy\n" ++ prule (i+2) r ++ pseq ms -- semantic predicate pseq (MatchPred pred : ms) = ind (i+1) ++ "if not (" ++ pred ++ ") then fail " ++ show ("failed predicate " ++ show pred) ++ " else return ()\n" ++ pseq ms -- end of sequence pseq [] = ind (i+1) ++ "return " ++ code prod ++ ")\n" where code (ProdName id) = id code (ProdCode c) = "(" ++ c ++ ")" prule i (RuleAlt []) = error "prule: no alternatives!" prule i (RuleAlt [r]) = prule i r prule i (RuleAlt (r : rs)) = ind i ++ "(\n" ++ prule (i+1) r ++ concat (map (\r -> ind i ++ " <|>\n" ++ prule (i+1) r) rs) ++ ind i ++ " )\n" prule i (RuleOpt r) = ind i ++ "optional\n" ++ prule (i+1) r prule i (RuleStar r) = ind i ++ "many\n" ++ prule (i+1) r prule i (RulePlus r) = ind i ++ "many1\n" ++ prule (i+1) r ---------- Pattern-matching parsing function code generation patpfuncs i [] = "" patpfuncs i ((name, typ, rule) : rest) = ind i ++ pname name ++ " :: " ++ derivstype ++ " -> Result " ++ derivstype ++ " (" ++ typ ++ ")\n" ++ ind i ++ pname name ++ " d =\n" ++ prule (i+1) "d" [] (\e -> "NoParse " ++ e) rule ++ "\n" ++ patpfuncs i rest where -- Generate the code for a parse rule, -- directly generating a result on success -- but using a continutation to produce failure results. -- The failure continuation may be called multiple times. -- prule indent derivs-name error-names fail-cont rule prule i d es f (RulePrim n) = ind i ++ "case " ++ dvexp n d ++ " of\n" ++ ind (i+1) ++ "Parsed " ++ v ++ " " ++ d' ++ " " ++ e' ++ " -> Parsed " ++ v ++ " " ++ d' ++ " " ++ join d es' ++ "\n" ++ ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++ f (join d es') ++ "\n" where v = valname i d' = dvsname i e' = errname i es' = e' : es prule i d es f (RuleChar c) = ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++ ind (i+1) ++ "r @ (Parsed " ++ show c ++ " _ _) -> r\n" ++ ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n" prule i d es f (RuleSeq [MatchName r n] (ProdName n')) | n' == n = prule i d es f r prule i d es f (RuleSeq matches prod) = pseq i d es matches where -- Special handling for characters in sequences, -- preventing unnecessary generation of intermediates. pseq i d es (MatchAnon (RuleChar c) : ms) = ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++ ind (i+1) ++ "Parsed " ++ show c ++ " " ++ d' ++ " _ ->\n" ++ pseq (i+2) d' es ms ++ ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n" where d' = dvsname i -- Match a rule without binding its semantic value pseq i d es (MatchAnon r : ms) = pseq i d es (MatchName r "_" : ms) -- Match a rule and give its semantic value the name 'id' pseq i d es (MatchName r id : ms) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "Parsed " ++ id ++ " " ++ d' ++ " " ++ e' ++ " ->\n" ++ pseq (i+2) d' (e':es) ms ++ ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++ f (join d (e':es)) ++ "\n" ++ prexpdef i d r where d' = dvsname i e' = errname i -- Match the semantic value of a rule -- against a Haskell pattern pseq i d es (MatchPat r p : ms) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "Parsed (" ++ p ++ ") " ++ d' ++ " " ++ e' ++ " ->\n" ++ pseq (i+2) d' (e':es) ms ++ ind (i+1) ++ "Parsed _ _ _ -> " ++ f (join d es) ++ "\n" ++ ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++ f (join d (e':es)) ++ "\n" ++ prexpdef i d r where d' = dvsname i e' = errname i -- match the semantic value of a rule against a string: -- typically used for keywords, operators, etc. pseq i d es (MatchString r s : ms) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "Parsed " ++ show s ++ " " ++ d' ++ " " ++ e' ++ " ->\n" ++ pseq (i+2) d' (e':es) ms ++ ind (i+1) ++ "_ -> " ++ f (join d (exp:es)) ++ "\n" ++ prexpdef i d r where d' = dvsname i e' = errname i exp = "(ParseError (" ++ dvexp "Pos" d ++ ") [Expected " ++ show s ++ "])" -- and-followed-by syntactic predicate pseq i d es (MatchAnd r : ms) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "Parsed _ _ " ++ e' ++ " ->\n" ++ pseq (i+2) d (e':es) ms ++ ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++ f (join d (e':es)) ++ "\n" ++ prexpdef i d r where e' = errname i -- not-followed-by syntactic predicate pseq i d es (MatchNot r : ms) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "NoParse " ++ e' ++ " ->\n" ++ pseq (i+2) d (e':es) ms ++ ind (i+1) ++ "Parsed _ _ " ++ e' ++ " -> " ++ f (join d (e':es)) ++ "\n" ++ prexpdef i d r where e' = errname i -- semantic predicate pseq i d es (MatchPred pred : ms) = ind i ++ "case (" ++ pred ++ ") of\n" ++ ind (i+1) ++ "True ->\n" ++ pseq (i+2) d es ms ++ ind (i+1) ++ "False -> " ++ f (join d es) ++ "\n" -- end of sequence pseq i d es [] = ind i ++ "Parsed (" ++ code prod ++ ") " ++ d ++ " " ++ join d es ++ "\n" where code (ProdName id) = id code (ProdCode c) = c prule i d es f (RuleAlt []) = error "prule: no alternatives!" prule i d es f (RuleAlt [r]) = prule i d es f r prule i d es f (RuleAlt rs) = ind i ++ alt 1 (join d es) ++ " where\n" ++ ralt 1 rs where ralt j (r:rs) = ind (i+1) ++ alt j e' ++ " =\n" ++ prule (i+2) d [e'] (\e'' -> alt (j+1) e'') r ++ ralt (j+1) rs ralt j [] = ind (i+1) ++ alt j e' ++ " = " ++ f e' ++ "\n" alt j e = "pappyAlt" ++ show i ++ "_" ++ show j ++ " " ++ e e' = errname i prule i d es f (RuleOpt r) = ind i ++ "case " ++ prexp i d r ++ " of\n" ++ ind (i+1) ++ "Parsed " ++ v ++ " " ++ d' ++ " " ++ e' ++ " -> " ++ "Parsed (Just " ++ v ++ ") " ++ d' ++ " " ++ join d es' ++ "\n" ++ ind (i+1) ++ "NoParse " ++ e' ++ " -> " ++ "Parsed (Nothing) " ++ d ++ " " ++ join d es' ++ "\n" ++ prexpdef i d r where v = valname i d' = dvsname i e' = errname i es' = e' : es -- Special low-level rule for single-choice alternation on characters. -- The left factoring code in the simplifier produces these. prule i d es f (RuleSwitchChar alts dfl) = ind i ++ "case " ++ dvexp "Char" d ++ " of\n" ++ cases alts ++ deflt dfl where cases ((c, r) : crs) = ind (i+1) ++ "Parsed " ++ show c ++ " " ++ d' ++ " _ ->\n" ++ prule (i+2) d' es f r ++ cases crs where d' = dvsname i cases [] = "" deflt (Just r) = ind (i+1) ++ "_ ->\n" ++ prule (i+2) d es f r deflt (Nothing) = ind (i+1) ++ "_ -> " ++ f (join d es) ++ "\n" -- Override any errors produced by the associated rule -- with a given set of Expected strings. prule i d es f (RuleExpect r (s:ss)) = prule i d es f' r where f' _ = "ParseError (" ++ dvexp "Pos" d ++ ") [Expected " ++ show (show s) ++ concat (map (\s -> ", Expected " ++ show (show s)) ss) ++ "]\n" -- Produce a short expression for a rule. -- Any associated definition is produced by prexpdef below. prexp i d (RulePrim n) = dvexp n d prexp i d r = prexpname i prexpdef i d (RulePrim n) = "" prexpdef i d r = ind i ++ "where\n" ++ ind (i+1) ++ prexpname i ++ " =\n" ++ prule (i+2) d [] (\e -> "NoParse " ++ e) r prexpname i = "pappyResult" ++ show i -- Helper: generate expression to join a list of errors join d [] = "(ParseError (" ++ dvexp "Pos" d ++ ") [])" join d [e] = e join d [e,e'] = "(max " ++ e ++ " " ++ e' ++ ")" join d (e:es) = "(maximum [" ++ e ++ concat (map (\e -> "," ++ e) es) ++ "])" -- Helper: generate variable indentation for rule code ind i = replicate (i*2) ' ' -- Convert the first letter of a string to uppercase or lowercase, -- to produce Haskell type/constructor names or identifiers. upcase (c:cs) = toUpper c : cs locase (c:cs) = toLower c : csMemoAnalysis.hs Скачать
module MemoAnalysis (memoAnalysis) where import List import Pappy -- Cost metric for a rule as used below, -- Infinite if a cycle is detected. data Cost = Cost Int | Infinite instance Eq Cost where Cost c1 == Cost c2 = c1 == c2 Infinite == Infinite = True _ == _ = False instance Ord Cost where Cost c1 <= Cost c2 = c1 <= c2 Infinite <= Infinite = True Cost c1 <= Infinite = True Infinite <= Cost c2 = False instance Show Cost where show (Cost c) = show c show Infinite = "<infinite>" instance Num Cost where Cost c1 + Cost c2 = Cost (c1 + c2) Infinite + _ = Infinite _ + Infinite = Infinite c1 * c2 = error "no multiplication for Cost" abs c = error "no 'abs' for Cost" signum c = error "no 'signum' for Cost" fromInteger i = Cost (fromInteger i) -- Produce a list of the nonterminals in a grammar to be memoized, -- by analyzing the size and recursion structure of the grammar. -- What we are doing is essentially "virtual inlining": -- finding small- to medium-size definitions that we _could_ inline directly, -- but do not want to because of the code expansion they would cause, -- and instead turn those definitions into simple Haskell functions. -- Since every circularity in the grammar is still broken by memoization, -- this transformation does not break the linear parse time guarantee. memoAnalysis :: [Nonterminal] -> [Identifier] memoAnalysis nts = iter [] where -- Iterate over the grammar, -- at each step finding the smallest nonterminal that can be inlined, -- and adding that to our list of "virtual inline" nonterminals. -- We have to take the list into account in subsequent iterations -- in order to ensure that cylces stay broken, -- and to take virtual inlining into account in our size metrics: -- the size of the definition for a virtual-inlined nonterminal -- counts towards the sizes of rules that reference it. iter :: [Nonterminal] -> [Identifier] iter vnts = case sel vnts Nothing nts of Just vnt -> iter (vnt : vnts) Nothing -> memos nts where memos ((n, t, r) : nts) = case findnt n vnts of Just _ -> memos nts Nothing -> n : memos nts memos [] = [] -- Find the smallest nonterminal that can be inlined. sel :: [Nonterminal] -> Maybe (Nonterminal, Cost) -> [Nonterminal] -> Maybe Nonterminal sel vnts (Just (nt, c)) ((nt' @ (n', t', r')) : nts) = if use vnts c c' n' r' then sel vnts (Just (nt', c')) nts else sel vnts (Just (nt, c)) nts where c' = costrule vnts [n'] r' sel vnts (Nothing) ((nt' @ (n', t', r')) : nts) = if use vnts Infinite c' n' r' then sel vnts (Just (nt', c')) nts else sel vnts (Nothing) nts where c' = costrule vnts [n'] r' sel vnts (Just (nt, c)) [] = Just nt sel vnts (Nothing) [] = Nothing -- Taking a particular nonterminal as a candidate, -- determine if we should use it (possibly over an existing candidate). -- Only select nonterminals that aren't already in vnts. use vnts curcost newcost n r = newcost < curcost && newcost < 25 && notfound where notfound = case findnt n vnts of Just _ -> False Nothing -> True -- Scan a rule and all virtually-inlined rules it refers to, -- calculating the total cost of the rule, and making sure that -- we do not hit a nonterminal we've already visited. -- ns: nonterminals we have already visited in our search. costrule :: [Nonterminal] -> [Identifier] -> Rule -> Cost costrule vnts ns (RulePrim n) = if n `elem` ns then Infinite else case findnt n vnts of Just (_, _, r) -> 1 + costrule vnts (n:ns) r Nothing -> Cost 1 -- not virtually-inlined (yet) costrule vnts ns (RuleChar c) = 1 costrule vnts ns (RuleSeq ms p) = 1 + seq ms where seq (MatchAnon r : ms) = costrule vnts ns r + seq ms seq (MatchName r id : ms) = costrule vnts ns r + seq ms seq (MatchPat r p : ms) = costrule vnts ns r + seq ms seq (MatchString r s : ms) = costrule vnts ns r + seq ms seq (MatchAnd r : ms) = costrule vnts ns r + seq ms seq (MatchNot r : ms) = costrule vnts ns r + seq ms seq (MatchPred c : ms) = 1 seq [] = 0 costrule vnts ns (RuleAlt rs) = 1 + alts rs where alts (r : rs) = costrule vnts ns r + alts rs alts [] = 0 costrule vnts ns (RuleOpt r) = costrule vnts ns r costrule vnts ns (RuleString s) = 1 costrule vnts ns (RuleStar r) = costrule vnts ns r costrule vnts ns (RulePlus r) = costrule vnts ns r costrule vnts ns (RuleExpect r ss) = costrule vnts ns r costrule vnts ns (RuleSwitchChar crs dfl) = 1 + cases crs + defl dfl where cases ((c, r) : crs) = costrule vnts ns r + cases crs cases [] = 0 defl (Just r) = costrule vnts ns r defl (Nothing) = 0 -- Find a named nonterminal in a list. findnt :: Identifier -> [Nonterminal] -> Maybe Nonterminal findnt n ((nt @ (n', t', r')) : nts) | n' == n = Just nt findnt n (nt : nts) = findnt n nts findnt n [] = NothingSimplifyGrammar.hs Скачать
module SimplifyGrammar (simplifyGrammar) where import Pappy -- Rewrite composite and simple left recursive rules simplifyGrammar :: Grammar -> Grammar simplifyGrammar (parsername, topcode, topnts, nonterms) = (parsername, topcode, topnts, iteropt topnts nonterms) -- Iterate the optimizations until no more simplifications are possible, -- since eliminating one nonterminal may make others reducible. -- Be careful not to eliminate any nonterminals listed in 'tops'. iteropt :: [Identifier] -> [Nonterminal] -> [Nonterminal] iteropt tops nts = nts'''' where (nts', peepholed) = peephole nts (nts'', collapsed) = collapse tops nts' (nts''', inlined) = inline tops nts'' nts'''' = if peepholed || collapsed || inlined then iteropt tops nts''' else nts''' -- Simple peephole grammar optimizer, -- implementing various local simplifications. peephole :: [Nonterminal] -> ([Nonterminal], Bool) peephole g = nonterms g where nonterms ((n, t, r) : g) = ((n, t, r') : g', sr || sg) where (r', sr) = rule r (g', sg) = nonterms g nonterms [] = ([], False) ---------- Simplify a rule rule :: Rule -> (Rule, Bool) rule (r @ (RulePrim n)) = (r, False) rule (r @ (RuleChar c)) = (r, False) rule (r @ (RuleString s)) = (r, False) -- Eliminate useless sequencing rule (RuleSeq [MatchName r id] (ProdName id')) | id' == id = (r', True) where (r', _) = rule r rule (RuleSeq ms p) = (RuleSeq ms' p, sms) where (ms', sms) = seq ms -- Eliminate degenerate alternation rule (RuleAlt [r]) = (r', True) where (r', _) = rule r -- Flatten nested alternation operators rule (RuleAlt (RuleAlt rs1 : rs2)) = (r'', True) where r' = RuleAlt (rs1 ++ rs2) (r'', _) = rule r' rule (RuleAlt rs) = (RuleAlt rs', srs) where (rs', srs) = alts rs rule (RuleOpt r) = (RuleOpt r', sr) where (r', sr) = rule r rule (RuleStar r) = (RuleStar r', sr) where (r', sr) = rule r rule (RulePlus r) = (RulePlus r', sr) where (r', sr) = rule r rule (RuleSwitchChar crs dfl) = cases False [] crs where cases b ncrs ((c, r) : crs) = cases (b' || b) ((c, r') : ncrs) crs where (r', b') = rule r cases b ncrs [] = dodfl b (reverse ncrs) dfl dodfl b ncrs (Just r) = (RuleSwitchChar ncrs (Just r'), b || b') where (r', b') = rule r dodfl b ncrs (Nothing) = (RuleSwitchChar ncrs Nothing, b) ---------- Simplify a list of alternatives alts :: [Rule] -> ([Rule], Bool) -- Left-factor consecutive alternatives starting with characters, -- producing an efficiently-implementable SwitchChar rule. alts (rs @ (RuleSeq (MatchAnon (RuleChar _) : _) _ : RuleSeq (MatchAnon (RuleChar _) : _) _ : _)) = mksw [] rs where -- Collect alternatives starting with a character as cases. mksw cases (RuleSeq (MatchAnon (RuleChar c) : ms) p : rs) = mksw (addcase c (RuleSeq ms p) cases) rs -- If we hit an empty alternative, it always matches, -- so we can use it as the default case and ignore any others. mksw cases (RuleSeq [] p : _) = (rs'', True) where rs' = [RuleSwitchChar cases (Just (RuleSeq [] p))] (rs'', _) = alts rs' -- If we hit some other kind of alternative, -- we have to stop there and keep the rest separate. mksw cases rs = (rs'', True) where rs' = RuleSwitchChar cases Nothing : rs (rs'', _) = alts rs' -- Add an alternative to a list of cases, merging appropriately addcase c r [] = [(c, r)] addcase c r ((c', RuleAlt rs) : crs) | c == c' = (c, RuleAlt (rs ++ [r])) : crs addcase c r ((c', r') : crs) | c == c' = (c, RuleAlt [r', r]) : crs addcase c r ((c', r') : crs) = (c', r') : addcase c r crs -- Special degenerate case of character left-factoring. alts [RuleSeq (MatchAnon (RuleChar c) : ms) p, RuleSeq [] p2] = ([RuleSwitchChar [(c, r1')] (Just r2')], True) where (r1', _) = rule (RuleSeq ms p) (r2', _) = rule (RuleSeq [] p2) alts (r : rs) = (r' : rs', sr || srs) where (r', sr) = rule r (rs', srs) = alts rs alts [] = ([], False) ---------- Simplify a list of matches in a sequence seq :: [Match] -> ([Match], Bool) -- Flatten nested sequencing operators if possible seq (MatchAnon (RuleSeq ms1 _) : ms2) = (ms', True) where (ms', _) = seq (ms1 ++ ms2) seq ((m @ (MatchName (RuleSeq ms1 (ProdName idi)) ido)) : ms2) = flattenseq m ms1 idi (\r -> MatchName r ido) ms2 seq ((m @ (MatchPat (RuleSeq ms1 (ProdName idi)) p)) : ms2) = flattenseq m ms1 idi (\r -> MatchPat r p) ms2 seq ((m @ (MatchString (RuleSeq ms1 (ProdName idi)) s)) : ms2) = flattenseq m ms1 idi (\r -> MatchString r s) ms2 -- -- merge anonymous character and string sequences -- seq (MatchAnon (RuleChar c1) : MatchAnon (RuleChar c2) : ms) = -- (ms'', True) where -- ms' = MatchAnon (RuleString [c1, c2]) : ms -- (ms'', _) = seq ms' -- -- seq (MatchAnon (RuleChar c1) : MatchAnon (RuleString s2) : ms) = -- (ms'', True) where -- ms' = MatchAnon (RuleString (c1 : s2)) : ms -- (ms'', _) = seq ms' -- -- seq (MatchAnon (RuleString s1) : MatchAnon (RuleChar c2) : ms) = -- (ms'', True) where -- ms' = MatchAnon (RuleString (s1 ++ [c2])) : ms -- (ms'', _) = seq ms' -- -- seq (MatchAnon (RuleString s1) : MatchAnon (RuleString s2) : ms) = -- (ms'', True) where -- ms' = MatchAnon (RuleString (s1 ++ s2)) : ms -- (ms'', _) = seq ms' seq (m : ms) = (m' : ms', sm || sms) where (m', sm) = match m (ms', sms) = seq ms seq [] = ([], False) -- Helper for sequence flattening code above flattenseq m ms1 idi f ms2 = case simpidi ms1 of True -> (ms', True) where (ms', _) = seq (rebind ms1 ++ ms2) False -> (m' : ms2', sm || sms2) where (m', sm) = match m (ms2', sms2) = seq ms2 where -- Make sure idi is bound by a simple MatchName, -- rather than within a Haskell pattern. simpidi [] = False simpidi (MatchName d id' : ms) | id' == idi = True simpidi (m : ms) = simpidi ms -- Convert the matcher for idi into the appropriate substitute rebind [] = error "flattenseq/rebind - oops!" rebind (MatchName r id' : ms) | id' == idi = f r : ms rebind (m : ms) = m : rebind ms ---------- Simplify a single match in a sequence match :: Match -> (Match, Bool) match (MatchAnon r) = (MatchAnon r', sr) where (r', sr) = rule r match (MatchName r id) = (MatchName r' id, sr) where (r', sr) = rule r match (MatchPat r p) = (MatchPat r' p, sr) where (r', sr) = rule r match (MatchString r s) = (MatchString r' s, sr) where (r', sr) = rule r match (MatchAnd r) = (MatchAnd r', sr) where (r', sr) = rule r match (MatchNot r) = (MatchNot r', sr) where (r', sr) = rule r match (r @ (MatchPred p)) = (r, False) -- Search for and eliminate nonterminals in a grammar -- that are structurally equivalent to some other nonterminal. -- This optimization is fairly straightforward and stupid: -- e.g., it doesn't catch structurally equivalent -- mutually recursive _groups_ of nonterminals - -- but I have no reason to believe such groups are likely to be common. -- Most of the duplicates this pass finds -- are star- and plus-rules generated above, -- but we run the optimization over the whole grammar anyway -- just in case the original grammar has some duplicates we can eliminate. collapse :: [Identifier] -> [Nonterminal] -> ([Nonterminal], Bool) collapse tops g = scandups [] (\n -> n) g where -- Scan a grammar for duplicates to eliminate. -- ns is the list of nonterminals we can eliminate, -- and f is a function to rename those nonterminals correctly. -- We will actually eliminate those nonterminals all at once later. -- Always rename to the _last_ duplicate in the grammar, -- to make sure it won't also go away if there are several. scandups ns f ((n, t, r) : g) = if n `elem` tops then scandups ns f g else case scandupof n t r f (reverse g) of Just n' -> scandups (n:ns) f' g where f' nx = if nx == n then n' else f nx Nothing -> scandups ns f g scandups ns f [] = case ns of [] -> (g, False) _ -> (rebuild ns f g, True) -- Scan for a duplicate of a particular nonterminal. scandupof n t r f ((n', t', r') : g) = if n' == n then error "shouldn't have found myself!" else if t' == t && equivRule f' r r' then Just n' else scandupof n t r f g where f' nx = if nx == n then n' else f nx scandupof n t r f [] = Nothing -- Check two rules for structural equivalence -- under the nonterminal renaming function f. equivRule f (RulePrim n1) (RulePrim n2) = f n1 == f n2 equivRule f (RuleChar c1) (RuleChar c2) = c1 == c2 equivRule f (RuleString s1) (RuleString s2) = s1 == s2 equivRule f (RuleSeq ms1 p1) (RuleSeq ms2 p2) = p1 == p2 && seq ms1 ms2 where seq (m1 : ms1) (m2 : ms2) = equivMatch f m1 m2 && seq ms1 ms2 seq [] [] = True seq _ _ = False equivRule f (RuleAlt rs1) (RuleAlt rs2) = alts rs1 rs2 where alts (r1 : rs1) (r2 : rs2) = equivRule f r1 r2 && alts rs1 rs2 alts [] [] = True alts _ _ = False equivRule f (RuleOpt r1) (RuleOpt r2) = equivRule f r1 r2 equivRule f (RuleStar r1) (RuleStar r2) = equivRule f r1 r2 equivRule f (RulePlus r1) (RulePlus r2) = equivRule f r1 r2 equivRule f (RuleSwitchChar crs1 dfl1) (RuleSwitchChar crs2 dfl2) = cases crs1 crs2 && eqdefl dfl1 dfl2 where cases ((c1, r1) : crs1) ((c2, r2) : crs2) = c1 == c2 && equivRule f r1 r2 && cases crs1 crs2 cases [] [] = True cases _ _ = False eqdefl (Just r1) (Just r2) = equivRule f r1 r2 eqdefl (Nothing) (Nothing) = True eqdefl _ _ = False equivRule f _ _ = False equivMatch f (MatchAnon r1) (MatchAnon r2) = equivRule f r1 r2 equivMatch f (MatchName r1 id1) (MatchName r2 id2) = id1 == id2 && equivRule f r1 r2 equivMatch f (MatchPat r1 p1) (MatchPat r2 p2) = p1 == p2 && equivRule f r1 r2 equivMatch f (MatchString r1 s1) (MatchString r2 s2) = s1 == s2 && equivRule f r1 r2 equivMatch f (MatchAnd r1) (MatchAnd r2) = equivRule f r1 r2 equivMatch f (MatchNot r1) (MatchNot r2) = equivRule f r1 r2 equivMatch f (MatchPred p1) (MatchPred p2) = p1 == p2 equivMatch f _ _ = False -- Rebuild a grammar, eliminating definitions for nonterminals in 'ns' -- and applying renaming function 'f' to all nonterminal references. rebuild ns f [] = [] rebuild ns f ((n, t, r) : g) = if n `elem` ns then rebuild ns f g else (n, t, r') : rebuild ns f g where r' = rebuildRule f r rebuildRule f (RulePrim n) = RulePrim (f n) rebuildRule f (RuleChar c) = RuleChar c rebuildRule f (RuleString s) = RuleString s rebuildRule f (RuleSeq ms p) = RuleSeq (reseq ms) p where reseq (m : ms) = rebuildMatch f m : reseq ms reseq [] = [] rebuildRule f (RuleAlt rs) = RuleAlt (alts rs) where alts (r : rs) = rebuildRule f r : alts rs alts [] = [] rebuildRule f (RuleOpt r) = RuleOpt (rebuildRule f r) rebuildRule f (RuleStar r) = RuleStar (rebuildRule f r) rebuildRule f (RulePlus r) = RulePlus (rebuildRule f r) rebuildRule f (RuleSwitchChar crs dfl) = RuleSwitchChar (cases crs) (defl dfl) where cases ((c, r) : crs) = (c, rebuildRule f r) : cases crs cases [] = [] defl (Just r) = Just (rebuildRule f r) defl (Nothing) = Nothing rebuildMatch f (MatchAnon r) = MatchAnon (rebuildRule f r) rebuildMatch f (MatchName r id) = MatchName (rebuildRule f r) id rebuildMatch f (MatchPat r p) = MatchPat (rebuildRule f r) p rebuildMatch f (MatchString r s) = MatchString (rebuildRule f r) s rebuildMatch f (MatchAnd r) = MatchAnd (rebuildRule f r) rebuildMatch f (MatchNot r) = MatchNot (rebuildRule f r) rebuildMatch f (MatchPred p) = MatchPred p -- Inline all nonterminals that are only referenced once in a grammar. -- Also eliminates any nonterminals not referenced anywhere -- except perhaps in their own rule. inline :: [Identifier] -> [Nonterminal] -> ([Nonterminal], Bool) inline tops g = scan g where -- Scan for a nonterminal we can eliminate. scan ((n, t, r) : nts) | n `elem` tops = scan nts scan ((n, t, r) : nts) = if elim then (rebuild n g, True) else scan nts where elim = case refs n g of 0 -> True 1 -> refsRule n r == 0 _ -> refsRule n r == 0 && sizeofRule r <= 2 scan [] = (g, False) -- nothing eliminated -- Count the number of references to a particular nonterminal -- *outside of* that nonterminal's own definition. refs n [] = 0 refs n ((n', t', r') : nts) = if (n == n') then refs n nts else refsRule n r' + refs n nts -- Count the number of references to nonterminal 'n' in rule 'r' refsRule :: Identifier -> Rule -> Int refsRule n (RulePrim n') = if n' == n then 1 else 0 refsRule n (RuleChar c) = 0 refsRule n (RuleString s) = 0 refsRule n (RuleSeq ms p) = seq ms where seq (m : ms) = refsMatch n m + seq ms seq [] = 0 refsRule n (RuleAlt rs) = alts rs where alts (r : rs) = refsRule n r + alts rs alts [] = 0 refsRule n (RuleOpt r) = refsRule n r refsRule n (RuleStar r) = refsRule n r refsRule n (RulePlus r) = refsRule n r refsRule n (RuleSwitchChar crs dfl) = cases crs + deflt dfl where cases ((c, r) : crs) = refsRule n r + cases crs cases [] = 0 deflt (Just r) = refsRule n r deflt (Nothing) = 0 refsMatch n (MatchAnon r) = refsRule n r refsMatch n (MatchName r id) = refsRule n r refsMatch n (MatchPat r p) = refsRule n r refsMatch n (MatchString r s) = refsRule n r refsMatch n (MatchAnd r) = refsRule n r refsMatch n (MatchNot r) = refsRule n r refsMatch n (MatchPred p) = 0 -- Rebuild the grammar, eliminating nonterminals in 'ns' -- and applying the inlining function 'f'. rebuild ns [] = [] rebuild ns ((n, t, r) : nts) = if n == ns then rebuild ns nts else (n, t, rebuildRule ns r) : rebuild ns nts rebuildRule ns (RulePrim n) = if n == ns then rebuildRule ns (find n g) else RulePrim n rebuildRule ns (RuleChar c) = RuleChar c rebuildRule ns (RuleString s) = RuleString s rebuildRule ns (RuleSeq ms p) = RuleSeq (reseq ms) p where reseq (m : ms) = rebuildMatch ns m : reseq ms reseq [] = [] rebuildRule ns (RuleAlt rs) = RuleAlt (alts rs) where alts (r : rs) = rebuildRule ns r : alts rs alts [] = [] rebuildRule ns (RuleOpt r) = RuleOpt (rebuildRule ns r) rebuildRule ns (RuleStar r) = RuleStar (rebuildRule ns r) rebuildRule ns (RulePlus r) = RulePlus (rebuildRule ns r) rebuildRule ns (RuleSwitchChar crs dfl) = RuleSwitchChar (cases crs) (deflt dfl) where cases ((c, r) : crs) = (c, rebuildRule ns r) : cases crs cases [] = [] deflt (Just r) = Just (rebuildRule ns r) deflt (Nothing) = Nothing rebuildMatch ns (MatchAnon r) = MatchAnon (rebuildRule ns r) rebuildMatch ns (MatchName r id) = MatchName (rebuildRule ns r) id rebuildMatch ns (MatchPat r p) = MatchPat (rebuildRule ns r) p rebuildMatch ns (MatchString r s) = MatchString (rebuildRule ns r) s rebuildMatch ns (MatchAnd r) = MatchAnd (rebuildRule ns r) rebuildMatch ns (MatchNot r) = MatchNot (rebuildRule ns r) rebuildMatch ns (MatchPred p) = MatchPred p -- Find the definition rule for a particular nonterminal find n ((n', t', r') : nts) = if n' == n then r' else find n nts find n [] = error ("inline: can't find nonterminal " ++ n)ReduceGrammar.hs Скачать
-- Pappy grammar analysis/reduction module: -- -- - checks that all referenced nonterminals have definitions -- - checks that nonterminals are not defined multiple times -- - checks for illegal (e.g., indirect) left recursion -- - rewrites simple left-recursive rules using right recursion -- - rewrites '*' (zero-or-more) and '+' (one-or-more) iteration rules -- module ReduceGrammar (reduceGrammar) where import Pappy -- Rewrite composite and simple left recursive rules reduceGrammar :: Grammar -> Grammar reduceGrammar (parsername, topcode, topnts, nonterms) = g'' where -- First reduce the grammar g' = (parsername, topcode, topnts, reverse (reducents [] nonterms)) -- Then check it for remaining illegal left recursion g'' = case checkLeftRecursion g' of Just e -> error e Nothing -> g' -- Reduce the rules in a grammar and add them to 'ng' reducents ng [] = ng reducents ng ((n, t, r) : g) = if existstnt n ng then error ("Duplicate nonterminal " ++ show n) else reducents ng''' g where -- First rewrite composite constructs in the rule (ng', r') = rerule ng r -- Then eliminate simple left recursion, if any (ng'', r'') = elimleft ng' n t r' -- Add the final reduced nonterminal to the grammar ng''' = (n, t, r'') : ng'' -- Reduce iterative operators in a grammar rule. rerule ng (r @ (RulePrim n)) = if existstnt n nonterms then (ng, r) else error ("Reference to undefined nonterminal " ++ show n) rerule ng (r @ (RuleChar c)) = (ng, r) rerule ng (RuleSeq ms p) = (ng', RuleSeq ms' p) where (ng', ms') = reseq ng ms reseq ng [] = (ng, []) reseq ng (MatchAnon r : ms) = (ng'', MatchAnon r' : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchName r id : ms) = (ng'', MatchName r' id : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchPat r p : ms) = (ng'', MatchPat r' p : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchString r s : ms) = (ng'', MatchString r' s : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchAnd r : ms) = (ng'', MatchAnd r' : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchNot r : ms) = (ng'', MatchNot r' : ms') where (ng', r') = rerule ng r (ng'', ms') = reseq ng' ms reseq ng (MatchPred c : ms) = (ng', MatchPred c : ms') where (ng', ms') = reseq ng ms rerule ng (RuleAlt [alt]) = rerule ng alt rerule ng (RuleAlt alts) = (ng', RuleAlt alts') where (ng', alts') = rerules ng alts rerules ng [] = (ng, []) rerules ng (r : rs) = (ng'', r' : rs') where (ng', r') = rerule ng r (ng'', rs') = rerules ng' rs rerule ng (RuleOpt r) = (ng', RuleOpt r') where (ng', r') = rerule ng r -- Reduce string literals to character sequences rerule ng (RuleString s) = (ng, RuleSeq (matches s) (ProdCode (show s))) where matches (c : cs) = MatchAnon (RuleChar c) : matches cs matches [] = [] -- Reduce zero-or-more (star operator) rules rerule ng (RuleStar r) = (ng'', RulePrim n'') where (ng', r') = rerule ng r n'' = newnt "StarRule" (nonterms ++ ng) t'' = "[" ++ infertype r' ++ "]" r'' = RuleAlt [ RuleSeq [MatchName r' "v", MatchName (RulePrim n'') "vs"] (ProdCode "v : vs"), RuleSeq [] (ProdCode "[]")] ng'' = (n'', t'', r'') : ng' -- reuse existing equivalent iteration nonterminals -- Reduce one-or-more (plus operator) rules rerule ng (RulePlus r) = (ng'', RulePrim n'') where (ng', r') = rerule ng r n'' = newnt "PlusRule" (nonterms ++ ng) t'' = "[" ++ infertype r' ++ "]" r'' = RuleAlt [ RuleSeq [MatchName r' "v", MatchName (RulePrim n'') "vs"] (ProdCode "v : vs"), RuleSeq [MatchName r' "v"] (ProdCode "[v]")] ng'' = (n'', t'', r'') : ng' -- reuse existing equivalent iteration nonterminals -- Eliminate simple left recursion in a grammar rule elimleft ng n t (r @ (RuleAlt alts)) = fix (scan alts) where -- Separate left-recursive alternatives (las) -- from terminating alternatives (tas). scan [] = ([], []) scan ((ra @ (RuleSeq (MatchName (RulePrim n') id' : ms') p)) : ras) = if n' == n then (ra : las, tas) else (las, ra : tas) where (las, tas) = scan ras scan (ra : ras) = (las, ra : tas) where (las, tas) = scan ras -- Trivial case: no left-recursive alternatives fix ([], _) = (ng, r) -- Illegal case: no terminating alternatives fix (_, []) = error ("No termination for left recursive rule " ++ show n) -- Left recursive case fix (las, tas) = (ng', r') where ntail = n ++ "Tail" ttail = "(" ++ t ++ " -> " ++ t ++ ")" rtail = RuleAlt (map tailalt las ++ [rnull]) rnull = RuleSeq [] (ProdCode "\\v -> v") ng' = (ntail, ttail, rtail) : ng r' = RuleAlt (map headalt tas) headalt r = RuleSeq [MatchName r "l", MatchName (RulePrim ntail) "t"] (ProdCode "t l") tailalt (RuleSeq (MatchName _ id : ms) p) = RuleSeq (ms ++ [m]) (ProdCode code) where m = MatchName (RulePrim ntail) "pappyTail" code = "\\" ++ id ++ " -> pappyTail (" ++ oldcode ++ ")" oldcode = case p of ProdName id' -> id' ProdCode c -> c -- Default case when nonterminal isn't an alternation elimleft ng n t r = (ng, r) -- Infer the type of a rule expression, for use by */+ reducers. -- Doesn't work on sequences with results produced by raw Haskell code. infertype (RulePrim "Char") = "Char" infertype (RulePrim n) = t where (t, r) = findnt n nonterms infertype (RuleChar c) = "Char" infertype (RuleString s) = "String" infertype (RuleSeq ms (ProdName id)) = findmatch ms where findmatch [] = error ("Match variable " ++ id ++ " not found") findmatch (MatchName r id' : ms) = if id' == id then infertype r else findmatch ms findmatch (m : ms) = findmatch ms infertype (RuleAlt (r : rs)) = infertype r -- ignore others infertype (RuleOpt r) = "Maybe (" ++ infertype r ++ ")" infertype (RuleStar r) = "[" ++ infertype r ++ "]" infertype (RulePlus r) = "[" ++ infertype r ++ "]" infertype r = error ("Unable to infer type of: " ++ show r) -- After simple left recursion has been eliminated, -- check for any remaining (illegal) left recursion. -- Takes a grammar and returns an error message if any is found. checkLeftRecursion :: Grammar -> Maybe String checkLeftRecursion (parsername, topcode, topnts, nonterms) = checkgram nonterms nonterms where -- Iterate through the grammar checking each nonterminal. checkgram :: [Nonterminal] -> [Nonterminal] -> Maybe String checkgram g ((n, t, r) : nts) = case (checknt g [n] r, checkgram g nts) of ((_, Just e), _) -> Just e (_, Just e) -> Just e _ -> Nothing checkgram g [] = Nothing -- checknt takes a list of nonterminals that have been visited -- and the rule to check, -- and descends into the rule detecting circular left-references -- to those nonterminals. -- It returns True if the rule can match the empty string, -- False otherwise. checknt :: [Nonterminal] -> [Identifier] -> Rule -> (Bool, Maybe String) checknt g ns (RulePrim n) = if n == "Char" then (False, Nothing) else if n `elem` ns then (True, Just ("Illegal left recursion: " ++ concat (map (\n -> n ++ " -> ") (reverse ns)) ++ n)) else let (t, r) = findnt n g in checknt g (n:ns) r checknt g ns (RuleChar c) = (False, Nothing) checknt g ns (RuleString s) = (s == [], Nothing) checknt g ns (RuleSeq ms p) = cseq ms where cseq (MatchAnon r : ms) = seq (checknt g ns r) (cseq ms) cseq (MatchName r id : ms) = seq (checknt g ns r) (cseq ms) cseq (MatchPat r p : ms) = seq (checknt g ns r) (cseq ms) cseq (MatchString r s : ms) = seq (checknt g ns r) (cseq ms) cseq (MatchAnd r : ms) = pred (checknt g ns r) (cseq ms) cseq (MatchNot r : ms) = pred (checknt g ns r) (cseq ms) cseq (MatchPred c : ms) = pred (True, Nothing) (cseq ms) cseq [] = (True, Nothing) -- Used for normal sequence matchers seq (_, Just e) _ = (True, Just e) seq (True, Nothing) r2 = r2 seq r1 _ = r1 -- Used for predicate matchers, which always match empty pred (_, Just e) _ = (True, Just e) pred (_, Nothing) r2 = r2 checknt g ns (RuleAlt rs) = calts rs where calts [r] = checknt g ns r calts (r : rs) = case (checknt g ns r, calts rs) of ((_, Just e), _) -> (True, Just e) (_, (_, Just e)) -> (True, Just e) ((b1, _), (b2, _)) -> (b1 || b2, Nothing) checknt g ns (RuleOpt r) = case checknt g ns r of (_, Just e) -> (True, Just e) _ -> (True, Nothing) checknt g ns (RuleStar r) = case checknt g ns r of (_, Just e) -> (True, Just e) _ -> (True, Nothing) checknt g ns (RulePlus r) = checknt g ns r -- Check if a terminal or nonterminal of a given name exists existstnt n nts = n == "Char" || existsnt n nts -- Check if a nonterminal of a given name exists in a grammar existsnt n [] = False existsnt n ((n', _, _) : nts) = (n' == n) || existsnt n nts -- Find the type and rule for a given nonterminal in the grammar findnt n [] = error ("Nonterminal " ++ show n ++ " not found") findnt n ((n', t, r) : nts) = if n' == n then (t, r) else findnt n nts -- Construct a name for a new nonterminal out of a given basename, -- using ascending numeric indices to prevent conflicts newnt base nts = scan 0 where scan i = let n = base ++ show i in if existstnt n nts then scan (i+1) else nReadGrammar.hs Скачать
-- Packrat parser for Pappy grammar definition files module ReadGrammar (pappyParse, pappyParseFile) where import Char import Pos import Parse import Pappy -------------------- Top-level tie-up -------------------- data PappyDerivs = PappyDerivs { -- Grammar definition structure pdGrammar :: Result PappyDerivs Grammar, pdNonterminal :: Result PappyDerivs Nonterminal, -- Grammar rule expressions pdAltRule :: Result PappyDerivs Rule, pdSeqRule :: Result PappyDerivs Rule, pdSeqMatch :: Result PappyDerivs Match, pdUnaryRule :: Result PappyDerivs Rule, pdPrimRule :: Result PappyDerivs Rule, -- Lexical structure pdIdentifier :: Result PappyDerivs String, pdWord :: Result PappyDerivs String, pdSymbol :: Result PappyDerivs String, pdRawCode :: Result PappyDerivs String, pdCodeBlock :: Result PappyDerivs String, pdCodeChars :: Result PappyDerivs String, pdCodeSQChars :: Result PappyDerivs String, pdCodeDQChars :: Result PappyDerivs String, pdCharLit :: Result PappyDerivs Char, pdStringLit :: Result PappyDerivs String, pdWhitespace :: Result PappyDerivs String, -- Input text pdChar :: Result PappyDerivs Char, pdPos :: Pos } instance Derivs PappyDerivs where dvPos d = pdPos d dvChar d = pdChar d grammar = Parser pdGrammar nonterminal = Parser pdNonterminal altRule = Parser pdAltRule seqRule = Parser pdSeqRule seqMatch = Parser pdSeqMatch unaryRule = Parser pdUnaryRule primRule = Parser pdPrimRule identifier = Parser pdIdentifier word = Parser pdWord symbol = Parser pdSymbol rawCode = Parser pdRawCode codeBlock = Parser pdCodeBlock codeChars = Parser pdCodeChars codeSQChars = Parser pdCodeSQChars codeDQChars = Parser pdCodeDQChars charLit = Parser pdCharLit stringLit = Parser pdStringLit whitespace = Parser pdWhitespace pappyParse :: String -> String -> Either Grammar String pappyParse name text = let initpos = Pos name 1 1 d = pappyDerivs initpos text in case pdGrammar d of Parsed g _ _ -> Left g NoParse e -> Right (show e) pappyParseFile :: String -> IO Grammar pappyParseFile filename = do text <- readFile filename let initpos = Pos filename 1 1 d = pappyDerivs initpos text case pdGrammar d of Parsed g _ _ -> return g NoParse e -> fail (show e) pappyDerivs :: Pos -> String -> PappyDerivs pappyDerivs pos text = d where d = PappyDerivs (pGrammar d) (pNonterminal d) (pAltRule d) (pSeqRule d) (pSeqMatch d) (pUnaryRule d) (pPrimRule d) (pIdentifier d) (pWord d) (pSymbol d) (pRawCode d) (pCodeBlock d) (pCodeChars d) (pCodeSQChars d) (pCodeDQChars d) (pCharLit d) (pStringLit d) (pWhitespace d) chr pos chr = case text of [] -> NoParse (eofError d) (c:cs) -> Parsed c (pappyDerivs (nextPos pos c) cs) (nullError d) -------------------- Grammar Definition Structure -------------------- pGrammar :: PappyDerivs -> Result PappyDerivs Grammar Parser pGrammar = do whitespace keyword "parser"; name <- identifier; sym ":" c1 <- rawCode </> return [] keyword "top"; tops <- sepBy1 identifier (sym ",") ns <- many1 nonterminal c2 <- rawCode </> return [] let code = c1 ++ "\n" ++ c2 ++ "\n" notFollowedBy anyChar return (name, code, tops, ns) pNonterminal :: PappyDerivs -> Result PappyDerivs Nonterminal Parser pNonterminal = do n <- identifier sym "::" t <- identifier </> rawCode sym "=" r <- altRule return (n, t, r) -------------------- Grammar Rule Expressions -------------------- pAltRule :: PappyDerivs -> Result PappyDerivs Rule Parser pAltRule = do alts <- sepBy1 seqRule (sym "/") case alts of [a] -> return a _ -> return (RuleAlt alts) pSeqRule :: PappyDerivs -> Result PappyDerivs Rule Parser pSeqRule = (do ms <- many seqMatch sym "->" id <- identifier return (RuleSeq ms (ProdName id))) </> (do ms <- many seqMatch sym "->" c <- rawCode return (RuleSeq ms (ProdCode c))) </> unaryRule pSeqMatch :: PappyDerivs -> Result PappyDerivs Match Parser pSeqMatch = (do i <- identifier sym ":" r <- unaryRule return (MatchName r i)) </> (do p <- rawCode sym ":" r <- unaryRule return (MatchPat r p)) </> (do c <- charLit sym ":" r <- unaryRule return (MatchString r [c])) </> (do s <- stringLit sym ":" r <- unaryRule return (MatchString r s)) </> (do sym "&" r <- unaryRule return (MatchAnd r)) </> (do sym "!" r <- unaryRule return (MatchNot r)) </> (do sym "&" p <- rawCode return (MatchPred p)) </> (do r <- unaryRule return (MatchAnon r)) pUnaryRule :: PappyDerivs -> Result PappyDerivs Rule Parser pUnaryRule = (do r <- primRule sym "?" return (RuleOpt r)) </> (do r <- primRule sym "+" return (RulePlus r)) </> (do r <- primRule sym "*" return (RuleStar r)) </> primRule pPrimRule :: PappyDerivs -> Result PappyDerivs Rule Parser pPrimRule = (do n <- identifier return (RulePrim n)) </> (do c <- charLit return (RuleChar c)) </> (do s <- stringLit return (RuleString s)) </> (do sym "(" r <- altRule sym ")" return r) -------------------- Lexical Structure -------------------- -- Keywords and identifiers keyword :: String -> Parser PappyDerivs String keyword w = (do s <- word if s == w then return w else fail "") <?!> show w pIdentifier :: PappyDerivs -> Result PappyDerivs String Parser pIdentifier = (do s <- word if s `elem` keywords then fail "" else return s) <?!> "identifier" pWord :: PappyDerivs -> Result PappyDerivs String Parser pWord = do c <- satisfy anyChar isIdentStart cs <- many (satisfy anyChar isIdentCont) whitespace return (c : cs) isIdentStart c = isAlpha c || c == '_' isIdentCont c = isIdentStart c || isDigit c || c == '\'' keywords = [ "parser", "top"] -- Symbols: operators, parentheses, etc. sym :: String -> Parser PappyDerivs String sym s = (do s' <- symbol if s' == s then return s else fail "") <?!> show s pSymbol :: PappyDerivs -> Result PappyDerivs String Parser pSymbol = do s <- stringFrom symbols whitespace return s symbols = [ "->", "::", ":", "=", "(", ")", "/", "+", "*", "?", "&", "!"] -- Raw code block, ignoring following whitespace pRawCode :: PappyDerivs -> Result PappyDerivs String Parser pRawCode = do c <- codeBlock whitespace return c -- Raw code block, consuming only the braces and contents pCodeBlock :: PappyDerivs -> Result PappyDerivs String Parser pCodeBlock = do char '{' c <- codeChars char '}' return c -- Characters in a code block pCodeChars :: PappyDerivs -> Result PappyDerivs String Parser pCodeChars = (do b <- codeBlock -- nested code blocks cs <- codeChars return ("{" ++ b ++ "}" ++ cs)) </> (do char '\'' -- character literals lit <- codeSQChars char '\'' cs <- codeChars return ("\'" ++ lit ++ "\'" ++ cs)) </> (do char '\"' -- string literals lit <- codeDQChars char '\"' cs <- codeChars return ("\"" ++ lit ++ "\"" ++ cs)) </> (do ic <- satisfy anyChar isIdentStart -- identifiers ics <- many (satisfy anyChar isIdentCont) cs <- codeChars return (ic : ics ++ cs)) </> (do s <- lineComment -- comments cs <- codeChars return (s ++ cs)) </> (do c <- noneOf "{}\"\'" -- other characters cs <- codeChars return (c : cs)) </> return [] -- Characters in a single-quoted character literal in a code block pCodeSQChars :: PappyDerivs -> Result PappyDerivs String Parser pCodeSQChars = (do char '\\' c <- anyChar cs <- codeSQChars return ('\\' : c : cs)) </> (do c <- noneOf "\'\\\r\n" cs <- codeSQChars return (c : cs)) </> return [] -- Characters in a double-quoted string literal in a code block pCodeDQChars :: PappyDerivs -> Result PappyDerivs String Parser pCodeDQChars = (do char '\\' c <- anyChar cs <- codeDQChars return ('\\' : c : cs)) </> (do c <- noneOf "\"\\\r\n" cs <- codeDQChars return (c : cs)) </> return [] -- Character and string literals quotedChar quote = (do char '\\' c <- anyChar case c of 'n' -> return '\n' 'r' -> return '\r' 't' -> return '\t' 'v' -> return '\v' 'f' -> return '\f' '\\' -> return '\\' '\'' -> return '\'' '\"' -> return '\"' -- XXX octal characters, other escapes _ -> fail "invalid escape sequence") </> satisfy anyChar (\c -> c /= quote) pCharLit :: PappyDerivs -> Result PappyDerivs Char Parser pCharLit = (do char '\'' c <- quotedChar '\'' char '\'' whitespace return c) pStringLit :: PappyDerivs -> Result PappyDerivs String Parser pStringLit = (do char '"' s <- many (quotedChar '"') char '"' whitespace return s) -- Whitespace pWhitespace :: PappyDerivs -> Result PappyDerivs String Parser pWhitespace = do ss <- many (many1 spaceChar </> lineComment) return (concat ss) spaceChar :: Parser PappyDerivs Char spaceChar = satisfy anyChar isSpace --flowComment :: Parser PappyDerivs () --flowComment = -- do string "-*" -- many (do notFollowedBy (string "*/"); anyChar) -- string "*-" -- return () lineComment :: Parser PappyDerivs String lineComment = do s1 <- string "--" s2 <- many (do notFollowedBy lineTerminator; anyChar) s3 <- lineTerminator return (s1 ++ s2 ++ s3) lineTerminator :: Parser PappyDerivs String lineTerminator = (do string "\r\n") </> string "\r" </> string "\n"thesis.ps Скачать