better-parse

0.4.4indexed

Parser combinator library simplifies constructing complex parsers from simpler ones. Features include tokenization, combinators for chaining and transforming parsers, and grammar definitions for language parsing. Ideal for building interpreters and compilers.

JSJVMNative·h0tk3y/better-parse

444

Stars

Used by

dependents

—

Health

/ 100

better-parse

A nice parser combinator library for Kotlin JVM, JS, and Multiplatform projects

val booleanGrammar = object : Grammar<BooleanExpression>() {
    val id by regexToken("\\w+")
    val not by literalToken("!")
    val and by literalToken("&")
    val or by literalToken("|")
    val ws by regexToken("\\s+", ignore = true)
    val lpar by literalToken("(")
    val rpar by literalToken(")")

    val term by 
        (id use { Variable(text) }) or
        (-not * parser(this::term) map { Not(it) }) or
        (-lpar * parser(this::rootParser) * -rpar)

    val andChain by leftAssociative(term, and) { l, _, r -> And(l, r) }
    override val rootParser by leftAssociative(andChain, or) { l, _, r -> Or(l, r) }
}

val ast = booleanGrammar.parseToEnd("a & !b | b & (!a | c)")

Using with Gradle

dependencies {
   implementation("com.github.h0tk3y.betterParse:better-parse:0.4.4")
}

With multiplatform projects, it's OK to add the dependency just to the commonMain source set, or some other source set if you want it for specific parts of the code.

Tokens

As many other language recognition tools, better-parse abstracts away from raw character input by pre-processing it with a Tokenizer, that can match Tokens (with regular expressions, literal values or arbitrary against an input character sequence.

There are several kinds of supported Tokens:

a regexToken("(?:my)?(?:regex)) is matched as a regular expression;
a literalToken("foo") is matched literally, character to character;
a token { (charSequence, from) -> ... } is matched using the passed function.

A Tokenizer tokenizes an input sequence such as InputStream or a String into a Sequence<TokenMatch>, providing each with a position in the input.

One way to create a Tokenizer is to first define the Tokens to be matched:

val id = regexToken("\\w+")
val cm = literalToken(",")
val ws = regexToken("\\s+", ignore = true)

A Token can be ignored by setting its ignore = true. An ignored token can still be matched explicitly, but if another token is expected, the ignored one is just dropped from the sequence.

val tokenizer = DefaultTokenizer(listOf(id, cm, ws))

Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if literalToken("a") is listed before literalToken("aa"), the latter will never be matched. Be careful with keyword tokens! If you match them with regexes, a word boundary \b in the end may help against ambiguity.

val tokenMatches: Sequence<TokenMatch> = tokenizer.tokenize("hello, world")

A more convenient way of defining tokens is described in the Grammar section.

It is possible to provide a custom implementation of a Tokenizer.

Parser

A Parser<T> is an object that accepts an input sequence (TokenMatchesSequence) and tries to convert some (from none to all) of its items into a T. In better-parse, parsers are also the building blocks used to create new parsers by combining them.

When a parser tries to process the input, there are two possible outcomes:

If it succeeds, it returns Parsed<T> containing the T result and the nextPosition: Int that points to what it left unprocessed. The latter can then be, and often is, passed to another parser.
If it fails, it reports the failure returning an ErrorResult, which provides detailed information about the failure.

A very basic parser to start with is a Token itself: given an input TokenMatchesSequence and a position in it, it succeeds if the sequence starts with the match of this token itself (possibly, skipping some ignored tokens) and returns that TokenMatch, pointing at the next token with the nextPosition.

val a = regexToken("a+")
val b = regexToken("b+")
val tokenMatches = DefaultTokenizer(listOf(a, b)).tokenize("aabbaaa")
val result = a.tryParse(tokenMatches, 0) // contains the match for "aa" and the next index is 1 for the match of b

Combinators

Simpler parsers can be combined to build a more complex parser, from tokens to terms and to the whole language. There are several kinds of combinators included in better-parse:

Grammar

As a convenient way of defining a grammar of a language, there is an abstract class Grammar, that collects the by-delegated properties into a Tokenizer automatically, and also behaves as a composition of the Tokenizer and the rootParser.

Note: a Grammar also collects by-delegated Parser<T> properties so that they can be accessed as declaredParsers along with the tokens. As a good style, declare the parsers inside a Grammar by delegation as well.

To use a parser that has not been constructed yet, reference it with parser { someParser } or parser(this::someParser):

val term by
    constParser or 
    variableParser or 
    (-lpar and parser(this::term) and -rpar)

A Grammar implementation can override the tokenizer property to provide a custom implementation of Tokenizer.

Syntax trees

A Parser<T> can be converted to another Parser<SyntaxTree<T>>, where a SyntaxTree<T>, along with the parsed T contains the children syntax trees, the reference to the parser and the positions in the input sequence. This can be done with parser.liftToSyntaxTreeParser().

This can be used for syntax highlighting and inspecting the resulting tree in case the parsed result does not contain the full syntactic structure.

For convenience, a Grammar can also be lifted to that parsing a SyntaxTree with grammar.liftToSyntaxTreeGrammar().

val treeGrammar = booleanGrammar.liftToSyntaxTreeGrammar()
val tree = treeGrammar.parseToEnd("a & !b | c -> d")
assertTrue(tree.parser == booleanGrammar.implChain)
val firstChild = tree.children.first()
assertTrue(firstChild.parser == booleanGrammar.orChain)
assertTrue(firstChild.range == 0..9)

There are optional arguments for customizing the transformation:

See SyntaxTreeDemo.kt for an example of working with syntax trees.

Examples

A boolean expressions parser that constructs a simple AST: BooleanExpression.kt
An integer arithmetic expressions evaluator:

Benchmarks

See the benchmarks repository h0tk3y/better-parse-benchmark and feel free to contribute.

Related libraries

Surfaced from shared tags and platforms — no rankings paid for.

kstatemachine★ 504

KStateMachineDSL library for creating state machines and statecharts. Supports event-based transitions, reactive listening, nested and parallel states, typesafe transitions, undo transitions, and exports to diagrams. Zero dependencies.Shared: dsl

data2viz★ 424

data2vizData visualization toolbox enabling consistent rendering across platforms. Offers modules for shapes, colors, interpolation, random data, easing, and more. Inspired by d3js and other libraries.Shared: dsl

akkurate★ 417

neskValidation library offering a declarative API for writing clear, maintainable validation code. Handles complex business logic, supports custom constraints, is highly extendable, and integrates with popular frameworks.Shared: dsl

spatial-k★ 137

maplibreSet of libraries for working with geospatial data, including GeoJson implementation and a port of Turfjs, featuring a Kotlin DSL for building GeoJson objects.Shared: dsl

Terpal★ 57

exoqueryEnables capturing string variables before interpolation, enhancing security and flexibility. Implements custom DSLs for safe SQL queries, preventing SQL injection vulnerabilities through typed string interpolation.Shared: dsl

Parameterize★ 47

BenWoodworthEnables concise, idiomatic parameterization of code and testing, modeling complex control flows cleanly. Supports customizable configurations for iterations, failure handling, and detailed multi-failure reporting, enhancing parameterized testing.Shared: dsl

better-parse

better-parse

Using with Gradle

Tokens

Parser

Combinators

Grammar

Syntax trees

Examples

Benchmarks

Related libraries

better-parse

Using with Gradle

Tokens

Parser

Combinators

Grammar

Syntax trees

Examples

Benchmarks

Operators

Related libraries