Skip to content

Polyglot Language Understanding

andychu edited this page Sep 26, 2023 · 36 revisions

Note: light is not necessarily better than heavy!

It refers to how much code is shared between language "back ends". If no code is shared, it's "heavy".

Lightweight Implementation

  • uchex (Stanford paper, 2016)
    • micro-grammars, parser combinators
    • "belief-style checkers" (not the only supported technique)
    • Haskell Parsec, original implementation was Python
  • Island Grammars
    • An island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.

These two appear fairly approximate?

  • Comby (OCaml), Strange Loop Talk, CMU paper
  • sylver - not open source?
    • Sylver is a language-agnostic tool for source code exploration and analysis.
    • *Using the SYLQ query language REPL, you can perform syntax-aware search on your codebase to find

Heavyweight Implementation

  • semgrep / coccinelle (OCaml)
    • Semgrep: a static analysis journey (2021) - How an academic project for the Linux kernel evolved into a multilingual security tool
    • INRIA -> Facebook -> r2c
    • facebook/pfff repo (OCaml) style issues and potential bugs.*
  • github/semantic -- appears inactive
    • Haskell
  • Google Kythe

Other Surveys

Clone this wiki locally