Skip to content

Polyglot Language Understanding

andychu edited this page Sep 27, 2023 · 36 revisions

Surveys of projects/research that try to understand multiple programming languages in a "unified" way.

To varying degrees, they're valuable "corpuses" of language info.


Note: light is not necessarily better than heavy!

It refers to how much code is shared between language "back ends". If no code is shared, it's "heavy".

Lightweight Implementations

  • ctags (Universal, Exuberant) -- Integrated with vim. Very approximate, text-only analysis of languages.

    • See FAQ on "what happens when it's wrong?"
    • Although it's not clear how much sharing there is
  • uchex (Stanford paper, 2016)

    • micro-grammars, parser combinators
    • "belief-style checkers" (not the only supported technique)
    • Haskell Parsec, original implementation was Python
  • Concept: Island Grammars

    • An island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.

These two appear fairly approximate?

  • Comby (OCaml), Strange Loop Talk, CMU paper
  • sylver - not open source?
    • Sylver is a language-agnostic tool for source code exploration and analysis.
    • *Using the SYLQ query language REPL, you can perform syntax-aware search on your codebase to find

Heavyweight Implementations

  • semgrep / coccinelle (OCaml)
    • Semgrep: a static analysis journey (2021) - How an academic project for the Linux kernel evolved into a multilingual security tool
    • INRIA -> Facebook -> r2c
    • facebook/pfff repo (OCaml) style issues and potential bugs.*
  • github/semantic -- appears inactive
    • Haskell
  • Google Kythe

Polyglot Interfaces

Other Surveys

Clone this wiki locally