Skip to content

Polyglot Language Understanding

andychu edited this page Oct 3, 2023 · 36 revisions

Surveys of projects/research that try to understand multiple programming languages in a "unified" way.

To varying degrees, they're valuable "corpuses" of language info.


Note: light is not necessarily better than heavy!

It refers to how much code is shared between language "back ends". If no code is shared, it's "heavy".

Lightweight Implementations

  • ctags (Universal, Exuberant) -- Integrated with vim. Very approximate, text-only analysis of languages.

    • See FAQ on "what happens when it's wrong?"
    • Although it's not clear how much sharing there is
  • uchex (Stanford paper, 2016)

    • micro-grammars, parser combinators
    • "belief-style checkers" (not the only supported technique)
    • Haskell Parsec, original implementation was Python
  • Concept: Island Grammars

    • An island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.

These two appear fairly approximate?

  • Comby (OCaml), Strange Loop Talk, CMU paper
  • sylver - not open source?
    • Sylver is a language-agnostic tool for source code exploration and analysis.
    • *Using the SYLQ query language REPL, you can perform syntax-aware search on your codebase to find

Heavyweight Implementations

  • semgrep / coccinelle (OCaml)

    • Semgrep: a static analysis journey (2021) - How an academic project for the Linux kernel evolved into a multilingual security tool
    • INRIA -> Facebook -> r2c
    • facebook/pfff repo (OCaml) style issues and potential bugs.*
  • github/semantic -- appears inactive

    • Haskell
  • Google Kythe

  • ROSE

    • Developed at Lawrence Livermore National Laboratory (LLNL), ROSE is an open source compiler infrastructure to build source-to-source program transformation and analysis tools for large-scale C (C89 and C98), C++ (C++98 and C++11), UPC, Fortran (77, 95, 2003), OpenMP, Java, Python, PHP, and Binary applications.
    • ROSE is particularly well suited for building custom tools for static analysis, program optimization, arbitrary program transformation, domain-specific optimizations, complex loop optimizations, performance analysis, and cyber-security
    • Written in C++ - https://github.com/rose-compiler/rose/tree/weekly/src/AstNodes/Expression

Polyglot Interfaces

Other Surveys

Clone this wiki locally