-
-
Notifications
You must be signed in to change notification settings - Fork 166
Polyglot Language Understanding
andychu edited this page Sep 26, 2023
·
36 revisions
Surveys of projects/research that try to understand multiple programming languages in a "unified" way.
To varying degrees, they're valuable "corpuses" of language info.
Note: light is not necessarily better than heavy!
It refers to how much code is shared between language "back ends". If no code is shared, it's "heavy".
-
ctags (Universal, Exuberant) -- Integrated with vim. Very approximate, text-only analysis of languages.
- See FAQ on "what happens when it's wrong?"
- Although it's not clear how much sharing there is
-
uchex (Stanford paper, 2016)
- micro-grammars, parser combinators
- "belief-style checkers" (not the only supported technique)
- Haskell Parsec, original implementation was Python
-
Concept: Island Grammars
- An island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.
These two appear fairly approximate?
-
Comby (OCaml), Strange Loop Talk, CMU paper
- parser parser combinators
- https://comby.dev/en/projects
-
sylver - not open source?
- Sylver is a language-agnostic tool for source code exploration and analysis.
- *Using the SYLQ query language REPL, you can perform syntax-aware search on your codebase to find
- semgrep / coccinelle (OCaml)
- Semgrep: a static analysis journey (2021) - How an academic project for the Linux kernel evolved into a multilingual security tool
- INRIA -> Facebook -> r2c
- facebook/pfff repo (OCaml) style issues and potential bugs.*
- github/semantic -- appears inactive
- Haskell
- Google Kythe
- Language Server Protocol