-
-
Notifications
You must be signed in to change notification settings - Fork 166
Polyglot Language Understanding
andychu edited this page Oct 3, 2023
·
36 revisions
Surveys of projects/research that try to understand multiple programming languages in a "unified" way.
To varying degrees, they're valuable "corpuses" of language info.
Note: light is not necessarily better than heavy!
It refers to how much code is shared between language "back ends". If no code is shared, it's "heavy".
-
ctags (Universal, Exuberant) -- Integrated with vim. Very approximate, text-only analysis of languages.
- See FAQ on "what happens when it's wrong?"
- Although it's not clear how much sharing there is
-
uchex (Stanford paper, 2016)
- micro-grammars, parser combinators
- "belief-style checkers" (not the only supported technique)
- Haskell Parsec, original implementation was Python
-
Concept: Island Grammars
- An island grammar only precisely defines small portions of the syntax of a language. The rest of the syntax is defined imprecisely, for instance as a list of characters, or a list of tokens.
These two appear fairly approximate?
-
Comby (OCaml), Strange Loop Talk, CMU paper
- parser parser combinators
- https://comby.dev/en/projects
-
sylver - not open source?
- Sylver is a language-agnostic tool for source code exploration and analysis.
- *Using the SYLQ query language REPL, you can perform syntax-aware search on your codebase to find
-
semgrep / coccinelle (OCaml)
- Semgrep: a static analysis journey (2021) - How an academic project for the Linux kernel evolved into a multilingual security tool
- INRIA -> Facebook -> r2c
- facebook/pfff repo (OCaml) style issues and potential bugs.*
-
github/semantic -- appears inactive
- Haskell
-
Google Kythe
-
- Developed at Lawrence Livermore National Laboratory (LLNL), ROSE is an open source compiler infrastructure to build source-to-source program transformation and analysis tools for large-scale C (C89 and C98), C++ (C++98 and C++11), UPC, Fortran (77, 95, 2003), OpenMP, Java, Python, PHP, and Binary applications.
- ROSE is particularly well suited for building custom tools for static analysis, program optimization, arbitrary program transformation, domain-specific optimizations, complex loop optimizations, performance analysis, and cyber-security
- Written in C++ - https://github.com/rose-compiler/rose/tree/weekly/src/AstNodes/Expression
-
SCIP - a better code indexing format than LSIF (Sourcegraph, 2022)
- Sourcegraph code navigation such as “Go to definition” comes in two flavors: search-based and precise. Search-based code navigation is available out-of-the-box. It is fast and always available, but it can occasionally return false-positive and false-negative results. Precise code navigation, on the other hand, requires custom configuration to set up, but the results are compiler-accurate and work across repositories. Both search-based and precise code navigation are useful in their own ways. While search-based is less powerful, it is a quick and convenient solution. Precise is more powerful, but it also requires more upfront investment to configure.
- scip-typescript: a new TypeScript and JavaScript indexer
- https://github.com/sourcegraph/scip-java
-
Language Server Protocol