Skip to content

SHCOMP Protocol Proposal

andychu edited this page Oct 17, 2018 · 35 revisions

This is DRAFT. Don't circulate yet!

SHCOMP Protocol

Overview

There are SHCOMP clients and SHCOMP servers. Clients request completions, and servers provide them.

  • A client is typically a shell like Elvish, ZSH, Oil.
    • It could also be an editor that's editing a shell script! (Vim, VSCode?)
  • A server is typically the binary itself (git, npm, clang) OR a shell!
    • That is, the completion logic could be written in C, JavaScript, or Python -- or it could be written in Elvish, ZSH, or Oil (or a compleat-like DSL).
    • Note that shells are both clients and servers.

NOTE: A SHCOMP server can also be called a SHCOMP "provider", because sometimes it runs in batch/CLI mode.

Request format

SHCOMP_* environment prefix. SHCOMP_

SHCOMP_ARGV@, SHCOMP_ARG_INDEX, SHCOMP_CHAR_INDEX ?

problem: you can't have NUL bytes for arrays? Maybe the request comes on stdin then? Can bash deal with that?

  • read -d $'' ?

  • $SHCOMP_VERSION environment variable for detection.

  • $SHCOMP_TRANSPORT=cli. (or coprocess, or JSON-RPC).

Response format

  • netstrings are out because bash can't generate the length of a bytestring!
  • Don't want newlines, because newlines can appear in filenames! touch $'\n'.
  • So we use NUL delimited strings. Maybe we have a length prefix for the array count. ${#COMPREPLY[@]}.
  • How to add complete help?
  • types? SHCOMP_REPLY@ is an array? A string that starts with the ascii length and then a colon?

Transports

  • CLI providers - environment variables
  • Coprocess providers (JSON?)
  • Maybe later: JSON-RPC like the language server protocol. I don't necessarily see the need for multi-threaded servers, but we'll see.

Character Encodings

SHCOMP clients and servers should prefer UTF-8 where possible. But file system paths are often the things being completed, and they are just byte strings. So technically most of the strings in the request and response format are NUL-terminated byte sterings, and UTF-8 is a special case of that.

Typical Client Algorithm

  • Partially parse the shell language
  • Send over ARGV
  • Receive SHCOMP_REPLY
  • Dequote them into shell syntax, e.g. ${x@Q} in basvh.

Typical Server Algorithm

  • Receive ARGV
  • Options
    • Run an existing command line parser or use its data structures to figure out what we need to complete
    • dynamically grep --help (or a cached copy of it). bash-completion does this grepping.
  • Send back SHCOMP_REPLY

Design and Implementation Issues

  • Shells should NOT consult a completion server for $<TAB> and ${<TAB>. They should complete their own variables!
  • If you have something ls $(echo long-time; sleep 100) --ref=<TAB>, then the $(echo) can be replaced with DUMMY before sending it to the completion server.
  • What about tilde expansion? That can be done beforehand? Or the completion provider has to know about it?

Streaming Responses

  • Low latency for shells is important. A user might want to accept a completion before all candidates are generated (e.g. from a distributed file system or cloud storage service). So we need to support streaming.

  • Instead of length-prefixed arrays, we can have arrays terminated by sentinels. The sentinel could just be an additional \0 byte? That is like the empty string.

Security

  • To prevent resource exhaustion attacks, shells may truncate long strings.
  • Completion servers can be sandboxed since they only communicate over stdin and stdout.

Why Coprocesses?

Why not put one completion per line? Why not environment variables?

Clone this wiki locally