SHCOMP Protocol Proposal

This is DRAFT. Don't circulate yet!

SHCOMP Protocol

SHCOMP is a protocol for shell-agnostic autocompletion. Shells and tools written in any language can communicate with each other.

Motivation

The status quo is that you can only expect upstream authors to main command completion logic for bash, the most popular shell in the world. SHCOMP is a simple protocol for upstream authors to implement that will provide them with autocompletion for all shells (that are SHCOMP clients).

Overview

Roughly speaking, SHCOMP plays the same role for shells as the Language Server Protocl does for editors, but it looks more like CGI or FastCGI.

SHCOMP clients request completions, and SHCOMP servers provide them. (A server that runs in single-shot "batch" mode can also be called a provider.)

A client is typically a shell like Elvish, ZSH, Oil.
- It could also be an editor that's editing a shell script! (Vim, EMacs, VS Code, etc.)
A server could be the binary itself (git, npm, clang) OR a shell!
- That is, the completion logic could be written in C, JavaScript, or Python -- or it could be written in Elvish, ZSH, or Oil (or a compleat-like DSL).
- So note that shells are both clients and servers. They may request completions or they may provide them.

Rough Example

Let's use the example of busybox ash, which is derived from the dash code. I've heard some people complain that you have to use bash on Alpine Linux to get completions, because ash/dash have no support for it. The SHCOMP protocol potentially provides a migration path out of that situation.

Type this in ash:

$ git --git-dir . a<TAB>

ash will act as a SHCOMP client. It forms a request that looks something like this (encoding to be discussed):

{ "SHCOMP_ARGV": ["git", "--git-dir", ".", "a"]
  "SHCOMP_ARGV_INDEX": 3,
  "SHCOMP_CHAR_INDEX": 1,
}

ash just needs way of associating a command with a binary that supports the SHCOMP protocol. It doesn't need its own completion API.

It invokes the SHCOMP server/provider. In this case, it could just be bash script running git-completion.bash! You can write your own providers from scratch in any language, or you can wrap up existing bash completion (and zsh / Elvish completion) scripts with this protocol.

The response is:

{ "SHCOMP_REPLY": ["add", "am", "annotate", "apply", "archive"] }

Then ash displays these alternatives to the user.

NOTE: I've written the protocol like JSON, but the encoding will most likely not be JSON.

Request format

SHCOMP_* environment prefix. SHCOMP_

SHCOMP_ARGV@, SHCOMP_ARG_INDEX, SHCOMP_CHAR_INDEX ?

problem: you can't have NUL bytes for arrays? Maybe the request comes on stdin then? Can bash deal with that?

read -d $'' ?
$SHCOMP_VERSION environment variable for detection.
$SHCOMP_TRANSPORT=cli. (or coprocess, or JSON-RPC).

Response format

netstrings are out because bash can't generate the length of a bytestring!
Don't want newlines, because newlines can appear in filenames! touch $'\n'.
So we use NUL delimited strings. Maybe we have a length prefix for the array count. ${#COMPREPLY[@]}.
How to add complete help?
types? SHCOMP_REPLY@ is an array? A string that starts with the ascii length and then a colon?

Transports

CLI providers - environment variables
Coprocess providers (JSON?)
Maybe later: JSON-RPC like the language server protocol. I don't necessarily see the need for multi-threaded servers, but we'll see.

Character Encodings

SHCOMP clients and servers should prefer UTF-8 where possible. But file system paths are often the things being completed, and they are just byte strings. So technically most of the strings in the request and response format are NUL-terminated byte sterings, and UTF-8 is a special case of that.

Dispatch

Should this be done with the file system? Or It can be done in the shell itself with registration functions.
- complete -C git_completion_command git already registers a command. It could be complete -S for SHCOMP.

Typical Client Algorithm

Partially parse the shell language to argv. The last one may be incomplete or empty. (TODO: does it make sense to complete in the middle?)
Dispatch to the right binary that implements SHCOMP
Start it up with SHCOMP_VERSION=0.1 to make sure it supports the protocol.
Send over ARGV, as NUL-terminated strings. Maybe an array length prefix.
Receive SHCOMP_REPLY, which is an array, or maybe it can be streaming.
Dequote them into shell syntax -- e.g. ${x@Q} in bash -- and then display to the user.

Typical Server Algorithm

Check if you were started with SHCOMP_VERSION=<non-empty>.
Check if you were started with SHCOMP_MODE=batch or SHCOMP_MODE=coprocess and behave as appropriate.
Receive ARGV.
Determine completions. Example strategies:
- Run an existing command line parser or use its data structures to figure out what we need to complete
- dynamically grep --help (or a cached copy of it). bash-completion does this grepping.
Send back a response header?
Send back REPLY

Design and Implementation Issues

Shells should NOT consult a completion server for $<TAB> and ${<TAB>. They should complete their own variables!
If you have something ls $(echo long-time; sleep 100) --ref=<TAB>, then the $(echo) can be replaced with DUMMY before sending it to the completion server.
What about tilde expansion? That can be done beforehand? Or the completion provider has to know about it?
Are the key-value pairs in arbitrary order?

Streaming Responses

Low latency for shells is important. A user might want to accept a completion before all candidates are generated (e.g. from a distributed file system or cloud storage service). So we need to support streaming.
Instead of length-prefixed arrays, we can have arrays terminated by sentinels. The sentinel could just be an additional \0 byte? That is like the empty string.

Security

To prevent resource exhaustion attacks, shells may truncate long strings.
Completion servers can be sandboxed since they only communicate over stdin and stdout.

Uh oh!

SHCOMP Protocol Proposal

SHCOMP Protocol

Motivation

Overview

Rough Example

Request format

Response format

Transports

Character Encodings

Dispatch

Typical Client Algorithm

Typical Server Algorithm

Design and Implementation Issues

Streaming Responses

Security

Why Coprocesses?

Why not put one completion per line? Why not environment variables?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!