-
-
Notifications
You must be signed in to change notification settings - Fork 166
Coprocess Protocol Proposal
This document sketches a protocol to allow coprocesses to substitute for normal "batch" processes in shell scripts. A coprocess can be thought of as a single-threaded server that reads and writes from pipes.
The goal is to make shell scripts faster. It can also make interactive completion faster, since completion scripts often invoke (multiple) external tools.
Many language runtimes start up slowly, especially when they include a JIT compiler or maybe libraries are loaded: Python, Ruby, R, Julia, the JVM (including Clojure), etc.
Process startup times seem to be getting worse in general. Python 3 is faster than Python 2 in nearly all dimensions except startup time.
Let's call the protocol FCLI for now. There's a rough analogy to FastCGI and CGI. CGI starts one process per request, while FastCGI handles multiple requests in a process. (I think FastCGI is threaded unlike FCLI, but let's ignore that for now.)
Suppose we have a Python command line tool that copies files to a cloud file system. It works like this:
cloudcopy foo.jpg //remote/myhome/mydir/
(This could also be an R tool that does a linear regression, but let's use the cloudcopy
example to be concrete. The idea is that a lot of the work is "startup time" like initializing libraries, not "actual work".)
It could be converted to a FCLI coprocess by wrapping main()
in a while True
loop.
A shell would invoke such a process with these environment variables:
-
FCLI_VERSION
-- the process should try to become a coprocess. Some scripts may ignore this! That is OK; the shell/client should handle it. -
FCLI_REQUEST_FIFO
-- read requests from this file system path (a named pipe) -
FCLI_RESPONSE_FIFO
-- write responses to this file system path (a named pipe)
For worker #9, the shell might set variables like this:
FCLI_REQUEST_FIFO=/tmp/cloudcopy-pool/request-fifo-9 \
FCLI_RESPONSE_FIFO=/tmp/cloudcopy-pool/response-fifo-9 \
cloudcopy # no args; they'll be sent as "argv" requests
The requests and responses will look like this. Note the actual encoding will likely not be JSON, but I'm writing in JSON syntax for convenience.
# written by the shell to request-fifo-9
{ argv: ["cloudcopy", "bar.jpg", "//remote/myhome/mydir"]
env: {"PYTHONPATH": "."} # optional ENV to override actual env. May be ignored by some processes.
}
->
# written by the cloudcopy process to response-fifo-9
{ "status": 0 } # 0 on success, 1 on failure
stderr
is for logging. stdin
/ stdout
are used as usual. We probably need to instruct the server to flush its streams in order to properly delimit requests (?). We won't get an EOF because the pipes are open across multiple requests.
If you wanted to copy 1,000 files, you could start a pool of 20 or so coprocesses and drive them from an event loop. You would only pay the startup time 20 times instead of 1000 times.
In some cases, it would be possible to add a --num-threads
option to your cloudcopy
tool. But there are many cases where something like FCLI would be easier to implement. Wrapping main()
is a fairly basic change.
The process may also just exit 1
or exit 123
, and that will be treated as {"status": 123}
. A new coprocess will be started for the next request.
-
argv
-- run a new command and print a response to the fifo. Use stdin/stdout/stderr as normal. -
flush
-- flush stdout and stderr. I think this will make it easier to delimit responses from adjacent commands. -
echo
-- for testing protocol conformance? -
version
-- maybe? -
cd
-- instruct the process to change directories? This should be straightforward in most (all?) languages.
Shells are usually thought of as clients that drive coprocess "tools" in parallel. But they can also be servers, i.e. processing multiple invocations of sh -c
in a single process.
Shells are often invoked recursively (including by redo).
Because it will be easier for existing command line tools to implement this protocol. Many tools are written with global variables, or they are written in languages that don't freely thread anyway (Python, R, etc.).
- I could have used this for RAPPOR and several other "data science" projects in R.
- The redo build system starts many short-lived processes.
- it starts many shell processes to intrepret rules, and many "tool" processes.
- Shellac Protocol Proposal -- this protocol for shell-independent command completion can build on top of the coprocess protocol. It has more of a structured request/response flavor than some command line tools, but that's fine. FCLI works for both use cases.
Bash coprocesses communicate structured data over two file descriptors / pipes:
http://wiki.bash-hackers.org/syntax/keywords/coproc
They are not drop-in replacements for command line tools.
FCLI uses at least 4 one-way pipes, in order to separate control (argv, status) from data (stdin/stdout).
It would be nice for adoption to distribute a script like fcli-lib.sh
or fcli-lib.bash
that could call coprocesses in a transparent fashion.
TODO: Should the shell capture stderr? Or just use it as the normal logging/error stream? Usage errors could be printed there.
Do processes have to change directories? It shouldn't be super hard for them to implement a cd
command. (The shell can optimize that away in some cases.)
Process startup time is slow on Windows. I think it has named pipes, but they might not be on the file system? They might have their own namespace.
- If you start a coprocess pool, some requests might have affinity for certain replicas, i.e. to try to reuse a certain network connection. The shell could allow the user to specify this logic in a small shell function.