Skip to content

Names for communication ABCs #1208

Open
@njsmith

Description

@njsmith

One of the major blockers for stabilizing Trio is that we need to stabilize our ABCs for channels/streams/pipes/whatever-they're-called. We've been iterating for a while, and I feel like we're pretty close on the core semantics, and I really want this to be done, but the fact is that I'm just not happy with the names, and those are kind of crucial. So here's an issue to try to sort it out once and for all.

Along the way, I've realized that there isn't really anything Trio-specific about these ABCs, and there are a bunch of advantages to making these ABCs something more widely used across the async Python ecosystem. So also CC'ing for feedback: @asvetlov @1st1 @agronholm @glyph (and feel free to add others). And I'll briefly review what we're trying to do for context before digging into the naming issue.

What problem are we trying to solve?

There are two main concepts that I think we need ABCs for:

  • communication channels used to transmit/receive bytes (right now Trio calls this a trio.abc.Stream)
  • communication channels used to transmit/receive objects of a given type (right now Trio calls this a trio.abc.Channel[T])

There are many many concrete implementations of each. Trio currently ships with ~7 different implementations of its byte-oriented communication ABC (sockets, TLS, Unix pipes, Windows named pipes, ...) and we expect to add more; it's also an interface that's commonly exposed and consumed by third-party libraries (think of SSH channels, HTTP streaming response bodies, QUIC connections, ...).

Object-wise communication is maybe a bit less familiar because most frameworks don't call it out as a single category, but it's also something you see all over the place. Some examples:

  • asyncio Queues or Golang channels
  • fundamental framing protocols like length-prefixing or newline-termination are strategies for converting an individual-byte-oriented channel into a bytes-object-oriented channel
  • a lot of sans-io libraries like h11, wsproto, h2 are essentially designed to convert a stream-of-individual-bytes into a stream-of-event-objects
  • Websockets are a channel for sending/receiving Union[bytes, str] objects

Having standard ABCs for these has a lot of benefits:

  • Most obviously, it provides "one obvious way to do it", so developers can focus on the interesting parts instead of looking up whether they're supposed to call read or recv or receive or get
  • It lets us write generic algorithms that work on arbitrary implementations. Example: Trio's TLS implementation can be composed with any compliant byte-stream implementation. A generic JSONChannel could be wrapped around any Channel[bytes] implementer to convert it into a Channel[JSONObject]. There's more discussion in Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796
  • Something that surprised me, but that's becoming a major issue: having a standard convention across async libraries makes life much easier for packages that want to support multiple async libraries.
  • Something that surprised me even more: aside from the asyncio/trio split, having standard abstractions here is the best way to write generic, composable sans-io protocols, because sans-io is basically another kind of I/O system, and Python's async/await is generic enough be repurposed for this. For more details see Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796 (comment)

If this is such a generic problem, then why should Trio be the one to solve it?

Well, no-one else seems to be working on it, and we need it :-). And of course we'd be happy if our work is useful to more people.

Asyncio will be adding an asyncio.Stream class in 3.8, but it doesn't seem to be intended as a generic abstraction with multiple implementations. It's a specific concrete class that's tightly tied to the asyncio's transport/protocols layer, and it exposes a rich public interface that includes buffering, line-splitting, fixed-length reads, and TLS.

This means that whenever someone needs a new kind of Stream object, they have two options: they can either define a new type that quacks like a Stream, which means they have to re-implement all this functionality from scratch. Or else, they have to implement their new functionality using the transport/protocols layer and then wrap an asyncio.Stream around it; but using the asyncio transport/protocol layer is awkward, adds overhead, and makes it very difficult to support other async libraries. Neither option is very appealing. IMO what we need is a minimal core interface, so that it's easy to implement, and then higher-level tools like buffering, line-splitting, TLS, etc. can be written once and re-used on any object that implements the byte-wise ABC.

And asyncio doesn't have an object-based communication abstraction at all, which IMO is a major missed opportunity.

Twisted OTOH does have standard abstractions for all these things, but they were designed 15+ years ago, long before async/await existed, and I think we can do better now. Heck, even the replacement @glyph's been working on for half a decade now predates async/await.

So the field seems wide open here.

What's already working?

We've been iterating on our APIs for these for a year+ now, and I think we're converging on a pretty solid design. You can see the current version in our docs: https://trio.readthedocs.io/en/stable/reference-io.html#abstract-base-classes

There are some details we're still sorting out – if you're curious see #1125, #823 / #1181, #371, #636 – but I don't want to get into the details too much because they're not really on-topic for this issue and I don't think they'll affect the overall adoption of the ABCs.

OK so what's this issue about then?

Like I said above, Trio currently uses trio.abc.Stream for the byte-wise interface, and trio.abc.Channel[T] for the object-wise interface. These names have two major problems:

  • They're both completely generic. In regular English, "stream" and "channel" mean basically the same thing. This means that the names don't tell you anything about which is which, or how they're similar, or how they're different. And this is unfortunate, because while these concepts are fairly simple and fundamental, experience says that it really takes some work for new users to wrap their heads around them. Anything we can do to make that easier will help a lot.

    Also it personally took me like 6 months to stop mixing up the names and saying "stream" where I meant "channel" or vice-versa, which I just can't ignore. If I can't keep them straight, then how can I expect anyone else to keep them straight.

  • There actually is an important conceptual relationship between them that IMO we should emphasize. Conceptually, byte-streams are basically object-streams where the object type is "a single byte". But if you try to use an interface designed for sending/receiving objects to send/receive individual bytes, then it'll be ridiculously inefficient, so instead you need a vectorized interface that works on whole bytestrings at once. A nice thing about framing things this way is that it emphasizes one of the things that always trips people up, which is that byte-streams don't preserve framing – because they're really a stream of individual bytes.

So I think the right way to do it is to present the object-wise interface as the more fundamental one, and the byte-wise interface as a specialized variant. And we can communicate that right in the names, by calling the object-wise interface X[T], and the byte-wise interface a ByteX.

And then when someone asks what the difference is between a ByteX and an X[bytes], we can say: This is an X[bytes]. This is a ByteX. (click the images to see the animations)

But what should X be?

Can we steal from another system?

As mentioned above, AsyncIO has a concrete class Stream for byte-wise communication and a concrete class Queue for object-wise communication, but no relevant ABCs.

Go has a concrete type chan for communicating objects within a single process, and abstract Reader and Writer interfaces for byte-wise communication.

Nodejs has an abstract Stream interface, that they use for both byte-wise and object-wise communication. There's a mode argument you can set when creating the stream, that determines whether bytestrings can be rechunked or not. (It's pretty similar to some of the approaches we considered and rejected in #959.) The byte-wise mode is the default, and the object-wise mode seems like an afterthought (e.g. the docs say that nodejs itself never uses the object-wise mode).

Rust's Tokio module has an abstract Stream interface, which is basically equivalent to an async iterator in Python, or what Trio calls a trio.abc.ReceiveChannel[T]. And they also have an abstract Sink interface, which is equivalent to what Trio currently calls a trio.abc.SendChannel[T]. For bytes, they have abstract interfaces called AsyncRead and AsyncWrite. And they have generic framing tools to convert an AsyncRead into a Stream, or convert an AsyncWrite into a Sink.

Java's java.nio library uses Channel to mean, basically, anything with a close method (like Trio's AsyncResource). And then it has sub-interfaces like ByteChannel for byte-oriented communication. I don't think there's any abstract interface for framed/object-wise communication. Java's java.io library uses Stream to refer to byte-wise communication, and Reader and Writer for character-wise communication.

Swift NIO has an abstract Channel interface for byte-wise communication, and I don't see any interfaces for framed/object-wise communication.

My takeaways:

There's no general consensus on terminology. The words "stream" and "channel" show up a lot, but the meanings aren't consistent. "Read" and "write" are also popular, and are used consistently for byte-oriented interfaces.

There isn't any consensus on the basic concepts either! A major part of our ABC design is the insight that object-wise and byte-wise communication are both fundamental concepts, and that it's valuable to have standard interfaces to express both of them, and how they relate. But most of these frameworks only think seriously about byte-wise communication, and treat object-wise communication as an unrelated problem if they consider it at all. Tokio is the main exception, but their terminology is either ad hoc or motivated by other Rust-specific stuff. So we can't just steal an existing solution.

OK so what are our options?

I tried to brainstorm all the potential names I could think of, assuming we go with the X + ByteX pattern described above:

  • Channel[T] + ByteChannel, e.g. you could use a MemoryChannel to pass objects between tasks, and a SocketByteChannel to represent a TCP connection
  • Stream[T] + ByteStream, e.g. MemoryStream, SocketByteStream
  • Tube[T] + ByteTube, e.g. MemoryTube, SocketByteTube
  • Flow[T] + ByteFlow, e.g. MemoryFlow, SocketByteFlow
  • Transport[T] + ByteTransport, e.g. MemoryTransport, SocketByteTransport
  • Hose[T] + ByteHose, e.g. MemoryHose, SocketByteHose
  • Ferry[T] + ByteFerry, e.g. MemoryFerry, SocketByteFerry
  • Duct[T] + ByteDuct, e.g. MemoryDuct, SocketByteDuct
  • Vent[T] + ByteVent, e.g. MemoryVent, SocketByteVent
  • Pipe[T] + BytePipe, e.g. MemoryPipe, SocketBytePipe
  • Port[T] + BytePort, e.g. MemoryPort, SocketBytePort

Criteria: I think ideally our core name should be a common, concrete English word that's short to say and to type, because those make the best names for fundamental concepts. It shouldn't be too "weird" or controversial, because people dislike weird names and will refuse to adopt them even if everything else is good. And of course we want it to be unambiguous, so we need to avoid name clashes.

Unfortunately it's really hard to get all of these at once, which is why I got stuck :-)

Stream is really solid on the first two criteria: it's one syllable, common, uncontroversial. But! It clashes with asyncio.Stream. That's not a problem for adoption in the Trio ecosystem. But it might be a major problem if we want this to get uptake across Python more broadly.

The other obvious option is Channel, but I'm really hesitant because it's more cumbersome: 2 syllables on their own aren't too bad, but by the time you start talking about a SocketByteChannel it feels extremely Java, not Python.

For some reason Transport doesn't bother me as much, even though it's two syllables as well, but then you have the clash with the whole protocols/transport terminology, and that also seems like it could be pretty confusing. And we're generally trying to move away from protocols/transports, but I don't want to have to tell beginners "we recommend you use Transports instead of protocols/transports". That's just going to make them more confused.

Among the rest, Tube is tempting as a simple, short, concrete word that doesn't conflict with anything in Trio or asyncio, and fits very nicely with illustrations like the ones I linked above, where there's a literal tube with objects moving through it. But... @glyph has been trying to make his tubes a thing for half a decade now. I think the proposal here is totally compatible with the goals and vision behind his version of tubes, and much more likely to get wider traction going forward. But OTOH the details are very different. So I can't tell whether using the name here would be the sincerest form of flattery, or super-rude, or both at once.

What do y'all think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions