Add a `Utf8CharDecoder` type for incrementally decoding UTF-8 byte-by-byte

```
#[derive(Clone, Debug, Eq, PartialEq)]
struct Utf8CharDecoder(...);

impl Utf8CharDecoder {
    fn new() -> Self { todo!() }
    fn feed(&mut self, byte: u8) -> Option<Result<char, SomeError>> { todo!() }
    fn finish(self) -> Result<(), SomeError> { todo!() }
    fn reset(&mut self) { todo!() }
    // Something for decomposing into the inner partial bytes
    // Something for testing whether there are currently partial bytes stored
    // Something for querying how many continuation bytes are needed to complete the current character
}
```

- Also add a lossy variant

- Error conditions that the error type must cover:
    - Codepoint is greater than 0x10FFFF
    - Codepoint is a surrogate character
    - Non-canonical encoding of codepoint (e.g., `0b1100_0000 0b1000_0000` for the NUL byte)
    - Partial UTF-8 sequence followed by non-continuation byte (i.e., ASCII char or new start byte)
    - Continuation byte encountered without preceding matching start byte

- On an error, reset the decoder to the initial state?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a `Utf8CharDecoder` type for incrementally decoding UTF-8 byte-by-byte #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add a Utf8CharDecoder type for incrementally decoding UTF-8 byte-by-byte #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add a `Utf8CharDecoder` type for incrementally decoding UTF-8 byte-by-byte #55