-
Notifications
You must be signed in to change notification settings - Fork 29
[RFC] Add string_stream_in_put_image #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swbaird I think you're in charge of the original AI at the ARG. Maxim's comments sound sensible to me, but I think you'd be better suited to review this :) Thanks in advance ! |
Thanks for the comment. The ARG has started a discussion on this topic. As soon as I have an AI number I will post it here. |
Here is a somewhat more elaborate package the ARG is discussing to provide something like the Text_Stream you suggested. I think there is a goal to make this something you could read and write using String, Wide_String, or Wide_Wide_String. It is missing New_LIne and End_Of_Line routines, which it presumably needs. Let me propose a specification for a "Text_Buffer" that does something like a text stream. It would allow concatenating chunks of text in whatever format as well as extracting them in similar arbitrary formats. It could have an optional streaming interface, but it could be used without referring to any streams (and usually would be used that way). Presumably, there would also be a bounded version (not shown). Something like the following would be ideal: package Text_Buffers
with Preelaborated, Nonblocking, Global => null is
-- This Global => null requires that a Text_Buffer is a compound object,
-- and any access types be internal.
type Text_Buffer is private;
function Character_Count (Buffer : in Text_Buffer) return Natural;
procedure Clear (Buffer : in out Text_Buffer);
procedure Write (Buffer : in out Text_Buffer;
Item : in String)
with Post => Character_Count (Buffer)'Old + Item'Length =
Character_Count (Buffer);
procedure Wide_Write (Buffer : in out Text_Buffer;
Item : in Wide_String)
with Post => Character_Count (Buffer)'Old + Item'Length =
Character_Count (Buffer);
-- I hate having to use "Wide_xxx" here, but it's necessary if we're
-- going to allow using literals conviniently with this package.
procedure Wide_Wide_Write (Buffer : in out Text_Buffer;
Item : in Wide_Wide_String)
with Post => Character_Count (Buffer)'Old + Item'Length =
Character_Count (Buffer);
procedure Read (Buffer : in out Text_Buffer;
Item : in out String;
Length : in out Natural)
with Post => Character_Count (Buffer)'Old - Length =
Character_Count (Buffer);
function Read (Buffer : in out Text_Buffer) return String
with Post => Character_Count (Buffer)'Old = Read'Length and then
Character_Count (Buffer) = 0;
-- Same for Wide_String and Wide_Wide_String.
-- We also could have UTF8 and UTF16 routines (possibly in a child,
-- to avoid dragging in Ada.Strings.Encoding unless needed).
-- Similarly, we could support a streaming interface:
procedure Read_into_Buffer(
Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Buffer : in out Text_Buffer;
Length : in Natural)
with Post => Character_Count (Buffer)'Old + Length =
Character_Count (Buffer)
Nonblocking => False, Global => in out all;
-- Read length Wide_Wide_Characters from Stream.
procedure Write_from_Buffer(
Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Buffer : in out Text_Buffer;
Length : in Natural := 0)
with Post => (if Length /= 0 then
Character_Count (Buffer)'Old - Length =
Character_Count (Buffer)
else
Character_Count (Buffer) = 0)
Nonblocking => False, Global => in out all;
-- Write length Wide_Wide_Characters to Stream. If Length is 0,
-- write all of the characters in the buffer to the stream.
end Text_Buffers; Put_Image would then be modified to take a Text_Buffer rather than a Stream: procedure S'Put_Image
(Buffer : in out Text_Buffer'Class;
Arg : in T); |
Dear ARG, I'm absolutely skeptical, but thanks to Maxim, who pushed me again to try to share my experience ;) I would like warn you again, please change your mind and forgot about Character/String (Wide/Wide_Wide forms) if you would like to create something useful in real world localized/internationalized/globalized applications. For such applications Character is meaningless. String is meaningless too. Text has two important properties: (1) low level representation and (2) high level access/processing API. Low level is all kinds of encoding/character sets. High level allows to process some "atomic" items. They may be Unicode code points, grapheme clusters, words, paragraphs, etc. Also, when application needs to work with streams - language should provide necessary abstraction. Stream may be anything: text buffer, network socket, serial port. If it is text stream - necessary encoder/decoder should be connected between "raw" stream and application. |
According to many object-oriented design practices, like SOLID, it's better to have clean text writing interface here instead of specify implementation for all occasions. Why user authoring Image attribute should see Character_Count, Clear, Read subprograms? Why does she need reading from Root_Stream_Type and writing to it? I agree with @godunko. Ability to writing String raises questions "what encoding is used in this string (Latin-1/UTF8" and "how to read from the stream if the encoding is unknown in advance". We already have String in Ada.Directories and Ada.Environment_Variables that prevents usage of non-ascii characters in file names and environments. Don't do this again. |
It would help to see an alternative. Remember we are focused on 'Image for now. Thanks. |
@reznikmm @godunko To expand on Tucker's answer:
So, the question is, do you have a concrete counter proposal, bearing those two points in mind ? |
@raph-amiard Yes, I have. I've written it in RFC. What is wrong with the rfc? |
Sorry, you're right ... So if I understand correctly:
I completely agree with the objections to However I have one question - as, I must confess, a lot about encoding is still unclear to me. If you only allow Wide_Wide_String, are you sure about the encoding that you're using ? It needs to be utf-32 ? If that is the case, @sttaft I indeed prefer @reznikmm 's original proposal. It's cleaner and simpler, and allows pretty much everything a user needs. |
However I have one question - as, I must confess, a lot about encoding is still unclear to me. If you only allow Wide_Wide_String, are you sure about the encoding that you're using ? It needs to be utf-32 ?
That’s the point: if you use Wide_Wide_String, there is no encoding (though there is potentially a charset)
Basically, each element in your string is a character (a unicode code point in general), so you do not have to
decode anything. Whenever you want to output or input such a wide_wide_string, then you need to somehow
convert those code points to bytes, and that’s when encoding plays a role. But only at the fringes of your program,
in the input/output routines. Once you have the string, the rest of your program doesn’t need to care what
encoding was used initially, or will be used in the future.
GNATCOLL.XStrings_Impl has a similar approach and is meant to store characters, not bytes.
The default instantiation in GNATCOLL.Strings used characters in the range 1-256, so if you want to
handle Asian languages for instance you should have a different instance of GNATCOLL.Strings_Impl.
(that’s likely what Ada strings should be too, though we have all made the confusion at some point between
characters and bytes, and then we no longer know what we are manipulating :-)
|
I would tend to agree with Maxim's position: the current AI proposal is too low-level, and expose too many things for no immediate benefit. Strings should be kept "abstract" within the application, and converted only when they are output/stored. Thus, I like this RFC, which goes in the right direction. For the record, I found the following blog post quite enlightening: https://two-wrongs.com/unicode-strings-in-ada-2012.html EDIT: I just noticed that the Matreshka library referenced in the post is written by Vadim 😅 |
@raph-amiard Right, your understanding is correct. Thank you for summarizing. Whole Unicode character repertoire fits |
I've wrote a little demo showing how this was implemented in GNAT Community Edition 2020. |
From Slack: I believe the model given in http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai12s/ai12-0384-1.txt?rev=1.2 is pretty clear at this point that a Text_Buffer is a sequence of "logical" characters, and we have addressed the issues relating to what sort of substitutions take place when getting a String after putting a Wide_Wide_String. Overall, this API is designed to make it convenient for user-defined 'Image functions. Most Ada code uses String rather than Wide_String or Wide_Wide_String, and there might likely already be "Image" functions for a private type that could be re-purposed or slightly revised to implement the 'Put_Image aspect. It would seem an unnecessary burden to force all such code to be revised to use Wide_Wide_String. Nevertheless, to accommodate situations where you do need to use characters outside Standard.Character, we have provided ways to store characters using Wide_String, Wide_Wide_String, UTF-8, and UTF-16. The justification for providing multiple Put_... operations is that we are trying to make it relatively easy to implement 'Put_Image. So the goal is to simplify the caller's job, at the expense of having more Put routines. Here is the document comparing the current ARG AI with the prototyping work done by Bob Duff: |
Thank you, @sttaft for sharing the comparing document and AI12-0384-1. I haven’t seen them. As I understand, ARG decided to make
With this API a user creates a document (represented as a tree of documents, actually) concatenating and grouping them together and then (providing max line width) casts it to a list of lines or its string representation ( I have implemented such an engine and found it rather handy and powerful. I was even able to write an Ada pretty printer library capable of generating cute Ada source code. Here is an example of pretty printing with Ada.Wide_Wide_Text_IO;
with League.Pretty_Printers; use League.Pretty_Printers;
procedure Test is
Printer : aliased League.Pretty_Printers.Printer;
Doc : Document;
begin
Doc := Printer.New_Document
.Put ("begin")
.Append
(Printer.New_Document
.New_Line
.Put ("null;")
.Nest (3))
.New_Line
.Put ("end;");
Ada.Wide_Wide_Text_IO.Put_Line
(Printer.Pretty (80, Doc).To_Wide_Wide_String);
end Test; Of course it requires dynamic memory allocations, but for restricted environments the compiler could provide an implementation that ignores all Group commands or New_Line commands, so reducing pretty printing to just a concatenation into a string (I hope). Having such API would help users much more than To be more concrete I provide a draft specification: package Ada.Pretty_Printers is
type Pretty_Printer is tagged limited private;
-- A pretty printer object. It owns all documents created by
-- New_Document and their modifications. Destroying a Pretty_Printer
-- will invalidate all its documents.
type Document (<>) is private; -- tagged to enable dot notation???
-- A document contains some text to pretty print
function New_Document (Self : in out Pretty_Printer'Class) return Document;
-- Create empty document
procedure Append
(Self : in out Document;
Text : Wide_Wide_String);
-- Append a Text to the document, replace Self with the result.
-- The Text should not contain line feeds and other control characters.
procedure Append
(Self : in out Document;
Text : Document);
-- Append another document text to the document, replace Self with the
-- result. Both documents should be created from the same printer object.
procedure New_Line
(Self : in out Document;
Gap : Wide_Wide_String := " ");
-- Append a line separator to the document, replace Self with the result.
-- The Gap will replace line separator if enclosing group operator
-- collapses lines into a single line.
procedure Nest
(Self : in out Document;
Indent : Natural);
-- Create a document where each new line appended with Indent spaces.
-- Replace Self with the result.
procedure Group (Self : in out Document);
-- Group all new lines in Self documend to fold them altogether
procedure Print
(Self : in out Pretty_Printer'Class;
Doc : Document;
Width : Positive;
Put_Line : access procedure (Line : Wide_Wide_String));
-- Convert an input document to a line list with given prefered
-- line Width. Feed the list to Put_Line procedure. The document
-- should belong to the printer. Corresponding function Put_Image
(Printer : in out Pretty_Printer’Class;
Value : T) return Document; Considering AI12-0384-1 I ask you NOT to introduce any UTF-8 strings. Standard defines If you like, I’ll volontier to write a draft implementation for the GNAT Runtime. |
@reznikmm Since the ARG has updated corresponding AIs to take this feedback into account, but in a different fashion still, I'll close this AI. Feel free to update/reopen if you think we should make further amendments on top of what the ARG has proposed. |
No description provided.