[RFC] Add string_stream_in_put_image #17

reznikmm · 2019-06-13T10:44:02Z

No description provided.

raph-amiard · 2019-06-17T10:18:52Z

@swbaird I think you're in charge of the original AI at the ARG. Maxim's comments sound sensible to me, but I think you'd be better suited to review this :) Thanks in advance !

sttaft · 2019-07-09T17:32:38Z

Thanks for the comment. The ARG has started a discussion on this topic. As soon as I have an AI number I will post it here.

sttaft · 2019-07-09T19:54:48Z

Here is a somewhat more elaborate package the ARG is discussing to provide something like the Text_Stream you suggested. I think there is a goal to make this something you could read and write using String, Wide_String, or Wide_Wide_String. It is missing New_LIne and End_Of_Line routines, which it presumably needs.

Let me propose a specification for a "Text_Buffer" that does something like a text stream. It would allow concatenating chunks of text in whatever format as well as extracting them in similar arbitrary formats. It could have an optional streaming interface, but it could be used without referring to any streams (and usually would be used that way).

Presumably, there would also be a bounded version (not shown).

Something like the following would be ideal:

package Text_Buffers
   with Preelaborated, Nonblocking, Global => null is
     -- This Global => null requires that a Text_Buffer is a compound object,
     -- and any access types be internal.

   type Text_Buffer is private;

   function Character_Count (Buffer : in Text_Buffer) return Natural;

   procedure Clear (Buffer : in out Text_Buffer);

   procedure Write (Buffer : in out Text_Buffer;
                    Item   : in String)
      with Post => Character_Count (Buffer)'Old + Item'Length = 
                   Character_Count (Buffer);

   procedure Wide_Write (Buffer : in out Text_Buffer;
                         Item   : in Wide_String)
      with Post => Character_Count (Buffer)'Old + Item'Length = 
                   Character_Count (Buffer);
      -- I hate having to use "Wide_xxx" here, but it's necessary if we're
      -- going to allow using literals conviniently with this package.

   procedure Wide_Wide_Write (Buffer : in out Text_Buffer;
                              Item   : in Wide_Wide_String)
      with Post => Character_Count (Buffer)'Old + Item'Length = 
                   Character_Count (Buffer);

   procedure Read (Buffer : in out Text_Buffer;
                   Item   : in out String;
                   Length : in out Natural)
      with Post => Character_Count (Buffer)'Old - Length = 
                   Character_Count (Buffer);

   function Read (Buffer : in out Text_Buffer) return String
      with Post => Character_Count (Buffer)'Old = Read'Length and then
                   Character_Count (Buffer) = 0;

   -- Same for Wide_String and Wide_Wide_String.
   -- We also could have UTF8 and UTF16 routines (possibly in a child,
   -- to avoid dragging in Ada.Strings.Encoding unless needed).
   -- Similarly, we could support a streaming interface:

   procedure Read_into_Buffer(
      Stream : not null access Ada.Streams.Root_Stream_Type'Class;
      Buffer : in out Text_Buffer;
      Length : in Natural)
      with Post => Character_Count (Buffer)'Old + Length = 
                   Character_Count (Buffer)
           Nonblocking => False, Global => in out all;       
      -- Read length Wide_Wide_Characters from Stream.

   procedure Write_from_Buffer(
      Stream : not null access Ada.Streams.Root_Stream_Type'Class;
      Buffer : in out Text_Buffer;
      Length : in Natural := 0)
      with Post => (if Length /= 0 then
                       Character_Count (Buffer)'Old - Length = 
                       Character_Count (Buffer)
                    else  
                       Character_Count (Buffer) = 0)     
           Nonblocking => False, Global => in out all;       
      -- Write length Wide_Wide_Characters to Stream. If Length is 0, 
      -- write all of the characters in the buffer to the stream.
end Text_Buffers;

Put_Image would then be modified to take a Text_Buffer rather than a Stream:

procedure S'Put_Image
  (Buffer : in out Text_Buffer'Class;
   Arg    : in T);

godunko · 2019-07-10T11:31:02Z

Dear ARG,

I'm absolutely skeptical, but thanks to Maxim, who pushed me again to try to share my experience ;)

I would like warn you again, please change your mind and forgot about Character/String (Wide/Wide_Wide forms) if you would like to create something useful in real world localized/internationalized/globalized applications. For such applications Character is meaningless. String is meaningless too. Text has two important properties: (1) low level representation and (2) high level access/processing API. Low level is all kinds of encoding/character sets. High level allows to process some "atomic" items. They may be Unicode code points, grapheme clusters, words, paragraphs, etc.

Also, when application needs to work with streams - language should provide necessary abstraction. Stream may be anything: text buffer, network socket, serial port. If it is text stream - necessary encoder/decoder should be connected between "raw" stream and application.

reznikmm · 2019-07-21T08:38:07Z

@sttaft

According to many object-oriented design practices, like SOLID, it's better to have clean text writing interface here instead of specify implementation for all occasions.

Why user authoring Image attribute should see Character_Count, Clear, Read subprograms? Why does she need reading from Root_Stream_Type and writing to it?

I agree with @godunko. Ability to writing String raises questions "what encoding is used in this string (Latin-1/UTF8" and "how to read from the stream if the encoding is unknown in advance".

We already have String in Ada.Directories and Ada.Environment_Variables that prevents usage of non-ascii characters in file names and environments. Don't do this again.

sttaft · 2019-07-23T15:22:47Z

It would help to see an alternative. Remember we are focused on 'Image for now. Thanks.

raph-amiard · 2019-07-23T23:08:45Z

@reznikmm @godunko To expand on Tucker's answer:

It is clear that Ada so far didn't "get" character encoding right, but this RFC is not the place to fix it.
We want this feature to be part of the next standard, with a "good enough" solution for people who will most likely just put ASCII in their data structures anyway.

So, the question is, do you have a concrete counter proposal, bearing those two points in mind ?

reznikmm · 2019-07-24T10:16:04Z

@raph-amiard Yes, I have. I've written it in RFC.

What is wrong with the rfc?
Do you have ideas how can I make the RFC better?
What points do I miss in the RFC?
Shall I provide some implementation examples?

raph-amiard · 2019-07-24T10:22:19Z

Sorry, you're right ... So if I understand correctly:

You think it's better to provide used-defined 'Image through a Wide_Wide_String stream
Your objection to the Text_Buffer that Tucker proposed is that
1. It contains implementation details (such as read/clear/etc) that breaks encapsulation.
2. It's not clear which encoding it is using.

I completely agree with the objections to Text_Buffer I think ... I see no reason, or even I think it would be potentially dangerous to allow anybody to clear the buffer, and your proposed interface seems cleaner.

However I have one question - as, I must confess, a lot about encoding is still unclear to me. If you only allow Wide_Wide_String, are you sure about the encoding that you're using ? It needs to be utf-32 ?

If that is the case, @sttaft I indeed prefer @reznikmm 's original proposal. It's cleaner and simpler, and allows pretty much everything a user needs.

briot · 2019-07-24T10:36:25Z

However I have one question - as, I must confess, a lot about encoding is still unclear to me. If you only allow Wide_Wide_String, are you sure about the encoding that you're using ? It needs to be utf-32 ?

That’s the point: if you use Wide_Wide_String, there is no encoding (though there is potentially a charset) Basically, each element in your string is a character (a unicode code point in general), so you do not have to decode anything. Whenever you want to output or input such a wide_wide_string, then you need to somehow convert those code points to bytes, and that’s when encoding plays a role. But only at the fringes of your program, in the input/output routines. Once you have the string, the rest of your program doesn’t need to care what encoding was used initially, or will be used in the future. GNATCOLL.XStrings_Impl has a similar approach and is meant to store characters, not bytes. The default instantiation in GNATCOLL.Strings used characters in the range 1-256, so if you want to handle Asian languages for instance you should have a different instance of GNATCOLL.Strings_Impl. (that’s likely what Ada strings should be too, though we have all made the confusion at some point between characters and bytes, and then we no longer know what we are manipulating :-)

yakobowski · 2019-07-24T12:47:02Z

I would tend to agree with Maxim's position: the current AI proposal is too low-level, and expose too many things for no immediate benefit. Strings should be kept "abstract" within the application, and converted only when they are output/stored. Thus, I like this RFC, which goes in the right direction.

For the record, I found the following blog post quite enlightening: https://two-wrongs.com/unicode-strings-in-ada-2012.html

EDIT: I just noticed that the Matreshka library referenced in the post is written by Vadim 😅

reznikmm · 2019-07-24T13:49:12Z

@raph-amiard Right, your understanding is correct. Thank you for summarizing.

Whole Unicode character repertoire fits Wide_Wide_Character, while Wide_Character can be only BMP character and Character is Latin-1. So, yes, Wide_Wide_String doesn't require any encoding by itself. Its interpretation is unambiguous.

reznikmm · 2020-09-02T13:30:46Z

I've wrote a little demo showing how this was implemented in GNAT Community Edition 2020.

sttaft · 2020-09-02T15:44:51Z

From Slack: I believe the model given in http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai12s/ai12-0384-1.txt?rev=1.2 is pretty clear at this point that a Text_Buffer is a sequence of "logical" characters, and we have addressed the issues relating to what sort of substitutions take place when getting a String after putting a Wide_Wide_String.

Overall, this API is designed to make it convenient for user-defined 'Image functions. Most Ada code uses String rather than Wide_String or Wide_Wide_String, and there might likely already be "Image" functions for a private type that could be re-purposed or slightly revised to implement the 'Put_Image aspect. It would seem an unnecessary burden to force all such code to be revised to use Wide_Wide_String. Nevertheless, to accommodate situations where you do need to use characters outside Standard.Character, we have provided ways to store characters using Wide_String, Wide_Wide_String, UTF-8, and UTF-16.

The justification for providing multiple Put_... operations is that we are trying to make it relatively easy to implement 'Put_Image. So the goal is to simplify the caller's job, at the expense of having more Put routines.

Here is the document comparing the current ARG AI with the prototyping work done by Bob Duff:
https://docs.google.com/document/d/1Plr-Ov6q7_TfpgBmnmwXoaJKFYc_6WOhlj2Yyfb1OFU/edit?usp=sharing

reznikmm · 2020-09-03T11:12:55Z

Thank you, @sttaft for sharing the comparing document and AI12-0384-1. I haven’t seen them. As I understand, ARG decided to make X’Image pretty print the value. This is a much harder task than just printing everything into a string. To do this job well a user should have an appropriate tool. I would say that a text buffer with an ability to append a string and new lines is not an appropriate tool there. I would consider a tool described in “A prettier printer." paper by Philip Wadler as more appropriate there. In short, he proposes a next API:

A document (and/or a pretty printer) type with methods:
- create an empty document
- append a string or another document
- append new line
- indent with N spaces
- group (most tricky one) – drop all new lines, but only if the result fits a single line
- convert resulting document to a line list

With this API a user creates a document (represented as a tree of documents, actually) concatenating and grouping them together and then (providing max line width) casts it to a list of lines or its string representation (list.join(“\n”)). Implementation cost is O(N).

I have implemented such an engine and found it rather handy and powerful. I was even able to write an Ada pretty printer library capable of generating cute Ada source code. Here is an example of pretty printing begin null; end;. Depending on line width it will take one or 3 lines.

with Ada.Wide_Wide_Text_IO;
with League.Pretty_Printers; use League.Pretty_Printers;

procedure Test is
   Printer : aliased League.Pretty_Printers.Printer;
   Doc : Document;
begin
   Doc := Printer.New_Document
     .Put ("begin")
     .Append
       (Printer.New_Document
        .New_Line
        .Put ("null;")
        .Nest (3))
     .New_Line
     .Put ("end;");
   
   Ada.Wide_Wide_Text_IO.Put_Line
     (Printer.Pretty (80, Doc).To_Wide_Wide_String);
end Test;

Of course it requires dynamic memory allocations, but for restricted environments the compiler could provide an implementation that ignores all Group commands or New_Line commands, so reducing pretty printing to just a concatenation into a string (I hope). Having such API would help users much more than String/Wide_String conversion routines.

To be more concrete I provide a draft specification:

package Ada.Pretty_Printers is

   type Pretty_Printer is tagged limited private;
   --  A pretty printer object. It owns all documents created by
   --  New_Document and their modifications. Destroying a Pretty_Printer
   --  will invalidate all its documents.
   
   type Document (<>) is private;  --  tagged to enable dot notation???
   --  A document contains some text to pretty print
   
   function New_Document (Self : in out Pretty_Printer'Class) return Document;
   --  Create empty document
   
   procedure Append
     (Self : in out Document;
      Text : Wide_Wide_String);
   --  Append a Text to the document, replace Self with the result.
   --  The Text should not contain line feeds and other control characters.

   procedure Append
     (Self : in out Document;
      Text : Document);
   --  Append another document text to the document, replace Self with the
   --  result. Both documents should be created from the same printer object.
   
   procedure New_Line
     (Self : in out Document;
      Gap  : Wide_Wide_String := " ");
   --  Append a line separator to the document, replace Self with the result.
   --  The Gap will replace line separator if enclosing group operator
   --  collapses lines into a single line.
   
   procedure Nest
     (Self   : in out Document;
      Indent : Natural);
   --  Create a document where each new line appended with Indent spaces.
   --  Replace Self with the result.

   procedure Group (Self : in out Document);
   --  Group all new lines in Self documend to fold them altogether
   
   procedure Print
     (Self     : in out Pretty_Printer'Class;
      Doc      : Document;
      Width    : Positive;
      Put_Line : access procedure (Line : Wide_Wide_String));
   --  Convert an input document to a line list with given prefered
   --  line Width. Feed the list to Put_Line procedure. The document
   --  should belong to the printer.

Corresponding T’Put_Image profile will be:

function Put_Image
  (Printer : in out Pretty_Printer’Class;
   Value   : T) return Document;

Considering AI12-0384-1 I ask you NOT to introduce any UTF-8 strings. Standard defines String as an array of ASCII characters. Putting UTF-8 code units there is a very bad idea. We have endless errors with this in GNAT Studio. We shouldn't encourage users to do this. Until we introduce a new modern String replacement into the Ada Standard, the only reliable representation of international characters is Wide_Wide_String and Wide_Wide_Character. Absent procedures for String/Wide_String in Put_Image interface will let the user know that he is writing in a wrong unreliable way.

If you like, I’ll volontier to write a draft implementation for the GNAT Runtime.

raph-amiard · 2021-01-05T11:21:06Z

@reznikmm Since the ARG has updated corresponding AIs to take this feedback into account, but in a different fashion still, I'll close this AI. Feel free to update/reopen if you think we should make further amendments on top of what the ARG has proposed.

raph-amiard changed the title ~~Add string_stream_in_put_image RFC~~ [RFC] Add string_stream_in_put_image Jun 14, 2019

yannickmoy added the RFC label Jun 14, 2019

Add string_stream_in_put_image RFC

ccde784

reznikmm force-pushed the master branch from cb8e981 to ccde784 Compare June 14, 2019 17:25

raph-amiard assigned swbaird Jun 25, 2019

raph-amiard added the bug Something isn't working label Jun 25, 2019

raph-amiard unassigned swbaird Jan 5, 2021

raph-amiard closed this Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add string_stream_in_put_image #17

[RFC] Add string_stream_in_put_image #17

reznikmm commented Jun 13, 2019

raph-amiard commented Jun 17, 2019

sttaft commented Jul 9, 2019

sttaft commented Jul 9, 2019 •

edited by pmderodat

Loading

godunko commented Jul 10, 2019

reznikmm commented Jul 21, 2019

sttaft commented Jul 23, 2019

raph-amiard commented Jul 23, 2019

reznikmm commented Jul 24, 2019

raph-amiard commented Jul 24, 2019

briot commented Jul 24, 2019 via email

yakobowski commented Jul 24, 2019 •

edited

Loading

reznikmm commented Jul 24, 2019

reznikmm commented Sep 2, 2020

sttaft commented Sep 2, 2020 •

edited

Loading

reznikmm commented Sep 3, 2020

raph-amiard commented Jan 5, 2021

[RFC] Add string_stream_in_put_image #17

[RFC] Add string_stream_in_put_image #17

Conversation

reznikmm commented Jun 13, 2019

raph-amiard commented Jun 17, 2019

sttaft commented Jul 9, 2019

sttaft commented Jul 9, 2019 • edited by pmderodat Loading

godunko commented Jul 10, 2019

reznikmm commented Jul 21, 2019

sttaft commented Jul 23, 2019

raph-amiard commented Jul 23, 2019

reznikmm commented Jul 24, 2019

raph-amiard commented Jul 24, 2019

briot commented Jul 24, 2019 via email

yakobowski commented Jul 24, 2019 • edited Loading

reznikmm commented Jul 24, 2019

reznikmm commented Sep 2, 2020

sttaft commented Sep 2, 2020 • edited Loading

reznikmm commented Sep 3, 2020

raph-amiard commented Jan 5, 2021

sttaft commented Jul 9, 2019 •

edited by pmderodat

Loading

yakobowski commented Jul 24, 2019 •

edited

Loading

sttaft commented Sep 2, 2020 •

edited

Loading