Skip to content
This repository was archived by the owner on Nov 28, 2017. It is now read-only.
This repository was archived by the owner on Nov 28, 2017. It is now read-only.

reboot - Map and key insertion order #8

Closed
@mdedetrich

Description

@mdedetrich

Stackoverflow question is here http://stackoverflow.com/questions/31635076/scala-immutable-map-like-datastructure-that-has-constant-effective-constant-look

@eed3si9n raised a potential issue, and that is the fact that JObject uses a Map, which means that there is no specified ordering on keys. The JSON specification says that you shouldn't rely on the object having any specific order, however this does have practical implications.

Quoting what @eed3si9n said in gitter

suppose I want to construct a JSON object { "a": 1, "b": 2 }
and this represents some serialized value
for caching I might just take the SHA1 of it
if I use your Map version of JObject to implement it, the SHA1 could change depending on the weather
so the behavior of JObject becomes non-idempotent (I am not sure if this is the right term)
this is not a what-if scenario as sbt currently uses json4s to serialize case classes via Scala Pickling - https://github.com/sbt/serialization

The problem is, that Scala stdlib does not have any immutable map like data structure which preserves key ordering that also has c or eC lookup time. There is LinkedMap, which is a Map backed by a LinkedList, however that has Linear lookup time. There is LinkedHashMap which has a constant lookup time, however its mutable. There are some solutions

  1. Roll our own datastructure (which inherits from Scala's collection library, so it can be treated to be exactly the same as a normal Map) which has all of the properties we need
  2. Internally maintain a Vector as well as a Map so we can have some concept of ordering
  3. Just use a SortedMap. We still can't guarantee that that the key order will be the same according to insertion, however we can guarantee that the keys will be Sorted, which means that if you SHA-1 two JObject's that have the same contents, they will be equal. It also means that when you serialize, we can guarantee that it will always be ordered. EDIT: Default implementation of SortedMap is a TreeMap, which has log lookup time.
  4. Converted the Map to a SortedMap by some Ordering, which means that if you do something like what @eed3si9n above, you would convert your JObject's Map to a SortedMap before doing a SHA-1. The nicest way to implement this is by having some sorted method on a JValue, which will replace all internal JObjects to something like JObjectSorted and return a JValueSorted. You can then serialize a JValueSorted if you need this sought of functionality.

Taking into account all theoretical/practical/time considerations, I think 4 is most realistic. It fixes @eed3si9n issue indirectly, while still maintaining immutability and c/eC lookup time.

EDIT: The implementation for 3 (which uses a SortedMap), has only one implementation in Scala's stdlib, which is a TreeMap. TreeMap has a log lookup time, which is definitely is not as bad as n, but its not a c/eC. Looks like we may need to roll our own data structure, if possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions