reboot - Map and key insertion order #8
Description
Stackoverflow question is here http://stackoverflow.com/questions/31635076/scala-immutable-map-like-datastructure-that-has-constant-effective-constant-look
@eed3si9n raised a potential issue, and that is the fact that JObject
uses a Map
, which means that there is no specified ordering on keys. The JSON
specification says that you shouldn't rely on the object having any specific order, however this does have practical implications.
Quoting what @eed3si9n said in gitter
suppose I want to construct a JSON object { "a": 1, "b": 2 }
and this represents some serialized value
for caching I might just take the SHA1 of it
if I use your Map version of JObject to implement it, the SHA1 could change depending on the weather
so the behavior of JObject becomes non-idempotent (I am not sure if this is the right term)
this is not a what-if scenario as sbt currently uses json4s to serialize case classes via Scala Pickling - https://github.com/sbt/serialization
The problem is, that Scala
stdlib
does not have any immutable map like data structure which preserves key ordering that also has c
or eC
lookup time. There is LinkedMap
, which is a Map
backed by a LinkedList
, however that has Linear lookup time. There is LinkedHashMap
which has a constant lookup time, however its mutable
. There are some solutions
- Roll our own datastructure (which inherits from
Scala
's collection library, so it can be treated to be exactly the same as a normalMap
) which has all of the properties we need - Internally maintain a
Vector
as well as aMap
so we can have some concept of ordering - Just use a
SortedMap
. We still can't guarantee that that the key order will be the same according to insertion, however we can guarantee that the keys will beSorted
, which means that if youSHA-1
twoJObject
's that have the same contents, they will be equal. It also means that when you serialize, we can guarantee that it will always be ordered. EDIT: Default implementation ofSortedMap
is aTreeMap
, which haslog
lookup time. - Converted the
Map
to aSortedMap
by someOrdering
, which means that if you do something like what @eed3si9n above, you would convert yourJObject
'sMap
to aSortedMap
before doing aSHA-1
. The nicest way to implement this is by having somesorted
method on aJValue
, which will replace all internalJObject
s to something likeJObjectSorted
and return aJValueSorted
. You can then serialize aJValueSorted
if you need this sought of functionality.
Taking into account all theoretical/practical/time considerations, I think 4 is most realistic. It fixes @eed3si9n issue indirectly, while still maintaining immutability and c
/eC
lookup time.
EDIT: The implementation for 3 (which uses a SortedMap
), has only one implementation in Scala
's stdlib
, which is a TreeMap
. TreeMap
has a log
lookup time, which is definitely is not as bad as n
, but its not a c
/eC
. Looks like we may need to roll our own data structure, if possible.