Skip to content
This repository was archived by the owner on Nov 28, 2017. It is now read-only.

reboot - decide between immutable.Seq vs immutable.Vector for JArray #5

Closed
mdedetrich opened this issue Jul 24, 2015 · 3 comments
Closed

Comments

@mdedetrich
Copy link
Contributor

One of the changes in reboot is the underlying collection behind JArray. Originally it was List, which is way too restrictive (made it impossible to have near constant lookup of JArray without having to do a costly conversion to something like Array or List)

We do have 2 contenders, one is Vector and the other is immutable.Seq. immutable.Seq allows the constructor to provide any immutable sequence (including List) where as Vector enforces it to be Vector.

Vector is a fantastic immutable general purpose data structure, that provides (almost) constant access of any elements, however its not an ideal data structure for some corner cases. Smaller size List's take up less memory, and have better performance characteristics (assuming that you are going to iterate through the whole collection), where as Vector uses less memory for larger sizes, plus has the almost constant lookup for large sizes. Read http://docs.scala-lang.org/overviews/collections/performance-characteristics.html for more info

In regards to examples, something like this will work much better with List (JArray's will have a really small size, random access is actually slightly faster for List for really small sizes, will also take up less memory)

{
   "1":[1],
   "2":[1],
   "3":[1],
   "4":[1],
   "5":[1],
    // up to some really large size
   "100000":[100000]
}

Where as the following will obviously benefit Vector much more, particularly if you need random access

[
    ["1"],   
    ["2"],   
    ["3"],   
    ["4"],   
    ["5"],   
    ["6"],
    // up to some really large size
    ["100000"]
]

@rossabaker is leaning towards Vector particularly if it helps in getting adopted by people like spray, however all technical issues/differences (as far as I am aware) have been addressed in the reboot pull request. By default, when constructing with the latest version, it will always use Vector, and there is an implicit conversion from predef.Seq to Vector so you can do stuff like JArray(Seq(JString("rawr")))

@rossabaker
Copy link
Contributor

immutable.Seq lets the producer choose an appropriate collection when it knows the size and the consumer's use case.

Vector lets the consumer confidently do indexed lookups, and the core library author confidently provide operations to manipulate a JArray with good general performance characteristics. Think appends and updates. These can be worked around with pattern matches and conversions, but it feels much simpler when Vector is the general currency.

I see both sides, but I still like the Vector side.

@mdedetrich
Copy link
Contributor Author

It also does fall down to economics of user usage. If the majority of people use the default constructors for JArray, it will almost always be a Vector, the only way to make a list is to specify it explicitly (i.e., you have to JArray(List(...))). In this sense it kinda works like an override

You can always match against the type values inside the JArray to check if its a collection that satisfies the type of lookup that you are doing, and thats not that hard

Last resort would to add an annotation, that will mention as a compiler warning whenever you try to construct a JArray with a List.

@mdedetrich
Copy link
Contributor Author

JArray for org.json4s.ast is now a Vector. org.json4s.basic.ast should be used if you care about performance/memory usage

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants