Skip to content
This repository was archived by the owner on Nov 28, 2017. It is now read-only.

reboot - provide an unsafe faster implementation of json4s-ast? #9

Closed
mdedetrich opened this issue Jul 26, 2015 · 5 comments
Closed

Comments

@mdedetrich
Copy link
Contributor

After speaking with @eed3si9n, there may be impetus for an unsafe json4s-ast which doesn't provide the same correctness guarantees as the current json4s-ast, however will have much better performance.

I personally don't have much issue with this, I suppose the only real contention is going from json4s-ast unsafe to json4s-ast current will have 2 methods, one that is safe (and will return an Option[JValue]), and another which is unsafe, that will return a JValue

One example that @eed3si9n came up with, is storing the JNumber as a string internally (which is what jawn apprently does). This improves runtime performance, since it doesn't need to convert to a BigDecimal first (which can be costly).

I also have a suspicion that it may be the same reason why JLong/JInt etc etc existed in json4s, however we then get into the issues of side effects (due to possible failure)

We could also just add JLong,JInt etc etc types to our current implementation, but than that brings back problems that we may had earlier which we were trying to remove.

@eed3si9n currently uses https://github.com/non/jawn for this reason (performance), we may look at their AST for inspiration, or maybe even just copy it 😛.

@mdedetrich mdedetrich changed the title reboot - provide an unsafe implementation of json4s-ast? reboot - provide an unsafe faster implementation of json4s-ast? Jul 26, 2015
@mdedetrich
Copy link
Contributor Author

I have pushed a test for the BigDecimal vs String representation, and here are the results.
https://gist.github.com/mdedetrich/6b175693d429525fa277

The runtime overhead appears to be around 30-40% in my trivial test, so there is definitely an overhead, which means there may be a stronger case for an unsafe ast

@rossabaker
Copy link
Contributor

I think the jawn-ast approach is mostly sound, with a couple caveats;

  1. I'm not thrilled about public case classes for the implementations. A JNumber is a first-class JSON concept. DeferLong et al are optimized representations of a JNumber, but I'm a little squeamish about them being a long-term supported API. I would want to hear the case for pattern matching these rather than toDouble type of operations on the abstract JNumber.
  2. There would need to be a BigDecimal-based version, even if it is avoided where possible, for full correctness.

Argonaut's JsonNumber is also worth a look.

@sirthias
Copy link

At first sight I don't really like having a safe and an unsafe AST version side by side.
Rather I'd layer convenience levels on top of each other.
I.e. have an ultra-fast and potentially less convenient AST structure at the base and provide lazy "views" for various kinds of convenience. Main application points: JSON numbers and JSON objects.

@mdedetrich
Copy link
Contributor Author

There are 3 ways to achieve this that I have in mind

  1. Add in a DeferNum/DeferObject to account for lazy access. While this does solve the performance problem, we can no longer guarantee that any instance of JValue is correct. Its entirely possible to have some JValue, and then when you try to serialize/render it, it will blow up in your face.
  2. Similar to what @sirthias had in mind, we would have a "unsafe" json4s-ast, which uses internal datastructures critical for performance. The unsafe version of JValue would then have a "view" or "convert" method, that would produce the safe version of the AST.
  3. Not do anything. This would probably mean libraries like jawn/rapture-json would use their own internal AST designed for performance, and then produce json4s-ast as the final step, i.e. when they have finished all of their streaming/parsing

My personal preference is 2, I have already create an very initial rudimentary implementation here (https://gist.github.com/mdedetrich/874dbac57bff9a93fa8c). Obviously this approach is going from one extreme to another, we have a very unsafe, but fast AST, and then we have not so fast (more accurately, fast unless you need to extract/compose JNumber values all of the time), but you know will always be correct

@mdedetrich
Copy link
Contributor Author

This has been implemented in the latest pull request. Talked with @sirthias at length today on gitter, and long story short, I didn't see a way of implementing a JSON AST that adequately satisfied people while trying to juggle so many requirements.

People have 2 very distinct concepts in mind when they think about a JSON type, hence why we now have 2 distinct libraries

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants