Skip to content

Turns the standard AST JNumber into a safe constructor #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 5, 2017

Conversation

mdedetrich
Copy link
Owner

@sirthias @Ichoran @ktoso @fommil This PR disallows JNumber to be called directly via the constructor as a String (there is a fromString method which returns an Option[JNumber] with the regex check).

The regex check has also been removed from all constructors which are guaranteed to be a number, and since JNumber is a final class, all of the boilerplate methods to make it treated as a case class has been defined (unapply/canEqual/productArity/productElement ).

Note that this was only done to the standard AST and not the unsafe one (not sure if it should also be done for unsafe). Note that when combined with #17 (comment) its possible to do some include a bit flag field within JNumber which can track if it gets created as a long/int/double/float so we can make efficient .to converters for different number types.

Let me know if this is looking good, and I can merge/release to M4. I think the only contentious point would be whether these changes make sense in the unsafe AST

@mdedetrich mdedetrich changed the title Turns the standard AST JNumber into a safe constructor, #28 and #22 Turns the standard AST JNumber into a safe constructor Jul 2, 2017
@codecov-io
Copy link

codecov-io commented Jul 2, 2017

Codecov Report

Merging #30 into master will increase coverage by 0.12%.
The diff coverage is 43.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
+ Coverage   44.56%   44.68%   +0.12%     
==========================================
  Files           5        5              
  Lines         662      687      +25     
  Branches      128      133       +5     
==========================================
+ Hits          295      307      +12     
- Misses        367      380      +13
Impacted Files Coverage Δ
js/src/main/scala/scalajson/ast/JValue.scala 0% <0%> (ø) ⬆️
...s/src/main/scala/scalajson/ast/unsafe/JValue.scala 0% <0%> (ø) ⬆️
shared/src/main/scala/scalajson/ast/package.scala 57.77% <100%> (ø) ⬆️
...m/src/main/scala/scalajson/ast/unsafe/JValue.scala 59.5% <75%> (-0.24%) ⬇️
jvm/src/main/scala/scalajson/ast/JValue.scala 73.39% <77.77%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c444bf...d75e458. Read the comment docs.

throw new NumberFormatException(value)
}
final class JNumber(private[ast] val underlying: String) extends JValue {
@inline val value: String = underlying
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having underlying as well as this value?

Copy link
Owner Author

@mdedetrich mdedetrich Jul 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To access the raw string value (you can't access value because it's private and for large numbers you may want to access the String value)

Copy link

@ktoso ktoso Jul 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only for that reason then a def would make more sense perhaps (not to grow the memory footprint of the object)

Copy link
Owner Author

@mdedetrich mdedetrich Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's why I used@inline, or does this not work?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a val the value has to be stored somewhere. @inline means the content of the method (in this case accessing the field) should be inlined.

It's also generally not a good idea to use @inline on a public API method, since it makes it harder to change the implementation while maintaining backwards compatibility.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, as Greg said. @inline is bad for bin compat, does not change field semantics, may only do so for the method, and here we just want the method.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To access the raw string value (you can't access value because it's private and for large numbers you may want to access the String value).

I don't see the benefit of adding a second value member.
Why not simply call the constructor parameter value and make it a public field?

@sirthias
Copy link

sirthias commented Jul 2, 2017

Why do you prefer the "manual" case class over the actual case class, which is now possible (since 2.11.11 and 2.12.2)?

else
None

def unapply(arg: JNumber): Option[String] = Some(arg.underlying)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be improved via something like this, i.e. it could be done in a way that avoids the allocation here.
Not saying that this should be done, just pointing out the opportunity.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have a look at this tomorrow

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an end-user standpoint I'm not a fan of having a JNumber.unapply that provides a string. Maybe we should just remove the unapply altogether? What is a user supposed to do when getting this string? Try to parse it as a double? as a BigDecimal? It would be nice if the type system could help out more here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unapply for JNumber is there for these reasons

  1. Consistency in the AST (everything is a case class, containing a single value with an unapply)
  2. The string can be directly converted to the number you want (i.e. toDouble, toInt) since its always going to be a valid number. This may be improved with Remove the .to[X] converters #17 (comment)

@mdedetrich
Copy link
Owner Author

mdedetrich commented Jul 2, 2017

@sirthias The reason why I used a manual class is because of Scale 2.10.x support. Also there isn't any real difference for the user and we don't have to rely on compiler flags (safer route)

if (!value.matches(jNumberRegex)) {
throw new NumberFormatException(value)
}
final class JNumber(private[ast] val underlying: String) extends JValue {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this constructor be private?

That said, it would be preferable to provide different implementations of JNumber for each of the cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

private should be the default.
Making it public should require respective argumentative backup.

Copy link
Owner Author

@mdedetrich mdedetrich Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is private, just for the package ast. This is done to save performance as it allows both the standard ast and unsafe to access the private constructor to save on performance in some cases (bypassing the regex check when not needed). Its still private to anything that is outside of the scalaJSON package.

If we end up removing the unsafe AST or it gets changed, I will definitely change it back to private.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the constructor is public

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (barring the manual case class discussion) this would be the preferred solution:

final class JNumber private[ast](val value: String) extends JValue {
  ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. JNumber private[ast] (...) is the way to go.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood you guys, fixing it now

else
None

def unapply(arg: JNumber): Option[String] = Some(arg.underlying)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an end-user standpoint I'm not a fan of having a JNumber.unapply that provides a string. Maybe we should just remove the unapply altogether? What is a user supposed to do when getting this string? Try to parse it as a double? as a BigDecimal? It would be nice if the type system could help out more here.

throw new NumberFormatException(value)
}
final class JNumber(private[ast] val underlying: String) extends JValue {
@inline val value: String = underlying
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a val the value has to be stored somewhere. @inline means the content of the method (in this case accessing the field) should be inlined.

It's also generally not a good idea to use @inline on a public API method, since it makes it harder to change the implementation while maintaining backwards compatibility.

@sirthias
Copy link

sirthias commented Jul 3, 2017

@sirthias The reason why I used a manual class is because of Scale 2.10.x support. Also there isn't any real difference for the user and we don't have to rely on compiler flags (safer route)

Ok, so it comes down to either supporting Scala 2.10 or going for the standard case class.
Let's understand a bit the pros/cons of either choice:

  1. Dropping Scala 2.10 support would be bad because all projects that are still on Scala 2.10 could not depend on ScalaJSON. In order to understand the impact of this it'd be important to understand which projects this are. Do you know of any?

  2. Going for the "manual" case class has a number of (IMHO) important drawbacks:

    a) Asymmetry with all the other model classes. It seems weird that everything is a case class except for JNumber.
    b) Forgoing internal optimizations. AFAIK the compiler treats (final) case classes specially in certain cases (e.g. in pattern matches) and applies a number of internal optimizations, e.g. by removing superfluous allocations (see also the discussion around unapply). The "manual" case class will not be able to benefit from these.
    c) Risks with regard to meta support. Certain tools like shapeless generics or other macros might require an ADT to be modelled from actual cases classes to work as expected. I am quite sure that going down the manual route will hurt us here in some way or another at some point in the future!
    d) Less future proof. Coming developments (e.g. around TASTY) might add even more significance to case classes in unforeseen ways. The asymmetry in the ScalaJSON ADT is not a good thing here either.
    e) We have quite a few more lines of code in ScalaJSON, which makes the code harder to read and maintain. For example, we have the risk that the compiler changes the default desugaring of case classes in subtle ways (between versions) that we won't (immediately) see and follow.
    f) People reading the code will always and automatically look at the code and try to spot the difference to what they think the default desugaring of case classes is, which is hard, because many people will not know the details to well. Overall it's much more confusing that a standard case class.

So, overall I think we need to have very good reasons to not go for the standard case class.
In my eyes, it comes down to fully understanding how important Scala 2.10 support really is.
My guess is that it is of no significance whatsoever, but I might be wrong.

@mdedetrich
Copy link
Owner Author

mdedetrich commented Jul 3, 2017

@sirthias

Dropping Scala 2.10 support would be bad because all projects that are still on Scala 2.10 could not depend on ScalaJSON. In order to understand the impact of this it'd be important to understand which projects this are. Do you know of any?

http4s, apparently they are going to be stuck on 2.10 for a while. Also circe (due to http4s and cats)

Asymmetry with all the other model classes. It seems weird that everything is a case class except for JNumber.

Afaik, the asymmetry is only in the source and not how end users end up using the JNumber class (the only difference at this point is the new keyword, which end-users don't even use because they can't construct the JNumber directly with a string anyways)

Forgoing internal optimizations. AFAIK the compiler treats (final) case classes specially in certain cases (e.g. in pattern matches) and applies a number of internal optimizations, e.g. by removing superfluous allocations (see also the discussion around unapply). The "manual" case class will not be able to benefit from these.

This is probably the most convincing argument, but I am not completely aware of these optimizations. Also I think you can simulate the changes for unapply in a manual class

Risks with regard to meta support. Certain tools like shapeless generics or other macros might require an ADT to be modelled from actual cases classes to work as expected. I am quite sure that going down the manual route will hurt us here in some way or another at some point in the future!

Less future proof. Coming developments (e.g. around TASTY) might add even more significance to case classes in unforeseen ways. The asymmetry in the ScalaJSON ADT is not a good thing here either.

This is true

People reading the code will always and automatically look at the code and try to spot the difference to what they think the default desugaring of case classes is, which is hard, because many people will not know the details to well. Overall it's much more confusing that a standard case class.

This is a documentation issue

So, overall I think we need to have very good reasons to not go for the standard case class.
In my eyes, it comes down to fully understanding how important Scala 2.10 support really is.
My guess is that it is of no significance whatsoever, but I might be wrong.

Unfortunately its very significant (http4s/circe), but there are alternatives

  • Provide the current implementation just for scala 2.10.x. Means more code to maintain (although not a real issue with a library that is meant to rarely change) and use the case class for 2.11.x/2.12.x
  • Just use a case class for the entire codebase and present the Scala 2.10 users with a big warning label that they can access the constructor themselves

Also with these compiler flags that are needed for Scala 2.11.x, I assume its ScalaJSON that only needs to use these flags for compilation (i.e. end users don't need to add these special flags?)

@sirthias
Copy link

sirthias commented Jul 3, 2017

Unfortunately its very significant (http4s/circe), but there are alternatives

  • Provide the current implementation just for scala 2.10.x. Means more code to maintain (although >not a real issue with a library that is meant to rarely change) and use the case class for 2.11.x/2.12.x
  • Just use a case class for the entire codebase and present the Scala 2.10 users with a big > warning label that they can access the constructor themselves

To be honest, even though not great, I'd prefer the first of your listed alternatives over the manual case class way. Paying forever into the future with the drawbacks just for Scala 2.10 support now doesn't appear like a good tradeoff to me.

So how about splitting out a scala210 branch with a manual case class solution for JNumber and a proper case class for JNumber for everyone else?

Also with these compiler flags that are needed for Scala 2.11.x, I assume its ScalaJSON that only needs to use these flags for compilation (i.e. end users don't need to add these special flags?)

Yes, that is also my understanding. This flag only affects the compiler frontend.

}

def fromString(value: String): Option[JNumber] =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Scala 2.10 out of the way we should be able to simply name this method apply as well, like all the others...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of leaving in the fromString method for all versions and aliasing it to apply for Scala 2.10.x and Scala 2.11.x, that way for a codebase that needs to targets all Scala versions you can use fromString

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of leaving in the fromString method for all versions and aliasing it to apply for Scala 2.10.x and Scala 2.11.x, that way for a codebase that needs to targets all Scala versions you can use fromString

@mdedetrich
Copy link
Owner Author

@sirthias Sounds like a separate branch for 2.10.x support is the way, will work on it tonight

@@ -95,7 +95,7 @@ object JNumber {
*
* @author Matthew de Detrich
*/
final class JNumber(private[ast] val underlying: String) extends JValue {
final class JNumber private[ast] (val underlying: String) extends JValue {
@inline def value: String = underlying
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, with the constructor parameter being a public field, this construct is now completely superfluous.
Or is there some benefit that I am not seeing?

Scala 2.11.x + uses a proper case class
Now using sbt-cross-project
Moved private[ast] to the whole constructor
Removed underlying (not needed with move of private[ast])
@mdedetrich
Copy link
Owner Author

@sirthias @ktoso @Ichoran New PR is up. Lot of stuff has changed, have removed underlying (its now pointless) and there is a build that is specific only for Scala 2.10.

@sirthias I have provided an apply for JNumber("1"): Option[JNumber] but it only works for JVM Scala 2.11.x and Scala 2.12.x (doesn't work for Scala.js, nor does it work for Scala 2.10.x). For this reason, there is a JNumber.fromString method that works on all platforms and all Scala versions

*/
// Due to a restriction in Scala 2.10, we can't use a proper case class along with
// a private constructor, so we have to simulate a case class with a normal class
final class JNumber private[ast] (val value: String) extends JValue {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a copy method?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pickup, I will add a copy method

* @author Matthew de Detrich
*/
// Due to a restriction in Scala 2.10, we can't use a proper case class along with
// a private constructor, so we have to simulate a case class with a normal class
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note and for the record: This is not correct. Even in Scala 2.10 case classes can have a private constructor. The issue is that we cannot override/replace the default apply method generated by the compiler even when the constructor itself is marked private, which kind of defeats the purpose of marking the constructor private in the first place.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worded it wrongly, will update the documentation now.

* @author Rex Kerr
* @see https://github.com/Ichoran/kse/blob/master/src/main/scala/jsonal/Jast.scala#L683-L917
*/
private[ast] def numericStringEquals(a: String, b: String): Boolean = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is crazy!
What's the purpose of this special numericStringEquals?

I think what is missing from the whole effort is a clear statement as to what semantic level the scalajson AST is supposed to target.
The decision of having JNumber wrap a String leads one to believe that the AST is targeting the lowest, closest-to-the-source, level, which can be a completely valid approach.
But then I'd expect that JNumber("0") != JNumber("-0")!

This here appears to try to turn JNumber into a hybrid: Represent the lowest semantic level but try to also uphold some kind of higher-level guarantees. IMHO this is not only confusing but also a setup for weird and hard to understand effects down the road.

We should make a clear decision as to what we want the AST to represent and then model exactly that, not more and not less, with a consistent and no-magic approach.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numericStringEquals is designed to test for actual number equality however there is clearly a bug in it (@Ichoran are you able to help here)?

This is something that people asked for, i.e. they expect that JNumber does proper number equality. So yes its true that this is a weird hybrid, but the reasoning for using String in the first place is because no unbounded real number exists in Scala (else we would be using that which would implement its own proper number equality)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People ask for (and will ask for) many things.
The hard part of this effort is to know which requests to fulfill and which to deny.
Fulfilling all requests is a sure path to disaster.

IMHO there are two choices:

  • Make JNumber represent a parsed number, i.e. lift the level of abstraction from the character level to some higher number-representing data type.
  • Make JNumber a character representation of a number, i.e. without actually raising the level of abstraction. Rather: Merely group a bunch of characters and mark them as representing a number.

From reading the source one gets the impression that scalajson is doing the latter and not the former. The former is possible but a bit more involved. Circe goes down this route and is a good example of how it can be done. Unfortunately this requires the introduction of another number type (BiggerDecimal in circe's case) since there is no readily available one that can properly represent all JSON numbers.

My impression was that scalajson chose to go down the "stay low-level" path exactly to circumvent these kinds of issues. The consequence would have to be that JNumber("0") != JNumber("-0"), which is not only expected from the type setup, it is also consistent with the semantic approach.

I really see no value in going for a strange semantic mix. How would that be useful?
When would I want to rely on character-level equality when I really want some kind of higher-level equality?

The question of whether it's even feasible to properly do equality on arbitrary JSON numbers is a second issue. Is there a proper testsuite for the currently implemented "magic" equality implementation?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait for @Ichoran to chime in, I think there was a reason behind it. To be frank, I have zero issues in removing numericStringEquals is it means less work for me to maintain.

@mdedetrich mdedetrich mentioned this pull request Jul 4, 2017
@mdedetrich mdedetrich force-pushed the safeJNumberConstructor branch from 07bc5e9 to 1baab54 Compare July 5, 2017 08:45
@mdedetrich
Copy link
Owner Author

mdedetrich commented Jul 5, 2017

@dwijnand @eed3si9n @ktoso @sirthias @Ichoran @gmethvin I am going to release 1.0.0-M4 now and merge this into master.

@sirthias I have created an issue at #36 for your concern regarding numericStringEquals if you want to talk about it there.

Also the name based extractor optimizations no longer apply, named based extractors only came into Scala 2.11.x, so its irrelevant in the scala210 branch. In the normal branch we use standard case classes which should automatically generate named based extractors by default

@mdedetrich mdedetrich merged commit ba3f4ef into master Jul 5, 2017
@mdedetrich mdedetrich deleted the safeJNumberConstructor branch July 5, 2017 11:55
@mdedetrich mdedetrich mentioned this pull request Jul 5, 2017
mdedetrich added a commit that referenced this pull request Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants