Skip to content

Add more allowed return values for relaxed trunc unsigned #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

ngzhian
Copy link
Member

@ngzhian ngzhian commented Jun 5, 2023

These values show up due to faster algorithms used for relaxed trunc unsigned when hardware does not support single-instruction lowering.

See #140 for more discussions.

@ngzhian
Copy link
Member Author

ngzhian commented Jun 5, 2023

@yurydelendik @Maratyszcza

@yurydelendik
Copy link
Contributor

yurydelendik commented Jun 15, 2023

Nice. The 0x80000000 and 0xfffffffe are now included.

Thanks.

Copy link
Member

@dtig dtig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return values lgtm, could you drop a link for where the either/or syntax was introduced? I remember looking at it briefly, but don't remember where it wasm.

@ngzhian
Copy link
Member Author

ngzhian commented Jun 26, 2023

either/or is a construct introduced by the threads proposal
https://github.com/WebAssembly/threads/blob/main/interpreter/text/lexer.mll#L455
would you prefer to have a comment in the tests referencing the threads proposal?

@dtig
Copy link
Member

dtig commented Jun 27, 2023

Thanks yeah, please add it so its easier to find.

@ngzhian ngzhian merged commit 70f8687 into WebAssembly:main Jun 30, 2023
@ngzhian ngzhian deleted the relaxed-trunc branch June 30, 2023 05:44
@rossberg
Copy link
Member

rossberg commented Nov 1, 2024

This PR was landed without landing #140, which updates the test suite. Moreover, this PR does not actually match the changes to the test suite in #140, see my comments there. The overview has never been updated either.

I'm a bit concerned about all this. The relaxation spec'ed here looks very ad-hoc and seems to cover some rather arbitrary implementation artefacts. It's not clear why these should be simultaneously okay and also exhaustive, or what the driving principle is. At the same time, the test suite seems to be used without the updated tests, so neither does this relaxation seem essential, nor has it been tested in the wild.

As a result, spec and tests are simply inconsistent at the moment, while it's unclear which version is more relevant. That suggests that more discussion is needed, and perhaps this spec change should be rolled back unless that is properly resolved?

@dtig
Copy link
Member

dtig commented Nov 1, 2024

This PR was landed without landing #140, which updates the test suite. Moreover, this PR does not actually match the changes to the test suite in #140, see my comments there. The overview has never been updated either.

I can take a look and update the overview.

I'm a bit concerned about all this. The relaxation spec'ed here looks very ad-hoc and seems to cover some rather arbitrary implementation artefacts. It's not clear why these should be simultaneously okay and also exhaustive, or what the driving principle is. At the same time, the test suite seems to be used without the updated tests, so neither does this relaxation seem essential, nor has it been tested in the wild.

Because neither the threads proposal (where the syntax was used) and nor this proposal were merged previously, so engines have to do something custom to support running these tests anyway. The thing that happened though is that the threads proposal no longer uses the 'EITHER' syntax, and this was definitely wonky to work around to begin with. The spec tests as they are likely need to be refactored to be more intuitive to work with.

As a result, spec and tests are simply inconsistent at the moment, while it's unclear which version is more relevant. That suggests that more discussion is needed, and perhaps this spec change should be rolled back unless that is properly resolved?

Perhaps we could resolve the open bits or pair down the tests in #144 to just cover what's represented in the spec change and then merge that one instead? cc@yurydelendik

@rossberg
Copy link
Member

rossberg commented Nov 2, 2024

@dtig, that sounds okay, but I still wonder whether there should be a more thorough discussion. These changes appear to have happened rather last-minute, without much feedback from even the subgroup. They almost look like just waving through whatever artefacts a couple of hand-selected implementations happened to produce first.

If we consider the additional changes from #140, then the set of behaviours already becomes so numerous and selectively ad-hoc that it would be cleaner and more future-proof to simply say that the result for the edge cases of the trunc instructions could be anything, i.e., is plain non-deterministic. Is there a practical reason why that would not be sufficient?

Re the either syntax: Can you elaborate what the problems were, so I can address them? On the wasm-3.0 branch I hope it works fine, I recently fixed a couple of bugs in its translation to JS, in case that was an issue. The state of the threads proposal is a whole different story. :)

@dtig
Copy link
Member

dtig commented Nov 5, 2024

@dtig, that sounds okay, but I still wonder whether there should be a more thorough discussion. These changes appear to have happened rather last-minute, without much feedback from even the subgroup. They almost look like just waving through whatever artefacts a couple of hand-selected implementations happened to produce first.

Thinking back, the reason I personally thought this was fine to merge was (a) proposal is niche enough that we don't expect several implementations to want to target it. (b) this proposal introduces an inherent tension between the specification and the implementations because the ideal thing for the specification is to underspecify in case of diverging behaviors, but implementations would want to be consistent with each other for portability of applications.

If we consider the additional changes from #140, then the set of behaviours already becomes so numerous and selectively ad-hoc that it would be cleaner and more future-proof to simply say that the result for the edge cases of the trunc instructions could be anything, i.e., is plain non-deterministic. Is there a practical reason why that would not be sufficient?

The practical reason is specific to browser VMs (and maybe can be extended to other implementations too?), and is that they need to be interoperable with each other, so even if the spec is underspecified, implementations should still have the same results on the same hardware. Now whether these tests should be in the spec repository is something we could discuss but we definitely need tests to hedge against generating the wrong (but still allowed by the spec) codegen on a particular hardware.

Re the either syntax: Can you elaborate what the problems were, so I can address them? On the wasm-3.0 branch I hope it works fine, I recently fixed a couple of bugs in its translation to JS, in case that was an issue. The state of the threads proposal is a whole different story. :)

Oh the wonky bit was that the tests imply that the set of allowable values is smaller than it actually is whicch probably is another reinforcing factor for having a broader discussion about what should live in the spec tests. I also remember this not working out of the box, but that might have been because there are slight differences between V8's infra for running spec tests - I can try again with the Wasm 3.0 branch and report back.

@rossberg
Copy link
Member

rossberg commented Nov 6, 2024

@dtig, hm, I'm afraid I'm even more confused now.

Thinking back, the reason I personally thought this was fine to merge was (a) proposal is niche enough that we don't expect several implementations to want to target it.

Once standardised, everybody has to implement a feature according to the specification. How does it matter that it's niche? Are you implying that relaxed SIMD is too niche to be standardised?

(b) this proposal introduces an inherent tension between the specification and the implementations because the ideal thing for the specification is to underspecify in case of diverging behaviors, but implementations would want to be consistent with each other for portability of applications.

What is the purpose of having a specification and standard, other than defining what can be relied upon across all implementations?

If there is a tension, then either (a) the requirement is not realistic, or (b) its specification is not adequate, or (c) implementations are lazy. Either way, something has to be done about it.

The practical reason is specific to browser VMs (and maybe can be extended to other implementations too?), and is that they need to be interoperable with each other

Why would you say that this is only relevant to web engines? Portability is a — if not the — central value proposition of Wasm, and should be considered equally relevant to other ecosystems.

so even if the spec is underspecified, implementations should still have the same results on the same hardware. Now whether these tests should be in the spec repository is something we could discuss but we definitely need tests to hedge against generating the wrong (but still allowed by the spec) codegen on a particular hardware.

It did not occur to me that cross-implementation consistency of relaxed behaviour was an intended requirement. The overview literally says it "depends on the implementation", which suggests the contrary. In fact, my recollection is that we concluded that you cannot even rely on the same implementation producing the same results between multiple runs. And indeed, it's unclear how "hardware-defined" behaviour could even be specified (what is "equal" hardware?), or enforced with tests, or make sense at all given practices like virtualisation.

Regardless, the current spec and the tests don't achieve that either. AFAICS, enumerating all known behaviours doesn't contribute to addressing this requirement. So in that regard, nothing changes, no matter whether spec and tests allowed 4, 8 or 2^128 edge case behaviours.

@dtig
Copy link
Member

dtig commented Nov 8, 2024

We're talking past each other, this is likely a much easier over a high bandwidth discussion. There's a SIMD sungroup meeting scheduled on Nov 22nd, would that work for you?

@dtig, hm, I'm afraid I'm even more confused now.

Thinking back, the reason I personally thought this was fine to merge was (a) proposal is niche enough that we don't expect several implementations to want to target it.

Once standardised, everybody has to implement a feature according to the specification. How does it matter that it's niche? Are you implying that relaxed SIMD is too niche to be standardised?

I'm not implying that relaxed SIMD is too niche to be standardized - I'm implying that it's possible that not all implementations will care about the relaxed semantics or the proposal and fall back to the strict semantics if needed.

(b) this proposal introduces an inherent tension between the specification and the implementations because the ideal thing for the specification is to underspecify in case of diverging behaviors, but implementations would want to be consistent with each other for portability of applications.

What is the purpose of having a specification and standard, other than defining what can be relied upon across all implementations?

If there is a tension, then either (a) the requirement is not realistic, or (b) its specification is not adequate, or (c) implementations are lazy. Either way, something has to be done about it.

The specification yes, but I thought this discussion was focused on the tests - or are you considering them both to be the same? My opinion was always that (b) the specification isn't adequate to explain the nuances of hardware support for this proposal. But after several CG discussions, the vote was in favor of under specifying relaxed operations and I'm unwilling to participate in relitigating that particular discussion.

The practical reason is specific to browser VMs (and maybe can be extended to other implementations too?), and is that they need to be interoperable with each other

Why would you say that this is only relevant to web engines? Portability is a — if not the — central value proposition of Wasm, and should be considered equally relevant to other ecosystems.

Applications will still be portable, because the hardware edge cases aren't something applications rely on (which is also what the goal of this proposal is to relax). But implementations do care about generating the same code on the same hardware.

so even if the spec is underspecified, implementations should still have the same results on the same hardware. Now whether these tests should be in the spec repository is something we could discuss but we definitely need tests to hedge against generating the wrong (but still allowed by the spec) codegen on a particular hardware.

It did not occur to me that cross-implementation consistency of relaxed behaviour was an intended requirement. The overview literally says it "depends on the implementation", which suggests the contrary. In fact, my recollection is that we concluded that you cannot even rely on the same implementation producing the same results between multiple runs. And indeed, it's unclear how "hardware-defined" behaviour could even be specified (what is "equal" hardware?), or enforced with tests, or make sense at all given practices like virtualisation.

Regardless, the current spec and the tests don't achieve that either. AFAICS, enumerating all known behaviours doesn't contribute to addressing this requirement. So in that regard, nothing changes, no matter whether spec and tests allowed 4, 8 or 2^128 edge case behaviours.

@rossberg
Copy link
Member

rossberg commented Nov 8, 2024

I am talking about both tests and specification, and their relation to each other and to implementations.

The problem isn't that the spec is underspecifying, but that it apparently is not underspecifying enough. It is a bug (in either spec or implementations) if the spec allows less than what implementations do in the wild. But judging from #140, such an inconsistency is exactly the situation we are in right now: implementations do something that is not covered by the spec, even after this PR. And that needs fixing.

Hence, either we are willing to force implementations to comply with what is actually specified right now, or we need to relax the spec further. I would expect you'd rather do the latter.

That's why I suggested to conservatively make it ultimately relaxed by just wholly under-specifying the result for the trunc edge cases in question. Because real-world behaviour appears to be diverse enough that any attempt to enumerate all possibilities will never be useful nor exhaustive wrt to possible future implementations (and further relaxing it in the future to bless new implementations would technically be a breaking change, so isn't an option).

One nuance here is that the JS/web specs can impose additional restrictions, and consequently, JS/web tests may reject more behaviours than the core tests. But they can never allow more, as that would obviously be a violation of the core standard. So the kind of solution you seem to be envisioning with moving the more relaxed tests elsewhere does not fix the situation.

There's a SIMD sungroup meeting scheduled on Nov 22nd, would that work for you?

Hm, that's Friday night over here, I'm afraid that doesn't work for me. But I could make any other day.

@ppenzin
Copy link

ppenzin commented Nov 27, 2024

@rossberg we can probably find a slot that works if you want to chat.

Since we are talking about NaNs passed to integer conversion ops, I think relaxing further is probably a reasonable way forward.

@dtig
Copy link
Member

dtig commented Mar 12, 2025

Chatted with @rossberg offline, and we landed on:

  • For relaxed cases where the hardware differences exceed 2-4 well defined cases, the core spec and the tests should under specify/test to be maximally inclusive of future implementations (ex: potential new hardware backend with slightly different semantics)
  • Web based implementations can and should still test edge cases but that can live elsewhere which we could decide - WPT tests are an option
  • one thing that would be good to discuss though is how to document the nuances of the special cases. These could be in notes for the instruction, an appendix, or some user facing documentation like the intrinsics documentation are also a possibility. If the next edition of the subgroup meeting has slots available, I'd like to add this to the agenda so we can resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants