Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8353500: [s390x] Intrinsify Unsafe::setMemory #24480

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

offamitkumar
Copy link
Member

@offamitkumar offamitkumar commented Apr 7, 2025

Unsafe::setMemory intrinsic implementation for s390x.

Stub Code:

StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes)
--------------------------------------------------------------------------------
  0x000003ffb04b63c0:   ogrk	%r1,%r2,%r3
  0x000003ffb04b63c4:   nill	%r1,7
  0x000003ffb04b63c8:   je	0x000003ffb04b6410
  0x000003ffb04b63cc:   nill	%r1,3
  0x000003ffb04b63d0:   je	0x000003ffb04b6460
  0x000003ffb04b63d4:   nill	%r1,1
  0x000003ffb04b63d8:   jlh	0x000003ffb04b64a0
  0x000003ffb04b63dc:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b63e2:   risbgz	%r1,%r3,32,63,62
  0x000003ffb04b63e8:   je	0x000003ffb04b6402
  0x000003ffb04b63ec:   nopr
  0x000003ffb04b63ee:   nopr
  0x000003ffb04b63f0:   sth	%r4,0(%r2)
  0x000003ffb04b63f4:   sth	%r4,2(%r2)
  0x000003ffb04b63f8:   agfi	%r2,4
  0x000003ffb04b63fe:   brct	%r1,0x000003ffb04b63f0
  0x000003ffb04b6402:   nilf	%r3,2
  0x000003ffb04b6408:   ber	%r14
  0x000003ffb04b640a:   sth	%r4,0(%r2)
  0x000003ffb04b640e:   br	%r14
  0x000003ffb04b6410:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b6416:   risbg	%r4,%r4,32,47,16
  0x000003ffb04b641c:   risbg	%r4,%r4,0,31,32
  0x000003ffb04b6422:   risbgz	%r1,%r3,32,63,60
  0x000003ffb04b6428:   je	0x000003ffb04b6446
  0x000003ffb04b642c:   nopr
  0x000003ffb04b642e:   nopr
  0x000003ffb04b6430:   stg	%r4,0(%r2)
  0x000003ffb04b6436:   stg	%r4,8(%r2)
  0x000003ffb04b643c:   agfi	%r2,16
  0x000003ffb04b6442:   brct	%r1,0x000003ffb04b6430
  0x000003ffb04b6446:   nilf	%r3,8
  0x000003ffb04b644c:   ber	%r14
  0x000003ffb04b644e:   stg	%r4,0(%r2)
  0x000003ffb04b6454:   br	%r14
  0x000003ffb04b6456:   nopr
  0x000003ffb04b6458:   nopr
  0x000003ffb04b645a:   nopr
  0x000003ffb04b645c:   nopr
  0x000003ffb04b645e:   nopr
  0x000003ffb04b6460:   risbg	%r4,%r4,48,55,8
  0x000003ffb04b6466:   risbg	%r4,%r4,32,47,16
  0x000003ffb04b646c:   risbgz	%r1,%r3,32,63,61
  0x000003ffb04b6472:   je	0x000003ffb04b6492
  0x000003ffb04b6476:   nopr
  0x000003ffb04b6478:   nopr
  0x000003ffb04b647a:   nopr
  0x000003ffb04b647c:   nopr
  0x000003ffb04b647e:   nopr
  0x000003ffb04b6480:   st	%r4,0(%r2)
  0x000003ffb04b6484:   st	%r4,4(%r2)
  0x000003ffb04b6488:   agfi	%r2,8
  0x000003ffb04b648e:   brct	%r1,0x000003ffb04b6480
  0x000003ffb04b6492:   nilf	%r3,4
  0x000003ffb04b6498:   ber	%r14
  0x000003ffb04b649a:   st	%r4,0(%r2)
  0x000003ffb04b649e:   br	%r14
  0x000003ffb04b64a0:   risbgz	%r1,%r3,32,63,63
  0x000003ffb04b64a6:   je	0x000003ffb04b64c2
  0x000003ffb04b64aa:   nopr
  0x000003ffb04b64ac:   nopr
  0x000003ffb04b64ae:   nopr
  0x000003ffb04b64b0:   stc	%r4,0(%r2)
  0x000003ffb04b64b4:   stc	%r4,1(%r2)
  0x000003ffb04b64b8:   agfi	%r2,2
  0x000003ffb04b64be:   brct	%r1,0x000003ffb04b64b0
  0x000003ffb04b64c2:   nilf	%r3,1
  0x000003ffb04b64c8:   ber	%r14
  0x000003ffb04b64ca:   stc	%r4,0(%r2)
  0x000003ffb04b64ce:   br	%r14

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353500: [s390x] Intrinsify Unsafe::setMemory (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480
$ git checkout pull/24480

Update a local copy of the PR:
$ git checkout pull/24480
$ git pull https://git.openjdk.org/jdk.git pull/24480/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24480

View PR using the GUI difftool:
$ git pr show -t 24480

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24480.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 7, 2025

👋 Welcome back amitkumar! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 7, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8353500 8353500: [s390x] Intrinsify Unsafe::setMemory Apr 7, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 7, 2025
@openjdk
Copy link

openjdk bot commented Apr 7, 2025

@offamitkumar The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Apr 7, 2025

Webrevs

@offamitkumar
Copy link
Member Author

with patch:

with the patch: 

Benchmark                       (aligned)  (size)  Mode  Cnt   Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30   2.351 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30   2.655 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30   2.614 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30   2.783 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30   2.760 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30   2.891 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30   2.697 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30   2.769 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30   3.689 ± 0.016  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30   3.127 ± 0.009  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  15.900 ± 0.046  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30   4.140 ± 0.057  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  53.748 ± 0.872  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30   9.245 ± 0.013  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30   2.346 ± 0.020  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30   2.647 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30   2.617 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30   2.786 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30   2.755 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30   2.892 ± 0.005  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30   2.699 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30   2.765 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30   3.691 ± 0.015  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30   3.175 ± 0.053  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  15.892 ± 0.028  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  15.122 ± 0.347  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  53.588 ± 0.315  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  52.775 ± 0.169  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30   2.333 ± 0.216  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30   1.878 ± 0.092  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30   2.301 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30   2.400 ± 0.201  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30   2.666 ± 0.052  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30   2.209 ± 0.084  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30   3.086 ± 0.009  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30   2.294 ± 0.217  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30   4.631 ± 0.013  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30   2.164 ± 0.124  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  13.959 ± 0.042  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30   3.078 ± 0.211  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  51.435 ± 0.712  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30   7.879 ± 0.140  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30   2.486 ± 0.169  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30   2.163 ± 0.065  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30   2.307 ± 0.011  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30   2.489 ± 0.121  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30   2.653 ± 0.025  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30   2.830 ± 0.161  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30   3.086 ± 0.008  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30   3.124 ± 0.189  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30   4.634 ± 0.015  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30   4.552 ± 0.194  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  13.977 ± 0.031  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  14.310 ± 0.177  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  52.244 ± 1.414  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  53.824 ± 0.580  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

without patch:

Benchmark                       (aligned)  (size)  Mode  Cnt   Score   Error  Units
MemorySegmentZeroUnsafe.panama       true       1  avgt   30   2.368 ± 0.029  ns/op
MemorySegmentZeroUnsafe.panama       true       2  avgt   30   2.647 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       3  avgt   30   2.615 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true       4  avgt   30   2.782 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true       5  avgt   30   2.760 ± 0.014  ns/op
MemorySegmentZeroUnsafe.panama       true       6  avgt   30   2.889 ± 0.003  ns/op
MemorySegmentZeroUnsafe.panama       true       7  avgt   30   2.702 ± 0.017  ns/op
MemorySegmentZeroUnsafe.panama       true       8  avgt   30   2.766 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama       true      15  avgt   30   3.748 ± 0.045  ns/op
MemorySegmentZeroUnsafe.panama       true      16  avgt   30   3.122 ± 0.007  ns/op
MemorySegmentZeroUnsafe.panama       true      63  avgt   30  24.901 ± 0.106  ns/op
MemorySegmentZeroUnsafe.panama       true      64  avgt   30  20.841 ± 0.154  ns/op
MemorySegmentZeroUnsafe.panama       true     255  avgt   30  24.498 ± 0.233  ns/op
MemorySegmentZeroUnsafe.panama       true     256  avgt   30  24.290 ± 0.050  ns/op
MemorySegmentZeroUnsafe.panama      false       1  avgt   30   2.345 ± 0.012  ns/op
MemorySegmentZeroUnsafe.panama      false       2  avgt   30   2.648 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       3  avgt   30   2.619 ± 0.008  ns/op
MemorySegmentZeroUnsafe.panama      false       4  avgt   30   2.784 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       5  avgt   30   2.756 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false       6  avgt   30   2.892 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false       7  avgt   30   2.702 ± 0.011  ns/op
MemorySegmentZeroUnsafe.panama      false       8  avgt   30   2.765 ± 0.004  ns/op
MemorySegmentZeroUnsafe.panama      false      15  avgt   30   3.702 ± 0.006  ns/op
MemorySegmentZeroUnsafe.panama      false      16  avgt   30   3.121 ± 0.010  ns/op
MemorySegmentZeroUnsafe.panama      false      63  avgt   30  25.130 ± 0.058  ns/op
MemorySegmentZeroUnsafe.panama      false      64  avgt   30  24.891 ± 0.128  ns/op
MemorySegmentZeroUnsafe.panama      false     255  avgt   30  24.385 ± 0.061  ns/op
MemorySegmentZeroUnsafe.panama      false     256  avgt   30  24.444 ± 0.076  ns/op
MemorySegmentZeroUnsafe.unsafe       true       1  avgt   30  19.611 ± 0.495  ns/op
MemorySegmentZeroUnsafe.unsafe       true       2  avgt   30  18.797 ± 0.126  ns/op
MemorySegmentZeroUnsafe.unsafe       true       3  avgt   30  22.808 ± 0.075  ns/op
MemorySegmentZeroUnsafe.unsafe       true       4  avgt   30  18.797 ± 0.047  ns/op
MemorySegmentZeroUnsafe.unsafe       true       5  avgt   30  22.934 ± 0.114  ns/op
MemorySegmentZeroUnsafe.unsafe       true       6  avgt   30  19.580 ± 0.061  ns/op
MemorySegmentZeroUnsafe.unsafe       true       7  avgt   30  22.798 ± 0.063  ns/op
MemorySegmentZeroUnsafe.unsafe       true       8  avgt   30  18.029 ± 0.689  ns/op
MemorySegmentZeroUnsafe.unsafe       true      15  avgt   30  22.736 ± 0.034  ns/op
MemorySegmentZeroUnsafe.unsafe       true      16  avgt   30  17.799 ± 0.276  ns/op
MemorySegmentZeroUnsafe.unsafe       true      63  avgt   30  22.777 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe       true      64  avgt   30  19.271 ± 0.017  ns/op
MemorySegmentZeroUnsafe.unsafe       true     255  avgt   30  22.758 ± 0.068  ns/op
MemorySegmentZeroUnsafe.unsafe       true     256  avgt   30  22.752 ± 0.057  ns/op
MemorySegmentZeroUnsafe.unsafe      false       1  avgt   30  19.115 ± 0.069  ns/op
MemorySegmentZeroUnsafe.unsafe      false       2  avgt   30  22.795 ± 0.067  ns/op
MemorySegmentZeroUnsafe.unsafe      false       3  avgt   30  22.754 ± 0.057  ns/op
MemorySegmentZeroUnsafe.unsafe      false       4  avgt   30  22.797 ± 0.064  ns/op
MemorySegmentZeroUnsafe.unsafe      false       5  avgt   30  22.803 ± 0.078  ns/op
MemorySegmentZeroUnsafe.unsafe      false       6  avgt   30  22.738 ± 0.044  ns/op
MemorySegmentZeroUnsafe.unsafe      false       7  avgt   30  22.815 ± 0.074  ns/op
MemorySegmentZeroUnsafe.unsafe      false       8  avgt   30  22.732 ± 0.026  ns/op
MemorySegmentZeroUnsafe.unsafe      false      15  avgt   30  22.754 ± 0.063  ns/op
MemorySegmentZeroUnsafe.unsafe      false      16  avgt   30  22.743 ± 0.042  ns/op
MemorySegmentZeroUnsafe.unsafe      false      63  avgt   30  23.250 ± 1.193  ns/op
MemorySegmentZeroUnsafe.unsafe      false      64  avgt   30  22.838 ± 0.182  ns/op
MemorySegmentZeroUnsafe.unsafe      false     255  avgt   30  22.748 ± 0.033  ns/op
MemorySegmentZeroUnsafe.unsafe      false     256  avgt   30  22.740 ± 0.039  ns/op
Finished running test 'micro:java.lang.foreign.MemorySegmentZeroUnsafe'

__ z_risbg(tmp, size, 32, 128/* risbgz */ + 63, 64 - exact_log2(2 * elem_size), 0); // just do the right shift and set cc
__ z_bre(L_Tail);

__ align(16); // loop alignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align(32) would be more helpful:

  • instruction engine fetches octoword (32 bytes) bundles.
  • Tight loop is < 32 byes -> all in one bundle, does not cross cache line boundary.

// multiple of 2
do_setmemory_atomic_loop(2, dest, size, byteVal, _masm);

__ align(16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this alignment good for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch target alignment. There is no fallthrough path from before this point. Should it be 32?

__ z_ogrk(rScratch1, dest, size);

__ z_nill(rScratch1, 7);
__ z_bre(L_fill8Bytes); // branch if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls use z_braz() to reflect check semantics



__ z_nill(rScratch1, 3);
__ z_bre(L_fill4Bytes); // branch if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

__ z_bre(L_fill4Bytes); // branch if 0

__ z_nill(rScratch1, 1);
__ z_brne(L_fillBytes); // branch if not 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls use z_brnaz() to reflect check semantics

@@ -415,10 +415,13 @@ inline void Assembler::z_rosbg( Register r1, Register r2, int64_t spos3, int64_t
}
inline void Assembler::z_risbg( Register r1, Register r2, int64_t spos3, int64_t epos4, int64_t nrot5, bool zero_rest) { // Rotate then INS selected bits. -- z196
const int64_t len = 48;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are not necessary if bool zero_rest is used to control what happens to untouched destination bits.

// inc_counter_np(SharedRuntime::_unsafe_set_memory_ctr);

{
NearLabel L_fill8Bytes, L_fill4Bytes, L_fillBytes, L_exit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused Label: L_exit

__ bind(L_Loop);
__ store_sized_value(byteVal, Address(dest, 0), elem_size);
__ store_sized_value(byteVal, Address(dest, elem_size), elem_size);
__ z_agfi(dest, 2 * elem_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not aghi?

@TheRealMDoerr
Copy link
Contributor

Since this is taken from #24254: Maybe you can review that one, too?

@offamitkumar offamitkumar marked this pull request as draft April 8, 2025 10:05
@openjdk openjdk bot removed the rfr Pull request is ready for review label Apr 8, 2025
Register tmp = Z_R1; // R1 is free at this point

if (elem_size > 1) {
__ rotate_then_insert(byteVal, byteVal, 64 - 2 * 8 , 63 - 8, 8, 0);
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last argument seems to be a bool. The value should better be false.


__ z_nill(rScratch1, 7);
__ z_braz(L_fill8Bytes); // branch if 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants