-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8359965: Enable paired pushp and popp instruction usage for APX enabled CPUs #25889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back sparasa! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@vamsi-parasa The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive-by comment as this isn't code I normally have much to do with but to me it would look a lot cleaner to define push_paired
/pop_paired
(maybe abbreviating directly to pushp
/popp
?) rather than passing the boolean.
Hi David (@dholmes-ora), Thanks for the suggestion! Thanks, |
Like @dholmes-ora, I also prefer a new function (in MacroAssembler) instead of flags. Though I like the names The shorter PS: |
@@ -795,6 +795,22 @@ void MacroAssembler::pop_d(XMMRegister r) { | |||
addptr(rsp, 2 * Interpreter::stackElementSize); | |||
} | |||
|
|||
void MacroAssembler::push(Register src, bool is_pair) { | |||
if (is_pair && VM_Version::supports_apx_f()) { | |||
pushp(src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does is_pair signify here ? You are just pushing one register. Do you intend to use has_matching_pop ?
} | ||
|
||
void MacroAssembler::pop(Register dst, bool is_pair) { | ||
if (is_pair && VM_Version::supports_apx_f()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, new argument suggestion: please use has_matching_push.
I understand your purpose here is to delegate the responsibility of balancing of PPX pair to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker in the stub snippets using push/pop instruction sequence and wrap the actual assembler call underneath. The idea here is to catch the balancing error upfront as PPX is purely a performance hint. Instructions with this hint have the same functional semantics as those without. PPX hints set by the compiler that violate the balancing rule may turn off the PPX
optimization, but they will not affect program semantics..
class APXPushPopPairTracker {
private:
int _counter;
public:
APXPushPopPairTracker() _counter(0) {
}
~APXPushPopPairTracker() {
assert(_counter == 0, "Push/pop pair mismatch");
}
void push(Register reg, bool has_matching_pop) {
if (has_matching_pop && VM_Version::supports_apx_f()) {
Assembler::pushp(reg);
incrementCounter();
} else {
Assembler::push(reg);
}
}
void pop(Register reg, bool has_matching_push) {
if (has_matching_push && VM_Version::supports_apx_f()) {
Assembler::popp(reg);
decrementCounter();
} else {
Assembler::pop(reg);
}
}
void incrementCounter() {
_counter++;
}
void decrementCounter() {
_counter--;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a cleaner interface, I think we can also maintain a RAII style APXPushPopPairTracker ...
Using the suggested code as a base, Vamsi and I tinkered with the idea some more! Here is what we came up with. This also tracks the correct order of registers being pushed/poped.. (haven't compiled it, so might have some syntax bugs).
@dholmes-ora would you mind sharing your opinion? We seem to be making things more complicated, but hopefully in a good way?
Also included a sample usage in a stub.
#define __ _masm->
class PushPopTracker {
private:
int _counter;
MacroAssembler *_masm;
const int REGS = 32; // Increase as needed
int regs[REGS];
public:
PushPopTracker(MacroAssembler *_masm) : _counter(0), _masm(_masm) {}
~PushPopTracker() {
assert(_counter == 0, "Push/pop pair mismatch");
}
void push(Register reg) {
assert(_counter<REGS, "Push/pop overflow");
regs[_counter++] = reg.encoding();
if (VM_Version::supports_apx_f()) {
__ pushp(reg);
} else {
__ push(reg);
}
}
void pop(Register reg) {
assert(_counter>0, "Push/pop underflow");
assert(regs[_counter] == reg.encoding(), "Push/pop pair mismatch: %d != %d", regs[_counter], reg.encoding());
_counter--;
if (VM_Version::supports_apx_f()) {
__ popp(reg);
} else {
__ pop(reg);
}
}
}
address StubGenerator::generate_intpoly_montgomeryMult_P256() {
__ align(CodeEntryAlignment);
/*...*/
address start = __ pc();
__ enter();
PushPopTracker s(_masm);
s.push(r12); //1
s.push(r13); //2
s.push(r14); //3
#ifdef _WIN64
s.push(rsi); //4
s.push(rdi); //5
#endif
s.push(rbp); //6
__ movq(rbp, rsp);
__ andq(rsp, -32);
__ subptr(rsp, 32);
// Register Map
const Register aLimbs = c_rarg0; // c_rarg0: rdi | rcx
const Register bLimbs = rsi; // c_rarg1: rsi | rdx
const Register rLimbs = r8; // c_rarg2: rdx | r8
const Register tmp1 = r9;
const Register tmp2 = r10;
/*...*/
__ movq(rsp, rbp);
s.pop(rbp); //5
#ifdef _WIN64
s.pop(rdi); //4
s.pop(rsi); //3
#endif
s.pop(r14); //2
s.pop(r13); //1
s.pop(r12); //0
__ leave();
__ ret(0);
return start;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vamsi-parasa, It's better to make this as a subclass of MacroAssembler in src/hotspot/cpu/x86/macroAssembler_x86.hpp and pass Tracker as an argument to push / pop for a cleaner interface.
I don't think its possible? Unless I am missing something..
- Subclass has an instance of the base class (i.e. the memory allocation of
PushPopTracker
would have theMacroAssembler
base class with extra fields appended); andMacroAssembler
has already been allocated (i.e. you can't tack on more fields onto the end of the underlying memory of existingMacroAssembler
..) - If its a subclass, there is no reason to pass it as a parameter, because it already would have the parent's instance? Also, the extra parameter to push/pop (flag) was what I had originally objected to? (i.e. would like for push/pop to still just take one register as a parameter..)
- This class is sort of a stripped-down implementation of reference counting; we want the stack-allocated variable (i.e. explicit constructor call) and the implicit destructor calls (i.e. inserted by g++ on all function exits). That is, we must have a stack allocated variable for it to be deallocated (and destructor called for assert check)
Here is an attempt to make it a subclass? And sample usage...
class PushPopTracker : public MacroAssembler {
private:
int _counter;
const int REGS = 32; // Increase as needed
int regs[REGS];
public:
// MacroAssembler(CodeBuffer* code) is the only constructor?
PushPopTracker() : _counter(0), MacroAssembler(???) {} //FIXME???
~PushPopTracker() {
assert(_counter == 0, "Push/pop pair mismatch");
}
void push(Register reg) {
assert(_counter<REGS, "Push/pop overflow");
regs[_counter++] = reg.encoding();
if (VM_Version::supports_apx_f()) {
Assembler::pushp(reg);
} else {
Assembler::push(reg);
}
}
/*...*/
}
address StubGenerator::generate_intpoly_montgomeryMult_P256() {
__ align(CodeEntryAlignment);
/*...*/
address start = __ pc();
__ enter();
PushPopTracker s(???); //FIXME?
s.push(r12, /* Extra parm? */); //1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jatin (@jatin-bhateja) and Vlad (@vpaprotsk),
There's one more issue to be considered. The C++ PushPopTracker code will be run during the stub generation time. There are code bocks which do a single push onto the stack but due to multiple exit paths, there will be multiple pops as illustrated below. Will this reference counting approach not fail in such a scenario as the stub code is generated all at once during the stub generation phase?
#begin stack frame
push(r21)
#exit condition 1
pop(r21)
# exit condition 2
pop(r21)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I had my fun writing an array-backed stack.. (and with David's comment too..) I can admit that the point of the entire C++ Tracker class is to 'just' add an assert; doesn't actually functionally add to the original code, but does add better JIT/stub compile-time checking.
@vamsi-parasa you are right.. if there are ifs and multiple exit paths in the assembler itself.. the Tracker wont be able to catch it (multiple exits paths in the generator are just fine though); I was thinking about this problem too last night... a hack/'solution' would be to disable such checking with a default flag in the constructor... 'fairly trivial' but just adds to the complexity even more. And the assert was the point of the class to begin with... I do think such stubs are rare?
There is some value in improved checking, but enough? Writing stubs is already an 'you should know assembler very well' thing so those checks only improve things marginally overall? As David says, its for the compiler folks to decide :)
/label add hotspot-compiler-dev |
@jatin-bhateja |
Seems very complicated to me. Really this is for compiler folk to discuss. And as noted above this "tracker" class only helps where the push/pop are paired in the same scope. Personally I think a "pushp" that is defined to be a "push-paired" when available, else a regular "push", would suffice in terms of API design. But again this is for compiler folk to determine. |
The goal of this PR is to enhance the existing x86 assembly stubs using PUSH and POP instructions with paired PUSHP/POPP instructions which are part of Intel APX technology.
In Intel APX, the PUSHP and POPP instructions are modern, compact replacements for the legacy PUSH and POP, designed to work seamlessly with the expanded set of 32 general-purpose registers (R0–R31). Unlike their predecessors, they use the new APX (REX2-based) encoding, enabling more uniform and efficient instruction formats. These instructions improve code density, simplify register access, and are optimized for performance on APX-enabled CPUs.
Pairing PUSHP and POPP in Intel APX provides CPU-level benefits such as more efficient instruction decoding, better stack pointer tracking, and improved register dependency management. Their uniform encoding allows for streamlined execution, reduced pipeline stalls, and potential micro-op fusion, all of which enhance performance and power efficiency. This pairing helps the processor optimize speculative execution and register lifetimes, making code faster and more scalable on modern architectures.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25889/head:pull/25889
$ git checkout pull/25889
Update a local copy of the PR:
$ git checkout pull/25889
$ git pull https://git.openjdk.org/jdk.git pull/25889/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25889
View PR using the GUI difftool:
$ git pr show -t 25889
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25889.diff
Using Webrev
Link to Webrev Comment