-
Notifications
You must be signed in to change notification settings - Fork 44
[FEA] Qualification : Uniquely identify apps when app names are the same #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Logical plan is not directly in the event log, correct. Can we parse out a virtual layer on top of physical plan which looks like a logical plan? Eventlog1: Eventlog2: The abstract layer of above physical plan should be: Scan tableA, Scan tableB -> Join of tableA+tableB As long as the abstract layer is the same, we treat them the same job IMO. |
Thanks @viadea |
@leewyang is working on this and his current approach is hashing on |
I see. I will discuss more details with @leewyang offline. |
That is the reason why i still prefers logical plan over any type of physical plan. |
Proposal scope for the effort to allow for SQL query matching:
|
@mattahrens the |
Yes, that is the intent. Then there can be follow-up work in qualx (or a common utility) to do the actual alignment with hashing logic, etc. Let me know if this sounds reasonable. |
The expectation for this issue has been changed from Uniquely identifying apps based on different SQLs between CPU and GPU to just generating a downstream JSON based SQLPlanInfo files made for consumption by QualX to then parse and create a normalised version of the Plan to compare between the two. Creating a different issue for this. |
Is your feature request related to a problem? Please describe.
There are scenarios where different application code can have same app names (with distinct app IDs) . In such a scenario it is hard to know which apps are good candidates to test and how we can proceed with moving that query to the test environment.
Since app name is typically the key we group by, we need a different key that will help identify job runs uniquely.
Describe the solution you'd like
Having a hash associated with the SQL plan(physical is available easily in eventlogs) along with more identifying metadata such as tables being read, the columns of the table being read and their duration will help augment the existing grouping key.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
The text was updated successfully, but these errors were encountered: