-
Notifications
You must be signed in to change notification settings - Fork 4
Don't merge: static args and hacks #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ep using concat_where
…on4py into use_precompile_perf_measurements
…on4py into use_precompile_perf_measurements
@@ -20,7 +20,7 @@ | |||
import cProfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes are in #729
if __debug__: | ||
log.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
important to have this change for performance measurement in the greenline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to compare to fortran openacc or granule:
We noticed that there is an expensive debug print in the blueline with loglevel > 7?
So for performance run with Fortran ICON log level 1
@@ -167,6 +179,9 @@ def time_integration( | |||
second_order_divdamp_factor, | |||
do_prep_adv, | |||
) | |||
|
|||
cuda.runtime.deviceSynchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this syncthread
@@ -145,8 +145,9 @@ def _mch_ch_r04b09_config(): | |||
|
|||
def _jablownoski_Williamson_config(): | |||
icon_run_config = Icon4pyRunConfig( | |||
dtime=datetime.timedelta(seconds=300.0), | |||
dtime=datetime.timedelta(seconds=150.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusted timestep for r2b7, but probably not needed for performance measurement
@@ -132,7 +133,24 @@ def offset_providers(self): | |||
for key, value in self.offset_provider_mapping.items(): | |||
try: | |||
method, *args = value | |||
offset_providers[key] = method(*args) if args else method() | |||
offset_provider = method(*args) if args else method() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the -1 hacks, see also #728
#cpd=[constants.CPD], | ||
#iau_wgt_dyn=[self._config.iau_wgt_dyn], | ||
#is_iau_active=[self._config.is_iau_active], | ||
#limited_area=[self._grid.limited_area], | ||
#iadv_rhotheta=[self._config.iadv_rhotheta], | ||
#igradp_method=[self._config.igradp_method], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all of these are enabled, then gtfn will error in some transformation (for the mch_dsl experiment), the current selection is a random set, not the minimal set...
xp = data_alloc.import_array_ns(self._backend) | ||
tmp = xp.where(self._metric_state_nonhydro.scaling_factor_for_3d_divdamp.ndarray > 0.0) | ||
print(tmp, flush=True) | ||
self.starting_vertical_index_for_3d_divdamp = xp.min(tmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is broken, triggered in mch_icon_ch2
xp.where(self._metric_state_nonhydro.scaling_factor_for_3d_divdamp.ndarray > 0.0) | ||
) | ||
xp = data_alloc.import_array_ns(self._backend) | ||
tmp = xp.where(self._metric_state_nonhydro.scaling_factor_for_3d_divdamp.ndarray > 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this where is missing 2 arguments? but didn't have time what the code should do...
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
In case your change might affect downstream icon-exclaim, please consider running
For more detailed information please look at CI in the EXCLAIM universe. |
cscs-ci run dace |
Following @havogt 's PR: #731 Added pre-compiled static args to stencils, e.g.: ```python self.calculate_nabla2_for_theta = calculate_nabla2_for_theta.with_backend( self._backend ).compile( vertical_start=[0], vertical_end=[self._grid.num_levels], offset_provider=self._grid.offset_providers, ) ``` --------- Co-authored-by: Edoardo Paone <[email protected]>
This PR should be rebased (to keep the edits in the driver for perf measurements) or closed. |
Following @havogt 's PR: #731 Added pre-compiled static args to stencils, e.g.: ```python self.calculate_nabla2_for_theta = calculate_nabla2_for_theta.with_backend( self._backend ).compile( vertical_start=[0], vertical_end=[self._grid.num_levels], offset_provider=self._grid.offset_providers, ) ``` --------- Co-authored-by: Edoardo Paone <[email protected]>
Consider extracting some of this into separate PRs and possibly merge:
compile
with static args could be extracted into a separate PR and tested, here it's only done for the "combined programs", the individual ones in the dycore are onlycompile
d but not with static args. At some point this worked with the granule, but didn't manage to get it working now.-1
hacks need to be cleaned, see the separate PR Don't merge: hack skip values #728