Reducing temporary allocations in CClosure #400

ppedrot · 2017-01-10T14:18:11Z

This is two small patches that prevent CClosure from overallocating for nothing. It also provides a little code factorization as a byproduct.

This is an innocuous, trivially sound patch that can only enhance performances, and it could be merged quickly, but this touches the kernel so that I prefer to advertise it here.

silene · 2017-01-20T14:39:31Z

Unless you have experienced some convincing performance gains, I would vote against such a patch, since the code is now longer and less readable. In fact, it does not even feel correct. For instance, the old code was doing

Array.map (fun b -> constr_fun lfts (mk_clos env b)) ve)

while the new code does

let lfts = el_liftn n lfts in
for i = 0 to len - 1 do
  let c = constr_fun lfts (mk_clos env (Array.unsafe_get v i)) in

So you have added an extra call to el_liftn. Are you fixing a bug there or are you introducing one? My guess is that it is just dead code, but the simple fact I had to think a few minutes about it means that the patch is not as trivial as you think.

ppedrot · 2017-01-21T10:04:44Z

Well, I believe it's more uniform than the previous version and thus easier to read. It's pretty clear that 0-lifting is identity (and it is actually implemented like so).

maximedenes · 2017-03-23T18:20:49Z

@MatejKosik would it be possible to benchmark this PR? Thanks!

ejgallego · 2017-03-23T18:26:31Z

@MatejKosik would it be possible to benchmark this PR? Thanks!

Is the benchmark code available somewhere?

mattam82 · 2017-03-25T21:04:26Z

LGTM, indeed this is more uniform. I don't believe this introduces a bug.

maximedenes · 2017-06-02T22:04:30Z

@MatejKosik Is pendulum busy these days? If not, it would be good to bench this.

Zimmi48 · 2017-06-03T11:21:57Z

@maximedenes Currently pendulum is busy testing two of Matej's PRs but there is nothing preventing to add a job to the queue so I just did it for this PR. Output here.

BTW I rebased this PR before testing it. The rebase is here: https://github.com/Zimmi48/coq/commits/coq-pr-400

Zimmi48 · 2017-06-03T13:54:54Z

FTR Travis build of the rebase is here: https://travis-ci.org/Zimmi48/coq/builds/239057025

maximedenes · 2017-06-05T20:51:08Z

Ok, so the rebase has green lights from Travis. @silene we did not discuss this PR during the WG. Are you ok with it being merged?

Zimmi48 · 2017-06-05T20:57:34Z

Performance tests results:

┌────────────────────────┬─────────────────────────┬─────────────────────────────────────┬───────────────────────────────────────┐
│                        │      user time [s]      │             CPU cycles              │           CPU instructions            │
│                        │                         │                                     │                                       │
│           package_name │     NEW     OLD PDIFF   │           NEW           OLD PDIFF   │            NEW            OLD PDIFF   │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│ coq-mathcomp-odd_order │ 1390.96 1424.40 -2.35 % │ 3876607241109 3968322370870 -2.31 % │  6839747535598  6840977143926 -0.02 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│    coq-formal-topology │   37.35   37.60 -0.66 % │  100656916429  101088340716 -0.43 % │   137680031573   137853934475 -0.13 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│        coq-fiat-crypto │ 2217.71 2229.85 -0.54 % │ 6133902887340 6171106120737 -0.60 % │ 10254937269762 10272281567972 -0.17 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│   coq-mathcomp-algebra │  170.87  171.77 -0.52 % │  474394805837  474875042399 -0.10 % │   675769840774   675561303926 +0.03 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│            coq-unimath │ 1170.18 1176.22 -0.51 % │ 3245115734046 3261889179111 -0.51 % │  5582539048861  5583189132346 -0.01 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│              coq-flocq │   55.87   56.12 -0.45 % │  152653335193  152641384865 +0.01 % │   197337002483   197628133865 -0.15 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│                coq-vst │ 1301.06 1306.63 -0.43 % │ 3613238693312 3627097666422 -0.38 % │  5362991032860  5369039459559 -0.11 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│                 coq-sf │   45.10   45.26 -0.35 % │  123378591288  123663129618 -0.23 % │   160985958973   161221537409 -0.15 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│         coq-verdi-raft │ 1338.16 1341.52 -0.25 % │ 3724040820974 3733381084045 -0.25 % │  5333506691093  5335819863519 -0.04 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│             coq-geocoq │ 2164.31 2169.50 -0.24 % │ 6028298728396 6036283823845 -0.13 % │ 10087675878596 10088417060853 -0.01 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│ coq-mathcomp-character │  270.16  270.79 -0.23 % │  751805542452  752843504102 -0.14 % │  1111089710144  1110848350735 +0.02 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│  coq-mathcomp-solvable │  192.30  192.74 -0.23 % │  534171335104  535397642274 -0.23 % │   778992167653   780854297452 -0.24 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│     coq-mathcomp-field │  457.52  458.41 -0.19 % │ 1273517338036 1275788784196 -0.18 % │  2118122081507  2119269974571 -0.05 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│         coq-coquelicot │   72.46   72.58 -0.17 % │  198967708760  199272777980 -0.15 % │   251338781682   251662316588 -0.13 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│  coq-mathcomp-fingroup │   58.71   58.76 -0.09 % │  161734803584  161978440323 -0.15 % │   227516002758   227847612396 -0.15 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│        coq-bedrock-src │  907.12  907.49 -0.04 % │ 2514803116765 2513940822319 +0.03 % │  3219419059616  3220963613322 -0.05 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│ coq-mathcomp-ssreflect │   43.32   43.32 +0.00 % │  118493297825  118623145769 -0.11 % │   152753939753   152893767853 -0.09 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│              coq-color │  582.26  582.08 +0.03 % │ 1616616108943 1614210778297 +0.15 % │  2032556027306  2033485473319 -0.05 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│           coq-compcert │  811.15  810.33 +0.10 % │ 2253496654993 2249193905595 +0.19 % │  3403847313539  3402517321499 +0.04 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│     coq-bedrock-facade │  930.37  929.34 +0.11 % │ 2579269756407 2575236405768 +0.16 % │  3297089091577  3298842016965 -0.05 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│          coq-fiat-core │  128.60  128.44 +0.12 % │  336636892318  336218577362 +0.12 % │   439536975096   439890592154 -0.08 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│               coq-iris │  359.70  358.45 +0.35 % │  997073711057  996837652523 +0.02 % │  1302898066385  1303067636047 -0.01 % │
├────────────────────────┼─────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────┤
│       coq-fiat-parsers │  747.78  745.18 +0.35 % │ 2053951746940 2054420075198 -0.02 % │  3260070982515  3260448939423 -0.01 % │
└────────────────────────┴─────────────────────────┴─────────────────────────────────────┴───────────────────────────────────────┘


PDIFF = proportional difference between measurements done for the NEW and the OLD Coq version
      = (NEW_measurement - OLD_measurement) / OLD_measurement * 100%

NEW = b19411a1ce5d834ca705a5ba7e5f1bf6c0ea207d
OLD = a2a98a4015311af83edcf8fc87aa30a5318bead8

silene · 2017-06-06T05:25:34Z

I haven't really changed my opinion on this patch. On the one side, we manually unroll Array.map for the sake of performances, and on the other side, we explicitly add dead code for the sake of uniformity. We should really make clear what our policy is with respect to performance-critical code like CClosure.

Zimmi48 · 2017-06-06T07:01:53Z

Unless you have experienced some convincing performance gains

Let's add that performance testing didn't show any significant gain brought by the PR, except on the odd order package, and even there not so impressive.

ppedrot · 2017-06-06T08:54:12Z

even there not so impressive

I'm advocating for my chapel, but a series of percentish gains can end up making a big difference. Small streams make up big rivers!

ejgallego · 2017-06-06T12:10:42Z

Maybe a way to proceed is to test this patch with -flambda and see if it makes any performance gain there?

Zimmi48 · 2017-11-29T16:12:26Z

What should be done about this? Is someone willing to do the testing of this patch with Flambda?

ejgallego · 2017-11-29T16:15:48Z

I am willing but I need to find a slot to update my oexp stuff.

ejgallego · 2017-11-29T16:16:39Z

If you add a tag for "needs: benchmark" , I promise at some point I would be able to send all those PRs to my bench queue. But current ETA is > 1 month.

ppedrot · 2017-11-29T16:40:07Z

FTR, Flambda is likely to have no effect here, because this patch is only about array allocations, a matter over which Flambda does nothing.

SkySkimmer · 2018-02-26T13:13:54Z

It seems with_stats is only used in cbv after this.
Does it even make sense there?

maximedenes · 2018-02-26T13:21:37Z

WG discussion: we rebase, benchmark again, and if no noticeable improvement, we close.

ejgallego · 2018-02-26T13:44:05Z

Please also bench with flambda.

In several cases, an intermediary array was generated to be mapped over immediately. We now do everything in one pass. As a by-product, this patch also provides a little code refactoring.

While thought as a macro, the with_stats function was calling lazy + Lazy.force, which has a remarkable cost in the implementation, as it allocates and breaches the write barrier. We simply expand the definition twice, breaking the abstraction but gaining in performance.

ppedrot · 2018-03-06T14:40:11Z

Bench running here: https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/365/.

@ejgallego How do you bench with flambda?

ejgallego · 2018-03-06T14:46:34Z

@ejgallego How do you bench with flambda?

I'm not sure anyone did adapt the script to do an flambda test, but it should be very easy. Steps to build coq with flambda:

install the 4.06.1+flambda switch
configure with -native-compiler no [this won't be needed in 4.07] Pass -flambda-specific options to configure if desired [default options are not a big gain]
?
profit!

So indeed, someone should create a flambda test job in jenkins with a new field to add custom flags.

By the way it still bothers me a lot that in our benchmarks we are testing the performance of the ocaml compiler itself a lot. IMHO -native-compiler should be always off.

maximedenes · 2018-03-06T14:49:22Z

By the way it still bothers me a lot that in our benchmarks we are testing the performance of the ocaml compiler itself a lot.

Really? On what files?

ejgallego · 2018-03-06T14:56:44Z

Really? On what files?

All .v files do indeed generate the native version so a significant amount of time of every build is spent on ocamlc.

While this is a constant factor, it is quite large and may distort relative analysis.

ejgallego · 2018-03-06T14:59:26Z

All .v files do indeed generate the native version so a significant amount of time of every build is spent on ocamlc.

Well, I didn't check what opam does tho, I am talking from my setup so I could be wrong 100%.

maximedenes · 2018-03-06T15:02:25Z

All .v files do indeed generate the native version

This is incorrect.

ejgallego · 2018-03-06T15:11:39Z

This is incorrect.

Indeed, only for the standard library, coq_makefile uses a different set of COQOPTS [sight]

maximedenes · 2018-03-06T15:13:10Z

I am talking from my setup

Did you do anything special? In principle, only the stdlib should generate pre-compiled native files. Other developments can get the same behavior by passing the -native-compiler flag to coqc. If they don't, compilation is done on demand when native_compute is called.

ejgallego · 2018-03-06T15:15:42Z

Did you do anything special?

Yes never mind, I was under a different setup [that replaces the full .vo build system] and got confused [and I always use -native-compiler no anyways].

ejgallego · 2018-03-06T15:25:58Z

I think I have made this mistake in public like already 10 times....

maximedenes · 2018-03-06T15:30:11Z

I think I have made this mistake in public like already 10 times....

No worries. I'm aware that these flags are a bit messy. But the short story is we are not doing anything crazy, with the default settings, AFAIK.

ejgallego · 2018-03-06T16:14:50Z

So if you are curious what I was doing by the way was putting a library [mathcomp in this case] and I was compiling it as if the dependencies where calculated globally together with the stdlib. This is in anticipation of the compositional build system PR for libraries.

maximedenes · 2018-03-08T09:19:30Z

@ppedrot did you bench against flambda?

ppedrot · 2018-03-08T11:13:40Z

@maximedenes Nope, I have no idea how to modify the script running on Jenkins. But the last bench showed essentially no change, I think we can close this PR anyways.

maximedenes · 2018-03-08T13:10:49Z

Ok, thanks for the update, so closing.

maximedenes closed this Mar 23, 2017

maximedenes reopened this Mar 23, 2017

maximedenes added the needs: testing Some manual testing is required. label Apr 6, 2017

ejgallego added this to the 8.7 milestone May 30, 2017

maximedenes added the needs: discussion Further discussion is needed. label Jun 6, 2017

maximedenes modified the milestones: 8.8, 8.7 Jun 9, 2017

Zimmi48 added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label Oct 3, 2017

maximedenes changed the base branch from trunk to master November 7, 2017 08:22

Zimmi48 added kind: performance Improvements to performance and efficiency. and removed needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. labels Nov 29, 2017

SkySkimmer added the needs: benchmarking Performance testing is required. label Jan 15, 2018

Zimmi48 added needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. and removed needs: discussion Further discussion is needed. needs: testing Some manual testing is required. labels Feb 26, 2018

ppedrot force-pushed the less-closure-alloc branch from 6d4a515 to 13b9854 Compare February 26, 2018 14:09

ppedrot added 2 commits March 6, 2018 11:37

More efficient implementation of CClosure that does not overallocate.

b958dfa

In several cases, an intermediary array was generated to be mapped over immediately. We now do everything in one pass. As a by-product, this patch also provides a little code refactoring.

ppedrot force-pushed the less-closure-alloc branch from 13b9854 to 9ab99e3 Compare March 6, 2018 14:38

maximedenes removed the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label Mar 8, 2018

maximedenes closed this Mar 8, 2018

Zimmi48 removed this from the 8.8+beta1 milestone Mar 8, 2018

Zimmi48 removed the needs: benchmarking Performance testing is required. label Mar 8, 2018

ppedrot deleted the less-closure-alloc branch July 27, 2018 09:15

Reducing temporary allocations in CClosure #400

Reducing temporary allocations in CClosure #400

Uh oh!

Conversation

ppedrot commented Jan 10, 2017

Uh oh!

silene commented Jan 20, 2017

Uh oh!

ppedrot commented Jan 21, 2017

Uh oh!

maximedenes commented Mar 23, 2017

Uh oh!

ejgallego commented Mar 23, 2017

Uh oh!

mattam82 commented Mar 25, 2017

Uh oh!

maximedenes commented Jun 2, 2017

Uh oh!

Zimmi48 commented Jun 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zimmi48 commented Jun 3, 2017

Uh oh!

maximedenes commented Jun 5, 2017

Uh oh!

Zimmi48 commented Jun 5, 2017

Uh oh!

silene commented Jun 6, 2017

Uh oh!

Zimmi48 commented Jun 6, 2017

Uh oh!

ppedrot commented Jun 6, 2017

Uh oh!

ejgallego commented Jun 6, 2017

Uh oh!

Zimmi48 commented Nov 29, 2017

Uh oh!

ejgallego commented Nov 29, 2017

Uh oh!

ejgallego commented Nov 29, 2017

Uh oh!

ppedrot commented Nov 29, 2017

Uh oh!

SkySkimmer commented Feb 26, 2018

Uh oh!

maximedenes commented Feb 26, 2018

Uh oh!

ejgallego commented Feb 26, 2018

Uh oh!

ppedrot commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximedenes commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018

Uh oh!

maximedenes commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximedenes commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018

Uh oh!

maximedenes commented Mar 6, 2018

Uh oh!

ejgallego commented Mar 6, 2018

Uh oh!

maximedenes commented Mar 8, 2018

Uh oh!

ppedrot commented Mar 8, 2018

Uh oh!

maximedenes commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Zimmi48 commented Jun 3, 2017 •

edited

Loading

ejgallego commented Mar 6, 2018 •

edited

Loading

ejgallego commented Mar 6, 2018 •

edited

Loading

maximedenes commented Mar 8, 2018 •

edited

Loading