Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: quad-double precision. #184

Open
cosurgi opened this issue Jan 17, 2020 · 89 comments
Open

Feature request: quad-double precision. #184

cosurgi opened this issue Jan 17, 2020 · 89 comments

Comments

@cosurgi
Copy link

cosurgi commented Jan 17, 2020

Hi,

as suggested in math #303 I repost this here:

How feasible it is to integrate quad-double precision, It has 212 bits in significand and is faster than MPFR:

I have just recently integrated boost multiprecision with a fairly large numerical computation software YADE.

Float128 isn't enough for my needs, yade would greatly benefit from quad-double (as it is much faster than MPFR). Also yade can serve as a really large testing ground. I have implemented CGAL numerical traits and EIGEN numerical traits to use multiprecision. And all of them are tested daily in our pipeline, see an example run here. Or a more specific example for float128: test (need to scroll up) and check

@cosurgi
Copy link
Author

cosurgi commented Jan 17, 2020

@ckormanyos

How feasible it is to integrate quad-double precision...

This is a good question. I would combine that with also software support for float128_t for those platforms lacking quadmath. In fact, I would see full cross-plattform availability of a float128_t (be it hardware backend or software pure) as a prerequisite before going for a float256_t.

This definitely would be great. Especially if float128 will use compiler/assembler version when available.

Then there is the topic of design questions.

Exrtract float128_t/256_t from the same template?
Support other digit splits, etc.
Lots of questions.

Yeah, it definitely depends on the approach. The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper. A much more interesting approach is to extract the design used to "quadruple" the precision in there. In such a way that quad-float128 would be simply another template of the same design. 512bits is exactly what I will need in my calculations.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 17, 2020

A much more interesting approach is to extract the design used to "quadruple" the precision in there. In such a way that quad-float128 would be simply another template of the same design

I was thinking more of the binary128 and binary256 formats that are mentiod in IEEE754:2008. But these would be (at least in their portable and generic form) software-intensive and thus lose the hardware acceleration that is the very thing you seek. So that is not exactly what the original request is about.

The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper...

This might unfortunately have liensing inconsistencies, in the sense that using the code directly with BSL might be problematic.

I think that binary128/256 is an idea we could talk about. But I mentioned previously, I think there would be several important design considerations. Another idea would be to create a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision. That might be a way to navigate through the licensing differences. I mean, you could even wrap the old FORTRAN77 REAL*16 that way.

I think I'd like to continue this discussion and see what possibilities there are.

Kind regards, Chris

@cosurgi
Copy link
Author

cosurgi commented Jan 17, 2020

Since binary128/256 would be software intensive I would prefer other approach. When hardware support appears in future CPUs adding such binary support should be fairly easy. As you said, it is not a matter of this thread.

a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision

This solution is much more interesting for me. I think that you have already done that with __float128 and mpfr:: (the file /usr/include/mpreal.h from the debian package libmpfrc++-dev), am I correct?

Also the idea to use the "quadrupling" method described in that paper is a worthwhile direction for me. Even if it means completely reimplementing this idea in boost from scratch. This will take a lot of time though.

So I could summarise following:

  1. backend intended to wrap quad-double. The user has to install and configure qd-2.3.22.tar.gz by himself, but then boost would immediately "understand" and wrap it.
  2. "quadrupling" precision approach, a pure boost-only solution.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 18, 2020

a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision

  1. backend intended to wrap quad-double. The user has to install and configure qd-2.3.22.tar.gz by himself, but then boost would immediately "understand" and wrap it.

Correct. There are already Boost.Multiprecision "float" backend wrappers for __float128/quadmath, GMP/MPIR, MPFR and an "int" wrapper for tommath, and these require that the user environment is able to find and link to the required target-specific lib.

I'm not sure what it means for an existing UDT number library to be important/cool/special enough to be wrapped with a supported backend in Boost? I'm also a bit hesitant to directly use DD/QD prototypes when looking at the licensing of the original work. Nonetheless, DD and QD have been around for a while. In fact, I used them heavily about 10-12 years ago. Back then there was a different DD package with 106 binary digits. Maybe it was the predecessor of the work we are mentioning in this post.

Even though I'm not particularly fond of the "FP hack" required for x86 architecture in these packages, and I'm also not sure exactly how portable DD and QD are, these libs are quite useful.

This idea seems feasible to me.

Kind regards, Chris

@pabristow
Copy link
Contributor

The downside of double-double (and I presume of quad-double) has been shown to be some uncertainty estimating about epsilon and consequence difficulties in estimating precision and uncertainties.

The Apple/Darwin platforms all show 'different' behaviour in many Boost.Math tests.

So the 106 or 212 significand bits may be over-optimistic. If you value speed more then ...

@ckormanyos
Copy link
Member

The downside of double-double (and I presume of quad-double) has been shown to be some uncertainty estimating about epsilon and consequence difficulties in estimating precision and uncertainties.

Indeed. Interestingly enough, two of the greater challenges in potentially wrapping DD/QD and higher orders would be getting the types to behave like proper C++ types and figuring out how to mix the function name calls within BSL headers.

These are the two challenges I'm facing in another (unpublished) project that wraps the high precision type of a popular computer algebra system for Boost.Multiprecision.

Still, I think the DD/QD topic might be interesting to pursue in a feasibility study. Perhaps some of the epsilon inconsistencies could be addressed if the concepts of unity, comparison, frexp, ldexp are established more clearly in small add-on functionalities beyond the raw original implementation.

Kind regards, Chris

@jzmaddock
Copy link
Collaborator

This is a good idea, and in principle easy enough.

As mentioned above already, epsilon is likely to the the main sticking point, there are 2 possible definitions:

  • A strict interpretation of the std, is to pick the smallest number which can be added to 1 and yield a different result - this is numeric_limits<double>::min() or even denorm_min(). This is the definition that gcc on apple used, and it's pretty useless for any real work.
  • A loose definition, would be 2^(B-1) if we have B bits of precision (53*4 in this case). This is more useful generally, but can still lead to nasty surprises.

BTW some time back I tried pretty hard to get Boost.Math working with apples "double double" and just couldn't get everything working - so much of the reasoning about what happens to a number under operation X breaks for these types that it's a bit of a loosing battle. That said, we have since added cpp_dec_float and mpf_float both of which have hidden guard digits and seem to be working OK. In principle, these types aren't that different I hope.

@ckormanyos
Copy link
Member

How feasible it is to integrate quad-double precision, It has 212 bits in significand and is faster than MPFR:

This is a good idea, and in principle easy enough.

Thanks John, for your experienced advice. That's what I was seeking in terms of a kind of nod.

If there are no other blocking points, then I will write this up ASAP as a feasibility study for Boost GSoC 2020. The timing of this user-raised issue is, I believe, perfect for that potential venue.

Kind regards, Chris

@NAThompson
Copy link
Contributor

I'm curious about @cosurgi's use case. I feel like the current Boost.Multiprecision covers the intended use well as it stands, which only maybe a factor of 2 available for performance increase. But if you're using quad double, well, performance can't be a huge issue anyway.

@cosurgi
Copy link
Author

cosurgi commented Jan 18, 2020

@NAThompson even a two-times performance increase means waiting for result one month or two months :)

My use case will be quantum mechanics and quantum chromodynamics. Currently I am preparing the yade source code for all of this. The experimental measurements in some cases are more precise than 20 or even 30 significant digits. Performing high precision calculations is a big deal here. Since I plan to calculate volumes significantly larger than Planck's length I have estimated that my target required precision is about 150 digits, which is 512bits, And it has to be as fast as possible. I know it will take years to achieve, but that's the goal.

My current strategy to achieve this is to make sure that yade is 100% compatibile with boost multiprecision. So that when new faster and more precise types appear I will be able to instantly benefit from them. And hope that someday CPUs with native floating point 512 bit type will appear :)

@cosurgi
Copy link
Author

cosurgi commented Jan 18, 2020

BTW: MPFR is about 10-times slower. The difference in waiting one month and 10 months for a result becomes really meaningful.

@ckormanyos
Copy link
Member

MPFR is about 10-times slower. The difference in waiting one month and 10 months for a result...

So let's prefer the 1 month.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

This is a good idea, and in principle easy enough.
As mentioned above already, epsilon is likely to the the main sticking point, ...

Yes.

BTW some time back I tried pretty hard to get Boost.Math working with apples "double double" and just couldn't get everything working

This comment rang a bell in my mind. So I mounted a drive from 2007. As it turns out, I developed my own version of q_float in March 2005. I had completely forgotten about this. It appears as though my original work used add/sub/mul/div routines originally from a developer known as K. Briggs in doubledouble. But these had subsequently been modified in a file called quad_float, that I had discovered in something at that time called WinNTL-5_3_2, which appears to live on to the present day in newer versions.

I got remarkably far with this code, and am somewhat surprised that I had forgotten about it. If we pursue this topic, I will be sharing my work in a smaller e-mail circle.

In that work, it looks like I simply empirically selected 31 for digits10 and calculated base-2 digits from that. Using this, I selected epsilon to simply be 10^-30. But I had curiously failed to implement any number for max_digits10.

Bset regards, Chris

@jzmaddock
Copy link
Collaborator

Interesting comments and use cases - thanks for this.

There's one other potential blocker that springs to mind: the special functions are kind of tricky to implement for types with large bit counts, but small exponent ranges. quad_float is perhaps the archetype here :(

There's an open bug for tgamma on this: #150, but it wouldn't surprise me if there aren't others lurking once you go hunting.

Question: can double_double be doubled? And doubled again etc? In other words could these be implemented via nested templates?

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper.

One thing I mentioned about this package is licensing of the code.

I believe that I am getting this interpretation. Not yet, however, fully sure... But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package. This means, as is confirmed in the DD/QD documentation, that the code will have a hard time being used in multithreading for x86. I would prefer to use a package that sets and resets the x86-FPU-fix within the individual subroutines that need it, thereby allowing for a synchronization mechanism if trying for multithreading.

So at the present, I am not aware of an existing kind of DD or QD that would fully satisfy my personal requirements. That being said, one could focus on a popular one. But there seem to be several popular ones dating all the way back to doubledouble that I had mentioned above.

Best regards, Chris

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

Question: can double_double be doubled? And doubled again etc? In other words could these be implemented via nested templates?

You read my mind. I think the answers are yes and yes. But I am also wary of such nested templates having once gotten poor results with an integer type using something like this. But that experience was more than 20 ago, I was less experienced at programming C++ back then and compilers got much (massively much) better at parsing nested templates since then.

@jzmaddock
Copy link
Collaborator

But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package

Wouldn't that only be required for code executing on the old x87 co-processor, and not x64 code using the SSE registers?

@ckormanyos
Copy link
Member

But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package

Wouldn't that only be required for code executing on the old x87 co-processor, and not x64 code using the SSE registers?

Yes. I have also just sent you some work in another thread. In that work, the x86-FPU-fix is only needed for that and only for that legacy FPU.

@ckormanyos
Copy link
Member

write this up ASAP as a feasibility study for Boost GSoC 2020. The timing of this user-raised issue is, I believe, perfect for that potential venue.

This has been done near the bottom of this page
Boost.Multiprecision: Double-Double and Quad-Double Types.

@jzmaddock
Copy link
Collaborator

@cosurgi : There's a very tentative version in draft PR referenced above. I would welcome feedback if you'd like to give it a try and see how far you can get.

There are some issues, and some possibly controversial design choices:

  • I've fixed up add/subtract/divide to handle infinities correctly. However, for the ultimate performance, this is perhaps not such a good idea?
  • The one and only very basic test case, has a number of failures. string IO is horribly broken, and printing out the value 64 does not result in "64" for example. This is a direct consequence of some of the arithmetic operators - for example 2^10 does not lead to exactly 1024 :(
  • I've done no testing of special functions or Boost.Math support generally.
  • Many more tests are required, although there are likely to be so many failures from our existing boilerplated tests it's hard to know quite how to proceed. Iostream round tripping isn't even worth trying - it will never work correctly IMO.

Anyhow, if you have any real world code you can plug this into, it would be interesting to hear how you get on. It's all a bit too "fast and loose" for me!

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

@jzmaddock I'm sorry for late reply. This looks very interesting. Yes I have a real-world code YADE with a full blown testing suite for math functions I will try to use it this or the next week.

Just to be sure - I should take this commit ?

The issues:

  • YADE heavily depends on correct handling of NaN and Inf. Without them CGAL does not work at all.
  • Ouch: string IO roundtripping is very important for me.
  • I have tests of all math functions, they are tested against python mpmath. I picked the tolerances by experimenting with the results. That's why string IO is important - strings are the only way to get high precision numbers into python.
  • Ouch again. Maybe we could craft something here? Even using cpp_bin_float as a helper type is acceptable for me. String IO was never supposed to be fast. It just should be accurate.

See some example math tests results for MPFR 150 - scroll up to see the math functions being tested by the earlier mentioned script.

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

Oh, btw if 2^10 is 1023.99999999(9)..... then it's acceptable for me, because the error will be smaller than the tolerance level. We only need the working round tripping.

@jzmaddock
Copy link
Collaborator

Oh, btw if 2^10 is 1023.99999999(9)..... then it's acceptable for me, because the error will be smaller than the tolerance level. We only need the working round tripping.

This is actually the reason that round tripping doesn't work - I mean it's approximately correct (to within an epsilon or so), but without a correctly functioning pow() we basically can't implement correct string output. Actually even the basic arithmetic operations aren't correctly rounded either... as long as string IO is approximately correct (to within an epsilon or so - ie drop one digit and it would round to the "correct" result), wouldn't your tests still function OK?

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

wouldn't your tests still function OK?

heh. I suppose that all math tests would work. But the round tripping test would fail ;) I suppose I could implement an exception for this one.

@jzmaddock
Copy link
Collaborator

Meanwhile.... there are a few issues with the current version - mostly order of initialization issues in numeric_limits<qd_real>, give me a day or so and I'll push a fix as they prevent tgamma and others from working.

@jzmaddock
Copy link
Collaborator

Nevermind, fix pushed. The C99 functions including tgamma etc are now working, note that they are not quite std conforming in that they don't adhere to the non-normative appendix F which deals with handling of infinities and NaNs.

@LegalizeAdulthood
Copy link

Is there a repo with a github wirkflow for the tests I can clone?

@ckormanyos
Copy link
Member

Is there a repo with a github wirkflow for the tests I can clone?

Yes certainly. Again i think we got pretty close to stability.

  • I'm torn between two worlds, the old double-double and newer implementations.
  • I also forgot to mention, we were getting stuck and tripped up on infs, NaNs and such edge cases.

So if we can just get stable operations and edge case handling this thing will be done. The formal testing and plug in to Multiprecisin was/is about done. but I need to get the branch and see if it still plays with Boost-Develop branch.

@ckormanyos
Copy link
Member

Is there a repo with a github wirkflow for the tests I can clone?

Yes Richard (@LegalizeAdulthood) we did this work in the 2021 Boost-GSoC.

The repo is here.

I'll take a few minutes ASAP to see if it shakes out well with merge develop into our branch and then we can start to boogy.

Cc: @mborland and @jzmaddock

@ckormanyos
Copy link
Member

OK @LegalizeAdulthood in this sync attempt I'm attempting to merge Boost.Develop Multiprecision into our branch at GSoC2021 to get this thing ready for action.

@ckormanyos
Copy link
Member

ckormanyos commented Dec 15, 2024

attempting to merge Boost.Develop Multiprecision into our branch at GSoC2021 to get this thing ready for action.

It worked rather well. But ONLY the cpp_double_fp_backend is exercised in CI at the moment. Quad is still pending.

The blaze-down on double-something is as follows:

  • Running only cpp_double_fp_backend (no quad backend in CI yet).
  • Most tests pass for double-float, double-double, double-long double and (if available) double-__float128.
  • Algebra, limits and all normal stuff such as elementary transcendental functions all pass.
  • There are five (5) files failing on specfun.
  • Of these specfun failures, some fail simple tolerances. Other failures are due to limited exponent range. These don't bother me.
  • Some very few selected specfun failures are due to confusion between zero and NaN or similar. These are pretty much the last remaining true problems in double-whatever.

I did, in fact, merge this to the develop of the GSoC branch even with certain failures of specfun.

So what does this mean?

  • It is difficult to reach the quality needed for Multiprecision backend.
  • But at least cpp_double_fp_backend is close to getting there.
  • Algebra, most elementary transcendental functions and limits work fine.
  • I will be pursuing the final edge cases that fail.
  • We need to review add/sub/mul/div/sqrt to ensure that the right, optimal algorithms are being used.
  • Edge cases of zeros/NaNs seem to need work.
  • Some specfun tests having huge exponential results need to be excluded for double-double testing.
  • Need to restore test_exp.cpp which I removed temporarily from the Jamfile.

Cc: @LegalizeAdulthood and @jzmaddock and @mborland

@ckormanyos
Copy link
Member

ckormanyos commented Dec 15, 2024

I found multi-floats to be very useful in my research area and believe it will play an even more important role in the future. If you have the chance to pick up this project again and need some help, I'd like to contribute or collaborate with you.

Hello @AuroraDysis based on interest from @LegalizeAdulthood as well as my own, I am, in fact, picking up this project again.

I think at the moment the project is struggling right at the double-double level to actually get the right algorithms for add/sub/mul/div/sqrt and a few more. I looked at the links to the Julia-based code you gave and the double-double primitives look quite good. Have you ever converted any of these DD/QD primitives to the C/C++ domain? I can start that if need be.

@AuroraDysis
Copy link

Hi @ckormanyos, I'm excited to hear about the renewed interest in this project.

I think at the moment the project is struggling right at the double-double level to actually get the right algorithms for add/sub/mul/div/sqrt and a few more. I looked at the links to the Julia-based code you gave and the double-double primitives look quite good. Have you ever converted any of these DD/QD primitives to the C/C++ domain? I can start that if need be.

As for your question, I have not yet had the opportunity to convert it to C/C++.

@ckormanyos
Copy link
Member

ckormanyos commented Dec 23, 2024

Is there a repo with a github wirkflow for the tests I can clone?

Hi @LegalizeAdulthood just for info, we are finishing the extreme edge-cases of the cpp_double_fp_backend<T> template at the moment. We have a few tiny issues with subnormals NaNs, infinities and zeros.

The type is, however, robust enough to use for, let's say, regular calculations such as those that do not over/underflow or reach NaNs.

Just for fun, I plugged the cpp_double_double backend having precision of 32 decimal digits into a full Mandelbrot calculation. I compared this with cpp_dec_float<32>. The double-double type is about twice as fast (see table below).

Some may consider this to be slow, as I do, compared to the expectations for this type. But I definitely think this type is worth pursuing for several reasons. The safety (infinities, NaNs, divide-by-zero, etc.) checks that make this type well-behaved also slow it down considerably. I could imagine a version that throws away checks and is specifically called unsafe. Furthermore, I feel that the double-double backend is/could-be particulalry well-suited for GPU programming if we ever get around to that. So the initial timing report indicates quite some potential down the road.

VERY Initial Timing Report

  • Full classic Mandelbrot image $2048{\times}2048$ pixels at 32 decimal digit precision.
  • cpp_double_double versus cpp_dec_float<32>
  • $2000$ iterations, half-width $1.35$, centered around the point $(0.0, -0.75i)$.
  • Using 15 cores multithreading on Intel Core-i9.
number type time (s)
cpp_double_double 27
cpp_dec_float<32> 52

Cc: @mborland and @sinandredemption and @cosurgi and @jzmaddock

@ckormanyos
Copy link
Member

ckormanyos commented Jan 5, 2025

So, we are just about done cleaning up and optimizing cpp_double_fp_backend. We have working cpp_double_float, cpp_double_double , cpp_double_long_double and (with the right compilers) cpp_double_float128.

I squeezed down the perf a bit on:

  • Full classic Mandelbrot image $2048{\times}2048$ pixels at 32 decimal digit precision.
  • cpp_double_double versus cpp_dec_float<32>
  • $2000$ iterations, half-width $1.35$, centered around the point $(0.0, -0.75i)$.
  • Using 15 cores multithreading on Intel Core-i9.
number type time (s)
cpp_double_double 19
cpp_dec_float<32> 52
cpp_bin_float<32, digit_base_10> 89

See the image below. And also note that low digits is one of the few and rare parameter ranges in which dec_float might have the upper-hand on bin_float perf. So cpp_couble_fp_backend<double> is really fast here.

There could/would be a lot more to push down if we enabled something like a template parameter:

const bool SkipEdgeChecks = false

Image

Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland

@ckormanyos
Copy link
Member

ckormanyos commented Jan 5, 2025

I have a good mind to finish the docs and try for a review among our authors and to get cpp_double_fp_backend into 1.88 in the early spring.

Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland

@LegalizeAdulthood
Copy link

Which version of the library were you using for the QD backend?

@ckormanyos
Copy link
Member

ckormanyos commented Jan 6, 2025

Which version of the library were you using for the QD backend?

None of them. Neither of them. No previously published QD is used as a library. Nor is any previously published DD used as a library.

Hi Richard (@LegalizeAdulthood)

  • In this first step, we are only handling the cpp_double_fp_backend.
  • The quad-fp-backend is scheduled for later.
  • We will not be using any published library code. This is self-written code. From a subjective viewpoint, there was no suitable backend code around the internet that worked in all edge cases. So I rewrote the algorithms from old and new literature to better handle efficiency, min/max/exactness situations.

@LegalizeAdulthood
Copy link

@ckormanyos I see, very interesting! I found the existing QD library to be pretty crufty as well. I'm not sure if it was because of the fortran support or just the author's inclinations.

So I rewrote the algorithms from old and new literature to better handle efficiency, min/max/exactness situations.

With the existing QD library I did notice that some tests failed around the edge cases. (See my cmake branch on my fork). That made me a little nervous, but I didn't have time to dig into the details of hte literature to understand the failure.

Is your rewrite availble on github?

@ckormanyos
Copy link
Member

So I rewrote the algorithms from old and new literature to better handle efficiency, min/max/exactness situations.

Is your rewrite availble on github?

Yes. It is header-only C++ at the moment located here. The code is very new and in an infant state, but it is getting there.

If you clone that repo, it is a fork of what we call the Boost.Multiprecision submodule.

To use it:

  • Locate the directory include.
  • Add that include directory to your compiler include path.
  • Then in your code #include <boost/multiprecision/cpp_double_fp_backend.hpp>.
  • You can start by using one of the prefabricated types like boost::multiprecision::cpp_double_double and use it just like you would use ordinary double.
  • See example below.

We can/will be modifying code internals in the future. So these algorithms like add/sub/mul/div/sqrt can evolve. Feel free to comment, add change requests or get involved if you like.


#include <boost/multiprecision/cpp_double_fp.hpp>

#include <iostream>
#include <iomanip>
#include <sstream>

auto main() -> int
{
  using dd_type = boost::multiprecision::cpp_double_double;

  dd_type frac = dd_type { 1 } / 3;

  std::stringstream strm { };

  strm << std::setprecision(std::numeric_limits<dd_type>::digits10) << frac;

  std::cout << strm.str() << std::endl;
}

@ckormanyos
Copy link
Member

ckormanyos commented Jan 7, 2025

Hello @LegalizeAdulthood

One other real benefit of our Multiprecision types is that they unabatedly and seamlessly interoperate with Boost.Math. A lot of great math can be done this way.

In the example below, I check that (for cpp_double_double)

$$ \Gamma\left(1/2\right){\approx}\sqrt{\pi} $$


#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/gamma.hpp>
#include <boost/multiprecision/cpp_double_fp.hpp>

#include <iostream>
#include <iomanip>
#include <sstream>

auto main() -> int
{
  using dd_type = boost::multiprecision::cpp_double_double;

  constexpr dd_type half = dd_type { 1 } / 2;

  const dd_type g_half { boost::math::tgamma(half) };
  const dd_type sqrt_pi { sqrt(boost::math::constants::pi<dd_type>()) };

  std::stringstream strm { };

  strm << "g_half : " << std::setprecision(std::numeric_limits<dd_type>::digits10) << g_half  << '\n';
  strm << "sqrt_pi: " << std::setprecision(std::numeric_limits<dd_type>::digits10) << sqrt_pi << '\n';

  std::cout << strm.str() << std::endl;
}

This is a great achievement in generic programming basically out of the box. We worked hard for this over the years, and @jzmaddock has been our guide and led this great effort for years.

Cc: @mborland and @NAThompson

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

I am now running a benchmark of yade --stdperformance 5 against boost::multiprecision::cpp_double_double. It is a simple simulation where 10000 spheres (imagine they are grains of sand) fall down inside a box and settle down. Below is an image from YADE paper where we have implemented high precision calculations with MPFR inside yade.

Image

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

Forgot to mention. I am using develop branch with latest commit 9f34658

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

However to compile yade with boost::multiprecision::cpp_double_double I had some compiler errors concerning MPFR.

In this configuration yade is using boost::multiprecision::cpp_double_double as the base precision type, but simultaneously higher precision types are also used and compiled in. These can be either boost::multiprecision::mpfr_float_backend or boost::multiprecision::cpp_bin_float depending on configuration options.

At first shot I used MPFR. And there were some compiler errors such as:

/usr/include/boost/math/constants/constants.hpp:313:3: error: no matching function for call to 
‘boost::math::constants::detail::constant_catalan
<boost::multiprecision::number<boost::multiprecision::backends::mpfr_float_backend<62, boost::multiprecision::allocate_dynamic>, boost::multiprecision::et_off> 
>::get(boost::math::constants::construction_traits
<boost::multiprecision::number<boost::multiprecision::backends::mpfr_float_backend<62, boost::multiprecision::allocate_dynamic>, boost::multiprecision::et_off>, 
boost::math::policies::policy<boost::math::policies::default_policy, boost::math::policies::default_policy, 
boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, 
boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, 
boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy> >::type)’

  313 |   BOOST_DEFINE_MATH_CONSTANT(catalan, 9.159655941772190150546035149323841107e-01, "9.15965594177219015054603514932384110774149374281672134266498119621763019776254769479356512926115106248574422619e-01")

/usr/include/boost/multiprecision/mpfr.hpp:2270:37: note: candidate: ‘template<int N> static const 
 2270 |    static inline const result_type& get(const std::integral_constant<int, N>&)
      |                                     ^~~
/usr/include/boost/multiprecision/mpfr.hpp:2270:37: note:   template argument deduction/substitution failed:

I think this error may be because I used the linux systemwide installation of boost 1.74 and only replaced the /usr/include/boost/multiprecision directory with new multiprecison code. Hence it is a mix of two different boost versions in the same directory, so some incompatibilities between Boost.Math and Boost.Multiprecision might have arisen.

But if you think it might be due to latest changes in Boost.Multiprecision I could try to make a minimal failing example and we could move this into a separate issue.

But I was able to compile yade despite these errors and run the benchmarks. I could simply comment out the offending line, because these errors were in diagnostics/testing part of the code - the code not used by yade --stdperformance.

I will post the benchmark results once they finish.

@ckormanyos
Copy link
Member

I am now running a benchmark of yade --stdperformance 5 against boost::multiprecision::cpp_double_double

Awesome Janek (@cosurgi) thank you. Now I'm scared to see how the infant cpp_double_fp_backend faces its first tough real world challenge.

Hence it is a mix of two different boost versions in the same directory, so some incompatibilities between Boost.Math and Boost.Multiprecision might have arisen.

It would be best to have both Math and Multiprecision synchronized. In your case, however, if these are the only two Boost libraries you need, then you could actually make use of their new standalone feature. These two libraries are specifically and purposely standalone and do not rely on any other parts of Boost. This all happened about 2 years ago so right around that 1.74 time.

But if you think it might be due to latest changes in Boost.Multiprecision we could move this into a separate issue.

It does seem like a coincidence that you encounter resolution of function problems around 62 deigits, which is similar to the range of cpp_double_double. But at the moment I could only speculate. I think the best bet is to synchronize both Math as well as the Multiprecision submodules to the state of today and include these two prior to your system include of 1.74.

Thanks Janek and let's see how this thing shakes out.

@ckormanyos
Copy link
Member

Forgot to mention. I am using develop branch with latest commit

Hi @cosurgi this is really hot off the press, but I am actually cycling the cpp_double_fp_backend-branch in the real Multiprecision submodule right now. And everything just went GREEN here. So you might like to get on that branch and PR 648 for the bleeding edge.

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

Not so good news. Single run of yade --stdperformance with boost::multiprecision::float128 took about 15 minutes. Now 1 hour has passed with boost::multiprecision::cpp_double_double and the single run still didn't finish.

@ckormanyos
Copy link
Member

Not so good news. Single run of yade --stdperformance with boost::multiprecision::float128 took about 15 minutes. Now 1 hour has passed with boost::multiprecision::cpp_double_double and the single run still didn't finish.

That is kind of a preliminary disaster. Let's see if it finishes at all. One problem I encountered in my first performance run was that interconversions between cpp_double_double and any other MP type rely on string conversions at the moment. So if there are mixed MP types that could explain the disaster.

Otherwise we would need to find out how and why this is so disasterously slooooowwwww.

@ckormanyos
Copy link
Member

how and why this is so disasterously slooooowwwww

Another point. You are doing full physics with probably lots of use of elementary transcendental functions. At the moment, cpp_double_fp_backend only has specializations for exp() and log(). So you are using excruciatingly slow elementary functions from the default cataloge of Multiprecision (like using Taylor series for sin() etc.

I have been doing exclusively algebra-only tests to mark the real performance of the raw type. On the other hand, we could be experiencing a real disaster...

@ckormanyos
Copy link
Member

ckormanyos commented Jan 8, 2025

I will take a day and run John's suite of performance tests that stress floating-point operations and report back when finished. We did some of these during the GSoC with @sinandredemption and they were looking good. It is time to reproduce these tests (which admittedly I have not done this year, but will).

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

The first run is finished, the very preliminary result is that cpp_double_double is 7.3 times slower than float128.

Indeed I am using here several transcendental and elementary functions. Of all of them, I think sqrt(…) and pow(… , 2) are used really the most.

EDIT: oh yeah, and probably a bit of trigonometric sin(…), cos(…), tan(…) functions too.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 8, 2025

The first run is finished, the very preliminary result is that cpp_double_double is 7.3 times slower than float128.

Thank you Janek for this preliminary result, even though somewhat challenging.

I just ran algebraic tests locally cpp_double_double versus cpp_bin_float<32> and the double-double type was about 3-10 times faster except for string operations. I will publish the table tomorrow.

Hmmmm... I wonder if you are facing off a software-emulated type with a hardware type ( float128). That would not ever be a fair fight. I will publish more detailed results tomorrow. Just for fun, you could run Yade in that test with something like one of either cpp_bin_float or cpp_dec_float configured for 32 decimal digits of precision?

I think we need to see what you are actually testing here. I suppose the elementary transcendental functions could make a huge difference but that huge would surprise me. Well, we will eventually get to the bottom of this.

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

So if there are mixed MP types that could explain the disaster.

Yade supports mixed MP types. But in this particular simulation they are not used. Everything is done with cpp_double_double. So no string conversions involved.

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

It would be best to have both Math and Multiprecision synchronized.

Yes, upgrading Boost.Math solved the compilation problem. Now I can compile yade with new type just with this super small diff:

--- a/lib/high-precision/Real.hpp
+++ b/lib/high-precision/Real.hpp
@@ -130,10 +130,10 @@ namespace math {
-#include <boost/multiprecision/float128.hpp>
+#include <boost/multiprecision/cpp_double_fp.hpp>
 namespace yade {
 namespace math {
-       using UnderlyingReal = boost::multiprecision::float128;
+       using UnderlyingReal = boost::multiprecision::cpp_double_double;
 }
 }

And also full math diagnostics which I have implemented in yade has now compiled without problems. This might come useful, because I was testing almost all math functions, see Table 4 in YADE paper. (The large errors in cpp_bin_float have since been fixed #264 )

you could run Yade in that test with something like one of either cpp_bin_float or cpp_dec_float configured for 32 decimal digits of precision?

You are right, I will try boost::multiprecision::cpp_bin_float<32> and boost::multiprecision::mpfr_float_backend<32>

Are you sure it should be 32 decimal places? digits10 says it's 31…

Also I have in mind testing cpp_double_long_double and cpp_double_float128 against cpp_bin_float and mpfr_float_backend. This will take some time though.

@cosurgi
Copy link
Author

cosurgi commented Jan 8, 2025

facing off a software-emulated type with a hardware type (float128). That would not ever be a fair fight.

Are you sure float128 is hardware? I was thinking that it was offered by the compiler either as __float128 or _Quad, both of which do not have associated hardware assembly instructions, but the compiler emulates them. I can be wrong though.

But yeah, I will now compare against cpp_bin_float and mpfr_float_backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests