-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: quad-double precision. #184
Comments
This definitely would be great. Especially if float128 will use compiler/assembler version when available.
Yeah, it definitely depends on the approach. The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper. A much more interesting approach is to extract the design used to "quadruple" the precision in there. In such a way that quad-float128 would be simply another template of the same design. 512bits is exactly what I will need in my calculations. |
I was thinking more of the binary128 and binary256 formats that are mentiod in IEEE754:2008. But these would be (at least in their portable and generic form) software-intensive and thus lose the hardware acceleration that is the very thing you seek. So that is not exactly what the original request is about.
This might unfortunately have liensing inconsistencies, in the sense that using the code directly with BSL might be problematic. I think that binary128/256 is an idea we could talk about. But I mentioned previously, I think there would be several important design considerations. Another idea would be to create a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision. That might be a way to navigate through the licensing differences. I mean, you could even wrap the old FORTRAN77 REAL*16 that way. I think I'd like to continue this discussion and see what possibilities there are. Kind regards, Chris |
Since binary128/256 would be software intensive I would prefer other approach. When hardware support appears in future CPUs adding such binary support should be fairly easy. As you said, it is not a matter of this thread.
This solution is much more interesting for me. I think that you have already done that with Also the idea to use the "quadrupling" method described in that paper is a worthwhile direction for me. Even if it means completely reimplementing this idea in boost from scratch. This will take a lot of time though. So I could summarise following:
|
Correct. There are already Boost.Multiprecision "float" backend wrappers for __float128/quadmath, GMP/MPIR, MPFR and an "int" wrapper for tommath, and these require that the user environment is able to find and link to the required target-specific lib. I'm not sure what it means for an existing UDT number library to be important/cool/special enough to be wrapped with a supported backend in Boost? I'm also a bit hesitant to directly use DD/QD prototypes when looking at the licensing of the original work. Nonetheless, DD and QD have been around for a while. In fact, I used them heavily about 10-12 years ago. Back then there was a different DD package with 106 binary digits. Maybe it was the predecessor of the work we are mentioning in this post. Even though I'm not particularly fond of the "FP hack" required for x86 architecture in these packages, and I'm also not sure exactly how portable DD and QD are, these libs are quite useful. This idea seems feasible to me. Kind regards, Chris |
The downside of double-double (and I presume of quad-double) has been shown to be some uncertainty estimating about epsilon and consequence difficulties in estimating precision and uncertainties. The Apple/Darwin platforms all show 'different' behaviour in many Boost.Math tests. So the 106 or 212 significand bits may be over-optimistic. If you value speed more then ... |
Indeed. Interestingly enough, two of the greater challenges in potentially wrapping DD/QD and higher orders would be getting the types to behave like proper C++ types and figuring out how to mix the function name calls within BSL headers. These are the two challenges I'm facing in another (unpublished) project that wraps the high precision type of a popular computer algebra system for Boost.Multiprecision. Still, I think the DD/QD topic might be interesting to pursue in a feasibility study. Perhaps some of the epsilon inconsistencies could be addressed if the concepts of unity, comparison, frexp, ldexp are established more clearly in small add-on functionalities beyond the raw original implementation. Kind regards, Chris |
This is a good idea, and in principle easy enough. As mentioned above already, epsilon is likely to the the main sticking point, there are 2 possible definitions:
BTW some time back I tried pretty hard to get Boost.Math working with apples "double double" and just couldn't get everything working - so much of the reasoning about what happens to a number under operation X breaks for these types that it's a bit of a loosing battle. That said, we have since added cpp_dec_float and mpf_float both of which have hidden guard digits and seem to be working OK. In principle, these types aren't that different I hope. |
Thanks John, for your experienced advice. That's what I was seeking in terms of a kind of nod. If there are no other blocking points, then I will write this up ASAP as a feasibility study for Boost GSoC 2020. The timing of this user-raised issue is, I believe, perfect for that potential venue. Kind regards, Chris |
I'm curious about @cosurgi's use case. I feel like the current Boost.Multiprecision covers the intended use well as it stands, which only maybe a factor of 2 available for performance increase. But if you're using quad double, well, performance can't be a huge issue anyway. |
@NAThompson even a two-times performance increase means waiting for result one month or two months :) My use case will be quantum mechanics and quantum chromodynamics. Currently I am preparing the yade source code for all of this. The experimental measurements in some cases are more precise than 20 or even 30 significant digits. Performing high precision calculations is a big deal here. Since I plan to calculate volumes significantly larger than Planck's length I have estimated that my target required precision is about 150 digits, which is 512bits, And it has to be as fast as possible. I know it will take years to achieve, but that's the goal. My current strategy to achieve this is to make sure that yade is 100% compatibile with boost multiprecision. So that when new faster and more precise types appear I will be able to instantly benefit from them. And hope that someday CPUs with native floating point 512 bit type will appear :) |
BTW: MPFR is about 10-times slower. The difference in waiting one month and 10 months for a result becomes really meaningful. |
So let's prefer the 1 month. |
Yes.
This comment rang a bell in my mind. So I mounted a drive from 2007. As it turns out, I developed my own version of I got remarkably far with this code, and am somewhat surprised that I had forgotten about it. If we pursue this topic, I will be sharing my work in a smaller e-mail circle. In that work, it looks like I simply empirically selected 31 for Bset regards, Chris |
Interesting comments and use cases - thanks for this. There's one other potential blocker that springs to mind: the special functions are kind of tricky to implement for types with large bit counts, but small exponent ranges. quad_float is perhaps the archetype here :( There's an open bug for tgamma on this: #150, but it wouldn't surprise me if there aren't others lurking once you go hunting. Question: can double_double be doubled? And doubled again etc? In other words could these be implemented via nested templates? |
One thing I mentioned about this package is licensing of the code. I believe that I am getting this interpretation. Not yet, however, fully sure... But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package. This means, as is confirmed in the DD/QD documentation, that the code will have a hard time being used in multithreading for x86. I would prefer to use a package that sets and resets the x86-FPU-fix within the individual subroutines that need it, thereby allowing for a synchronization mechanism if trying for multithreading. So at the present, I am not aware of an existing kind of DD or QD that would fully satisfy my personal requirements. That being said, one could focus on a popular one. But there seem to be several popular ones dating all the way back to Best regards, Chris |
You read my mind. I think the answers are yes and yes. But I am also wary of such nested templates having once gotten poor results with an integer type using something like this. But that experience was more than 20 ago, I was less experienced at programming C++ back then and compilers got much (massively much) better at parsing nested templates since then. |
Wouldn't that only be required for code executing on the old x87 co-processor, and not x64 code using the SSE registers? |
Yes. I have also just sent you some work in another thread. In that work, the x86-FPU-fix is only needed for that and only for that legacy FPU. |
This has been done near the bottom of this page |
@cosurgi : There's a very tentative version in draft PR referenced above. I would welcome feedback if you'd like to give it a try and see how far you can get. There are some issues, and some possibly controversial design choices:
Anyhow, if you have any real world code you can plug this into, it would be interesting to hear how you get on. It's all a bit too "fast and loose" for me! |
@jzmaddock I'm sorry for late reply. This looks very interesting. Yes I have a real-world code YADE with a full blown testing suite for math functions I will try to use it this or the next week. Just to be sure - I should take this commit ? The issues:
See some example math tests results for MPFR 150 - scroll up to see the math functions being tested by the earlier mentioned script. |
Oh, btw if 2^10 is 1023.99999999(9)..... then it's acceptable for me, because the error will be smaller than the tolerance level. We only need the working round tripping. |
This is actually the reason that round tripping doesn't work - I mean it's approximately correct (to within an epsilon or so), but without a correctly functioning pow() we basically can't implement correct string output. Actually even the basic arithmetic operations aren't correctly rounded either... as long as string IO is approximately correct (to within an epsilon or so - ie drop one digit and it would round to the "correct" result), wouldn't your tests still function OK? |
heh. I suppose that all math tests would work. But the round tripping test would fail ;) I suppose I could implement an exception for this one. |
Meanwhile.... there are a few issues with the current version - mostly order of initialization issues in numeric_limits<qd_real>, give me a day or so and I'll push a fix as they prevent tgamma and others from working. |
Nevermind, fix pushed. The C99 functions including tgamma etc are now working, note that they are not quite std conforming in that they don't adhere to the non-normative appendix F which deals with handling of infinities and NaNs. |
Is there a repo with a github wirkflow for the tests I can clone? |
Yes certainly. Again i think we got pretty close to stability.
So if we can just get stable operations and edge case handling this thing will be done. The formal testing and plug in to Multiprecisin was/is about done. but I need to get the branch and see if it still plays with Boost-Develop branch. |
Yes Richard (@LegalizeAdulthood) we did this work in the 2021 Boost-GSoC. The repo is here. I'll take a few minutes ASAP to see if it shakes out well with merge develop into our branch and then we can start to boogy. Cc: @mborland and @jzmaddock |
OK @LegalizeAdulthood in this sync attempt I'm attempting to merge Boost.Develop Multiprecision into our branch at GSoC2021 to get this thing ready for action. |
It worked rather well. But ONLY the The blaze-down on double-something is as follows:
I did, in fact, merge this to the develop of the GSoC branch even with certain failures of specfun. So what does this mean?
Cc: @LegalizeAdulthood and @jzmaddock and @mborland |
Hello @AuroraDysis based on interest from @LegalizeAdulthood as well as my own, I am, in fact, picking up this project again. I think at the moment the project is struggling right at the double-double level to actually get the right algorithms for add/sub/mul/div/sqrt and a few more. I looked at the links to the Julia-based code you gave and the double-double primitives look quite good. Have you ever converted any of these DD/QD primitives to the C/C++ domain? I can start that if need be. |
Hi @ckormanyos, I'm excited to hear about the renewed interest in this project.
As for your question, I have not yet had the opportunity to convert it to C/C++. |
Hi @LegalizeAdulthood just for info, we are finishing the extreme edge-cases of the The type is, however, robust enough to use for, let's say, regular calculations such as those that do not over/underflow or reach NaNs. Just for fun, I plugged the Some may consider this to be slow, as I do, compared to the expectations for this type. But I definitely think this type is worth pursuing for several reasons. The safety (infinities, NaNs, divide-by-zero, etc.) checks that make this type well-behaved also slow it down considerably. I could imagine a version that throws away checks and is specifically called unsafe. Furthermore, I feel that the double-double backend is/could-be particulalry well-suited for GPU programming if we ever get around to that. So the initial timing report indicates quite some potential down the road. VERY Initial Timing Report
Cc: @mborland and @sinandredemption and @cosurgi and @jzmaddock |
So, we are just about done cleaning up and optimizing I squeezed down the perf a bit on:
See the image below. And also note that low digits is one of the few and rare parameter ranges in which There could/would be a lot more to push down if we enabled something like a template parameter:
Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland |
I have a good mind to finish the docs and try for a review among our authors and to get Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland |
Which version of the library were you using for the QD backend? |
None of them. Neither of them. No previously published QD is used as a library. Nor is any previously published DD used as a library. Hi Richard (@LegalizeAdulthood)
|
@ckormanyos I see, very interesting! I found the existing QD library to be pretty crufty as well. I'm not sure if it was because of the fortran support or just the author's inclinations.
With the existing QD library I did notice that some tests failed around the edge cases. (See my cmake branch on my fork). That made me a little nervous, but I didn't have time to dig into the details of hte literature to understand the failure. Is your rewrite availble on github? |
Yes. It is header-only C++ at the moment located here. The code is very new and in an infant state, but it is getting there. If you clone that repo, it is a fork of what we call the Boost.Multiprecision submodule. To use it:
We can/will be modifying code internals in the future. So these algorithms like add/sub/mul/div/sqrt can evolve. Feel free to comment, add change requests or get involved if you like.
|
Hello @LegalizeAdulthood One other real benefit of our Multiprecision types is that they unabatedly and seamlessly interoperate with Boost.Math. A lot of great math can be done this way. In the example below, I check that (for
This is a great achievement in generic programming basically out of the box. We worked hard for this over the years, and @jzmaddock has been our guide and led this great effort for years. Cc: @mborland and @NAThompson |
I am now running a benchmark of |
Forgot to mention. I am using develop branch with latest commit 9f34658 |
However to compile yade with In this configuration yade is using At first shot I used MPFR. And there were some compiler errors such as:
I think this error may be because I used the linux systemwide installation of boost 1.74 and only replaced the But if you think it might be due to latest changes in Boost.Multiprecision I could try to make a minimal failing example and we could move this into a separate issue. But I was able to compile yade despite these errors and run the benchmarks. I could simply comment out the offending line, because these errors were in diagnostics/testing part of the code - the code not used by I will post the benchmark results once they finish. |
Awesome Janek (@cosurgi) thank you. Now I'm scared to see how the infant
It would be best to have both Math and Multiprecision synchronized. In your case, however, if these are the only two Boost libraries you need, then you could actually make use of their new standalone feature. These two libraries are specifically and purposely standalone and do not rely on any other parts of Boost. This all happened about 2 years ago so right around that 1.74 time.
It does seem like a coincidence that you encounter resolution of function problems around 62 deigits, which is similar to the range of Thanks Janek and let's see how this thing shakes out. |
Hi @cosurgi this is really hot off the press, but I am actually cycling the |
Not so good news. Single run of |
That is kind of a preliminary disaster. Let's see if it finishes at all. One problem I encountered in my first performance run was that interconversions between Otherwise we would need to find out how and why this is so disasterously slooooowwwww. |
Another point. You are doing full physics with probably lots of use of elementary transcendental functions. At the moment, I have been doing exclusively algebra-only tests to mark the real performance of the raw type. On the other hand, we could be experiencing a real disaster... |
I will take a day and run John's suite of performance tests that stress floating-point operations and report back when finished. We did some of these during the GSoC with @sinandredemption and they were looking good. It is time to reproduce these tests (which admittedly I have not done this year, but will). |
The first run is finished, the very preliminary result is that Indeed I am using here several transcendental and elementary functions. Of all of them, I think EDIT: oh yeah, and probably a bit of trigonometric |
Thank you Janek for this preliminary result, even though somewhat challenging. I just ran algebraic tests locally Hmmmm... I wonder if you are facing off a software-emulated type with a hardware type ( I think we need to see what you are actually testing here. I suppose the elementary transcendental functions could make a huge difference but that huge would surprise me. Well, we will eventually get to the bottom of this. |
Yade supports mixed MP types. But in this particular simulation they are not used. Everything is done with |
Yes, upgrading Boost.Math solved the compilation problem. Now I can compile yade with new type just with this super small diff: --- a/lib/high-precision/Real.hpp
+++ b/lib/high-precision/Real.hpp
@@ -130,10 +130,10 @@ namespace math {
-#include <boost/multiprecision/float128.hpp>
+#include <boost/multiprecision/cpp_double_fp.hpp>
namespace yade {
namespace math {
- using UnderlyingReal = boost::multiprecision::float128;
+ using UnderlyingReal = boost::multiprecision::cpp_double_double;
}
} And also full math diagnostics which I have implemented in yade has now compiled without problems. This might come useful, because I was testing almost all math functions, see Table 4 in YADE paper. (The large errors in
You are right, I will try Are you sure it should be 32 decimal places? Also I have in mind testing |
Are you sure But yeah, I will now compare against |
Hi,
as suggested in math #303 I repost this here:
How feasible it is to integrate quad-double precision, It has 212 bits in significand and is faster than MPFR:
I have just recently integrated boost multiprecision with a fairly large numerical computation software YADE.
Float128 isn't enough for my needs, yade would greatly benefit from quad-double (as it is much faster than MPFR). Also yade can serve as a really large testing ground. I have implemented CGAL numerical traits and EIGEN numerical traits to use multiprecision. And all of them are tested daily in our pipeline, see an example run here. Or a more specific example for float128: test (need to scroll up) and check
The text was updated successfully, but these errors were encountered: