Skip to content

Implement floating point conversion with ryu #365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Dec 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ba61414
Implement floating point conversion with ryu
la-wu Feb 22, 2021
7d80321
Use manual strictness
la-wu Feb 22, 2021
356ed6c
Use checked shifts
la-wu Feb 23, 2021
c0a0583
Use builtin float-to-word conversion functions
la-wu Feb 25, 2021
ff2210d
Use builtin conversion to Bool
la-wu Feb 25, 2021
0bb5a01
Remove dependency on array package
la-wu Feb 25, 2021
de1d174
Handle non-exhaustive patterns
la-wu Feb 25, 2021
8dd7f16
Try using prim conversions directly
la-wu Feb 25, 2021
23d5cfe
Revert "Try using prim conversions directly"
la-wu Feb 28, 2021
755f58f
Dispatch to slow cast when builtin unavailable
la-wu Feb 28, 2021
4635e2b
Try bumping min version to 8.4.x
la-wu Feb 28, 2021
76b5e2e
Fix log10pow5 approximation and add unit test
la-wu Aug 8, 2021
b5f7086
Re-export floatDec and doubleDec to maintain public API
la-wu Aug 8, 2021
648bfae
Improve documentation and fixes for initial code review
la-wu Aug 8, 2021
d172abf
Improve table generation documentation and clean-up
la-wu Aug 8, 2021
5f3dce5
Improve documentation of f2s and d2s and cleanup
la-wu Aug 8, 2021
f1c6275
Use stricter integral types and annotate fromIntegral usages
la-wu Sep 18, 2021
c2c2c87
Add module descriptions and fix typos
la-wu Sep 18, 2021
f6497c2
Use internal FloatFormat instead of GHC.Float.FFFormat
la-wu Sep 18, 2021
60db980
Use monomorphic helpers for remaining integral conversions used by Re…
la-wu Sep 23, 2021
6ec7e2d
Remove usage of TemplateHaskell in RealFloat
la-wu Sep 23, 2021
a73c84e
Fix LUT usage on big-endian systems
la-wu Sep 24, 2021
3ccfd47
Add header for endianness detection
la-wu Sep 27, 2021
6b3eaaa
Fix big-endian word16 packing in fast digit formatting
la-wu Sep 29, 2021
abf6e04
Fix big-endian word128 read from raw addr
la-wu Sep 29, 2021
f771cd5
Clean up unused functions
la-wu Oct 5, 2021
b394896
Fix incorrect reciprocal function usage
la-wu Oct 5, 2021
9815597
Add more test coverage and fix doc example
la-wu Oct 5, 2021
c0648bb
Use quickcheck equality property in tests
la-wu Oct 6, 2021
d87d3ae
Format haddock headers more similarly to existing ones
la-wu Oct 6, 2021
5500d59
Use simpler reciprocal math for 32-bit words
la-wu Oct 7, 2021
7d7d7fa
Use boxed arithmetic in logic flow
la-wu Oct 12, 2021
906d6db
Support ghc 9.2 prim word changes
la-wu Oct 12, 2021
046a42b
Fix 32-bit support
la-wu Oct 12, 2021
8fafed4
Skip conversion to Double before fixed Float formatting
la-wu Nov 7, 2021
dde95e2
Tweak doc wording and add examples
la-wu Nov 7, 2021
415ac6f
Rename FExponent to FScientific
la-wu Nov 15, 2021
0474332
Use an opaque FloatFormat type for compatibility
la-wu Nov 15, 2021
9be8170
Rename float fixed-format to standard-format and other naming tweaks
la-wu Nov 17, 2021
f67df50
Encourage inlining by removing partial application
la-wu Nov 17, 2021
a01cb00
Fix some haddock links and accidental monospacing
la-wu Nov 22, 2021
0cc5417
Add explanation about difference between implementation and reference…
la-wu Nov 22, 2021
12436a2
Clarify default precision
la-wu Nov 22, 2021
d8dac2a
Point to ryu paper for more details
la-wu Nov 22, 2021
b70918b
Fix non-exhaustive warning for ghc 9.2
la-wu Nov 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Data/ByteString/Builder.hs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ module Data.ByteString.Builder
, stringUtf8

, module Data.ByteString.Builder.ASCII
, module Data.ByteString.Builder.RealFloat

) where

Expand All @@ -261,6 +262,7 @@ import Data.ByteString.Builder.Internal
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Lazy.Internal as L
import Data.ByteString.Builder.ASCII
import Data.ByteString.Builder.RealFloat

import Data.String (IsString(..))
import System.IO (Handle, IOMode(..), withBinaryFile)
Expand Down
27 changes: 1 addition & 26 deletions Data/ByteString/Builder/ASCII.hs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ import Data.ByteString.Lazy as L
import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Builder.Prim.Internal as P
import Data.ByteString.Builder.RealFloat (floatDec, doubleDec)

import Foreign
import Foreign.C.Types
Expand All @@ -89,16 +90,6 @@ import Foreign.C.Types
-- Decimal Encoding
------------------------------------------------------------------------------


-- | Encode a 'String' using 'P.char7'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = P.primMapListFixed P.char7

------------------------------------------------------------------------------
-- Decimal Encoding
------------------------------------------------------------------------------

-- Signed integers
------------------

Expand Down Expand Up @@ -163,22 +154,6 @@ wordDec :: Word -> Builder
wordDec = P.primBounded P.wordDec


-- Floating point numbers
-------------------------

-- TODO: Use Bryan O'Sullivan's double-conversion package to speed it up.

-- | /Currently slow./ Decimal encoding of an IEEE 'Float'.
{-# INLINE floatDec #-}
floatDec :: Float -> Builder
floatDec = string7 . show

-- | /Currently slow./ Decimal encoding of an IEEE 'Double'.
{-# INLINE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = string7 . show


------------------------------------------------------------------------------
-- Hexadecimal Encoding
------------------------------------------------------------------------------
Expand Down
272 changes: 272 additions & 0 deletions Data/ByteString/Builder/RealFloat.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
-- |
-- Module : Data.ByteString.Builder.RealFloat
-- Copyright : (c) Lawrence Wu 2021
-- License : BSD-style
-- Maintainer : [email protected]
--
-- Floating point formatting for @Bytestring.Builder@
--
-- This module primarily exposes `floatDec` and `doubleDec` which do the
-- equivalent of converting through @'Data.ByteString.Builder.string7' . 'show'@.
--
-- It also exposes `formatFloat` and `formatDouble` with a similar API as
-- `GHC.Float.formatRealFloat`.
--
-- NB: The float-to-string conversions exposed by this module match `show`'s
-- output (specifically with respect to default rounding and length). In
-- particular, there are boundary cases where the closest and \'shortest\'
-- string representations are not used. Mentions of \'shortest\' in the docs
-- below are with this caveat.
--
-- For example, for fidelity, we match `show` on the output below.
--
-- >>> show (1.0e23 :: Float)
-- "1.0e23"
-- >>> show (1.0e23 :: Double)
-- "9.999999999999999e22"
-- >>> floatDec 1.0e23
-- "1.0e23"
-- >>> doubleDec 1.0e23
-- "9.999999999999999e22"
--
-- Simplifying, we can build a shorter, lossless representation by just using
-- @"1.0e23"@ since the floating point values that are 1 ULP away are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "ULP"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! Could you link to the article?

Note BTW that haddock supports Markdown-style hyperlinks, e.g. [ULP](https://en.wikipedia.org/wiki/Unit_in_the_last_place).

--
-- >>> showHex (castDoubleToWord64 1.0e23) []
-- "44b52d02c7e14af6"
-- >>> castWord64ToDouble 0x44b52d02c7e14af5
-- 9.999999999999997e22
-- >>> castWord64ToDouble 0x44b52d02c7e14af6
-- 9.999999999999999e22
-- >>> castWord64ToDouble 0x44b52d02c7e14af7
-- 1.0000000000000001e23
--
-- In particular, we could use the exact boundary if it is the shortest
-- representation and the original floating number is even. To experiment with
-- the shorter rounding, refer to
-- `Data.ByteString.Builder.RealFloat.Internal.acceptBounds`. This will give us
--
-- >>> floatDec 1.0e23
-- "1.0e23"
-- >>> doubleDec 1.0e23
-- "1.0e23"
--
-- For more details, please refer to the
-- <https://dl.acm.org/doi/10.1145/3192366.3192369 Ryu paper>.
Comment on lines +15 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is a bit too technical for most users. If we want to keep it, we should probably move it out of the module header and to the bottom of the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there such a thing as a footer? This was added per one of our discussions above about clarifying 'shortest'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can insert chunks of documentation at selected places in the export list. See e.g. Data.ByteString.Lazy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



module Data.ByteString.Builder.RealFloat
( floatDec
, doubleDec

-- * Custom formatting
, formatFloat
, formatDouble
, FloatFormat
, standard
, standardDefaultPrecision
, scientific
, generic
) where

import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.RealFloat.Internal as R
import qualified Data.ByteString.Builder.RealFloat.F2S as RF
import qualified Data.ByteString.Builder.RealFloat.D2S as RD
import qualified Data.ByteString.Builder.Prim as BP
import GHC.Float (roundTo)
import GHC.Word (Word64)
import GHC.Show (intToDigit)

-- | Returns a rendered Float. Matches `show` in displaying in standard or
-- scientific notation
--
-- @
-- floatDec = 'formatFloat' 'generic'
-- @
{-# INLINABLE floatDec #-}
floatDec :: Float -> Builder
floatDec = formatFloat generic

-- | Returns a rendered Double. Matches `show` in displaying in standard or
-- scientific notation
--
-- @
-- doubleDec = 'formatDouble' 'generic'
-- @
{-# INLINABLE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = formatDouble generic

-- | Format type for use with `formatFloat` and `formatDouble`.
data FloatFormat = MkFloatFormat FormatMode (Maybe Int)

-- | Standard notation with `n` decimal places
standard :: Int -> FloatFormat
standard n = MkFloatFormat FStandard (Just n)

-- | Standard notation with the \'default precision\' (decimal places matching `show`)
standardDefaultPrecision :: FloatFormat
standardDefaultPrecision = MkFloatFormat FStandard Nothing

-- | Scientific notation with \'default precision\' (decimal places matching `show`)
scientific :: FloatFormat
scientific = MkFloatFormat FScientific Nothing

-- | Standard or scientific notation depending on the exponent. Matches `show`
generic :: FloatFormat
generic = MkFloatFormat FGeneric Nothing

-- | ByteString float-to-string format
data FormatMode
= FScientific -- ^ scientific notation
| FStandard -- ^ standard notation with `Maybe Int` digits after the decimal
| FGeneric -- ^ dispatches to scientific or standard notation based on the exponent
deriving Show

-- TODO: support precision argument for FGeneric and FScientific
-- | Returns a rendered Float. Returns the \'shortest\' representation in
-- scientific notation and takes an optional precision argument in standard
-- notation. Also see `floatDec`.
--
-- With standard notation, the precision argument is used to truncate (or
-- extend with 0s) the \'shortest\' rendered Float. The \'default precision\' does
-- no such modifications and will return as many decimal places as the
-- representation demands.
--
-- e.g
--
-- >>> formatFloat (standard 1) 1.2345e-2
-- "0.0"
-- >>> formatFloat (standard 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatFloat standardDefaultPrecision 1.2345e-2
-- "0.01234"
-- >>> formatFloat scientific 12.345
-- "1.2345e1"
-- >>> formatFloat generic 12.345
-- "12.345"
{-# INLINABLE formatFloat #-}
formatFloat :: FloatFormat -> Float -> Builder
formatFloat (MkFloatFormat fmt prec) = \f ->
let (RF.FloatingDecimal m e) = RF.f2Intermediate f
e' = R.int32ToInt e + R.decimalLength9 m in
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showStandard (R.word32ToWord64 m) e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FScientific -> RF.f2s f
FStandard ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showStandard (R.word32ToWord64 m) e' prec

-- TODO: support precision argument for FGeneric and FScientific
-- | Returns a rendered Double. Returns the \'shortest\' representation in
-- scientific notation and takes an optional precision argument in standard
-- notation. Also see `doubleDec`.
--
-- With standard notation, the precision argument is used to truncate (or
-- extend with 0s) the \'shortest\' rendered Float. The \'default precision\'
-- does no such modifications and will return as many decimal places as the
-- representation demands.
--
-- e.g
--
-- >>> formatDouble (standard 1) 1.2345e-2
-- "0.0"
-- >>> formatDouble (standard 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatDouble standardDefaultPrecision 1.2345e-2
-- "0.01234"
-- >>> formatDouble scientific 12.345
-- "1.2345e1"
-- >>> formatDouble generic 12.345
-- "12.345"
{-# INLINABLE formatDouble #-}
formatDouble :: FloatFormat -> Double -> Builder
formatDouble (MkFloatFormat fmt prec) = \f ->
let (RD.FloatingDecimal m e) = RD.d2Intermediate f
e' = R.int32ToInt e + R.decimalLength17 m in
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showStandard m e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FScientific -> RD.d2s f
FStandard ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showStandard m e' prec

-- | Char7 encode a 'Char'.
{-# INLINE char7 #-}
char7 :: Char -> Builder
char7 = BP.primFixed BP.char7

-- | Char7 encode a 'String'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = BP.primMapListFixed BP.char7

-- | Encodes a `-` if input is negative
sign :: RealFloat a => a -> Builder
sign f = if f < 0 then char7 '-' else mempty

-- | Special rendering for Nan, Infinity, and 0. See
-- RealFloat.Internal.NonNumbersAndZero
specialStr :: RealFloat a => a -> Maybe Builder
specialStr f
| isNaN f = Just $ string7 "NaN"
| isInfinite f = Just $ sign f `mappend` string7 "Infinity"
| isNegativeZero f = Just $ string7 "-0.0"
| f == 0 = Just $ string7 "0.0"
| otherwise = Nothing

-- | Returns a list of decimal digits in a Word64
digits :: Word64 -> [Int]
digits w = go [] w
where go ds 0 = ds
go ds c = let (q, r) = R.dquotRem10 c
in go ((R.word64ToInt r) : ds) q

-- | Show a floating point value in standard notation. Based on GHC.Float.showFloat
showStandard :: Word64 -> Int -> Maybe Int -> Builder
showStandard m e prec =
case prec of
Nothing
| e <= 0 -> char7 '0'
`mappend` char7 '.'
`mappend` string7 (replicate (-e) '0')
`mappend` mconcat (digitsToBuilder ds)
| otherwise ->
let f 0 s rs = mk0 (reverse s) `mappend` char7 '.' `mappend` mk0 rs
f n s [] = f (n-1) (char7 '0':s) []
f n s (r:rs) = f (n-1) (r:s) rs
in f e [] (digitsToBuilder ds)
Just p
| e >= 0 ->
let (ei, is') = roundTo 10 (p' + e) ds
(ls, rs) = splitAt (e + ei) (digitsToBuilder is')
in mk0 ls `mappend` mkDot rs
| otherwise ->
let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
-- ds' should always be non-empty but use redundant pattern
-- matching to silence warning
ds' = if ei > 0 then is' else 0:is'
(ls, rs) = splitAt 1 $ digitsToBuilder ds'
in mk0 ls `mappend` mkDot rs
where p' = max p 0
where
mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls
mkDot rs = if null rs then mempty else char7 '.' `mappend` mconcat rs
ds = digits m
digitsToBuilder = fmap (char7 . intToDigit)

Loading