Skip to content

KeyError when 'dask_histogram.boost.Histogram().Fill()' with dask dataframe #161

@RobinTimTom

Description

@RobinTimTom

Dear experts,
I am starting to use dask and dask_histogram, but I am facing an error when I want to fill a dask_histogram.boost with a dataframe as below:

import numpy as np
import dask.dataframe as dd
import dask_histogram.boost as dhb

# this is reproducible
d = {
    'A': np.random.normal(0., 1., 100000),
    'W': np.random.uniform(0.2, 0.8, 100000),
}
ddf = dd.from_dict(d, npartitions=10)

h = dhb.Histogram(
    dhb.axis.Regular(10, -3, 3),
    storage=dhb.storage.Weight()
).fill(ddf['A'], weight=ddf['W']).compute()
print(h)

This example gives me :

Traceback (most recent call last):
  File "/gpfs/home/belle2/rlebouch/darkphotontodimuons/background_rejection/testdask.py", line 15, in <module>
    ).fill(ddf['A'], weight=ddf['W']).compute()
                                      ^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 653, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 422, in collections_to_dsk
    dsk = opt(dsk, keys, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask_histogram/core.py", line 514, in optimize
    dsk = fuse_roots(dsk, keys=keys)  # type: ignore
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 1564, in fuse_roots
    new = toolz.merge(layer, *[layers[dep] for dep in deps])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/toolz/dicttoolz.py", line 39, in merge
    rv.update(d)
  File "<frozen _collections_abc>", line 836, in __iter__
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 641, in __iter__
    return iter(self._dict)
                ^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 607, in _dict
    dsk = _make_blockwise_graph(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 958, in _make_blockwise_graph
    itertools.product(*[range(dims[i]) for i in out_indices])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 958, in <listcomp>
    itertools.product(*[range(dims[i]) for i in out_indices])
                              ~~~~^^^
KeyError: '.0'

Is It really possible to fill a histogram from a data frame?

I currently use:
Name: dask-histogram
Version: 2024.12.1

Name: dask
Version: 2024.12.1

Name: boost_histogram
Version: 1.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions