Closed
Description
When performing a combine operation on a DataFrame using the combine
function with groupby
, the output format is inconsistent based on the type of aggregation being applied. Specifically, when aggregating with the :v=>sum
operation, the expected output should retain the array format for the column being aggregated. However, the current behavior results in a scalar value for the aggregated column.
julia> a = DataFrame(k=[1,1,2,2], v=[[1,2],[3,4],[5,6],[7,8]], t=[1,2,3,4])
4×3 DataFrame
Row │ k v t
│ Int64 Array… Int64
─────┼──────────────────────
1 │ 1 [1, 2] 1
2 │ 1 [3, 4] 2
3 │ 2 [5, 6] 3
4 │ 2 [7, 8] 4
julia> combine(groupby(a, :k), :v=>sum)
4×2 DataFrame
Row │ k v_sum
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 1 6
3 │ 2 12
4 │ 2 14
julia> combine(groupby(a, :k), :t=>sum)
2×2 DataFrame
Row │ k t_sum
│ Int64 Int64
─────┼──────────────
1 │ 1 3
2 │ 2 7
Expected:
julia> combine(groupby(a, :k), :v=>sum)
2×2 DataFrame
Row │ k v_sum
│ Int64 Array…
─────┼────────────────
1 │ 1 [4, 6]
2 │ 2 [12, 14]
Metadata
Metadata
Assignees
Labels
No labels