@@ -830,6 +830,72 @@ julia> combine(df, names(df) .=> sum, names(df) .=> prod)
830
830
If you would prefer the result to have the same number of rows as the source
831
831
data frame, use ` select ` instead of ` combine ` .
832
832
833
+ Note that a ` DataFrame ` can store values of any type as its columns, for example
834
+ below we show how one can store a ` Tuple ` :
835
+
836
+ ```
837
+ julia> df2 = combine(df, All() .=> extrema)
838
+ 1×2 DataFrame
839
+ Row │ A_extrema B_extrema
840
+ │ Tuple… Tuple…
841
+ ─────┼───────────────────────
842
+ 1 │ (1, 4) (1.0, 4.0)
843
+ ```
844
+
845
+ Later you might want to expand the tuples into separate columns storing the computed
846
+ minima and maxima. This can be achieved by passing multiple columns for the output.
847
+ In the example below we show how this can be done in combination with a function
848
+ so that we can generate target column names conditional on source column names:
849
+
850
+ ```
851
+ julia> combine(df2, All() .=> identity .=> [c -> first(c) .* ["_min", "_max"]])
852
+ 1×4 DataFrame
853
+ Row │ A_min A_max B_min B_max
854
+ │ Int64 Int64 Float64 Float64
855
+ ─────┼────────────────────────────────
856
+ 1 │ 1 4 1.0 4.0
857
+ ```
858
+
859
+ Note that in this example we needed to pass ` identity ` explicitly as otherwise the
860
+ functions generated with ` c -> first(c) .* ["_min", "_max"] ` would be treated as transformations
861
+ and not as rules for target column names generation.
862
+
863
+ You might want to perform the transformation of the source data frame into the result
864
+ we have just shown in one step. This can be achieved with the following expression:
865
+
866
+ ```
867
+ julia> combine(df, All() .=> Ref∘extrema .=> [c -> c .* ["_min", "_max"]])
868
+ 1×4 DataFrame
869
+ Row │ A_min A_max B_min B_max
870
+ │ Int64 Int64 Float64 Float64
871
+ ─────┼────────────────────────────────
872
+ 1 │ 1 4 1.0 4.0
873
+ ```
874
+
875
+ Note that in this case we needed to add a ` Ref ` call in the ` Ref∘extrema ` operation specification.
876
+ The reason why this is needed is that instead ` combine ` iterates the contents of the value returned
877
+ by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
878
+ so one gets an error:
879
+
880
+ ```
881
+ julia> combine(df, names(df) .=> extrema .=> [c -> c .* ["_min", "_max"]])
882
+ ERROR: ArgumentError: 'Tuple{Int64, Int64}' iterates 'Int64' values,
883
+ which doesn't satisfy the Tables.jl `AbstractRow` interface
884
+ ```
885
+
886
+ Note that we used ` Ref ` as it is a container that is typically used in DataFrames.jl when one
887
+ wants to store one value, however, in general it could be another iterator. Here is an example
888
+ when the tuple returned by ` extrema ` is wrapped in a ` Tuple ` , producing the same result:
889
+
890
+ ```
891
+ julia> combine(df, names(df) .=> tuple∘extrema .=> [c -> c .* ["_min", "_max"]])
892
+ 1×4 DataFrame
893
+ Row │ A_min A_max B_min B_max
894
+ │ Int64 Int64 Float64 Float64
895
+ ─────┼────────────────────────────────
896
+ 1 │ 1 4 1.0 4.0
897
+ ```
898
+
833
899
## Handling of Columns Stored in a ` DataFrame `
834
900
835
901
Functions that transform a ` DataFrame ` to produce a
0 commit comments