@@ -812,14 +812,21 @@ julia> df = DataFrame(A=1:4, B=4.0:-1.0:1.0)
812
812
3 │ 3 2.0
813
813
4 │ 4 1.0
814
814
815
- julia> combine(df, names(df ) .=> sum)
815
+ julia> combine(df, All( ) .=> sum)
816
816
1×2 DataFrame
817
817
Row │ A_sum B_sum
818
818
│ Int64 Float64
819
819
─────┼────────────────
820
820
1 │ 10 10.0
821
821
822
- julia> combine(df, names(df) .=> sum, names(df) .=> prod)
822
+ julia> combine(df, All() .=> sum, All() .=> prod)
823
+ 1×4 DataFrame
824
+ Row │ A_sum B_sum A_prod B_prod
825
+ │ Int64 Float64 Int64 Float64
826
+ ─────┼─────────────────────────────────
827
+ 1 │ 10 10.0 24 24.0
828
+
829
+ julia> combine(df, All() .=> [sum prod]) # the same using 2-dimensional broadcasting
823
830
1×4 DataFrame
824
831
Row │ A_sum B_sum A_prod B_prod
825
832
│ Int64 Float64 Int64 Float64
@@ -830,7 +837,11 @@ julia> combine(df, names(df) .=> sum, names(df) .=> prod)
830
837
If you would prefer the result to have the same number of rows as the source
831
838
data frame, use ` select ` instead of ` combine ` .
832
839
833
- Note that a ` DataFrame ` can store values of any type as its columns, for example
840
+ In the remainder of this section we will discuss some of the more advanced topis
841
+ related to operation specification syntax, so you may decide to skip them if you
842
+ want to focus on the most common usage patterns.
843
+
844
+ A ` DataFrame ` can store values of any type as its columns, for example
834
845
below we show how one can store a ` Tuple ` :
835
846
836
847
```
@@ -844,27 +855,51 @@ julia> df2 = combine(df, All() .=> extrema)
844
855
845
856
Later you might want to expand the tuples into separate columns storing the computed
846
857
minima and maxima. This can be achieved by passing multiple columns for the output.
847
- In the example below we show how this can be done in combination with a function
848
- so that we can generate target column names conditional on source column names:
858
+ Here is an example how this can be done by writing the column names by-hand for a single
859
+ input column:
860
+
861
+ ```
862
+ julia> combine(df2, "A_extrema" => identity => ["A_min", "A_max"])
863
+ 1×2 DataFrame
864
+ Row │ A_min A_max
865
+ │ Int64 Int64
866
+ ─────┼──────────────
867
+ 1 │ 1 4
868
+ ```
869
+
870
+ You can extend it to handling all columns in ` df2 ` using broadcasting:
849
871
850
872
```
851
- julia> combine(df2, All() .=> identity .=> [c -> first(c) .* ["_min ", "_max "]])
873
+ julia> combine(df2, All() .=> identity .=> [["A_min", "A_max"], ["B_min ", "B_max "]])
852
874
1×4 DataFrame
853
875
Row │ A_min A_max B_min B_max
854
876
│ Int64 Int64 Float64 Float64
855
877
─────┼────────────────────────────────
856
878
1 │ 1 4 1.0 4.0
857
879
```
858
880
859
- Note that in this example we needed to pass ` identity ` explicitly as otherwise the
860
- functions generated with ` c -> first(c) .* ["_min", "_max"] ` would be treated as transformations
861
- and not as rules for target column names generation.
881
+ This approach works, but can be improved. Instead of writing all the column names
882
+ manually we can instead use a function as a way to specify target column names
883
+ conditional on source column names:
884
+
885
+ ```
886
+ julia> combine(df2, All() .=> identity .=> c -> first(c) .* ["_min", "_max"])
887
+ 1×4 DataFrame
888
+ Row │ A_min A_max B_min B_max
889
+ │ Int64 Int64 Float64 Float64
890
+ ─────┼────────────────────────────────
891
+ 1 │ 1 4 1.0 4.0
892
+ ```
893
+
894
+ Note that in this example we needed to pass ` identity ` explicitly as with
895
+ ` All() => (c -> first(c) .* ["_min", "_max"]) ` the right-hand side part would be
896
+ treated as a transformation and not as a rule for target column names generation.
862
897
863
898
You might want to perform the transformation of the source data frame into the result
864
899
we have just shown in one step. This can be achieved with the following expression:
865
900
866
901
```
867
- julia> combine(df, All() .=> Ref∘extrema .=> [ c -> c .* ["_min", "_max"] ])
902
+ julia> combine(df, All() .=> Ref∘extrema .=> c -> c .* ["_min", "_max"])
868
903
1×4 DataFrame
869
904
Row │ A_min A_max B_min B_max
870
905
│ Int64 Int64 Float64 Float64
@@ -873,28 +908,18 @@ julia> combine(df, All() .=> Ref∘extrema .=> [c -> c .* ["_min", "_max"]])
873
908
```
874
909
875
910
Note that in this case we needed to add a ` Ref ` call in the ` Ref∘extrema ` operation specification.
876
- The reason why this is needed is that instead ` combine ` iterates the contents of the value returned
877
- by the operation specification function and tries to expand it, which in our case is a tuple of numbers ,
911
+ Without ` Ref ` , ` combine ` iterates the contents of the value returned by the operation specification function,
912
+ which in our case is a tuple of numbers, and tries to expand it assuming that each produced value specifies one row ,
878
913
so one gets an error:
879
914
880
915
```
881
- julia> combine(df, names(df ) .=> extrema .=> [c -> c .* ["_min", "_max"]])
916
+ julia> combine(df, All( ) .=> extrema .=> [c -> c .* ["_min", "_max"]])
882
917
ERROR: ArgumentError: 'Tuple{Int64, Int64}' iterates 'Int64' values,
883
918
which doesn't satisfy the Tables.jl `AbstractRow` interface
884
919
```
885
920
886
921
Note that we used ` Ref ` as it is a container that is typically used in DataFrames.jl when one
887
- wants to store one value, however, in general it could be another iterator. Here is an example
888
- when the tuple returned by ` extrema ` is wrapped in a ` Tuple ` , producing the same result:
889
-
890
- ```
891
- julia> combine(df, names(df) .=> tuple∘extrema .=> [c -> c .* ["_min", "_max"]])
892
- 1×4 DataFrame
893
- Row │ A_min A_max B_min B_max
894
- │ Int64 Int64 Float64 Float64
895
- ─────┼────────────────────────────────
896
- 1 │ 1 4 1.0 4.0
897
- ```
922
+ wants to store one row, however, in general it could be another iterator (e.g. a tuple).
898
923
899
924
## Handling of Columns Stored in a ` DataFrame `
900
925
0 commit comments