Skip to content

Commit cf62a94

Browse files
committed
apply review suggestions
1 parent 8ab6971 commit cf62a94

File tree

1 file changed

+49
-24
lines changed

1 file changed

+49
-24
lines changed

docs/src/man/working_with_dataframes.md

Lines changed: 49 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -812,14 +812,21 @@ julia> df = DataFrame(A=1:4, B=4.0:-1.0:1.0)
812812
3 │ 3 2.0
813813
4 │ 4 1.0
814814
815-
julia> combine(df, names(df) .=> sum)
815+
julia> combine(df, All() .=> sum)
816816
1×2 DataFrame
817817
Row │ A_sum B_sum
818818
│ Int64 Float64
819819
─────┼────────────────
820820
1 │ 10 10.0
821821
822-
julia> combine(df, names(df) .=> sum, names(df) .=> prod)
822+
julia> combine(df, All() .=> sum, All() .=> prod)
823+
1×4 DataFrame
824+
Row │ A_sum B_sum A_prod B_prod
825+
│ Int64 Float64 Int64 Float64
826+
─────┼─────────────────────────────────
827+
1 │ 10 10.0 24 24.0
828+
829+
julia> combine(df, All() .=> [sum prod]) # the same using 2-dimensional broadcasting
823830
1×4 DataFrame
824831
Row │ A_sum B_sum A_prod B_prod
825832
│ Int64 Float64 Int64 Float64
@@ -830,7 +837,11 @@ julia> combine(df, names(df) .=> sum, names(df) .=> prod)
830837
If you would prefer the result to have the same number of rows as the source
831838
data frame, use `select` instead of `combine`.
832839

833-
Note that a `DataFrame` can store values of any type as its columns, for example
840+
In the remainder of this section we will discuss some of the more advanced topis
841+
related to operation specification syntax, so you may decide to skip them if you
842+
want to focus on the most common usage patterns.
843+
844+
A `DataFrame` can store values of any type as its columns, for example
834845
below we show how one can store a `Tuple`:
835846

836847
```
@@ -844,27 +855,51 @@ julia> df2 = combine(df, All() .=> extrema)
844855

845856
Later you might want to expand the tuples into separate columns storing the computed
846857
minima and maxima. This can be achieved by passing multiple columns for the output.
847-
In the example below we show how this can be done in combination with a function
848-
so that we can generate target column names conditional on source column names:
858+
Here is an example how this can be done by writing the column names by-hand for a single
859+
input column:
860+
861+
```
862+
julia> combine(df2, "A_extrema" => identity => ["A_min", "A_max"])
863+
1×2 DataFrame
864+
Row │ A_min A_max
865+
│ Int64 Int64
866+
─────┼──────────────
867+
1 │ 1 4
868+
```
869+
870+
You can extend it to handling all columns in `df2` using broadcasting:
849871

850872
```
851-
julia> combine(df2, All() .=> identity .=> [c -> first(c) .* ["_min", "_max"]])
873+
julia> combine(df2, All() .=> identity .=> [["A_min", "A_max"], ["B_min", "B_max"]])
852874
1×4 DataFrame
853875
Row │ A_min A_max B_min B_max
854876
│ Int64 Int64 Float64 Float64
855877
─────┼────────────────────────────────
856878
1 │ 1 4 1.0 4.0
857879
```
858880

859-
Note that in this example we needed to pass `identity` explicitly as otherwise the
860-
functions generated with `c -> first(c) .* ["_min", "_max"]` would be treated as transformations
861-
and not as rules for target column names generation.
881+
This approach works, but can be improved. Instead of writing all the column names
882+
manually we can instead use a function as a way to specify target column names
883+
conditional on source column names:
884+
885+
```
886+
julia> combine(df2, All() .=> identity .=> c -> first(c) .* ["_min", "_max"])
887+
1×4 DataFrame
888+
Row │ A_min A_max B_min B_max
889+
│ Int64 Int64 Float64 Float64
890+
─────┼────────────────────────────────
891+
1 │ 1 4 1.0 4.0
892+
```
893+
894+
Note that in this example we needed to pass `identity` explicitly as with
895+
`All() => (c -> first(c) .* ["_min", "_max"])` the right-hand side part would be
896+
treated as a transformation and not as a rule for target column names generation.
862897

863898
You might want to perform the transformation of the source data frame into the result
864899
we have just shown in one step. This can be achieved with the following expression:
865900

866901
```
867-
julia> combine(df, All() .=> Ref∘extrema .=> [c -> c .* ["_min", "_max"]])
902+
julia> combine(df, All() .=> Ref∘extrema .=> c -> c .* ["_min", "_max"])
868903
1×4 DataFrame
869904
Row │ A_min A_max B_min B_max
870905
│ Int64 Int64 Float64 Float64
@@ -873,28 +908,18 @@ julia> combine(df, All() .=> Ref∘extrema .=> [c -> c .* ["_min", "_max"]])
873908
```
874909

875910
Note that in this case we needed to add a `Ref` call in the `Ref∘extrema` operation specification.
876-
The reason why this is needed is that instead `combine` iterates the contents of the value returned
877-
by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
911+
Without `Ref`, `combine` iterates the contents of the value returned by the operation specification function,
912+
which in our case is a tuple of numbers, and tries to expand it assuming that each produced value specifies one row,
878913
so one gets an error:
879914

880915
```
881-
julia> combine(df, names(df) .=> extrema .=> [c -> c .* ["_min", "_max"]])
916+
julia> combine(df, All() .=> extrema .=> [c -> c .* ["_min", "_max"]])
882917
ERROR: ArgumentError: 'Tuple{Int64, Int64}' iterates 'Int64' values,
883918
which doesn't satisfy the Tables.jl `AbstractRow` interface
884919
```
885920

886921
Note that we used `Ref` as it is a container that is typically used in DataFrames.jl when one
887-
wants to store one value, however, in general it could be another iterator. Here is an example
888-
when the tuple returned by `extrema` is wrapped in a `Tuple`, producing the same result:
889-
890-
```
891-
julia> combine(df, names(df) .=> tuple∘extrema .=> [c -> c .* ["_min", "_max"]])
892-
1×4 DataFrame
893-
Row │ A_min A_max B_min B_max
894-
│ Int64 Int64 Float64 Float64
895-
─────┼────────────────────────────────
896-
1 │ 1 4 1.0 4.0
897-
```
922+
wants to store one row, however, in general it could be another iterator (e.g. a tuple).
898923

899924
## Handling of Columns Stored in a `DataFrame`
900925

0 commit comments

Comments
 (0)