Skip to content

Commit 8ab6971

Browse files
committed
advanced transformation examples
1 parent d1d0b54 commit 8ab6971

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed

docs/src/man/working_with_dataframes.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -830,6 +830,72 @@ julia> combine(df, names(df) .=> sum, names(df) .=> prod)
830830
If you would prefer the result to have the same number of rows as the source
831831
data frame, use `select` instead of `combine`.
832832

833+
Note that a `DataFrame` can store values of any type as its columns, for example
834+
below we show how one can store a `Tuple`:
835+
836+
```
837+
julia> df2 = combine(df, All() .=> extrema)
838+
1×2 DataFrame
839+
Row │ A_extrema B_extrema
840+
│ Tuple… Tuple…
841+
─────┼───────────────────────
842+
1 │ (1, 4) (1.0, 4.0)
843+
```
844+
845+
Later you might want to expand the tuples into separate columns storing the computed
846+
minima and maxima. This can be achieved by passing multiple columns for the output.
847+
In the example below we show how this can be done in combination with a function
848+
so that we can generate target column names conditional on source column names:
849+
850+
```
851+
julia> combine(df2, All() .=> identity .=> [c -> first(c) .* ["_min", "_max"]])
852+
1×4 DataFrame
853+
Row │ A_min A_max B_min B_max
854+
│ Int64 Int64 Float64 Float64
855+
─────┼────────────────────────────────
856+
1 │ 1 4 1.0 4.0
857+
```
858+
859+
Note that in this example we needed to pass `identity` explicitly as otherwise the
860+
functions generated with `c -> first(c) .* ["_min", "_max"]` would be treated as transformations
861+
and not as rules for target column names generation.
862+
863+
You might want to perform the transformation of the source data frame into the result
864+
we have just shown in one step. This can be achieved with the following expression:
865+
866+
```
867+
julia> combine(df, All() .=> Ref∘extrema .=> [c -> c .* ["_min", "_max"]])
868+
1×4 DataFrame
869+
Row │ A_min A_max B_min B_max
870+
│ Int64 Int64 Float64 Float64
871+
─────┼────────────────────────────────
872+
1 │ 1 4 1.0 4.0
873+
```
874+
875+
Note that in this case we needed to add a `Ref` call in the `Ref∘extrema` operation specification.
876+
The reason why this is needed is that instead `combine` iterates the contents of the value returned
877+
by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
878+
so one gets an error:
879+
880+
```
881+
julia> combine(df, names(df) .=> extrema .=> [c -> c .* ["_min", "_max"]])
882+
ERROR: ArgumentError: 'Tuple{Int64, Int64}' iterates 'Int64' values,
883+
which doesn't satisfy the Tables.jl `AbstractRow` interface
884+
```
885+
886+
Note that we used `Ref` as it is a container that is typically used in DataFrames.jl when one
887+
wants to store one value, however, in general it could be another iterator. Here is an example
888+
when the tuple returned by `extrema` is wrapped in a `Tuple`, producing the same result:
889+
890+
```
891+
julia> combine(df, names(df) .=> tuple∘extrema .=> [c -> c .* ["_min", "_max"]])
892+
1×4 DataFrame
893+
Row │ A_min A_max B_min B_max
894+
│ Int64 Int64 Float64 Float64
895+
─────┼────────────────────────────────
896+
1 │ 1 4 1.0 4.0
897+
```
898+
833899
## Handling of Columns Stored in a `DataFrame`
834900

835901
Functions that transform a `DataFrame` to produce a

0 commit comments

Comments
 (0)