-
-
Notifications
You must be signed in to change notification settings - Fork 3
WIP: POC Redesign extraction #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This would specify "factor columns from |
Yes, I'm glad it is easy to understand.
Not sure I fully understand your question. Variables can also be manually set, for example
The only limitation is that the function provided should return a logical vector. In terms of implementation, the function is applied to the names and to the values, precisely to accommodate cases like this. |
I don't get this one. I would think it would return columns where the values are in all caps. |
The same function is applied to column names and column values? 🤔 |
It could mean both things imho. But here it is specifying variables, not values. For filtering by value there is td <- within(teal.data::teal_data(), {
df <- data.frame(A = as.factor(letters[1:5]), Ab = letters[1:5])
m <- matrix()
})
spec <- datasets("df") & variables(function(x){x==toupper(x)})
resolver(spec, td) Still selects A as the name is in capital letters
Yes, if there is a single argument that should accept |
Thanks, it does explain things a bit. |
I updated the branch to make it easy to reflect updates on selections from the possible choices. Besides some internal functions, it now exports a new key function Here is a full example with a shiny module and teal: library("teal")
library("shiny")
# Initialize the teal app
mod <- function(label, x, y) {
module(
ui = function(id, x, y) {
ns <- NS(id)
div(
module_input_ui(ns("x"), "x", x),
module_input_ui(ns("y"), "y", y),
shiny::tagList(
shiny::textOutput(ns("text"))
)
)
},
server = function(id, data, x, y) {
moduleServer(id, function(input, output, session) {
x_in <- module_input_server("x", x, data)
y_in <- module_input_server("y", y, data)
output$text <- shiny::renderText({
req(x_in(), y_in())
l <- lapply(
list(x = x_in, y = y_in),
function(sel) {
if (sum(lengths(sel()))) {
paste0(
"Object: ", sel()$datasets, "\nVariables: ",
paste0(sel()$variables, collapse = ", ")
)
} else {
"No info??"
}
}
)
unlist(l)
})
})
},
ui_args = list(x = x, y = y),
server_args = list(x = x, y = y)
)
} Here I create three different specifications, you can pass them to the app and they will work: iris_numeric <- datasets("IRIS") & variables(is.numeric)
mtcars_char <- datasets("MTCARS") & variables(is.character)
iris_numeric_or_mtcars_char <- iris_numeric | mtcars_char
app <- init(
data = teal.data::teal_data(IRIS = iris, MTCARS = mtcars),
modules = modules(
mod(x = datasets("IRIS") & variables(is.factor),
y = datasets("MTCARS") & variables(is.numeric))
)
)
shinyApp(app$ui, app$server) |
I've been experimenting a bit and it works well for some simple cases: env <- within(teal.data::teal_data(), {
iris <- iris
iris2 <- cbind(iris, A = 1)
m <- diag(5)
d <- cbind(m, B = 1)
}) However it easily fails when multiple inputs are acceptable: env2 <- within(env, {
iris2$a <- iris2$Species
})
# Get the levels of the factors:
sapply(env2, function(x){if(is.data.frame(x) & any(sapply(x, is.factor))){
levels(x[ , sapply(x, is.factor)])
}}) We would still to ensure that the output is valid within some limits (only allow named logical values? only allow the names of data objects, variables, or values?). When multiple options are valid on the previous step evaluating in further steps could create a confusion on how to limit to those selected by the user or on the structure returned (list of list of lists). I am thinking for example on the allowing selecting all the levels from variables that starts with ARM but at most one variable can be selected). This direct evaluation makes a tad harder to compose the extraction. Say we want the variable ARM of ADSL and then in a different input the ARMCD. With the current design we can reuse the dataset specification and append the variable specification. With a direct function, app developers will need to do that themselves: adam <- within(teal.data::teal_data(), {
penguins <- penguins
ADSL <- data.frame(ARM = 1, ARMCD = 2)
})
is_ADSL <- function(x) {
x == "ADSL"
}
is_ARM <- function(x) {
x == "ARM"
}
is_ARMCD <- function(x) {
x == "ARMCD"
}
sapply(names(adam), function(x, env){
if (is_ADSL(x)) {
nam <- names(env[[x]])
nam[is_ARM(nam)]
}
}, env = adam)
#> $ADSL
#> [1] "ARM"
#>
#> $penguins
#> NULL
sapply(names(adam), function(x, env){
if (is_ADSL(x)) {
nam <- names(env[[x]])
nam[is_ARMCD(nam)]
}
}, env = adam)
#> $ADSL
#> [1] "ARMCD"
#>
#> $penguins
#> NULL Created on 2025-04-23 with reprex v2.1.1 We might still need to evaluate several times the function in case the data has changed due to user updated selections. So the benefit of having a single eager function is diminished. On the plus side I like that this would make it a lot easier for new/different classes (On the current implementation I think there is room for improvement on supporting them). But I don't think there will be requests/interest to many new classes (Bioconductor emphasis on reusing them will make it easy to support different classes that inherit from a single one). But let's see the team's and @gogonzo feedback. |
If it comes to the readability - please consider
Then we would require a function of
Yes. We can make a simple decorator with some "validation" on the returned value. That validation might need some additional information though.
Definitely yes. Note the flow: filter panel -> encoding -> output. Encoding needs to be reactive on the changes in filter panel.
I was thinking about this as this might be one of the biggest challenges in here. Recently I am more interacting with some sort of configuration files in which things are more explicit. How about enabling different logic depending on the dataset selected via simple case when clause?
|
|
One of the feedback from Nina was if there would be a sensible default for app developers. I guess if we go to this implementation it could be even simpler as the app developer could pick the first of whatever makes sense for their modules.
This validation is what led to multiple methods required for new classes on the current design. Accessing and extracting is one thing but then we need to make sure the values retrieved makes sense.
Having a switch clause won't help reducing the length of code when the input to show the users is from the same dataset. Soon we will have requests to provide a template/default or simplify that part of the API. Even if we didn't, these switch clauses would be repeated on multiple user input it could becomes too verbose imho. PS: @chlebowa feel free to give some feedback about the design itself too. Not sure if you like the dual purpose of the functions provided on the current implementation or if you would prefer for easy extension having direct access to the data. PS2: As the increment is about to finish we'll need to take a decision soon to where this is heading and the next steps. |
Signed-off-by: Lluís Revilla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing logical operators implemented on type classes will simplify the code in multiple places. In POC we can still use c()
in the examples and app wouldn't fail. API should be only a replacement to the "explicit" class specification.
# make sure this works
c(
datasets(is.data.frame, select = "df1"),
variables(is.numeric)
)
# this is just an alternative syntax to above
datasets(is.data.frame, select = "df1") & variables(is.numeric)
Additionaly, we don't have transform-constructor and it is only possible to create a "transform"
using c()
or logical-operators, which is the opposite of how the design should be implemented. Problem is that transform
is already a base method and we can't create a different function than transform(_data, ...)
. Probably, we need other name for this class.
helper_input <- function(id, | ||
label, | ||
multiple = FALSE) { | ||
shiny::selectInput( | ||
id, | ||
label, | ||
choices = NULL, | ||
selected = NULL, | ||
multiple = multiple | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is short enough to use it directly
R/types.R
Outdated
"Invalid selection" = check_input(type) | ||
) | ||
if (is.function(x)) { | ||
x <- list(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it a list?
x <- list(x) | |
x <- list(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I has some challenges combining different types when a single function is provided and this was the simplest way to solve them that I found. If we go with direct access to the qenv with tidyselect helpers we will need to replace that and capture the expression or something similar.
R/resolver.R
Outdated
|
||
# Evaluate if the function applied to the data | ||
# but we need to return the name of the data received | ||
functions_data <- function(spec_criteria, names_data, data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions_names
and function_data
could be replaced to handle tidyselect
syntax. Implementation is pretty simple:
foo <- function(data, ...) {
eval_select(rlang::expr(c(...)), data)
}
iris |> foo(starts_with("Sepal"))
# Sepal.Length Sepal.Width
# 1 2
iris |> foo(where(is.numeric))
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 2 3 4
Remaining question is how to implement eval_select
for datasets
or for other classes. maybe we should have own "eval_select" equivalent to handle multiple types (qenv, data.frame)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for example:
teal.code::qenv() |> within({
aa <- 1
ab <- 2
ba <- 3
bb <- NULL
}) |> as.list() |> foo(starts_with("a") | where(is.null), -aa)
# ab bb
# 4 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much cleaner than what I implemented 👍👍👍. The exploration for other classes I think is worth it, specially seeing it already works with lists (which I thought it wouldn't work). Extending to multiple classes might be then simpler
R/ops_transform.R
Outdated
@@ -0,0 +1,134 @@ | |||
#' @export | |||
Ops.transform <- function(e1, e2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After digging in the implementation I think tidyselect
inspired syntax is little bit off from the origin. I think tidyselect
makes sense for names
argument e.g. datasets(names = starts_with(...))
.
Tidyselect allows to combine different "selection helpers" together with logical operators like datasets(starts_with("Sepal") | !ends_with("Width"))
. I find combining "type" objects (datasets
, variables
, ...) with logical operators complicated and confusing. For example what would be a result of:
datasets(is.data.frame, select = "iris") & variables(starts_with("Sepal")) &
!datasets(is.data.frame, select = "mtcars") & variables(is.numeric)
I think it doesn't make sense to copy tidyselect syntax and maintain the operators on our side. And I would stick to use operators for a single argument (names
) rather than for the whole "transform" object. Maintaining ops for the whole transform class might be (and it is already) very complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After digging in the implementation I think
tidyselect
inspired syntax is little bit off from the origin. I thinktidyselect
makes sense fornames
argument e.g.datasets(names = starts_with(...))
.
So you wouldn't allow tidyselect syntax on the select =
? And I take that you would rename x
to names
on the types 👍 .
Tidyselect allows to combine different "selection helpers" together with logical operators like
datasets(starts_with("Sepal") | !ends_with("Width"))
. I find combining "type" objects (datasets
,variables
, ...) with logical operators complicated and confusing. For example what would be a result of:datasets(is.data.frame, select = "iris") & variables(starts_with("Sepal")) & !datasets(is.data.frame, select = "mtcars") & variables(is.numeric)
The negation currently only takes into account the specification on x
argument (as you can see with (!datasets(is.data.frame, select = "mtcars"))
), so this would theoretically results in an empty selection as it would select all data.frames and then exclude them.
Given !
precedence it is only applied to the datasets, so it would be possible to select numeric variables and all those that start with Sepal of the data.frames selected (none on this case).
Currently there is no method for !
on the "transform" class. ie this returns an error: !(datasets(is.data.frame, select = "mtcars") & variables(is.numeric))
.
datasets(is.data.frame, select = "iris") & variables(starts_with("Sepal")) &
!datasets(is.data.frame, select = "mtcars") & variables(is.numeric)
I think it doesn't make sense to copy tidyselect syntax and maintain the operators on our side.
The problem with relying on tidyselect is that it only applies to data.frames, it won't work with other classes not even matrices. If this is okay, and we try to apply some tricks as Pawel suggested above for other methods I agree this in principle is less work for us.
And I would stick to use operators for a single argument (
names
) rather than for the whole "transform" object. Maintaining ops for the whole transform class might be (and it is already) very complex.
I agree that operators are more complex than what I thought initially. This is partly due to operators working by classes. If we keep them for the names
argument and (only?) allow tidyselect
helpers then we don't even need do do that ourselves.
Co-authored-by: Dawid Kałędkowski <[email protected]> Signed-off-by: Lluís Revilla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made me realize that tidyselect is easier and a good path forward 👍
Removing logical operators implemented on type classes will simplify the code in multiple places.
Will do that, as they will only be applied within the tidyselect environment.
In POC we can still use c() in the examples and app wouldn't fail. API should be only a replacement to the "explicit" class specification.
# make sure this works c( datasets(is.data.frame, select = "df1"), variables(is.numeric) ) # this is just an alternative syntax to above datasets(is.data.frame, select = "df1") & variables(is.numeric)Additionaly, we don't have transform-constructor and it is only possible to create a "transform" using c() or logical-operators, which is the opposite of how the design should be implemented.
Sorry, I first thought about the grammar and then on the implementation and this reflects on the code.
Problem is that transform is already a base method and we can't create a different function than transform(_data, ...). Probably, we need other name for this class.
What would be a good name? Some ideas:
specification
but it can be technical and doesn't explain what it does,selection
closer todplyr::select
and might help users,input
closer to shiny/web development.
R/types.R
Outdated
"Invalid selection" = check_input(type) | ||
) | ||
if (is.function(x)) { | ||
x <- list(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I has some challenges combining different types when a single function is provided and this was the simplest way to solve them that I found. If we go with direct access to the qenv with tidyselect helpers we will need to replace that and capture the expression or something similar.
R/ops_transform.R
Outdated
@@ -0,0 +1,134 @@ | |||
#' @export | |||
Ops.transform <- function(e1, e2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After digging in the implementation I think
tidyselect
inspired syntax is little bit off from the origin. I thinktidyselect
makes sense fornames
argument e.g.datasets(names = starts_with(...))
.
So you wouldn't allow tidyselect syntax on the select =
? And I take that you would rename x
to names
on the types 👍 .
Tidyselect allows to combine different "selection helpers" together with logical operators like
datasets(starts_with("Sepal") | !ends_with("Width"))
. I find combining "type" objects (datasets
,variables
, ...) with logical operators complicated and confusing. For example what would be a result of:datasets(is.data.frame, select = "iris") & variables(starts_with("Sepal")) & !datasets(is.data.frame, select = "mtcars") & variables(is.numeric)
The negation currently only takes into account the specification on x
argument (as you can see with (!datasets(is.data.frame, select = "mtcars"))
), so this would theoretically results in an empty selection as it would select all data.frames and then exclude them.
Given !
precedence it is only applied to the datasets, so it would be possible to select numeric variables and all those that start with Sepal of the data.frames selected (none on this case).
Currently there is no method for !
on the "transform" class. ie this returns an error: !(datasets(is.data.frame, select = "mtcars") & variables(is.numeric))
.
datasets(is.data.frame, select = "iris") & variables(starts_with("Sepal")) &
!datasets(is.data.frame, select = "mtcars") & variables(is.numeric)
I think it doesn't make sense to copy tidyselect syntax and maintain the operators on our side.
The problem with relying on tidyselect is that it only applies to data.frames, it won't work with other classes not even matrices. If this is okay, and we try to apply some tricks as Pawel suggested above for other methods I agree this in principle is less work for us.
And I would stick to use operators for a single argument (
names
) rather than for the whole "transform" object. Maintaining ops for the whole transform class might be (and it is already) very complex.
I agree that operators are more complex than what I thought initially. This is partly due to operators working by classes. If we keep them for the names
argument and (only?) allow tidyselect
helpers then we don't even need do do that ourselves.
R/resolver.R
Outdated
|
||
# Evaluate if the function applied to the data | ||
# but we need to return the name of the data received | ||
functions_data <- function(spec_criteria, names_data, data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much cleaner than what I implemented 👍👍👍. The exploration for other classes I think is worth it, specially seeing it already works with lists (which I thought it wouldn't work). Extending to multiple classes might be then simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments b/c I got a commit notification.
…sengineering/teal.transform into redesign_extraction@main
Linked to insightsengineering/NEST-roadmap#36
Pull Request
Redesign extraction
This branch/PR is a POC of specification of user input to be extracted.
See tests added about how it works (currently most if not all of them should work), but simple example of specification:
Features:
Examples
Resolve input to be used by module developer
The
module_input_ui
andmodule_input_server
help the developer to get the required data.Merge arbitrary data
The
merge_module_srv
accepts the inputs and merge arbitrary data processed by themodule_input_server
, this is done thanks toextract_input
the responsible to obtain the requested data fromdata()
.MAE
TODO
Data extraction and preparation design.
UI update based on partial resolution.
Check it is simple for app developers.
Use tidyselect instead of own operators and resolution.
Update API to not rely on operators on this package (rely only on those on tidyselect)
Test extraction on non data.frame objects: MAE objects/matrices, lists...
Check missing corners: test required changes on current modules:
tmg: tm_missing_data, tm_g_distribution, and tm_data_table;
tmc: tm_a_mmrm and tm_t_events
Only up to the server body, check the defaults,
8 hours each module, mmrm = 10 hours.
Verify value specification: Some modules request variable A vs other variables or a numeric input.
20 hours
Verify bookmarking works
Verify it works on multiple data merges from multiple sources (cfr: [Bug]: Forced merge precludes using modules without joinin keys #258 )
Check it is simple for module developers (less important).
UI validation: removing unnecessary input UI or gray out when no input is possible (just one choice). Example: selection that allows to select data.frames if they have column A, there is no need to select a variable.
Documentation/help
Check transition/notify dependencies/users
Update module packages to use that feature
Set sensible defaults to specifications
What should be the constructor of the function shuold have as a default for the extract spec (i.e. is it data.frame? take all numeric columns? Only select the first one (with all available options)?
Each module make have different requirements.