-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Docker in Docker compatibility: add param "host_temp_dir" to get_source() and get_destination() #644
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR refactors temporary directory handling across several modules by centralizing the configuration to a single constant ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ExecutorUtil as "Executor Util"
participant Constants as "Constants Module"
participant Docker as "Docker Executor"
User->>ExecutorUtil: Call get_connector_executor(host_temp_dir)
alt host_temp_dir is provided
ExecutorUtil->>ExecutorUtil: Use the provided host_temp_dir
else
ExecutorUtil->>Constants: Retrieve TEMP_DIR
ExecutorUtil->>ExecutorUtil: Use TEMP_DIR as default
end
ExecutorUtil->>Docker: Initialize Docker execution with CLI args mapping
Docker->>Docker: map_cli_args: if container_path == DEFAULT_AIRBYTE_CONTAINER_TEMP_DIR \n then map to TEMP_DIR
Docker-->>User: Run container with mapped temporary directory
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
airbyte/_executors/util.py (1)
239-239
: 💡 Verification agent🧩 Analysis chain
Type annotation issue in the volumes dictionary
There's a type mismatch here. The DockerExecutor expects
dict[Path, str] | None
for the volumes parameter, but sincehost_temp_dir
can be astr
, we're passingdict[Path | str, str]
.Should we convert
host_temp_dir
to a Path object to maintain type consistency, wdyt?
🏁 Script executed:
#!/bin/bash # Check the DockerExecutor class for its volumes parameter type annotation cat airbyte/_executors/docker.py | grep -n "volumes" | head -n 15Length of output: 399
Fix required: Adjust volumes dictionary type annotations
In
airbyte/_executors/util.py
at line 239, the keyhost_temp_dir
might be a string, but since the DockerExecutor expectsvolumes
as adict[Path, str]
(as seen inairbyte/_executors/docker.py
), we could converthost_temp_dir
to aPath
(e.g.,Path(host_temp_dir)
) to ensure type consistency. Would you consider that change? wdyt?
🧹 Nitpick comments (3)
airbyte/_executors/docker.py (1)
80-85
: Consider variable naming in the conditional temp directory mapping.The current implementation overwrites the loop variable
local_volume
when the container path matches the default temporary directory. This works but overwriting loop variables can make code harder to follow.Maybe we could use a clearer approach? wdyt?
- # If the container path corresponds to the container's temporary directory, - # then the local temporary directory path is used (as the local volume - # path can be overridden). - if container_path == DEFAULT_AIRBYTE_CONTAINER_TEMP_DIR: - local_volume = TEMP_DIR + # If the container path corresponds to the container's temporary directory, + # then the local temporary directory path is used (as the local volume + # path can be overridden). + effective_local_volume = local_volume + if container_path == DEFAULT_AIRBYTE_CONTAINER_TEMP_DIR: + effective_local_volume = TEMP_DIRAnd then use
effective_local_volume
in the next conditional block.🧰 Tools
🪛 Ruff (0.8.2)
84-84:
for
loop variablelocal_volume
overwritten by assignment target(PLW2901)
🪛 GitHub Actions: Run Linters
[warning] 84-84: PLW2901
for
loop variablelocal_volume
overwritten by assignment targetairbyte/constants.py (1)
68-70
: Simplify the environment variable access logic.The current implementation uses both
os.environ
andos.getenv()
to check and access the environment variable, which could lead to a KeyError if the variable doesn't exist.How about using only
os.getenv()
for a cleaner implementation? wdyt?-TEMP_DIR: Path = Path( - os.environ["AIRBYTE_TEMP_DIR"] if os.getenv("AIRBYTE_TEMP_DIR") else tempfile.gettempdir() -) +TEMP_DIR: Path = Path(os.getenv("AIRBYTE_TEMP_DIR") or tempfile.gettempdir())Also, you'll need to add two blank lines before this definition to fix the formatting error.
🧰 Tools
🪛 Ruff (0.8.2)
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
airbyte/sources/util.py (1)
99-101
: Fix the line length in docstring.The line exceeds the maximum line length of 100 characters (it's 103 characters).
How about breaking it into two lines? wdyt?
- host_temp_dir: If set, along with docker_image, this replaces the volume exposing the temporary + host_temp_dir: If set, along with docker_image, this replaces the volume exposing the + temporary files directory, ensuring compatibility when the Docker engine runs on a different host🧰 Tools
🪛 Ruff (0.8.2)
99-99: Line too long (103 > 100)
(E501)
🪛 GitHub Actions: Run Linters
[error] 99-99: E501 Line too long (103 > 100)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte/_executors/docker.py
(2 hunks)airbyte/_executors/util.py
(3 hunks)airbyte/_util/temp_files.py
(2 hunks)airbyte/constants.py
(2 hunks)airbyte/destinations/util.py
(3 hunks)airbyte/sources/util.py
(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
airbyte/_util/temp_files.py (1)
Learnt from: niyasrad
PR: airbytehq/PyAirbyte#368
File: airbyte/_util/temp_files.py:26-27
Timestamp: 2025-03-26T16:16:35.341Z
Learning: When `OVERRIDE_TEMP_DIR` is `None`, we should not supply the `dir` argument to `tempfile` functions to preserve the existing behavior.
🪛 Ruff (0.8.2)
airbyte/constants.py
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
airbyte/_executors/docker.py
84-84: for
loop variable local_volume
overwritten by assignment target
(PLW2901)
airbyte/sources/util.py
99-99: Line too long (103 > 100)
(E501)
airbyte/destinations/util.py
60-60: Line too long (103 > 100)
(E501)
🪛 GitHub Actions: Run Linters
airbyte/constants.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
airbyte/_executors/docker.py
[warning] 84-84: PLW2901 for
loop variable local_volume
overwritten by assignment target
airbyte/sources/util.py
[error] 99-99: E501 Line too long (103 > 100)
airbyte/_executors/util.py
[error] 258-258: Argument "volumes" to "DockerExecutor" has incompatible type "dict[Path | str, str]"; expected "dict[Path, str] | None" [arg-type]
airbyte/destinations/util.py
[error] 60-60: E501 Line too long (103 > 100)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (6)
airbyte/sources/util.py (1)
58-58
: LGTM! The new parameter implementation looks good.The
host_temp_dir
parameter is well-typed and properly passed to the underlying executor. This addition will improve compatibility in Docker-in-Docker environments.Also applies to: 123-124
airbyte/destinations/util.py (2)
31-31
: LGTM: Great addition for Docker-in-Docker support!The addition of this parameter allows for greater flexibility when working with Docker containers where host paths may differ from container paths.
77-77
: LGTM: Properly passing the new parameterThe
host_temp_dir
parameter is correctly passed to the executor creation function.airbyte/_executors/util.py (3)
18-18
: LGTM: Updated import to use centralized TEMP_DIRThe import has been updated to use the centralized
TEMP_DIR
constant rather than the previousTEMP_DIR_OVERRIDE
.
134-134
: LGTM: Consistent parameter additionThe
host_temp_dir
parameter addition here matches the one inget_destination
, providing a consistent API across the codebase.
229-231
: LGTM: Proper fallback for host_temp_dirThe code now uses
TEMP_DIR
as a fallback whenhost_temp_dir
is not provided, which centralizes the temporary directory configuration.
@@ -30,7 +30,7 @@ def as_temp_files(files_contents: list[dict | str]) -> Generator[list[str], Any, | |||
mode="w+t", | |||
delete=False, | |||
encoding="utf-8", | |||
dir=TEMP_DIR_OVERRIDE or None, | |||
dir=TEMP_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider the previous behavior with temporary directories.
According to previous feedback, when the temporary directory override is None
, the code should not supply the dir
argument to preserve existing behavior. With the current change, TEMP_DIR
is always provided.
Would it make sense to check if TEMP_DIR
corresponds to the system's default temp directory and omit the parameter in that case? wdyt?
🏁 Script executed:
#!/bin/bash
# Description: Check the difference between system temp dir and TEMP_DIR
# Get the temp dir from tempfile and compare with TEMP_DIR
cat airbyte/constants.py | grep -A 3 'TEMP_DIR: Path'
python -c "import tempfile; print('System temp dir:', tempfile.gettempdir())"
Length of output: 328
Conditional TEMP_DIR Parameter Suggestion
It looks like we're always passing TEMP_DIR
when calling the tempfile function. Given that in airbyte/constants.py
, TEMP_DIR
is set to tempfile.gettempdir()
when no override is provided (i.e., when AIRBYTE_TEMP_DIR
is not set), wouldn’t it be better to conditionally omit the dir
argument so that we preserve the original behavior? For instance, you might consider:
- Only supplying
dir=TEMP_DIR
ifTEMP_DIR
does not equaltempfile.gettempdir()
- Retaining the default behavior when the temporary directory override is effectively
None
Would it make sense to implement a check like this? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEMP_DIR
is never None and checking that TEMP_DIR
is different from tempfile.gettempdir()
seems unnecessary, just verbose, since the result will be the same. wdyt?
airbyte/destinations/util.py
Outdated
host_temp_dir: If set, along with docker_image, this replaces the volume exposing the temporary | ||
files directory, ensuring compatibility when the Docker engine runs on a different host | ||
(e.g., Docker in Docker), where paths may differ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line exceeds maximum length - consider reformatting?
The docstring line exceeds the 100 character limit (103 > 100), which is failing the linter checks. Could we maybe break this into multiple lines for better readability, wdyt?
- host_temp_dir: If set, along with docker_image, this replaces the volume exposing the temporary
- files directory, ensuring compatibility when the Docker engine runs on a different host
- (e.g., Docker in Docker), where paths may differ.
+ host_temp_dir: If set, along with docker_image, this replaces the volume exposing the
+ temporary files directory, ensuring compatibility when the Docker engine runs on a
+ different host (e.g., Docker in Docker), where paths may differ.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
host_temp_dir: If set, along with docker_image, this replaces the volume exposing the temporary | |
files directory, ensuring compatibility when the Docker engine runs on a different host | |
(e.g., Docker in Docker), where paths may differ. | |
host_temp_dir: If set, along with docker_image, this replaces the volume exposing the | |
temporary files directory, ensuring compatibility when the Docker engine runs on a | |
different host (e.g., Docker in Docker), where paths may differ. |
🧰 Tools
🪛 Ruff (0.8.2)
60-60: Line too long (103 > 100)
(E501)
🪛 GitHub Actions: Run Linters
[error] 60-60: E501 Line too long (103 > 100)
68cfa92
to
11e6840
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/constants.py (2)
68-70
: New TEMP_DIR implementationThe TEMP_DIR constant now provides a more robust way to determine the temporary directory, either from an environment variable or falling back to the system default. This supports the Docker-in-Docker compatibility goal nicely!
One small style issue: there's a linting error about missing blank lines after function definition. Would you mind adding another blank line before TEMP_DIR declaration to satisfy the linter? wdyt?
def _str_to_bool(value: str) -> bool: """Convert a string value of an environment values to a boolean value.""" return bool(value) and value.lower() not in {"", "0", "false", "f", "no", "n", "off"} TEMP_DIR: Path = Path(🧰 Tools
🪛 Ruff (0.8.2)
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
68-70
: Minor simplification of conditional expressionThe conditional expression for determining TEMP_DIR could be simplified slightly. Currently, it uses both
os.environ["KEY"]
andos.getenv("KEY")
in the same expression. Sinceos.getenv()
already handles the case when the environment variable doesn't exist, you could simplify this.Would this cleaner approach work for you? wdyt?
-TEMP_DIR: Path = Path( - os.environ["AIRBYTE_TEMP_DIR"] if os.getenv("AIRBYTE_TEMP_DIR") else tempfile.gettempdir() -) +TEMP_DIR: Path = Path( + os.getenv("AIRBYTE_TEMP_DIR") or tempfile.gettempdir() +)🧰 Tools
🪛 Ruff (0.8.2)
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte/_executors/docker.py
(2 hunks)airbyte/_executors/util.py
(3 hunks)airbyte/_util/temp_files.py
(2 hunks)airbyte/constants.py
(2 hunks)airbyte/destinations/util.py
(3 hunks)airbyte/sources/util.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- airbyte/destinations/util.py
- airbyte/sources/util.py
- airbyte/_util/temp_files.py
- airbyte/_executors/util.py
🧰 Additional context used
🪛 Ruff (0.8.2)
airbyte/_executors/docker.py
84-84: for
loop variable local_volume
overwritten by assignment target
(PLW2901)
airbyte/constants.py
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
🪛 GitHub Actions: Run Linters
airbyte/_executors/docker.py
[warning] 84-84: PLW2901 for
loop variable local_volume
overwritten by assignment target
airbyte/constants.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte/_executors/docker.py (1)
11-11
: New import added for TEMP_DIR constantThe import of TEMP_DIR from constants module is introduced to support the new temporary directory path handling capability. This aligns well with the goal of creating Docker-in-Docker compatibility.
airbyte/constants.py (1)
7-7
: Import of tempfile moduleGood addition of the tempfile module to access the system's temporary directory location. This enables the code to fallback to the system's temp directory when no environment variable is specified.
11e6840
to
e69dff3
Compare
… methods (docker in docker compatibility)
e69dff3
to
0cb8c4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/constants.py (2)
68-70
: Could we simplify the environment variable check, wdyt?The current implementation uses both
os.getenv()
andos.environ[]
for the same environment variable. A simpler approach might be:TEMP_DIR: Path = Path( - os.environ["AIRBYTE_TEMP_DIR"] if os.getenv("AIRBYTE_TEMP_DIR") else tempfile.gettempdir() + os.getenv("AIRBYTE_TEMP_DIR", tempfile.gettempdir()) )This change avoids the potential
KeyError
from direct dictionary access and accomplishes the same goal with less code.🧰 Tools
🪛 Ruff (0.8.2)
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
71-79
: Are there implications when using host_temp_dir with Docker in Docker?The docstring explains the general usage of TEMP_DIR well. Since this PR is specifically addressing Docker in Docker compatibility, would it be helpful to add a note about how this constant relates to the new
host_temp_dir
parameter and any considerations for Docker in Docker environments?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte/_executors/docker.py
(2 hunks)airbyte/_executors/util.py
(3 hunks)airbyte/_util/temp_files.py
(2 hunks)airbyte/constants.py
(2 hunks)airbyte/destinations/util.py
(3 hunks)airbyte/sources/util.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- airbyte/sources/util.py
- airbyte/_util/temp_files.py
- airbyte/destinations/util.py
- airbyte/_executors/docker.py
- airbyte/_executors/util.py
🧰 Additional context used
🪛 Ruff (0.8.2)
airbyte/constants.py
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
🪛 GitHub Actions: Run Linters
airbyte/constants.py
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/constants.py (1)
7-7
: Good addition of the tempfile module.The tempfile import is appropriately added to support the new TEMP_DIR implementation. This aligns well with the PR objective of enhancing Docker in Docker compatibility by centralizing temporary directory configuration.
|
||
TEMP_DIR_OVERRIDE: Path | None = ( | ||
Path(os.environ["AIRBYTE_TEMP_DIR"]) if os.getenv("AIRBYTE_TEMP_DIR") else None | ||
TEMP_DIR: Path = Path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an additional blank line after function definition.
The linter expects 2 blank lines after a function definition, but there's only 1. This is causing the pipeline to fail with a formatting error.
def _str_to_bool(value: str) -> bool:
"""Convert a string value of an environment values to a boolean value."""
return bool(value) and value.lower() not in {"", "0", "false", "f", "no", "n", "off"}
TEMP_DIR: Path = Path(
You can fix this and other formatting issues by running ruff format
as suggested in the pipeline failure.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
68-68: Expected 2 blank lines after class or function definition, found (1)
Add missing blank line(s)
(E305)
Hello @nicob3y, I apologize for the delay in reviewing your contribution. Could you please send me a DM on Slack so we can continue the review? |
Compatibility with an environment where Docker is used and its engine is not running on the same host as PyAirbyte (e.g., Docker in Docker).
The issue lies in accessing temporary files during container execution. If the paths to these files are not identical in the PyAirbyte and Docker environments, the system cannot function properly.
To make this configuration usable, I propose adding a new parameter
host_temp_dir
to theget_source()
andget_destination()
methods.This parameter will override the temporary directory path of the volume mounted in the container; I have adapted the
map_cli_args()
method of theDockerExecutor
to be compatible with this parameter.Summary by CodeRabbit
New Features
Refactor
host_temp_dir
parameter for better clarity and functionality.Documentation