-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat(source-file): Add custom http proxy support #62451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aaronsteers
wants to merge
19
commits into
master
Choose a base branch
from
aj/feat/source-file/add-custom-proxy-support
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 16 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
133f35d
feat(source-file): Add HTTP proxy URL and custom CA certificate support
devin-ai-integration[bot] 5997150
fix: Bump version to 0.6.0 and apply pre-commit formatting fixes
devin-ai-integration[bot] 921a0a3
docs(source-file): Add changelog entry for version 0.6.0
devin-ai-integration[bot] 8c2bc2a
Merge branch 'master' into devin/1749618047-add-http-proxy-support
aaronsteers 35b527a
docs: Add proxy investigation plan and test files for debugging
devin-ai-integration[bot] 2523de8
add working proxy script, do some clean up
aaronsteers 389eadb
refactored implementation
aaronsteers 3d860b7
Delete airbyte-integrations/connectors/source-file/test_direct_config…
aaronsteers 711d7fd
Update airbyte-integrations/connectors/source-file/integration_tests/…
aaronsteers 2c7d860
Apply suggestions from code review
aaronsteers 03256f9
Merge remote-tracking branch 'origin/master' into aj/feat/source-file…
aaronsteers 1f23fba
poetry lock
aaronsteers ad20ad6
fix
aaronsteers eec798f
misc updates/fixes
aaronsteers 92f1c39
add comment to clarify random ip
aaronsteers e0ff9bc
fix(source-file): Update proxy unit tests to match environment variab…
devin-ai-integration[bot] 54dd48b
create as combined certs bundle including built-in system certs
aaronsteers c26570d
Update airbyte-integrations/connectors/source-file/integration_tests/…
aaronsteers 3b55028
Merge branch 'master' into aj/feat/source-file/add-custom-proxy-support
aaronsteers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
42 changes: 42 additions & 0 deletions
42
airbyte-integrations/connectors/source-file/integration_tests/proxy_intercept_script.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright (c) 2025 Airbyte, Inc., all rights reserved. | ||
"""A mitm-proxy intercept script. | ||
|
||
This proves whether the proxy is working by intercepting a specific URL | ||
and modifying the response to return a different CSV. | ||
|
||
Usage: | ||
```bash | ||
# First launch the proxy sever: | ||
uvx --from=mitmproxy mitmdump --listen-port 8080 -s integration_tests/proxy_intercept_script.py | ||
|
||
# If the secrets file, doesn't exist, create it and open it in an editor to provide the proxy CA cert: | ||
cp integration_tests/proxy_test_config.json.template secrets/proxy_test_config.json | ||
code secrets/proxy_test_config.json | ||
|
||
# Now launch the connector: | ||
poetry run python main.py discover --config secrets/proxy_test_config.json | ||
""" | ||
|
||
from mitmproxy import http | ||
|
||
|
||
def response(flow: http.HTTPFlow) -> None: | ||
"""Intercept ALL httpbin requests and return modified base64 CSV data.""" | ||
if "httpbin.org" in flow.request.pretty_host: | ||
modified_csv = "intercepted_column,proxy_status\nproxy,INTERCEPTED\ntest,SUCCESS\nverification,CONFIRMED" | ||
flow.response.text = modified_csv | ||
flow.response.headers["content-type"] = "text/csv" | ||
flow.response.status_code = 200 | ||
|
||
print("🎯 PROXY INTERCEPTED REQUEST!") | ||
print(f" URL: {flow.request.pretty_url}") | ||
print(f" Method: {flow.request.method}") | ||
print(f" User-Agent: {flow.request.headers.get('User-Agent', 'Not set')}") | ||
print(f" Modified response: {modified_csv}") | ||
print("=" * 60) | ||
|
||
|
||
def request(flow: http.HTTPFlow) -> None: | ||
"""Log ALL requests to prove proxy is receiving traffic.""" | ||
print(f"📡 PROXY RECEIVED REQUEST: {flow.request.method} {flow.request.pretty_url}") | ||
print(f" Headers: {dict(flow.request.headers)}") |
10 changes: 10 additions & 0 deletions
10
...yte-integrations/connectors/source-file/integration_tests/proxy_test_config.json.template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"dataset_name": "proxy_investigation_test", | ||
"format": "csv", | ||
"url": "https://httpbin.org/base64/a2V5LHZhbHVlCmZvbyxiYXIKYW5zd2VyLDQyCnF1ZXN0aW9uLHdobyBrbm93cw==", | ||
"provider": { | ||
"storage": "HTTPS", | ||
"proxy_url": "http://localhost:8080", | ||
"ca_certificate": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----\n" | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
213 changes: 181 additions & 32 deletions
213
airbyte-integrations/connectors/source-file/poetry.lock
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ requires = ["poetry-core>=1.0.0"] | |
build-backend = "poetry.core.masonry.api" | ||
|
||
[tool.poetry] | ||
version = "0.5.35" | ||
version = "0.6.0" | ||
name = "source-file" | ||
description = "Source implementation for File" | ||
authors = ["Airbyte <[email protected]>"] | ||
|
@@ -47,6 +47,7 @@ pytest-mock = "^3.6.1" | |
pytest = "^8.0.0" | ||
requests-mock = "^1.9.3" | ||
pytest-docker = "==3.0.0" | ||
ruff = "^0.12.1" | ||
|
||
|
||
[tool.poe] | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
121 changes: 121 additions & 0 deletions
121
airbyte-integrations/connectors/source-file/source_file/proxy.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# Copyright (c) 2025 Airbyte, Inc., all rights reserved. | ||
"""Proxy config constants and helper functions.""" | ||
|
||
import os | ||
import tempfile | ||
from logging import Logger | ||
from pathlib import Path | ||
|
||
|
||
# Constants for proxy configuration keys | ||
PROXY_PARENT_CONFIG_KEY = "http_proxy" | ||
PROXY_URL_CONFIG_KEY = "proxy_url" | ||
PROXY_CA_CERTIFICATE_CONFIG_KEY = "proxy_ca_certificate" | ||
|
||
|
||
# Our hard-coded exclude list: | ||
AIRBYTE_NO_PROXY_ENTRIES = [ | ||
# Local and loopback | ||
"localhost", | ||
"127.0.0.1", | ||
"*.local", | ||
# Cloud metadata endpoints | ||
"169.254.169.254", # Special link-local IP for metadata servers (AWS, Azure, etc.) | ||
"metadata.google.internal", # GCP | ||
# Airbyte control/telemetry | ||
"*.airbyte.io", | ||
"*.airbyte.com", | ||
"connectors.airbyte.com", | ||
aaronsteers marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Third-party telemetry | ||
"sentry.io", | ||
"api.segment.io", | ||
"*.sentry.io", | ||
"*.datadoghq.com", | ||
"app.datadoghq.com", | ||
] | ||
|
||
|
||
def _get_no_proxy_entries_from_env_var() -> list[str]: | ||
"""Return a list of entries from the NO_PROXY environment variable.""" | ||
if "NO_PROXY" in os.environ: | ||
return [x.strip() for x in os.environ["NO_PROXY"].split(",") if x.strip()] | ||
|
||
return [] | ||
|
||
|
||
def _get_no_proxy_string() -> str: | ||
"""Return a string to be used as the NO_PROXY environment variable. | ||
|
||
This ensures that requests to these hosts bypass the proxy. | ||
""" | ||
# Merge and dedupe our hardcoded list with any already-set `NO_PROXY` env var | ||
return ",".join( | ||
filter( | ||
None, # Remove any None/Falsey values | ||
list( | ||
set( | ||
# Combine and dedupe: | ||
_get_no_proxy_entries_from_env_var() + AIRBYTE_NO_PROXY_ENTRIES | ||
) | ||
), | ||
) | ||
) | ||
|
||
|
||
def _install_ca_certificate(ca_cert_file_text: str) -> Path: | ||
"""Install the CA certificate for the proxy. | ||
|
||
This involves saving the text to a local file and then setting | ||
the appropriate environment variables to use this certificate. | ||
|
||
Returns the path to the temporary CA certificate file. | ||
""" | ||
with tempfile.NamedTemporaryFile( | ||
mode="w", | ||
delete=False, | ||
prefix="airbyte-custom-ca-cert-", | ||
suffix=".pem", | ||
encoding="utf-8", | ||
) as temp_file: | ||
temp_file.write(ca_cert_file_text) | ||
temp_file.flush() | ||
|
||
os.environ["REQUESTS_CA_BUNDLE"] = temp_file.name | ||
os.environ["CURL_CA_BUNDLE"] = temp_file.name | ||
os.environ["SSL_CERT_FILE"] = temp_file.name | ||
|
||
return Path(temp_file.name).absolute() | ||
|
||
|
||
def configure_custom_http_proxy( | ||
http_proxy_config: dict[str, str], | ||
*, | ||
logger: Logger, | ||
proxy_url: str | None = None, | ||
ca_cert_file_text: str | None = None, | ||
) -> None: | ||
"""Initialize the proxy environment variables. | ||
|
||
If connector_config_dict is provided it contains an "http_proxy" entry, this config | ||
will be scanned for proxy config settings. | ||
|
||
If proxy_url and/or `ca_cert_file_text` are provided, they will override the values in | ||
connector_config_dict. | ||
|
||
The function will no-op if neither input option provides a proxy URL. | ||
""" | ||
proxy_url = proxy_url or http_proxy_config.get(PROXY_URL_CONFIG_KEY) | ||
ca_cert_file_text = ca_cert_file_text or http_proxy_config.get(PROXY_CA_CERTIFICATE_CONFIG_KEY) | ||
|
||
if proxy_url: | ||
logger.info(f"Using custom proxy URL: {proxy_url}") | ||
|
||
if ca_cert_file_text: | ||
# Install the CA certificate if provided, and set CA-related env vars: | ||
cert_file_path = _install_ca_certificate(ca_cert_file_text) | ||
logger.info(f"Using custom installed CA certificate: {cert_file_path!s}") | ||
|
||
# Set the remaining proxy config env vars: | ||
os.environ["NO_PROXY"] = _get_no_proxy_string() | ||
os.environ["HTTP_PROXY"] = proxy_url | ||
os.environ["HTTPS_PROXY"] = proxy_url |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
118 changes: 118 additions & 0 deletions
118
airbyte-integrations/connectors/source-file/unit_tests/test_proxy_certificate_support.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# Copyright (c) 2025 Airbyte, Inc., all rights reserved. | ||
|
||
import os | ||
from pathlib import Path | ||
from unittest.mock import Mock, mock_open, patch | ||
|
||
import pytest | ||
from source_file.client import Client, URLFile | ||
|
||
from airbyte_cdk.entrypoint import logger | ||
|
||
|
||
class TestProxyCertificateSupport: | ||
"""Test proxy and certificate support for HTTPS provider""" | ||
|
||
def test_https_with_proxy_only(self): | ||
"""Test HTTPS provider with proxy_url configuration""" | ||
http_proxy_config = {"proxy_url": "http://proxy.company.com:8080"} | ||
|
||
with patch.dict("os.environ", {}, clear=True), patch("source_file.client.configure_custom_http_proxy") as mock_configure: | ||
client = Client( | ||
dataset_name="test", url="https://example.com/test.csv", provider={"storage": "HTTPS"}, http_proxy=http_proxy_config | ||
) | ||
|
||
mock_configure.assert_called_once_with(http_proxy_config=http_proxy_config, logger=logger) | ||
|
||
def test_https_with_certificate_only(self): | ||
"""Test HTTPS provider with ca_certificate configuration""" | ||
test_cert = "-----BEGIN CERTIFICATE-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...\n-----END CERTIFICATE-----" | ||
http_proxy_config = {"proxy_ca_certificate": test_cert} | ||
|
||
with ( | ||
patch.dict("os.environ", {}, clear=True), | ||
patch("source_file.proxy._install_ca_certificate") as mock_install, | ||
patch("source_file.client.configure_custom_http_proxy") as mock_configure, | ||
): | ||
mock_install.return_value = Path("/tmp/test_cert.pem") | ||
|
||
client = Client( | ||
dataset_name="test", url="https://example.com/test.csv", provider={"storage": "HTTPS"}, http_proxy=http_proxy_config | ||
) | ||
|
||
mock_configure.assert_called_once_with(http_proxy_config=http_proxy_config, logger=logger) | ||
|
||
def test_https_with_proxy_and_certificate(self): | ||
"""Test HTTPS provider with both proxy_url and ca_certificate""" | ||
test_cert = "-----BEGIN CERTIFICATE-----\ntest\n-----END CERTIFICATE-----" | ||
http_proxy_config = {"proxy_url": "https://secure-proxy.company.com:3128", "proxy_ca_certificate": test_cert} | ||
|
||
with patch.dict("os.environ", {}, clear=True), patch("source_file.client.configure_custom_http_proxy") as mock_configure: | ||
client = Client( | ||
dataset_name="test", url="https://example.com/test.csv", provider={"storage": "HTTPS"}, http_proxy=http_proxy_config | ||
) | ||
|
||
mock_configure.assert_called_once_with(http_proxy_config=http_proxy_config, logger=logger) | ||
|
||
def test_https_without_proxy_or_certificate(self): | ||
"""Test HTTPS provider without proxy or certificate (regression test)""" | ||
with patch.dict("os.environ", {}, clear=True), patch("source_file.client.configure_custom_http_proxy") as mock_configure: | ||
client = Client(dataset_name="test", url="https://example.com/test.csv", provider={"storage": "HTTPS"}, http_proxy=None) | ||
|
||
mock_configure.assert_not_called() | ||
|
||
def test_https_with_user_agent_and_proxy(self): | ||
"""Test HTTPS provider with user_agent and proxy_url""" | ||
http_proxy_config = {"proxy_url": "http://proxy.test.com:8080"} | ||
|
||
with ( | ||
patch.dict("os.environ", {"AIRBYTE_VERSION": "1.2.3"}, clear=True), | ||
patch("source_file.client.configure_custom_http_proxy") as mock_configure, | ||
): | ||
client = Client( | ||
dataset_name="test", | ||
url="https://example.com/test.csv", | ||
provider={"storage": "HTTPS", "user_agent": True}, | ||
http_proxy=http_proxy_config, | ||
) | ||
|
||
mock_configure.assert_called_once_with(http_proxy_config=http_proxy_config, logger=logger) | ||
|
||
def test_certificate_installation(self): | ||
"""Test certificate installation creates temporary file and sets environment variables""" | ||
test_cert = "-----BEGIN CERTIFICATE-----\ntest\n-----END CERTIFICATE-----" | ||
|
||
with patch("tempfile.NamedTemporaryFile") as mock_temp_file, patch.dict("os.environ", {}, clear=True): | ||
mock_file = mock_open() | ||
mock_temp_file.return_value.__enter__.return_value = mock_file.return_value | ||
mock_file.return_value.name = "/tmp/test_cert.pem" | ||
|
||
from source_file.proxy import _install_ca_certificate | ||
|
||
result_path = _install_ca_certificate(test_cert) | ||
|
||
mock_file.return_value.write.assert_called_once_with(test_cert) | ||
mock_file.return_value.flush.assert_called_once() | ||
|
||
assert os.environ.get("REQUESTS_CA_BUNDLE") == "/tmp/test_cert.pem" | ||
assert os.environ.get("CURL_CA_BUNDLE") == "/tmp/test_cert.pem" | ||
assert os.environ.get("SSL_CERT_FILE") == "/tmp/test_cert.pem" | ||
|
||
def test_proxy_environment_variables_set(self): | ||
"""Test that proxy configuration sets the correct environment variables""" | ||
http_proxy_config = { | ||
"proxy_url": "http://proxy.test.com:8080", | ||
"proxy_ca_certificate": "-----BEGIN CERTIFICATE-----\ntest\n-----END CERTIFICATE-----", | ||
} | ||
|
||
with patch.dict("os.environ", {}, clear=True), patch("source_file.proxy._install_ca_certificate") as mock_install: | ||
mock_install.return_value = Path("/tmp/test_cert.pem") | ||
|
||
from source_file.proxy import configure_custom_http_proxy | ||
|
||
configure_custom_http_proxy(http_proxy_config=http_proxy_config, logger=logger) | ||
|
||
assert os.environ.get("HTTP_PROXY") == "http://proxy.test.com:8080" | ||
assert os.environ.get("HTTPS_PROXY") == "http://proxy.test.com:8080" | ||
assert "NO_PROXY" in os.environ | ||
mock_install.assert_called_once_with("-----BEGIN CERTIFICATE-----\ntest\n-----END CERTIFICATE-----") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.