Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more logging options #630

Open
yohannj opened this issue Mar 21, 2025 · 0 comments
Open

Add more logging options #630

yohannj opened this issue Mar 21, 2025 · 0 comments

Comments

@yohannj
Copy link
Contributor

yohannj commented Mar 21, 2025

Context

Current logging strategy

Currently PyAirbyte forces the different loggers to write to disk (get_global_file_logger, new_passthrough_file_logger for connectors and get_global_stats_logger for global stats).

Those loggers can be seen as being hardcoded in the codebase. get_global_file_logger and get_global_stats_logger being singletons and new_passthrough_file_logger being an hardcoded logger implementation that each connector relies on by default.

The only configuration available is using an environment variable called AIRBYTE_LOGGING_ROOT to determine where should those logs be written to.

First rework approach

A first approach has been worked on to allow overriding the global logger (PR), but it would still be missing the loggers used by connectors.

Special behavior

One of the important feature of the existing implementation is that new_passthrough_file_logger chooses a random filename to store logs on disk.
It allows running PyAirbyte multiple times in a row while keeping separate files.

Design suggestion

Proposal 1: Choose logger behavior using environment variables

We could propose three modes:

  • FILE_ONLY: Current behavior, the default
  • CONSOLE_ONLY: Writing on stdout/stderr
  • FILE_AND_CONSOLE: Keep the current behavior but also write all logs on stdout/stderr

Proposal 2: Add customization methods in logs.py (introduces breaking changes)

We can split get_global_file_logger and new_passthrough_file_logger into the following methods:

# Global logger
def get_global_logger() -> logging.Logger:
    pass

def set_global_logger_handlers(*handlers: logging.Handler) -> None:
    pass

def global_logger_file_handler() -> logging.FileHandler:
    pass

# Global stats logger
def get_global_stats_logger() -> logging.Logger:
	pass

def set_global_stats_logger_handlers(*handlers: logging.Handler) -> None:
    pass

def global_stats_logger_file_handler() -> logging.FileHandler:
    pass

# Passthrough logger
def new_passthrough_logger(connector_name: str) -> logging.Logger:
    pass

def set_passthrough_logger_factory_handlers(*handlers: Callable[[str, ULID], logging.Handler]) -> None:
    pass

def passthrough_logger_file_factory_handler(connector_name: str, ulid: ULID) -> logging.FileHandler:
    pass

Proposal 2's breaking change

While the default behavior will be the same, functions will not have the same name.
We could keep the current function names and only add new ones and add the set_... and ..._handler methods, but the code quality would drop.

Rejected design ideas

We could rely on a dictConfig or fileConfig.
However it doesn't fit well with the need to have a random file name on disk to avoid mixing logs from a connector being used in different situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant