Skip to content

Commit 52c3935

Browse files
kacpermudadauinh
authored andcommitted
chore: Update docstring for DatabaseInfo in OpenLineage provider (apache#45638)
* chore: Update docstring for DatabaseInfo in OpenLineage provider Signed-off-by: Kacper Muda <[email protected]> * chore: Update docstring for DatabaseInfo in OpenLineage provider Signed-off-by: Kacper Muda <[email protected]> --------- Signed-off-by: Kacper Muda <[email protected]>
1 parent 1e7ec69 commit 52c3935

File tree

1 file changed

+61
-4
lines changed

1 file changed

+61
-4
lines changed

providers/src/airflow/providers/openlineage/sqlparser.py

Lines changed: 61 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,67 @@ class DatabaseInfo:
8282
:param database: Takes precedence over parsed database name.
8383
:param information_schema_columns: List of columns names from information schema table.
8484
:param information_schema_table_name: Information schema table name.
85-
:param use_flat_cross_db_query: Specifies if single information schema table should be used
86-
for cross-database queries (e.g. for Redshift).
87-
:param is_information_schema_cross_db: Specifies if information schema contains
88-
cross-database data.
85+
:param use_flat_cross_db_query: Specifies whether a single, "global" information schema table should
86+
be used for cross-database queries (e.g., in Redshift), or if multiple, per-database "local"
87+
information schema tables should be queried individually.
88+
89+
If True, assumes a single, universal information schema table is available
90+
(for example, in Redshift, the `SVV_REDSHIFT_COLUMNS` view)
91+
[https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_REDSHIFT_COLUMNS.html].
92+
In this mode, we query only `information_schema_table_name` directly.
93+
Depending on the `is_information_schema_cross_db` argument, you can also filter
94+
by database name in the WHERE clause.
95+
96+
If False, treats each database as having its own local information schema table containing
97+
metadata for that database only. As a result, one query per database may be generated
98+
and then combined (often via `UNION ALL`).
99+
This approach is necessary for dialects that do not maintain a single global view of
100+
all metadata or that require per-database queries.
101+
Depending on the `is_information_schema_cross_db` argument, queries can
102+
include or omit database information in both identifiers and filters.
103+
104+
See `is_information_schema_cross_db` which also affects how final queries are constructed.
105+
:param is_information_schema_cross_db: Specifies whether database information should be tracked
106+
and included in queries that retrieve schema information from the information_schema_table.
107+
In short, this determines whether queries are capable of spanning multiple databases.
108+
109+
If True, database identifiers are included wherever applicable, allowing retrieval of
110+
metadata from more than one database. For instance, in Snowflake or MS SQL
111+
(where each database is treated as a top-level namespace), you might have a query like:
112+
113+
```
114+
SELECT ...
115+
FROM db1.information_schema.columns WHERE ...
116+
UNION ALL
117+
SELECT ...
118+
FROM db2.information_schema.columns WHERE ...
119+
```
120+
121+
In Redshift, setting this to True together with `use_flat_cross_db_query=True` allows
122+
adding database filters to the query, for example:
123+
124+
```
125+
SELECT ...
126+
FROM SVV_REDSHIFT_COLUMNS
127+
WHERE
128+
SVV_REDSHIFT_COLUMNS.database == db1 # This is skipped when False
129+
AND SVV_REDSHIFT_COLUMNS.schema == schema1
130+
AND SVV_REDSHIFT_COLUMNS.table IN (table1, table2)
131+
OR ...
132+
```
133+
134+
However, certain databases (e.g., PostgreSQL) do not permit true cross-database queries.
135+
In such dialects, enabling cross-database support may lead to errors or be unnecessary.
136+
Always consult your dialect's documentation or test sample queries to confirm if
137+
cross-database querying is supported.
138+
139+
If False, database qualifiers are ignored, effectively restricting queries to a single
140+
database (or making the database-level qualifier optional). This is typically
141+
safer for databases that do not support cross-database operations or only provide a
142+
two-level namespace (schema + table) instead of a three-level one (database + schema + table).
143+
For example, some MySQL or PostgreSQL contexts might not need or permit cross-database queries at all.
144+
145+
See `use_flat_cross_db_query` which also affects how final queries are constructed.
89146
:param is_uppercase_names: Specifies if database accepts only uppercase names (e.g. Snowflake).
90147
:param normalize_name_method: Method to normalize database, schema and table names.
91148
Defaults to `name.lower()`.

0 commit comments

Comments
 (0)