Skip to content

Commit 76d88fb

Browse files
authored
fix!: remove out-of-date BigQuery ML protocol buffers (#1178)
deps!: BigQuery Storage and pyarrow are required dependencies (#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) feat!: destination tables are no-longer removed by `create_job` (#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972) feat!: mark the package as type-checked (#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967) fix: improve type annotations for mypy validation (#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117) docs: Add migration guide from version 2.x to 3.x (#1027) Release-As: 3.0.0
1 parent 35aeaa6 commit 76d88fb

File tree

274 files changed

+5282
-2797
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

274 files changed

+5282
-2797
lines changed

.coveragerc

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ fail_under = 100
66
show_missing = True
77
omit =
88
google/cloud/bigquery/__init__.py
9+
google/cloud/bigquery_v2/* # Legacy proto-based types.
910
exclude_lines =
1011
# Re-enable the standard pragma
1112
pragma: NO COVER

README.rst

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Python Client for Google BigQuery
22
=================================
33

4-
|GA| |pypi| |versions|
4+
|GA| |pypi| |versions|
55

66
Querying massive datasets can be time consuming and expensive without the
77
right hardware and infrastructure. Google `BigQuery`_ solves this problem by
@@ -140,6 +140,3 @@ In this example all tracing data will be published to the Google
140140

141141
.. _OpenTelemetry documentation: https://opentelemetry-python.readthedocs.io
142142
.. _Cloud Trace: https://cloud.google.com/trace
143-
144-
145-

UPGRADING.md

+185-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,190 @@ See the License for the specific language governing permissions and
1111
limitations under the License.
1212
-->
1313

14+
# 3.0.0 Migration Guide
15+
16+
## New Required Dependencies
17+
18+
Some of the previously optional dependencies are now *required* in `3.x` versions of the
19+
library, namely
20+
[google-cloud-bigquery-storage](https://pypi.org/project/google-cloud-bigquery-storage/)
21+
(minimum version `2.0.0`) and [pyarrow](https://pypi.org/project/pyarrow/) (minimum
22+
version `3.0.0`).
23+
24+
The behavior of some of the package "extras" has thus also changed:
25+
* The `pandas` extra now requires the [db-types](https://pypi.org/project/db-dtypes/)
26+
package.
27+
* The `bqstorage` extra has been preserved for comaptibility reasons, but it is now a
28+
no-op and should be omitted when installing the BigQuery client library.
29+
30+
**Before:**
31+
```
32+
$ pip install google-cloud-bigquery[bqstorage]
33+
```
34+
35+
**After:**
36+
```
37+
$ pip install google-cloud-bigquery
38+
```
39+
40+
* The `bignumeric_type` extra has been removed, as `BIGNUMERIC` type is now
41+
automatically supported. That extra should thus not be used.
42+
43+
**Before:**
44+
```
45+
$ pip install google-cloud-bigquery[bignumeric_type]
46+
```
47+
48+
**After:**
49+
```
50+
$ pip install google-cloud-bigquery
51+
```
52+
53+
54+
## Type Annotations
55+
56+
The library is now type-annotated and declares itself as such. If you use a static
57+
type checker such as `mypy`, you might start getting errors in places where
58+
`google-cloud-bigquery` package is used.
59+
60+
It is recommended to update your code and/or type annotations to fix these errors, but
61+
if this is not feasible in the short term, you can temporarily ignore type annotations
62+
in `google-cloud-bigquery`, for example by using a special `# type: ignore` comment:
63+
64+
```py
65+
from google.cloud import bigquery # type: ignore
66+
```
67+
68+
But again, this is only recommended as a possible short-term workaround if immediately
69+
fixing the type check errors in your project is not feasible.
70+
71+
## Re-organized Types
72+
73+
The auto-generated parts of the library has been removed, and proto-based types formerly
74+
found in `google.cloud.bigquery_v2` have been replaced by the new implementation (but
75+
see the [section](#legacy-types) below).
76+
77+
For example, the standard SQL data types should new be imported from a new location:
78+
79+
**Before:**
80+
```py
81+
from google.cloud.bigquery_v2 import StandardSqlDataType
82+
from google.cloud.bigquery_v2.types import StandardSqlField
83+
from google.cloud.bigquery_v2.types.standard_sql import StandardSqlStructType
84+
```
85+
86+
**After:**
87+
```py
88+
from google.cloud.bigquery import StandardSqlDataType
89+
from google.cloud.bigquery.standard_sql import StandardSqlField
90+
from google.cloud.bigquery.standard_sql import StandardSqlStructType
91+
```
92+
93+
The `TypeKind` enum defining all possible SQL types for schema fields has been renamed
94+
and is not nested anymore under `StandardSqlDataType`:
95+
96+
97+
**Before:**
98+
```py
99+
from google.cloud.bigquery_v2 import StandardSqlDataType
100+
101+
if field_type == StandardSqlDataType.TypeKind.STRING:
102+
...
103+
```
104+
105+
**After:**
106+
```py
107+
108+
from google.cloud.bigquery import StandardSqlTypeNames
109+
110+
if field_type == StandardSqlTypeNames.STRING:
111+
...
112+
```
113+
114+
115+
## Issuing queries with `Client.create_job` preserves destination table
116+
117+
The `Client.create_job` method no longer removes the destination table from a
118+
query job's configuration. Destination table for the query can thus be
119+
explicitly defined by the user.
120+
121+
122+
## Changes to data types when reading a pandas DataFrame
123+
124+
The default dtypes returned by the `to_dataframe` method have changed.
125+
126+
* Now, the BigQuery `BOOLEAN` data type maps to the pandas `boolean` dtype.
127+
Previously, this mapped to the pandas `bool` dtype when the column did not
128+
contain `NULL` values and the pandas `object` dtype when `NULL` values are
129+
present.
130+
* Now, the BigQuery `INT64` data type maps to the pandas `Int64` dtype.
131+
Previously, this mapped to the pandas `int64` dtype when the column did not
132+
contain `NULL` values and the pandas `float64` dtype when `NULL` values are
133+
present.
134+
* Now, the BigQuery `DATE` data type maps to the pandas `dbdate` dtype, which
135+
is provided by the
136+
[db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html)
137+
package. If any date value is outside of the range of
138+
[pandas.Timestamp.min](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.min.html)
139+
(1677-09-22) and
140+
[pandas.Timestamp.max](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.max.html)
141+
(2262-04-11), the data type maps to the pandas `object` dtype. The
142+
`date_as_object` parameter has been removed.
143+
* Now, the BigQuery `TIME` data type maps to the pandas `dbtime` dtype, which
144+
is provided by the
145+
[db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html)
146+
package.
147+
148+
149+
## Changes to data types loading a pandas DataFrame
150+
151+
In the absence of schema information, pandas columns with naive
152+
`datetime64[ns]` values, i.e. without timezone information, are recognized and
153+
loaded using the `DATETIME` type. On the other hand, for columns with
154+
timezone-aware `datetime64[ns, UTC]` values, the `TIMESTAMP` type is continued
155+
to be used.
156+
157+
## Changes to `Model`, `Client.get_model`, `Client.update_model`, and `Client.list_models`
158+
159+
The types of several `Model` properties have been changed.
160+
161+
- `Model.feature_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`.
162+
- `Model.label_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`.
163+
- `Model.model_type` now returns a string.
164+
- `Model.training_runs` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.training_runs).
165+
166+
<a name="legacy-protobuf-types"></a>
167+
## Legacy Protocol Buffers Types
168+
169+
For compatibility reasons, the legacy proto-based types still exists as static code
170+
and can be imported:
171+
172+
```py
173+
from google.cloud.bigquery_v2 import Model # a sublcass of proto.Message
174+
```
175+
176+
Mind, however, that importing them will issue a warning, because aside from
177+
being importable, these types **are not maintained anymore**. They may differ
178+
both from the types in `google.cloud.bigquery`, and from the types supported on
179+
the backend.
180+
181+
### Maintaining compatibility with `google-cloud-bigquery` version 2.0
182+
183+
If you maintain a library or system that needs to support both
184+
`google-cloud-bigquery` version 2.x and 3.x, it is recommended that you detect
185+
when version 2.x is in use and convert properties that use the legacy protocol
186+
buffer types, such as `Model.training_runs`, into the types used in 3.x.
187+
188+
Call the [`to_dict`
189+
method](https://proto-plus-python.readthedocs.io/en/latest/reference/message.html#proto.message.Message.to_dict)
190+
on the protocol buffers objects to get a JSON-compatible dictionary.
191+
192+
```py
193+
from google.cloud.bigquery_v2 import Model
194+
195+
training_run: Model.TrainingRun = ...
196+
training_run_dict = training_run.to_dict()
197+
```
14198

15199
# 2.0.0 Migration Guide
16200

@@ -56,4 +240,4 @@ distance_type = enums.Model.DistanceType.COSINE
56240
from google.cloud.bigquery_v2 import types
57241

58242
distance_type = types.Model.DistanceType.COSINE
59-
```
243+
```

docs/bigquery/legacy_proto_types.rst

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
Legacy proto-based Types for Google Cloud Bigquery v2 API
2+
=========================================================
3+
4+
.. warning::
5+
These types are provided for backward compatibility only, and are not maintained
6+
anymore. They might also differ from the types uspported on the backend. It is
7+
therefore strongly advised to migrate to the types found in :doc:`standard_sql`.
8+
9+
Also see the :doc:`3.0.0 Migration Guide<../UPGRADING>` for more information.
10+
11+
.. automodule:: google.cloud.bigquery_v2.types
12+
:members:
13+
:undoc-members:
14+
:show-inheritance:
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Types for Google Cloud Bigquery v2 API
22
======================================
33

4-
.. automodule:: google.cloud.bigquery_v2.types
4+
.. automodule:: google.cloud.bigquery.standard_sql
55
:members:
66
:undoc-members:
77
:show-inheritance:

docs/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -109,12 +109,12 @@
109109
# List of patterns, relative to source directory, that match files and
110110
# directories to ignore when looking for source files.
111111
exclude_patterns = [
112+
"google/cloud/bigquery_v2/**", # Legacy proto-based types.
112113
"_build",
113114
"**/.nox/**/*",
114115
"samples/AUTHORING_GUIDE.md",
115116
"samples/CONTRIBUTING.md",
116117
"samples/snippets/README.rst",
117-
"bigquery_v2/services.rst", # generated by the code generator
118118
]
119119

120120
# The reST default role (used for this markup: `text`) to use for all

docs/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ API Reference
3030
Migration Guide
3131
---------------
3232

33-
See the guide below for instructions on migrating to the 2.x release of this library.
33+
See the guides below for instructions on migrating from older to newer *major* releases
34+
of this library (from ``1.x`` to ``2.x``, or from ``2.x`` to ``3.x``).
3435

3536
.. toctree::
3637
:maxdepth: 2

docs/reference.rst

+17-2
Original file line numberDiff line numberDiff line change
@@ -202,9 +202,24 @@ Encryption Configuration
202202
Additional Types
203203
================
204204

205-
Protocol buffer classes for working with the Models API.
205+
Helper SQL type classes.
206206

207207
.. toctree::
208208
:maxdepth: 2
209209

210-
bigquery_v2/types
210+
bigquery/standard_sql
211+
212+
213+
Legacy proto-based Types (deprecated)
214+
=====================================
215+
216+
The legacy type classes based on protocol buffers.
217+
218+
.. deprecated:: 3.0.0
219+
These types are provided for backward compatibility only, and are not maintained
220+
anymore.
221+
222+
.. toctree::
223+
:maxdepth: 2
224+
225+
bigquery/legacy_proto_types

docs/snippets.py

-4
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,6 @@
3030
import pandas
3131
except (ImportError, AttributeError):
3232
pandas = None
33-
try:
34-
import pyarrow
35-
except (ImportError, AttributeError):
36-
pyarrow = None
3733

3834
from google.api_core.exceptions import InternalServerError
3935
from google.api_core.exceptions import ServiceUnavailable

docs/usage/pandas.rst

+35-3
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ First, ensure that the :mod:`pandas` library is installed by running:
1414
1515
pip install --upgrade pandas
1616
17-
Alternatively, you can install the BigQuery python client library with
17+
Alternatively, you can install the BigQuery Python client library with
1818
:mod:`pandas` by running:
1919

2020
.. code-block:: bash
2121
22-
pip install --upgrade google-cloud-bigquery[pandas]
22+
pip install --upgrade 'google-cloud-bigquery[pandas]'
2323
2424
To retrieve query results as a :class:`pandas.DataFrame`:
2525

@@ -37,6 +37,38 @@ To retrieve table rows as a :class:`pandas.DataFrame`:
3737
:start-after: [START bigquery_list_rows_dataframe]
3838
:end-before: [END bigquery_list_rows_dataframe]
3939

40+
The following data types are used when creating a pandas DataFrame.
41+
42+
.. list-table:: Pandas Data Type Mapping
43+
:header-rows: 1
44+
45+
* - BigQuery
46+
- pandas
47+
- Notes
48+
* - BOOL
49+
- boolean
50+
-
51+
* - DATETIME
52+
- datetime64[ns], object
53+
- The object dtype is used when there are values not representable in a
54+
pandas nanosecond-precision timestamp.
55+
* - DATE
56+
- dbdate, object
57+
- The object dtype is used when there are values not representable in a
58+
pandas nanosecond-precision timestamp.
59+
60+
Requires the ``db-dtypes`` package. See the `db-dtypes usage guide
61+
<https://googleapis.dev/python/db-dtypes/latest/usage.html>`_
62+
* - FLOAT64
63+
- float64
64+
-
65+
* - INT64
66+
- Int64
67+
-
68+
* - TIME
69+
- dbtime
70+
- Requires the ``db-dtypes`` package. See the `db-dtypes usage guide
71+
<https://googleapis.dev/python/db-dtypes/latest/usage.html>`_
4072

4173
Retrieve BigQuery GEOGRAPHY data as a GeoPandas GeoDataFrame
4274
------------------------------------------------------------
@@ -60,7 +92,7 @@ As of version 1.3.0, you can use the
6092
to load data from a :class:`pandas.DataFrame` to a
6193
:class:`~google.cloud.bigquery.table.Table`. To use this function, in addition
6294
to :mod:`pandas`, you will need to install the :mod:`pyarrow` library. You can
63-
install the BigQuery python client library with :mod:`pandas` and
95+
install the BigQuery Python client library with :mod:`pandas` and
6496
:mod:`pyarrow` by running:
6597

6698
.. code-block:: bash

0 commit comments

Comments
 (0)