Skip to content

Commit c8a8384

Browse files
authored
Merge branch 'main' into b296390934-index-return-types
2 parents bb24078 + 3392800 commit c8a8384

File tree

12 files changed

+149
-129
lines changed

12 files changed

+149
-129
lines changed

CHANGELOG.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,43 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [1.28.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.27.0...v1.28.0) (2024-12-11)
8+
9+
10+
### Features
11+
12+
* (Series | DataFrame).plot.bar ([#1152](https://github.com/googleapis/python-bigquery-dataframes/issues/1152)) ([0fae2e0](https://github.com/googleapis/python-bigquery-dataframes/commit/0fae2e0291ec8d22341b5b543e8f1b384f83cd3c))
13+
* `bigframes.bigquery.vector_search` supports `use_brute_force` and `fraction_lists_to_search` parameters ([#1158](https://github.com/googleapis/python-bigquery-dataframes/issues/1158)) ([131edc3](https://github.com/googleapis/python-bigquery-dataframes/commit/131edc3d79f46d35a25422f0db7f150e63e8f561))
14+
* Add `ARIMAPlus.predict_explain()` to generate forecasts with explanation columns ([#1177](https://github.com/googleapis/python-bigquery-dataframes/issues/1177)) ([05f8b4d](https://github.com/googleapis/python-bigquery-dataframes/commit/05f8b4d2b2b5f624097228e65a3c42364fc40d36))
15+
* Add client_endpoints_override to bq options ([#1167](https://github.com/googleapis/python-bigquery-dataframes/issues/1167)) ([be74b99](https://github.com/googleapis/python-bigquery-dataframes/commit/be74b99977cfbd513def5b7e439de6b7706c0712))
16+
* Add support for temporal types in dataframe's describe() method ([#1189](https://github.com/googleapis/python-bigquery-dataframes/issues/1189)) ([2d564a6](https://github.com/googleapis/python-bigquery-dataframes/commit/2d564a6a9925b69c7e9a15b532fb66ad68c3e264))
17+
* Allow join-free alignment of analytic expressions ([#1168](https://github.com/googleapis/python-bigquery-dataframes/issues/1168)) ([daef4f0](https://github.com/googleapis/python-bigquery-dataframes/commit/daef4f0c7c5ff2d0a4e9a6ffefeb81f43780ac8b))
18+
* Series.isin supports bigframes.Series arg ([#1195](https://github.com/googleapis/python-bigquery-dataframes/issues/1195)) ([0d8a16b](https://github.com/googleapis/python-bigquery-dataframes/commit/0d8a16ba77a66dce544d0a7cf411fca0adc2a694))
19+
* Update llm.TextEmbeddingGenerator to 005 ([#1186](https://github.com/googleapis/python-bigquery-dataframes/issues/1186)) ([3072d38](https://github.com/googleapis/python-bigquery-dataframes/commit/3072d382c6ff57bdb37d7e080c794c67dbf6e701))
20+
21+
22+
### Bug Fixes
23+
24+
* Fix error loading local dataframes into bigquery ([#1165](https://github.com/googleapis/python-bigquery-dataframes/issues/1165)) ([5b355ef](https://github.com/googleapis/python-bigquery-dataframes/commit/5b355efde122ed76b1cff39900ab8f94f5a13a30))
25+
* Fix null index join with 'on' arg ([#1153](https://github.com/googleapis/python-bigquery-dataframes/issues/1153)) ([9015c33](https://github.com/googleapis/python-bigquery-dataframes/commit/9015c33e73675ebb2299487dce3295732ea0527e))
26+
* Fix series.isin using local path always ([#1202](https://github.com/googleapis/python-bigquery-dataframes/issues/1202)) ([a44eafd](https://github.com/googleapis/python-bigquery-dataframes/commit/a44eafdd95eb1b994dc82411640b61fd0a78a492))
27+
28+
29+
### Performance Improvements
30+
31+
* Update df.corr, df.cov to be used with more than 30 columns case. ([#1161](https://github.com/googleapis/python-bigquery-dataframes/issues/1161)) ([9dcf1aa](https://github.com/googleapis/python-bigquery-dataframes/commit/9dcf1aa918919704dcf4d12b05935b22fb502fc6))
32+
33+
34+
### Documentation
35+
36+
* Add a code sample using `bpd.options.bigquery.ordering_mode = "partial"` ([#909](https://github.com/googleapis/python-bigquery-dataframes/issues/909)) ([f80d705](https://github.com/googleapis/python-bigquery-dataframes/commit/f80d70503b80559a0b1fe64434383aa3e028bf9b))
37+
* Add snippet for creating boosted tree model ([#1142](https://github.com/googleapis/python-bigquery-dataframes/issues/1142)) ([a972668](https://github.com/googleapis/python-bigquery-dataframes/commit/a972668833a454fb18e6cb148697165edd46e8cc))
38+
* Add snippet for evaluating a boosted tree model ([#1154](https://github.com/googleapis/python-bigquery-dataframes/issues/1154)) ([9d8970a](https://github.com/googleapis/python-bigquery-dataframes/commit/9d8970ac1f18b2520a061ac743e767ca8593cc8c))
39+
* Add snippet for predicting classifications using a boosted tree model ([#1156](https://github.com/googleapis/python-bigquery-dataframes/issues/1156)) ([e7b83f1](https://github.com/googleapis/python-bigquery-dataframes/commit/e7b83f166ef56e631120050103c2f43f454fce44))
40+
* Add third party `pandas.Index methods` and docstrings ([#1171](https://github.com/googleapis/python-bigquery-dataframes/issues/1171)) ([a970294](https://github.com/googleapis/python-bigquery-dataframes/commit/a9702945286fbe500ade4d0f0c14cc60a8aa00eb))
41+
* Fix Bigframes.Pandas.General_Function missing docs ([#1164](https://github.com/googleapis/python-bigquery-dataframes/issues/1164)) ([de923d0](https://github.com/googleapis/python-bigquery-dataframes/commit/de923d01b904b96cc51dfd526b6a412f28ff10c4))
42+
* Update `bigframes.pandas.Index` docstrings ([#1144](https://github.com/googleapis/python-bigquery-dataframes/issues/1144)) ([557ab8d](https://github.com/googleapis/python-bigquery-dataframes/commit/557ab8df526fcf743af0a609ec7ec636b00d0c0b))
43+
744
## [1.27.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v1.26.0...v1.27.0) (2024-11-16)
845

946

bigframes/core/blocks.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2025,7 +2025,7 @@ def isin(self, other: Block):
20252025
assert len(other.value_columns) == 1
20262026
unique_other_values = other.expr.select_columns(
20272027
[other.value_columns[0]]
2028-
).aggregate((), by_column_ids=(other.value_columns[0],))
2028+
).aggregate((), by_column_ids=(other.value_columns[0],), dropna=False)
20292029
block = self
20302030
# for each original column, join with other
20312031
for i in range(len(self.value_columns)):
@@ -2039,9 +2039,7 @@ def _isin_inner(self: Block, col: str, unique_values: core.ArrayValue) -> Block:
20392039
expr, (l_map, r_map) = self._expr.relational_join(
20402040
unique_values, ((col, unique_values.column_ids[0]),), type="left"
20412041
)
2042-
expr, matches = expr.project_to_id(
2043-
ops.eq_op.as_expr(ex.const(True), r_map[const])
2044-
)
2042+
expr, matches = expr.project_to_id(ops.notnull_op.as_expr(r_map[const]))
20452043

20462044
new_index_cols = tuple(l_map[idx_col] for idx_col in self.index_columns)
20472045
new_value_cols = tuple(

bigframes/dataframe.py

Lines changed: 50 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -517,6 +517,17 @@ def select_dtypes(self, include=None, exclude=None) -> DataFrame:
517517
)
518518
return DataFrame(self._block.select_columns(selected_columns))
519519

520+
def _select_exact_dtypes(
521+
self, dtypes: Sequence[bigframes.dtypes.Dtype]
522+
) -> DataFrame:
523+
"""Selects columns without considering inheritance relationships."""
524+
columns = [
525+
col_id
526+
for col_id, dtype in zip(self._block.value_columns, self._block.dtypes)
527+
if dtype in dtypes
528+
]
529+
return DataFrame(self._block.select_columns(columns))
530+
520531
def _set_internal_query_job(self, query_job: Optional[bigquery.QueryJob]):
521532
self._query_job = query_job
522533

@@ -2437,13 +2448,9 @@ def agg(
24372448
aggregations = [agg_ops.lookup_agg_func(f) for f in func]
24382449

24392450
for dtype, agg in itertools.product(self.dtypes, aggregations):
2440-
if not bigframes.operations.aggregations.is_agg_op_supported(
2441-
dtype, agg
2442-
):
2443-
raise NotImplementedError(
2444-
f"Type {dtype} does not support aggregation {agg}. "
2445-
f"Share your usecase with the BigQuery DataFrames team at the {constants.FEEDBACK_LINK}"
2446-
)
2451+
agg.output_type(
2452+
dtype
2453+
) # Raises exception if the agg does not support the dtype.
24472454

24482455
return DataFrame(
24492456
self._block.summarize(
@@ -2512,7 +2519,10 @@ def melt(
25122519

25132520
def describe(self, include: None | Literal["all"] = None) -> DataFrame:
25142521
if include is None:
2515-
numeric_df = self._drop_non_numeric(permissive=False)
2522+
numeric_df = self._select_exact_dtypes(
2523+
bigframes.dtypes.NUMERIC_BIGFRAMES_TYPES_RESTRICTIVE
2524+
+ bigframes.dtypes.TEMPORAL_NUMERIC_BIGFRAMES_TYPES
2525+
)
25162526
if len(numeric_df.columns) == 0:
25172527
# Describe eligible non-numeric columns
25182528
return self._describe_non_numeric()
@@ -2540,9 +2550,11 @@ def describe(self, include: None | Literal["all"] = None) -> DataFrame:
25402550
raise ValueError(f"Unsupported include type: {include}")
25412551

25422552
def _describe_numeric(self) -> DataFrame:
2543-
return typing.cast(
2553+
number_df_result = typing.cast(
25442554
DataFrame,
2545-
self._drop_non_numeric(permissive=False).agg(
2555+
self._select_exact_dtypes(
2556+
bigframes.dtypes.NUMERIC_BIGFRAMES_TYPES_RESTRICTIVE
2557+
).agg(
25462558
[
25472559
"count",
25482560
"mean",
@@ -2555,16 +2567,41 @@ def _describe_numeric(self) -> DataFrame:
25552567
]
25562568
),
25572569
)
2570+
temporal_df_result = typing.cast(
2571+
DataFrame,
2572+
self._select_exact_dtypes(
2573+
bigframes.dtypes.TEMPORAL_NUMERIC_BIGFRAMES_TYPES
2574+
).agg(["count"]),
2575+
)
2576+
2577+
if len(number_df_result.columns) == 0:
2578+
return temporal_df_result
2579+
elif len(temporal_df_result.columns) == 0:
2580+
return number_df_result
2581+
else:
2582+
import bigframes.core.reshape.api as rs
2583+
2584+
original_columns = self._select_exact_dtypes(
2585+
bigframes.dtypes.NUMERIC_BIGFRAMES_TYPES_RESTRICTIVE
2586+
+ bigframes.dtypes.TEMPORAL_NUMERIC_BIGFRAMES_TYPES
2587+
).columns
2588+
2589+
# Use reindex after join to preserve the original column order.
2590+
return rs.concat(
2591+
[number_df_result, temporal_df_result],
2592+
axis=1,
2593+
)._reindex_columns(original_columns)
25582594

25592595
def _describe_non_numeric(self) -> DataFrame:
25602596
return typing.cast(
25612597
DataFrame,
2562-
self.select_dtypes(
2563-
include={
2598+
self._select_exact_dtypes(
2599+
[
25642600
bigframes.dtypes.STRING_DTYPE,
25652601
bigframes.dtypes.BOOL_DTYPE,
25662602
bigframes.dtypes.BYTES_DTYPE,
2567-
}
2603+
bigframes.dtypes.TIME_DTYPE,
2604+
]
25682605
).agg(["count", "nunique"]),
25692606
)
25702607

bigframes/dtypes.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
import datetime
1919
import decimal
2020
import typing
21-
from typing import Dict, Literal, Union
21+
from typing import Dict, List, Literal, Union
2222

2323
import bigframes_vendored.constants as constants
2424
import geopandas as gpd # type: ignore
@@ -211,7 +211,7 @@ class SimpleDtypeInfo:
211211

212212
# Corresponds to the pandas concept of numeric type (such as when 'numeric_only' is specified in an operation)
213213
# Pandas is inconsistent, so two definitions are provided, each used in different contexts
214-
NUMERIC_BIGFRAMES_TYPES_RESTRICTIVE = [
214+
NUMERIC_BIGFRAMES_TYPES_RESTRICTIVE: List[Dtype] = [
215215
FLOAT_DTYPE,
216216
INT_DTYPE,
217217
]
@@ -222,7 +222,16 @@ class SimpleDtypeInfo:
222222
]
223223

224224

225-
## dtype predicates - use these to maintain consistency
225+
# Temporal types that are considered as "numeric" by Pandas
226+
TEMPORAL_NUMERIC_BIGFRAMES_TYPES: List[Dtype] = [
227+
DATE_DTYPE,
228+
TIMESTAMP_DTYPE,
229+
DATETIME_DTYPE,
230+
]
231+
TEMPORAL_BIGFRAMES_TYPES = TEMPORAL_NUMERIC_BIGFRAMES_TYPES + [TIME_DTYPE]
232+
233+
234+
# dtype predicates - use these to maintain consistency
226235
def is_datetime_like(type_: ExpressionType) -> bool:
227236
return type_ in (DATETIME_DTYPE, TIMESTAMP_DTYPE)
228237

@@ -630,7 +639,7 @@ def can_coerce(source_type: ExpressionType, target_type: ExpressionType) -> bool
630639
return True # None can be coerced to any supported type
631640
else:
632641
return (source_type == STRING_DTYPE) and (
633-
target_type in (DATETIME_DTYPE, TIMESTAMP_DTYPE, TIME_DTYPE, DATE_DTYPE)
642+
target_type in TEMPORAL_BIGFRAMES_TYPES
634643
)
635644

636645

bigframes/operations/aggregations.py

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -579,14 +579,3 @@ def lookup_agg_func(key: str) -> typing.Union[UnaryAggregateOp, NullaryAggregate
579579
return _AGGREGATIONS_LOOKUP[key]
580580
else:
581581
raise ValueError(f"Unrecognize aggregate function: {key}")
582-
583-
584-
def is_agg_op_supported(dtype: dtypes.Dtype, op: AggregateOp) -> bool:
585-
if dtype in dtypes.NUMERIC_BIGFRAMES_TYPES_PERMISSIVE:
586-
return True
587-
588-
if dtype in (dtypes.STRING_DTYPE, dtypes.BOOL_DTYPE, dtypes.BYTES_DTYPE):
589-
return isinstance(op, (CountOp, NuniqueOp))
590-
591-
# For all other types, support no aggregation
592-
return False

bigframes/series.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -719,7 +719,7 @@ def nsmallest(self, n: int = 5, keep: str = "first") -> Series:
719719

720720
def isin(self, values) -> "Series" | None:
721721
if isinstance(values, (Series,)):
722-
self._block.isin(values._block)
722+
return Series(self._block.isin(values._block))
723723
if not _is_list_like(values):
724724
raise TypeError(
725725
"only list-like objects are allowed to be passed to "

bigframes/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "1.27.0"
15+
__version__ = "1.28.0"

tests/system/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1358,4 +1358,4 @@ def cleanup_cloud_functions(session, cloudfunctions_client, dataset_id_permanent
13581358
# backend flakiness.
13591359
#
13601360
# Let's stop further clean up and leave it to later.
1361-
traceback.print_exception(exc)
1361+
traceback.print_exception(type(exc), exc, None)

tests/system/load/test_llm.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,12 @@ def llm_remote_text_df(session, llm_remote_text_pandas_df):
3939

4040

4141
@pytest.mark.flaky(retries=2)
42-
def test_llm_gemini_configure_fit(llm_fine_tune_df_default_index, llm_remote_text_df):
43-
model = llm.GeminiTextGenerator(model_name="gemini-pro", max_iterations=1)
42+
def test_llm_gemini_configure_fit(
43+
session, llm_fine_tune_df_default_index, llm_remote_text_df
44+
):
45+
model = llm.GeminiTextGenerator(
46+
session=session, model_name="gemini-pro", max_iterations=1
47+
)
4448

4549
X_train = llm_fine_tune_df_default_index[["prompt"]]
4650
y_train = llm_fine_tune_df_default_index[["label"]]

tests/system/small/test_dataframe.py

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2671,11 +2671,11 @@ def test_dataframe_agg_int_multi_string(scalars_dfs):
26712671

26722672

26732673
@skip_legacy_pandas
2674-
def test_df_describe(scalars_dfs):
2674+
def test_df_describe_non_temporal(scalars_dfs):
26752675
scalars_df, scalars_pandas_df = scalars_dfs
2676-
# pyarrows time columns fail in pandas
2676+
# excluding temporal columns here because BigFrames cannot perform percentiles operations on them
26772677
unsupported_columns = ["datetime_col", "timestamp_col", "time_col", "date_col"]
2678-
bf_result = scalars_df.describe().to_pandas()
2678+
bf_result = scalars_df.drop(columns=unsupported_columns).describe().to_pandas()
26792679

26802680
modified_pd_df = scalars_pandas_df.drop(columns=unsupported_columns)
26812681
pd_result = modified_pd_df.describe()
@@ -2709,12 +2709,14 @@ def test_df_describe(scalars_dfs):
27092709
def test_df_describe_non_numeric(scalars_dfs, include):
27102710
scalars_df, scalars_pandas_df = scalars_dfs
27112711

2712-
non_numeric_columns = ["string_col", "bytes_col", "bool_col"]
2712+
# Excluding "date_col" here because in BigFrames it is used as PyArrow[date32()], which is
2713+
# considered numerical in Pandas
2714+
target_columns = ["string_col", "bytes_col", "bool_col", "time_col"]
27132715

2714-
modified_bf = scalars_df[non_numeric_columns]
2716+
modified_bf = scalars_df[target_columns]
27152717
bf_result = modified_bf.describe(include=include).to_pandas()
27162718

2717-
modified_pd_df = scalars_pandas_df[non_numeric_columns]
2719+
modified_pd_df = scalars_pandas_df[target_columns]
27182720
pd_result = modified_pd_df.describe(include=include)
27192721

27202722
# Reindex results with the specified keys and their order, because
@@ -2726,8 +2728,35 @@ def test_df_describe_non_numeric(scalars_dfs, include):
27262728
).rename(index={"unique": "nunique"})
27272729

27282730
pd.testing.assert_frame_equal(
2729-
pd_result[non_numeric_columns].astype("Int64"),
2730-
bf_result[non_numeric_columns],
2731+
pd_result.astype("Int64"),
2732+
bf_result,
2733+
check_index_type=False,
2734+
)
2735+
2736+
2737+
@skip_legacy_pandas
2738+
def test_df_describe_temporal(scalars_dfs):
2739+
scalars_df, scalars_pandas_df = scalars_dfs
2740+
2741+
temporal_columns = ["datetime_col", "timestamp_col", "time_col", "date_col"]
2742+
2743+
modified_bf = scalars_df[temporal_columns]
2744+
bf_result = modified_bf.describe(include="all").to_pandas()
2745+
2746+
modified_pd_df = scalars_pandas_df[temporal_columns]
2747+
pd_result = modified_pd_df.describe(include="all")
2748+
2749+
# Reindex results with the specified keys and their order, because
2750+
# the relative order is not important.
2751+
bf_result = bf_result.reindex(["count", "nunique"])
2752+
pd_result = pd_result.reindex(
2753+
["count", "unique"]
2754+
# BF counter part of "unique" is called "nunique"
2755+
).rename(index={"unique": "nunique"})
2756+
2757+
pd.testing.assert_frame_equal(
2758+
pd_result.astype("Float64"),
2759+
bf_result.astype("Float64"),
27312760
check_index_type=False,
27322761
)
27332762

0 commit comments

Comments
 (0)