Skip to content

Commit d6d371d

Browse files
jsondaicopybara-github
authored andcommitted
docs: update docstrings for rapid evaluation library.
PiperOrigin-RevId: 634051814
1 parent 339f8b6 commit d6d371d

File tree

3 files changed

+74
-15
lines changed

3 files changed

+74
-15
lines changed

vertexai/preview/evaluation/_eval_tasks.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,8 @@ class EvalTask:
5454
models and their settings, and assess the quality of the model's generated
5555
text.
5656
57-
Dataset details:
57+
Dataset Details:
58+
5859
Default dataset column names:
5960
* content_column_name: "content"
6061
* reference_column_name: "reference"
@@ -74,11 +75,13 @@ class EvalTask:
7475
dataset must contain `instruction` and `context` column.
7576
7677
Metrics Details:
78+
7779
The supported metrics, metric bundle descriptions, grading rubrics, and
7880
the required input fields can be found on the Vertex AI public
7981
documentation page [Evaluation methods and metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval).
8082
8183
Usage:
84+
8285
1. To perform bring-your-own-prediction(BYOP) evaluation, provide the model
8386
responses in the response column in the dataset. The response column name
8487
is "response" by default, or specify `response_column_name` parameter to

vertexai/preview/evaluation/metrics/_base.py

+65-5
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,67 @@
2020

2121

2222
class PairwiseMetric:
23-
"""The Side-by-side(SxS) Pairwise Metric."""
23+
"""The Side-by-side(SxS) Pairwise Metric.
24+
25+
A model-based evaluation metric that compares two generative models
26+
side-by-side, and allows users to A/B test their generative models to
27+
determine which model is performing better on the given evaluation task.
28+
29+
For more details on when to use pairwise metrics, see
30+
[Evaluation methods and metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#pointwise_versus_pairwise).
31+
32+
Result Details:
33+
34+
* In `EvalResult.summary_metrics`, win rates for both the baseline and
35+
candidate model are computed, showing the rate of each model performs
36+
better on the given task. The win rate is computed as the number of times
37+
the candidate model performs better than the baseline model divided by the
38+
total number of examples. The win rate is a number between 0 and 1.
39+
40+
* In `EvalResult.metrics_table`, a pairwise metric produces three
41+
evaluation results for each row in the dataset:
42+
* `pairwise_choice`: the `pairwise_choice` in the evaluation result is
43+
an enumeration that indicates whether the candidate or baseline
44+
model perform better.
45+
* `explanation`: The model AutoRater's rationale behind each verdict
46+
using chain-of-thought reasoning. These explanations help users
47+
scrutinize the AutoRater's judgment and build appropriate trust in
48+
its decisions.
49+
* `confidence`: A score between 0 and 1, which signifies how confident
50+
the AutoRater was with its verdict. A score closer to 1 means higher
51+
confidence.
52+
53+
See [documentation page](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#understand-results)
54+
for more details on understanding the metric results.
55+
56+
Usages:
57+
58+
```
59+
from vertexai.generative_models import GenerativeModel
60+
from vertexai.preview.evaluation import EvalTask, PairwiseMetric
61+
62+
baseline_model = GenerativeModel("gemini-1.0-pro")
63+
candidate_model = GenerativeModel("gemini-1.5-pro")
64+
65+
pairwise_summarization_quality = PairwiseMetric(
66+
metric = "summarization_quality",
67+
baseline_model=baseline_model,
68+
)
69+
70+
eval_task = EvalTask(
71+
dataset = pd.DataFrame({
72+
"instruction": [...],
73+
"context": [...],
74+
}),
75+
metrics=[pairwise_summarization_quality],
76+
)
77+
78+
pairwise_results = eval_task.evaluate(
79+
prompt_template="instruction: {instruction}. context: {context}",
80+
model=candidate_model,
81+
)
82+
```
83+
"""
2484

2585
def __init__(
2686
self,
@@ -37,8 +97,8 @@ def __init__(
3797
Args:
3898
metric: The Side-by-side(SxS) pairwise evaluation metric name.
3999
baseline_model: The baseline model for the Side-by-side(SxS) comparison.
40-
use_reference: Whether to use reference to compute the metric. If specified,
41-
the reference column is required in the dataset.
100+
use_reference: Whether to use reference to compute the metric. If
101+
specified, the reference column is required in the dataset.
42102
version: The metric version to use for evaluation.
43103
"""
44104
self._metric = metric
@@ -74,8 +134,8 @@ class CustomMetric:
74134
Attributes:
75135
name: The name of the metric.
76136
metric_function: The evaluation function. Must use the dataset row/instance
77-
as the metric_function input. Returns per-instance metric result as a
78-
dictionary. The metric score must mapped to the CustomMetric.name as key.
137+
as the metric_function input. Returns per-instance metric result as a
138+
dictionary. The metric score must mapped to the CustomMetric.name as key.
79139
"""
80140

81141
def __init__(

vertexai/preview/evaluation/prompt_template.py

+5-9
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,14 @@ class PromptTemplate:
2727
values using the `assemble` method, providing flexibility in generating
2828
dynamic prompts.
2929
30-
Example Usage:
30+
Usage:
3131
3232
```
33-
template_str = "Hello, {name}! Today is {day}. How are you?"
34-
prompt_template = PromptTemplate(template_str)
35-
completed_prompt = prompt_template.assemble(name="John", day="Monday")
36-
print(completed_prompt)
33+
template_str = "Hello, {name}! Today is {day}. How are you?"
34+
prompt_template = PromptTemplate(template_str)
35+
completed_prompt = prompt_template.assemble(name="John", day="Monday")
36+
print(completed_prompt)
3737
```
38-
39-
Attributes:
40-
template: The template string containing placeholders for replacement.
41-
placeholders: A set of placeholder names from the template string.
4238
"""
4339

4440
def __init__(self, template: str):

0 commit comments

Comments
 (0)