You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: applications/rag/README.md
+23-33
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# RAG-on-GKE Application
2
2
3
-
**NOTE:** This solution is in beta/a work in progress - please expect friction while using it.
3
+
**NOTE:** This solution is in beta. Please expect friction while using it.
4
4
5
5
This is a sample to deploy a RAG application on GKE. Retrieval Augmented Generation (RAG) is a popular approach for boosting the accuracy of LLM responses, particularly for domain specific or private data sets. The basic idea is to have a semantically searchable knowledge base (often using vector search), which is used to retrieve relevant snippets for a given prompt to provide additional context to the LLM. Augmenting the knowledge base with additional data is typically cheaper than fine tuning and is more scalable when incorporating current events and other rapidly changing data spaces.
6
6
@@ -32,7 +32,7 @@ CLUSTER_REGION=us-central1
32
32
```
33
33
2. Use the following instructions to create a GKE cluster. We recommend using Autopilot for a simpler setup.
34
34
35
-
##### Autopilot
35
+
##### Autopilot (recommended)
36
36
37
37
RAG requires the latest Autopilot features, available on GKE cluster version `1.29.1-gke.1575000`+
1. To create a GKE Standard cluster using Terraform, follow the [instructions here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/infrastructure/README.md). Use the preconfigured node pools in `/infrastructure/platform.tfvars` as this solution requires T4s and L4s.
export USER_PROMPT="How to deploy a container on K8s?"
@@ -119,6 +120,7 @@ curl 127.0.0.1:8080/generate -X POST \
119
120
}
120
121
EOF
121
122
```
123
+
122
124
* At the end of the smoke test with the TGI server, stop port forwarding by using Ctrl-C on the original terminal.
123
125
124
126
5. Verify the frontend chat interface is setup:
@@ -145,10 +147,10 @@ This step generates the vector embeddings for your input dataset. Currently, the
145
147
1. Create a CloudSQL user to access the database: `gcloud sql users create rag-user-notebook --password=${SQL_PASSWORD:?} --instance=pgvector-instance --host=%`
146
148
147
149
2. Go to the Jupyterhub service endpoint in a browser:
* IAP enabled: Read terraform output `jupyter_uri` or use command: `kubectl get managedcertificates jupyter-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
152
+
* Open Google Cloud Console IAM to verify that the user has role `IAP-secured Web App User`
153
+
* Wait for the domain status to be `Active`
152
154
3. Login with placeholder credentials [TBD: replace with instructions for IAP]:
153
155
* username: user
154
156
* password: use `terraform output jupyter_password` to fetch the password value
@@ -167,40 +169,28 @@ This step generates the vector embeddings for your input dataset. Currently, the
167
169
* `os.environ['KAGGLE_KEY']`
168
170
169
171
9. Run all the cells in the notebook. This will generate vector embeddings for the input dataset (`denizbilginn/google-maps-restaurant-reviews`) and store them in the `pgvector-instance` via a Ray job.
170
-
* Once submitted, Ray will take several minutes to create the runtime environment and optionally scale up Ray worker nodes. During this time, the job status will remain PENDING.
171
-
* When the job status is SUCCEEDED, the vector embeddings have been generated and we are ready to launch the frontend chat interface.
172
+
* If the Ray job has FAILED, re-run the cell.
173
+
* When the Ray job has SUCCEEDED, we are ready to launch the frontend chat interface.
172
174
173
-
### Launch the Frontend Chat Interface
175
+
### Access the Frontend Chat Interface
174
176
175
-
#### Accessing the Frontend with IAP Disabled
177
+
#### With IAP Disabled
176
178
1. Setup port forwarding for the frontend: `kubectl port-forward service/rag-frontend -n $NAMESPACE 8080:8080 &`
177
179
178
180
2. Go to `localhost:8080` in a browser & start chatting! This will fetch context related to your prompt from the vector embeddings in the `pgvector-instance`, augment the original prompt with the context & query the inference model (`mistral-7b`) with the augmented prompt.
179
181
180
-
#### Accessing the Frontend with IAP Enabled
181
-
1. Verify IAP is Enabled
182
-
183
-
* Ensure that IAP is enabled on Google Cloud Platform (GCP) for your application. If you encounter any errors, try re-enabling IAP.
184
-
185
-
2. Verify User Role
186
-
187
-
* Make sure you have the role `IAP-secured Web App User` assigned to your user account. This role is necessary to access the application through IAP.
188
-
189
-
3. Verify Domain is Active
190
-
* Make sure the domain is active using commend:
191
-
`kubectl get managedcertificates frontend-managed-cert -n rag --output jsonpath='{.status.domainStatus[0].status}'`
192
-
193
-
3. Retrieve the Domain
194
-
195
-
* Read terraform output `frontend_uri` or use the following command to find the domain created by IAP for accessing your service:
196
-
`kubectl get managedcertificates frontend-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
197
-
198
-
4. Access the Frontend
182
+
#### With IAP Enabled
183
+
1. Verify that IAP is enabled on Google Cloud Platform (GCP) for your application. If you encounter any errors, try re-enabling IAP.
184
+
2. Verify that you have the role `IAP-secured Web App User` assigned to your user account. This role is necessary to access the application through IAP.
185
+
3. Verify the domain is active using command:
186
+
`kubectl get managedcertificates frontend-managed-cert -n rag --output jsonpath='{.status.domainStatus[0].status}'`
187
+
3. Read terraform output `frontend_uri` or use the following command to find the domain created by IAP for accessing your service:
188
+
`kubectl get managedcertificates frontend-managed-cert -n $NAMESPACE --output jsonpath='{.status.domainStatus[0].domain}'`
189
+
4. Open your browser and navigate to the domain you retrieved in the previous step to start chatting!
199
190
200
-
* Open your browser and navigate to the domain you retrieved in the previous step to start chatting!
191
+
#### Prompt Examples
201
192
202
-
#### Prompts Example
203
-
3. [TODO: Add some example prompts for the dataset].
0 commit comments