Skip to content

Commit 19221db

Browse files
authored
Merge pull request #57 from Tanzania-AI-Community/documentation
Add documentation and small code modifications for modularity
2 parents 864baa7 + 583ad5b commit 19221db

File tree

9 files changed

+258
-67
lines changed

9 files changed

+258
-67
lines changed

app/config.py

+16-6
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
This module sets the env configs for our WhatsApp app.
33
"""
44

5-
from typing import Optional
5+
from typing import Literal, Optional
66
import os
77
from pydantic_settings import BaseSettings, SettingsConfigDict
88
from pydantic import SecretStr, field_validator
@@ -66,22 +66,32 @@ class LLMSettings(BaseSettings):
6666
case_sensitive=False,
6767
env_nested_delimiter="__",
6868
)
69-
# Together AI settings
69+
70+
# AI provider api key
7071
llm_api_key: Optional[SecretStr] = None
7172

7273
# Model selection
7374
llm_model_options: dict = {
7475
"llama_405b": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
7576
"llama_70b": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
7677
"mixtral": "mistralai/Mixtral-8x7B-Instruct-v0.1",
78+
"gpt-4o": "gpt-4o",
79+
"gpt-4o_mini": "gpt-40-mini",
7780
}
78-
llm_model_name: str = llm_model_options["llama_405b"]
7981

80-
# Embedding model
81-
embedding_model: str = "BAAI/bge-large-en-v1.5"
82+
embedder_model_options: dict = {
83+
"bge-large": "BAAI/bge-large-en-v1.5", # 1024 dimensions
84+
"text-embedding-3-small": "text-embedding-3-small", # 1536 dimensions
85+
}
8286

83-
# Exercise generator model
87+
"""
88+
XXX: FILL YOUR AI PROVIDER AND MODEL CHOICES HERE (DEFAULTS ARE PREFILLED)
89+
- make sure your choice of LLM, embedder, and ai_provider are compatible
90+
"""
91+
ai_provider: Literal["together", "openai"] = "together"
92+
llm_model_name: str = llm_model_options["llama_405b"]
8493
exercise_generator_model: str = llm_model_options["llama_70b"]
94+
embedding_model: str = embedder_model_options["bge-large"]
8595

8696

8797
def initialize_settings():

app/database/models.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,14 @@ class Chunk(SQLModel, table=True):
417417
content_type: Optional[str] = Field(
418418
max_length=30
419419
) # exercise, text, image, etc. (to define later) - maybe add index in future
420-
embedding: Any = Field(sa_column=Column(Vector(1024))) # BAAI/bge-large-en-v1.5
420+
421+
"""
422+
XXX: FILL IN THE EMBEDDING LENGTH FOR YOUR EMBEDDINGS
423+
- Default is set to 1024 (for bge-large vectors)
424+
- Replace with 1536 for text-embedding-3-small if using OpenAI's embedder
425+
"""
426+
embedding: Any = Field(sa_column=Column(Vector(1024)))
427+
421428
top_level_section_index: Optional[str] = Field(max_length=10, default=None)
422429
top_level_section_title: Optional[str] = Field(max_length=100, default=None)
423430
created_at: Optional[datetime] = Field(

app/services/llm_service.py

-5
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import logging
33
import asyncio
44
from typing import List, Optional
5-
from openai import AsyncOpenAI
65
from openai.types.chat import ChatCompletionMessageToolCall
76

87
from app.database.models import Message, MessageRole, User
@@ -41,10 +40,6 @@ def is_locked(self) -> bool:
4140

4241
class LLMClient:
4342
def __init__(self):
44-
self.client = AsyncOpenAI(
45-
base_url="https://api.together.xyz/v1",
46-
api_key=llm_settings.llm_api_key.get_secret_value(),
47-
)
4843
self.logger = logging.getLogger(__name__)
4944
self._processors: dict[int, MessageProcessor] = {}
5045

app/utils/embedder.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
1-
# This is in scripts/database for now but will be moved to app/database
21
from typing import List
32
from app.config import llm_settings
43
from together import Together
4+
from openai import OpenAI
55

6-
client = Together(
7-
api_key=llm_settings.llm_api_key.get_secret_value(),
6+
client = (
7+
OpenAI(api_key=llm_settings.llm_api_key.get_secret_value())
8+
if llm_settings.ai_provider == "openai"
9+
else Together(api_key=llm_settings.llm_api_key.get_secret_value())
810
)
911

1012

app/utils/llm_utils.py

+7-4
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,13 @@
1212
# Set up basic logging configuration
1313
logger = logging.getLogger(__name__)
1414

15-
llm_client = openai.AsyncOpenAI(
16-
base_url="https://api.together.xyz/v1",
17-
api_key=llm_settings.llm_api_key.get_secret_value(),
18-
)
15+
if llm_settings.ai_provider == "together":
16+
llm_client = openai.AsyncOpenAI(
17+
base_url="https://api.together.xyz/v1",
18+
api_key=llm_settings.llm_api_key.get_secret_value(),
19+
)
20+
else:
21+
llm_client = openai.AsyncOpenAI(api_key=llm_settings.llm_api_key.get_secret_value())
1922

2023

2124
def num_tokens_from_string(string: str, encoding_name: str = "cl100k_base") -> int:

docs/en/ARCHITECTURE.md

+38-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Platform infrastructure
2+
23
<div align="center">
34

45
![Twiga Architecture](https://github.com/user-attachments/assets/33e4e394-b724-4ea4-af2a-7e75f93615aa)
@@ -7,8 +8,42 @@
78

89
This diagram is an overview of the infrastructure for the first iteration of Twiga in production. We appreciate simple architectures and want to minimize the number of platforms we use all while maintaining good performance.
910

10-
# Code infrastructure
11-
...tbd
11+
# Code architecture
12+
13+
We have designed Twiga's backend for simplicity and modularity.
14+
15+
## `app`
16+
17+
Everything used to run the Twiga application is within the `app` folder. Requests coming from the WhatsApp users (via the Meta API) are first received by the endpoints in the `app/main.py` file (the `webhooks` endpoint). Some WhatsApp signatures are controlled by the decorators in `app/security.py`and then the `handle_request` function in `app/services/messaging_service.py` routes the requests in the right direction depending on the type of request and the state of the user.
18+
19+
All environment variables are fetched from `app/config.py`, so when using these in any way just import the settings to your file.
20+
21+
> [!Note]
22+
>
23+
> Don't use `dotenv`, just use our settings.
24+
25+
The AI-relevant code is mainly handled in the `app/llm_service.py`. Conveniently, if you're planning on creating any new tools, you can create it in the `app/tools/` folder. Just follow the convention we've set.
26+
27+
We'll leave it up to you to explore the rest.
28+
29+
> [!Warning]
30+
>
31+
> If anything here appears off it may not be up to date. Let us know 😁
32+
33+
## `scripts`
34+
35+
Within the `scripts` folder we keep files that are run intermittently from the developer side. Look in there if you want to populate your own version of the database with some textbook data.
36+
37+
## `tests`
38+
39+
> [!Note]
40+
>
41+
> We are yet to make tests but it's in the roadmap.
1242
1343
# Database schema
14-
...tbd
44+
45+
We're using tiangolos [SQLModel](https://sqlmodel.tiangolo.com/) as an [ORM](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping) to interact with the Neon Postgres database in this project. Instead of statically sharing the database schema here (which is likely to change over time) we refer you to the `app/database/model.py` file which should contain everything you need to know regarding what tables are used in Twiga. We also have an [entity-relationship diagram](https://drive.google.com/file/d/10dKIW6I6_d-712rt0s-7KltTWTmBjRIP/view?usp=sharing) (ERD) providing an overview of the table relations but it is not consistently maintained and may not match exactly with the current database version.
46+
47+
## `migrations`
48+
49+
This folder keeps track of the database history. We use [_alembic_](https://medium.com/@kasperjuunge/how-to-get-started-with-alembic-and-sqlmodel-288700002543) migrations. Unless you want to use _alembic_ for your own copy of the database you can ignore this folder. If you're in the core team and have access to our Neon database, it might be good to know how it works and why we use it.

0 commit comments

Comments
 (0)