-
Notifications
You must be signed in to change notification settings - Fork 249
Split extension into an extension and a library package #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -177,6 +184,10 @@ def psql_cmd(cmd: str) -> str: | |||
|
|||
|
|||
def test_vectorizer_timescaledb(): | |||
with psycopg.connect(db_url("test")) as con: | |||
with con.cursor() as cur: | |||
cur.execute("create extension ai cascade") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand why we're installing the extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the thing that sends the ping to the lamda is still in the extension so on our cloud there is a dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added logic to the install() function to set up the ai extension if executor_url is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we take an advisory lock before attempting install/upgrade? bail if we don't get it.
); | ||
|
||
insert into ai.app_version ("name", version) | ||
values ('ai', '__version__') on conflict ("name") do update set version = excluded.version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check the version? If the version we are trying to install/upgrade is already present, abort? If you're running N apps in a load balancer, we probably want the first to win and the rest to skip.
projects/pgai/.python-version
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was committed by mistake in the first place? Seems inappropriate on a library that can run on many python versions
-> leaving ai.execute_vectorizer in extension | ||
-> ai._vectorizer_job which calls ai.execute_vectorizer is moved to dbapp | ||
|
||
-> creating the index still needs to be handled (probably through the worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work for lambda-based workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lamba stuff it's done in the timescaledb job. That actually already works
projects/pgai/db/build.py
Outdated
def error_if_pre_release() -> None: | ||
# Note: released versions always have the output sql file | ||
# commited into the repository. | ||
output_file = output_sql_file() | ||
command = ( | ||
"just ext build-install" | ||
if "ROOT_JUSTFILE" in os.environ | ||
else "just build-install" | ||
if "PROJECT_JUSTFILE" in os.environ | ||
else "python3 build.py build-install" | ||
) | ||
if not Path(output_file).exists(): | ||
print( | ||
textwrap.dedent(f""" | ||
WARNING: You're trying to install a pre-release version of pgai. | ||
This is not supported, and there is no upgrade path. | ||
|
||
Instead, install an official release from https://github.com/timescale/pgai/releases. | ||
|
||
If you are certain that you want to install a pre-release version, run: | ||
`{command}` | ||
""") | ||
) | ||
exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite right. Also, I'm not sure it's necessary. People won't be using the just
commands to install the SQL anymore. Maybe this check needs to go into the Python library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this was dead code. I removed it and added a check in the python code.
projects/pgai/pgai/cli.py
Outdated
default="postgres://postgres@localhost:5432/postgres", | ||
show_default=True, | ||
) | ||
def install(db_url: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is upgrading
the same as installing
over an existing installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but approving to unblock.
raise notice '%', _sql; | ||
execute _sql; | ||
|
||
if _rec.relname != 'vectorizer_id_seq' THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not change the ownership of vectorizer_id_seq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a code comment. The reason is it's linked to the vectorizer table so you change the owner of the table instead.
.github/workflows/ci.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't tested by CI because we use pull_request_target
to build it. I attempted to run just pgai db ci
locally, and it didn't pass, so I don't have confidence that these tests will succeed when this merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I (accidentally) ran them by creating a PR to this branch and they are red: https://github.com/timescale/pgai/actions/runs/14193958551/job/39764748178?pr=598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run manually through workflow_dispatch. everything passes now https://github.com/timescale/pgai/actions/runs/14205814536
projects/pgai/db/build.py
Outdated
[ | ||
"uv run --no-project pgspot --ignore-lang=plpython3u", | ||
'--proc-without-search-path "ai._vectorizer_job(job_id integer,config pg_catalog.jsonb)"', | ||
f"{output_sql_file()}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a commit with this fix.
USER root | ||
|
||
RUN pip install --break-system-packages uv==0.6.3 | ||
RUN mkdir /py/ && uv venv --directory /py/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing correctly setting up dependencies in the container.
RUN mkdir /py/ && uv venv --directory /py/ | |
COPY pyproject.toml uv.lock /py/ | |
RUN uv sync --directory /py --no-install-project --only-dev --frozen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I do that in the just command docker-sync now. Let's review after merge.
projects/pgai/db/README.md
Outdated
just docker-shell | ||
|
||
//inside shell 1 | ||
uv pip install fastapi[standard] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested change to the dockerfile, this becomes redundant.
uv pip install fastapi[standard] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah these were just outdated docs docker-sync takes care of this
projects/pgai/db/README.md
Outdated
uv sync --active | ||
cd ../extension && just build && just install && cd ../pgai/db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should eliminate these extra manual steps required to run the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah outdate docs this is already not necessary. fixing doc
, installed_at timestamptz not null default pg_catalog.clock_timestamp() | ||
); | ||
|
||
--check if the app has already been installed, error if so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why error? Because it's the only way to stop with the rest of the installation, or because we actually see this as being erroneous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because its the only way to stop the install
Purpose
This PR separates the codebase into two distinct components:
vectorizer and text2sql functionality
The key advantage: Portability. The pgai Python library can now be used on any PostgreSQL
installation, including cloud platforms that don't support the pgai extensions.
Architecture
The pgai library installs database objects (tables, schemas, functions) through the
pgai install
function using scripts compiled from the
projects/pgai/db
directory. The migration management followsthe extension's proven approach:
ai.migration_app
tableThe approach of using the
ai.migration_app
table allows us to maintains a single upgrade script compatible with any previous version, eliminating the need for version-specific upgrade paths.Migration Strategy
To transition from the previous monolithic extension:
pgai/db. Since the vectorizer code was mostly in their own files, these files moved mostly unchanged.
extension while preserving their functionality
ai.migration_app
After upgrading to the next extension version, vectorizer components remain intact but are no longer managed by the extension. The extension does not upgrade the vectorizer components before releasing them, since it now lack the required code. Instead, the pgai library can then independently upgrade these components regardless of which extension version previously created them.