Skip to content

Use the configured OpenAI Base URL for Automations #1065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 11, 2025

Conversation

arcuru
Copy link
Contributor

@arcuru arcuru commented Jan 10, 2025

This change makes Automations (and possibly other entrypoints) use the configured OpenAI-compatible server if that has been set. Without this change it tries to use the hardcoded OpenAI provider.

All the other calls in this file use a similar method to pass in the base URL.

I have not been able to manually test this because the docker build is taking an extremely long time to build locally.

Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Didn't realize we weren't passing api_base_url to the send_message_to_model_wrapper_sync method

@arcuru
Copy link
Contributor Author

arcuru commented Jan 10, 2025

I was able to build this locally for testing and I hit at least one uncovered issue related to sending the LLM calls to an OpenAI Compatible Server, which may need to be fixed as well for this scenario.

I think the root of the uncovered problem is from these logs:

server-1    | [18:46:28.889975] DEBUG    khoj.processor.conversation.utils:       utils.py:537
server-1    |                            Fallback to default chat model
server-1    |                            tokenizer: gpt-4o.
server-1    |                            Configure tokenizer for model:
server-1    |                            local/chat/phi4:14b in Khoj settings to
server-1    |                            improve context stuffing.

It appears that the name "gpt-4o" is hardcoded as the default tokenizer, and this call ended up failing even when I created a "gpt-4o" chat model in khoj, adding a "gpt-4o" endpoint on my LiteLLM server, and even configuring my OpenAI key for the gpt-4o model directly. Setting a tokenizer for that model in server/admin/ had no effect.

There are further errors in the logs after this, but they seem to be coming from chunking while running the query to create the automation. While it's good that this is hitting a failure immediately I really hope you're not just feeding the output of the selected GUI options into the LLM to convert it to a cron job template.

TBH, I've hit so many GUI and setup bugs while trying to selfhost Khoj that I'm not going to spend more time on it. I filed the bugs that probably impact your cloud service as well so you can fix for your paying customers.

@debanjum
Copy link
Member

Hey @arcuru , unfortunate you've hit multiple issues in setting up automations for self-hosted Khoj. I wonder if you're hitting the same timeout issues as #1035 (comment). Anyway let me look into self-hosted Khoj + OpenAI API proxy + Automation setups and see what can be improved for a less annoying experience.

Until then I've answered some of your concerns below:

It appears that the name "gpt-4o" is hardcoded as the default tokenizer, and this call ended up failing even when I created a "gpt-4o" chat model in khoj, adding a "gpt-4o" endpoint on my LiteLLM server, and even configuring my OpenAI key for the gpt-4o model directly. Setting a tokenizer for that model in server/admin/ had no effect.

This isn't an error just a debug log (notice the [18:46:28.889975] DEBUG prefix) that calculating prompt size is going to be less accurate. We fallback to use the gpt-4o tokenizer if can't infer the tokenizer to use for the current chat model, so in scenarios like this where the OpenAI API is being used with non-OpenAI models.

This only actually becomes a problem if the max prompt size set for the chat model is close the actual max prompt size of the model and your chat history starts hitting that limit. You could set max prompt size = 10K for phi4, given it has a 14K context window (which is small by modern standards).

Not sure why you're seeing this when using chat model to gpt-4o though.

I really hope you're not just feeding the output of the selected GUI options into the LLM to convert it to a cron job template.

The LLM isn't used to set crontime when you've explicitly specified it via GUI. We used to have the LLM set the crontime job schedule when we allowed creating an automation directly from the chat (e.g you tell Khoj in chat "Share synthetic biology news every tuesday at 9pm").

The LLM is used to convert your original query into an automation query and email subject. So if you say "Notify me if it's going to rain today" it converts it into a chat query: "Is it going to rain today?" and email subject: "Rain Notification". The results of the chat query: "Is it going to rain today?" is compared against your original query: "Notify me ..." to decide if a notification email should be sent or not.

Nonetheless, appreciate the PR and feedback!

@debanjum debanjum merged commit 6e0c767 into khoj-ai:master Jan 11, 2025
6 checks passed
@AlipAbdullah
Copy link

AlipAbdullah commented Jan 14, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants