Skip to content

Ollama - failed to embed chunk #1836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
spaasis opened this issue Jul 26, 2024 · 6 comments
Closed
3 tasks done

Ollama - failed to embed chunk #1836

spaasis opened this issue Jul 26, 2024 · 6 comments
Labels
area:indexing Relates to embedding and indexing ide:jetbrains Relates specifically to JetBrains extension kind:bug Indicates an unexpected problem or unintended behavior

Comments

@spaasis
Copy link
Contributor

spaasis commented Jul 26, 2024

Before submitting your bug report

Relevant environment info

- OS: Windows 11
- Continue: 0.0.56
- IDE: JetBrains Rider 2024.1.4
- Model: ollama
- config.json:
  
  "models": [
    {
      "model": "codellama:latest",
      "title": "codellama",
      "completionOptions": {},
      "apiBase": "http://X.X.X.X:8080/ollama",
      "contextLength": 4000,
      "provider": "ollama",
      "requestOptions": {
        "headers": {
          "Authorization": "Bearer eyJ..."
        }
      }
    }
  ],
  "tabAutocompleteModel": {
    "disable": true,  
    "title": "Autocomplete",
    "provider": "ollama",
    "model": "starcoder2:3b",
      "apiBase": "http://X.X.X.X:8080/ollama",
      "requestOptions": {
          "headers": {
            "Authorization": "Bearer eyJ.."
          }
      }
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text:latest",
    "apiBase": "http://X.X.X.X:8080/ollama/",
    "requestOptions": {
      "headers": {
        "Authorization": "Bearer eyJ.."
      }
    }
  }

Description

Hi! I'm testing Continue integration to our Open WebUI instance https://github.com/open-webui/open-webui

I got all the other pieces working, but the API calls to /embeddings fail. See the log entry.

However, if I do the API call manually, it passes and returns embeddings:

curl -X 'POST' \
  'http://X.X.X.X:8080/ollama/api/embeddings' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer eyJ...' \
  -H 'Content-Type: application/json' \
  -d '{
  "model":"nomic-embed-text:latest",
  "prompt":"<chunk contents>"
}'

I tried to eye the source code to figure out what is different in the manual API call vs the one that Continue makes, but couldn't find a difference. Is there a debug log I could turn on to see the actual API calls?

Btw the apiBase configuration for embeddingsProvider requires a trailing / but the other model configurations don't ;) Before I spotted this I got a bunch of "method not allowed"-messages.

Let me know if I can help you debug further. Loving Continue so far!

To reproduce

Open any project and start indexing

Log output

[2024-07-26T08:36:23] Failed to generate embedding for <filename> with provider: OllamaEmbeddingsProvider::nomic-embed-text:latest: Error: Failed to embed chunk: {"detail":[{"type":"model_attributes_type","loc":["body"],"msg":"Input should be a valid dictionary or object to extract fields from",

"input":"{\"model\":\"nomic-embed-text:latest\",\"prompt\":\"<chunk contents>\"}"}]}
@dosubot dosubot bot added area:indexing Relates to embedding and indexing ide:jetbrains Relates specifically to JetBrains extension kind:bug Indicates an unexpected problem or unintended behavior labels Jul 26, 2024
@spaasis
Copy link
Contributor Author

spaasis commented Jul 26, 2024

To clarify - the question relates more to "what is the actual embedding API call sent" than debugging the error message from the API itself. Since the manual curl call works I believe there's just a slight difference in the sent data

@Patrick-Erichsen
Copy link
Collaborator

Patrick-Erichsen commented Jul 26, 2024

Good catch on the trailing slash! That's an Ollama specific issue since we're doing some additional URL construction. Pushed a fix to resolve that.

Thanks for verifying the curl works on your end. We don't have any debug logs for embeddings at the moment unfortunately. Your best bet would probably be to run Continue locally using our https://github.com/continuedev/continue/blob/main/CONTRIBUTING.md guidelines and set some debug breakpoints.

From the error message you have, does it seem like there is anything unusual in the chunk that is getting embeded?

@spaasis
Copy link
Contributor Author

spaasis commented Jul 27, 2024

I'll check to local run on Monday, but here's the raw (file name changed, but the slashed are as they were) logs and curl for one file embedding. It does seem that every single file fails, so I doubt it's related to the chunk contents:

Log:

[2024-07-27T08:08:13] Failed to generate embedding for D:/Code/Project\.editorconfig with provider: OllamaEmbeddingsProvider::nomic-embed-text:latest:
Error: Failed to embed chunk: {"detail":[{"type":"model_attributes_type","loc":["body"],"msg":"Input should be a valid dictionary or object to extract fields from",
"input":"{\"model\":\"nomic-embed-text:latest\",\"prompt\":\"# Remove the line below if you want to inherit .editorconfig settings from higher directories\\r\\nroot = true\\r\\n\\r\\n# All files\\r\\n[*]\\r\\ncharset = utf-8\\r\\n# indent_size intentionally not specified in this section.\\r\\nindent_style = space # Use soft tabs (spaces) for indentation.\\r\\ninsert_final_newline = false\\r\\ntrim_trailing_whitespace = true\\r\\n\\r\\n# ReSharper properties\\r\\nresharper_wrap_array_initializer_style = chop_if_long\\r\\nresharper_wrap_object_and_collection_initializer_style = chop_if_long\\r\\n\\r\\n# JSON files\\r\\n[*.json]\\r\\nindent_size = 2\\r\\n\\r\\n# Markdown files\\r\\n[*.md]\\r\\nindent_size = 2\\r\\ntrim_trailing_whitespace = false\\r\\n\\r\\n# PowerShell scripts\\r\\n[*.ps1]\\r\\nindent_size = 4\\r\\n\\r\\n[*.{xml,xsd}]\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\nindent_size = 2\\r\\n\\r\\n# Visual Studio XML project files\\r\\n[*.{csproj,vcxproj,vcxproj.filters,proj,projitems,shproj}]\\r\\nindent_size = 2\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\n\\r\\n# Visual Studio and .NET related XML config files\\r\\n[*.{props,targets,ruleset,config,nuspec,resx,vsixmanifest,vsct}]\\r\\nindent_size = 2\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\n\\r\\n# YAML files\\r\\n[*.{yml,yaml}]\\r\\nindent_size = 2\\r\\n\\r\\n# C# files\\r\\n[*.{cs,cshtml}]\\r\\n\"}"}]} 

Curl:

curl -X 'POST' \
  'http://sykeai:8080/ollama/api/embeddings' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer eyJ...' \
  -H 'Content-Type: application/json' \
  -d '{"model":"nomic-embed-text:latest","prompt":"# Remove the line below if you want to inherit .editorconfig settings from higher directories\\r\\nroot = true\\r\\n\\r\\n# All files\\r\\n[*]\\r\\ncharset = utf-8\\r\\n# indent_size intentionally not specified in this section.\\r\\nindent_style = space # Use soft tabs (spaces) for indentation.\\r\\ninsert_final_newline = false\\r\\ntrim_trailing_whitespace = true\\r\\n\\r\\n# ReSharper properties\\r\\nresharper_wrap_array_initializer_style = chop_if_long\\r\\nresharper_wrap_object_and_collection_initializer_style = chop_if_long\\r\\n\\r\\n# JSON files\\r\\n[*.json]\\r\\nindent_size = 2\\r\\n\\r\\n# Markdown files\\r\\n[*.md]\\r\\nindent_size = 2\\r\\ntrim_trailing_whitespace = false\\r\\n\\r\\n# PowerShell scripts\\r\\n[*.ps1]\\r\\nindent_size = 4\\r\\n\\r\\n[*.{xml,xsd}]\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\nindent_size = 2\\r\\n\\r\\n# Visual Studio XML project files\\r\\n[*.{csproj,vcxproj,vcxproj.filters,proj,projitems,shproj}]\\r\\nindent_size = 2\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\n\\r\\n# Visual Studio and .NET related XML config files\\r\\n[*.{props,targets,ruleset,config,nuspec,resx,vsixmanifest,vsct}]\\r\\nindent_size = 2\\r\\nmax_line_length = off\\r\\nend_of_line = lf\\r\\n\\r\\n# YAML files\\r\\n[*.{yml,yaml}]\\r\\nindent_size = 2\\r\\n\\r\\n# C# files\\r\\n[*.{cs,cshtml}]\\r\\n"}'

Curl response:

{
  "embedding": [
    1.3733816146850586,
    1.6329410076141357,
    -1.99997079372406,
    -0.805694580078125,
    -0.4809744954109192,
    ----
  ]
}

@smonoscr
Copy link

I had the same issue. I thought Content-Type: application/json is default and set automatically, but it seems this isn't the case. So, I added it and now it works.

I can see that you're also missing the Content-Type in your embeddingsProvider part in your config.json, but you included it in your curl request. Maybe thats the reason why your curl request works, but your config not.

@spaasis
Copy link
Contributor Author

spaasis commented Jul 28, 2024

That's it, thanks @simonoscr ! I also figured it was set automatically since the other configs didn't need it, but adding Content-Type fixed it.

Working config:

  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text:latest",
	"apiBase": "http://XX.XX.XX.XX:8080/ollama/",
	"requestOptions": {
	  "headers": {
	    "Authorization": "Bearer eyJ...",
		"Content-Type": "application/json"
	  }
	}
  },

I added #1855 to hopefully fix this by default

@spaasis spaasis closed this as completed Jul 29, 2024
@Patrick-Erichsen
Copy link
Collaborator

Thank you @spaasis and @simonoscr for seeing this through to completion! Appreciate the PR to fix the behavior for everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:indexing Relates to embedding and indexing ide:jetbrains Relates specifically to JetBrains extension kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants