Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture #12466

manyoso · 2025-03-19T13:45:32Z

Adds MoE-based embedding model supporting multilingual embeddings.
Selects architecture variant based on hyperparameter detection (MoE layers).
Removes unnecessary subclass initialization checks for clarity.

https://www.nomic.ai/blog/posts/nomic-embed-text-v2

Make sure to read the contributing guidelines before submitting a PR

convert_hf_to_gguf.py

- Adds MoE-based embedding model supporting multilingual embeddings. - Selects architecture variant based on hyperparameter detection (MoE layers). - Removes unnecessary subclass initialization checks for clarity. https://www.nomic.ai/blog/posts/nomic-embed-text-v2 Co-authored-by: Jared Van Bortel <[email protected]>

cebtenzzre · 2025-04-23T20:07:43Z

The MoE model is now using the correct tokenizer (XLMRoberta), and norm_w is now correctly set to false. Getting an MSE of about 6e-7 compared to the HF embeddings with a simple prompt, so the implementation should be ready to use.

anudit · 2025-04-28T04:33:11Z

Just tested this with https://huggingface.co/anudit/nomic-embed-text-v2-moe-gguf/blob/main/nomic-xlm-8x277M-2048-F16.gguf

ngxson

Looks good overall, just a small comment

ngxson · 2025-04-28T08:52:50Z

src/llama-graph.cpp

@@ -907,31 +907,38 @@ ggml_tensor * llm_graph_context::build_moe_ffn(
        cb(cur, "ffn_moe_weighted", il);
    }

-    ggml_tensor * up = build_lora_mm_id(up_exps, cur, selected_experts); // [n_ff, n_expert_used, n_tokens]
-    cb(up, "ffn_moe_up", il);
+    ggml_tensor * tmp = build_lora_mm_id(up_exps, cur, selected_experts); // [n_ff, n_expert_used, n_tokens]


I think we can still call this up, right? There is no other places where we re-assign another value for tmp

The only reason to call it tmp would be that that's what (non-moe) build_ffn calls it, which makes it easier to compare the two functions. In that function, up, down, and gate refer to weight tensors, and not output tensors. But up is fine here.

ggerganov

Based on the documentation, I figured the following usage example for the search_document instruction:

llama-server \
  -m models/nomic-embed-text-v2-moe/ggml-model-f16.gguf \
  --embeddings

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": ["search_document: Hello!", "search_document: ¡Hola!", "search_document: Goodbye"]}' | jq

Is this correct?

Could you also show an example of how the search_query is to be used?

cebtenzzre · 2025-04-28T18:39:40Z

Is this correct?

Yes, that's the basic way the prefixes should be used.

Could you also show an example of how the search_query is to be used?

It's a prefix like search_document but typically used to embed the query in a retrieval pipeline. This is easier to demonstrate in Python:

import requests
def dot(va, vb):
    return sum(a*b for a, b in zip(va, vb))
def embed(texts):
    resp = requests.post('http://localhost:8080/v1/embeddings', json=dict(input=texts)).json()
    return [d['embedding'] for d in resp['data']]

docs = ['嵌入很酷', '骆驼很酷']  # 'embeddings are cool', 'llamas are cool'
docs_embed = embed(['search_document: '+d for d in docs])

query = '跟我讲讲嵌入'  # 'tell me about embeddings'
query_embed = embed(['search_query: '+query])[0]
print(f'query: {query!r}')
for d, e in zip(docs, docs_embed):
    print(f'similarity {dot(query_embed, e):.2f}: {d!r}')

Output:

query: '跟我讲讲嵌入'
similarity 0.48: '嵌入很酷'
similarity 0.19: '骆驼很酷'

search_query is used with a query to retrieve texts that help inform the response (RAG). The query should prefixed with search_document instead when the goal is to find the most semantically similar text.

ggerganov · 2025-04-30T15:25:34Z

@cebtenzzre The readme at https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe says that the max sequence length is 512:

Is this correct, or is it 2048 as specified in the model configuration?

github-actions bot added the python python script changes label Mar 19, 2025

manyoso marked this pull request as draft March 19, 2025 13:46

manyoso marked this pull request as ready for review March 19, 2025 15:54

ngxson reviewed Mar 19, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

ngxson reviewed Mar 19, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

manyoso marked this pull request as draft March 19, 2025 17:58

This comment was marked as resolved.

Sign in to view

manyoso and others added 2 commits April 23, 2025 16:02

fix tokenizer

e07039b

cebtenzzre force-pushed the nomic_embed_v2 branch from 3baf094 to e07039b Compare April 23, 2025 20:04

cebtenzzre marked this pull request as ready for review April 23, 2025 20:04

cebtenzzre requested a review from ngxson April 23, 2025 20:04

Terramoto mentioned this pull request Apr 24, 2025

Nomic Embed text v2 ollama/ollama#9340

Closed

ngxson approved these changes Apr 28, 2025

View reviewed changes

don't rename this tensor

60524f4

ggerganov approved these changes Apr 28, 2025

View reviewed changes

ggerganov merged commit 5f5e39e into ggml-org:master Apr 28, 2025
41 of 51 checks passed

cebtenzzre mentioned this pull request Apr 30, 2025

convert : fix context length for nomic-embed-text-v2-moe #13216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture #12466

Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture #12466

Uh oh!

manyoso commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

cebtenzzre commented Apr 23, 2025

Uh oh!

anudit commented Apr 28, 2025 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

ngxson Apr 28, 2025

Uh oh!

cebtenzzre Apr 28, 2025 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

cebtenzzre commented Apr 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

ggerganov commented Apr 30, 2025

Uh oh!

Uh oh!

Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture #12466

Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture #12466

Uh oh!

Conversation

manyoso commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

cebtenzzre commented Apr 23, 2025

Uh oh!

anudit commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

cebtenzzre Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

cebtenzzre commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Apr 30, 2025

Uh oh!

Uh oh!

anudit commented Apr 28, 2025 •

edited

Loading

cebtenzzre Apr 28, 2025 •

edited

Loading

cebtenzzre commented Apr 28, 2025 •

edited

Loading