What I want to do is a CLI program that outputs embeddings of an arbitrary input.
To do that, I want to do an inference with an embeddings model, and I chose NV-Embed-v2
. My framework of choice is Candle, but I also looked at Mistral-RS.
Basically, what I'm trying to do is this code fragment: https://huggingface.co/nvidia/NV-Embed-v2 but with Rust and Candle.
What I tried is to start off with Mistral Candle's example because the NV-Embed's HF page says: Model Details / Base Decoder-only LLM: Mistral-7B-v0.1
.
I replaced the model id in the original code with nvidia/NV-Embed-v2
, and was able to download the weights from Hugging Face, but upon loading the config, I got this:
Error: missing field `vocab_size` at line 101 column 1
Then I hardcoded the values from the JSON config loaded from HF to a newly created candle_transformers::models::mistral::Config
instance. And after that, Mistral::new(&config, vb)
fails with:
Error: cannot find tensor model.embed_tokens.weight
Is there a way around that — maybe there are some other Candle-based open source works that I could use as an inspiration? Or, maybe that's a common mistake that could easily be diagnosed?