Endpoint for requesting text to embedding inference
slug, check specific limits and features, and verify LoRA availability.Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
application/json Text to embedding conversion parameters
Input text(s) to generate embeddings for. Can be a single string or array of strings (max 2048 items). Each input limited to 8192 tokens, total request limited to 300k tokens.
"This is a sample text for embedding generation."
The embedding model to use. Available models can be retrieved via the GET /api/v1/client/models endpoint.
"Bge_M3_FP16"
If true, the embedding result will be returned directly in the response instead of only download url. Optional parameter.
false
ID of the inference request.
Information from success endpoint