Fetching the live model list
Before you start building, call the models endpoint to discover what’s available:data array of model objects. Each object contains:
| Field | Description |
|---|---|
name | Human-friendly display name (for UI only — never send this to the API). |
slug | The exact string to pass as the model parameter in any task endpoint. |
inference_types | Array of task types this model supports (e.g. ["txt2img"], ["img2video", "txt2video"]). |
info.limits | Min/max constraints for parameters like width, height, steps, frames, fps, etc. Fields vary per model type. |
info.features | Capability flags — e.g. supports_guidance, supports_negative_prompt, supports_steps, supports_last_frame. Not every model includes this field; some return it as an empty object. |
info.defaults | Recommended default values for each parameter. Not every model includes this field. |
loras | Array of available LoRA adapters (display_name + name). Present only on models that support LoRAs. |
languages | Array of supported languages, each with available voices. Present only on speech models. |
Not every model returns all fields. For example, a transcription model may return an empty
info, while an image-generation model will include detailed limits, features, and defaults. Always check for the presence of fields before using them.filter[inference_types] query parameter:
How model selection works
- Every task endpoint requires a model parameter. Pass the
slugvalue returned by the models endpoint — not the display name. - Quality ↔ Speed trade-off. Larger models often yield higher quality but cost more and take longer. See the Pricing page for per-task rates.
- Versioning & lifecycle. Models may be updated, superseded, or deprecated. Re-fetch the model list periodically to stay current.
Supported tasks
The table below shows which task types deAPI supports. To see which models are currently available for a given task, query the models endpoint with the correspondingfilter[inference_types] value.
| Task | inference_types value | What it does |
|---|---|---|
| Text-to-Image | txt2img | Generate images from text prompts — concept art, prototyping, creative exploration. |
| Image-to-Image | img2img | Transform existing images — style transfer, edits, inpainting, outpainting. |
| Text-to-Speech | txt2audio | Turn text into natural voice — narration, accessibility, product voices. Supports voice cloning and voice design. |
| Text-to-Music | txt2music | Generate music tracks from text — background music, jingles, songs with vocals. |
| Video-to-Text | video2text | Transcribe video by URL (YouTube, Twitch, Kick, TikTok, X) into text. |
| Audio-to-Text | audio2text | Transcribe audio by URL into text — subtitles, notes, search, accessibility. |
| Video File-to-Text | video_file2text | Transcribe an uploaded video file into text. |
| Audio File-to-Text | audio_file2text | Transcribe an uploaded audio file into text. |
| Image-to-Text (OCR) | img2txt | Extract text and meaning from images — OCR, descriptions, accessibility. |
| Text-to-Video | txt2video | Generate short AI video clips from a text prompt. |
| Image-to-Video | img2video | Animate a still image into a short video clip. |
| Text-to-Embedding | txt2embedding | Create vector embeddings — search, RAG, semantic similarity, clustering. |
| Background Removal | img-rmbg | Remove background from images — product photos, portraits, compositing. |
| Image Upscale | img-upscale | Upscale images to higher resolution. |
Some models support multiple tasks (e.g. both
txt2img and img2img). The models endpoint will list all supported inference_types for each model.Choosing the right model
When the models endpoint returns several options for the same task, use these guidelines: Image generation — Start with the fastest model for iteration. Increase steps and resolution for final quality. If the model object includes a non-emptyloras array, you can use LoRA adapters for style control.
Speech generation (TTS) — Check the model’s languages array for available languages and voices. Use info.defaults for recommended speed and format settings. The endpoint supports three modes: custom_voice (preset speakers), voice_clone (clone from reference audio), and voice_design (create voice from a text description). Not all models support all modes — check model capabilities before selecting a mode.
Music generation — Provide a text description (caption) of the desired music style. Optionally include lyrics (use "[Instrumental]" for instrumental tracks), bpm, keyscale, and timesignature to fine-tune the output. Check info.limits for supported duration range and inference steps. Use fewer steps with turbo models (e.g. 8) and more steps with base models (e.g. 32+).
Transcription (Video/Audio-to-Text) — Transcription models support both URL-based and file-upload transcription. For long content, enable timestamps (include_ts: true). URL-based transcription works with YouTube, Twitch, Kick, and X/Twitter.
OCR (Image-to-Text) — Check info.limits for the maximum supported image dimensions. For complex layouts, consider multiple passes or post-processing.
Video generation — Start with low frame counts to validate aesthetics, then scale up. Check info.limits.max_frames, min_frames, and max_fps for each model. Some models support a last_frame feature (see info.features.supports_last_frame).
Embeddings — Check info.limits.max_input_tokens and max_total_tokens for batch sizing. Use for semantic search, clustering, and retrieval-augmented generation (RAG).
Background removal — Check info.limits.max_width and max_height for the maximum supported resolution.
Image upscale — Check info.limits for input size constraints.
Parameter limits & resolution rules
Each model defines its own limits in theinfo.limits object. These limits vary between models and task types. Common fields include:
- Dimensions:
min_width,max_width,min_height,max_height, and (for image models)resolution_step— the value that width/height must be divisible by. - Steps:
min_steps,max_steps— how many inference steps the model supports. - Video-specific:
min_frames,max_frames,min_fps,max_fps. - Text-specific:
max_input_tokens,max_total_tokens(for embedding models),min_text,max_text(for speech models).
width and height to a multiple of the model’s resolution_step before sending the request.
Some models do not support guidance. Check info.features.supports_guidance — if it’s false, do not send a guidance value, or set it to 0.
API usage examples
1. Discover models for your taskrequest_id. Poll results with GET /api/v1/client/request-status/{request_id}.
Best practices
- Resolve models during integration. Fetch the model list when building your integration and re-fetch periodically (e.g. daily or on deployment) to stay current. There’s no need to call it on every request — the list doesn’t change that often.
- Respect info.limits and info.defaults. Use the returned defaults as a starting point. Stay within min/max boundaries to avoid unexpected rounding or errors. Note that some required fields (like
seedfor image generation) may not be listed in the model response — refer to the task endpoint docs for the full set of required parameters. - Pin slugs only when you need reproducibility. If you need consistent results across calls, keep the same slug and seed. But check the model list periodically — a slug may be retired and replaced.
- Budget before scaling. Larger models and higher resolution/steps cost more — see the Pricing page for per-task rates.
- Handle deprecation gracefully. If a model returns an error, re-fetch the model list and switch to a suitable alternative.
For AI agents & LLMs
If you are an AI agent, MCP client, or LLM integrating with deAPI:- Call
GET /api/v1/client/modelsat the start of your session to get the current model list. Do not rely on model slugs from training data, cached documentation, or prior conversations — they may be outdated. - Use
filter[inference_types]to narrow down to the task you need (e.g.txt2img,txt2audio). - Read
info.limitsandinfo.defaultsfrom the response to construct valid request parameters. Also consult the task endpoint docs for required fields that may not appear in the model response (e.g.seedfor image generation). - Pass the
slugfield (notname) as themodelparameter in task endpoints. - If a model slug returns an error, re-fetch the model list — the model may have been deprecated or replaced.
Related docs
- Model Selection endpoint — the live API spec for fetching models.
- Pricing — cost per task and model tier.
- Execution Modes — sync, async, webhooks, WebSockets.