Skip to main content
deAPI gives you a unified API across multiple open-source models running on a decentralized GPU cloud. Models evolve frequently; always rely on the live list (endpoint at the bottom) for the current availability and slugs.

How model selection works

  • Every task requires a model parameter. Example: model: "Flux1schnell" for Text-to-Image or model: "WhisperLargeV3" for speech/video transcription.
  • Display names vs. API slugs. In tables and UI we show human-friendly names (e.g., “FLUX.1-schnell”). The API accepts stable slugs. Use the Models endpoint to fetch the exact slug strings.
  • Quality ↔ Speed trade-off. Larger models often yield higher quality but cost more and take longer. Use our Price Calculator on the homepage to estimate cost before running large jobs.
  • Versioning & lifecycle. Models may be updated, superseded, or deprecated. Your application should resolve the model slug at runtime (from the live list) or pin to a specific version string if reproducibility is critical.
  • Safety & acceptable use. Follow the Terms of Service. Some content types may be blocked or filtered. See the Safety section in each task’s docs.

Supported tasks & models (curated)

The table below lists the core, production-ready models available today. For the authoritative list (including experimental or newly added ones), use the Models endpoint.
Service (Task)Short SummaryWhat it’s forDisplay NameAPI Model SlugTask key(s)
Text-to-ImageGenerate images from textConcept art, prototyping, creative explorationFLUX.1-schnellFlux1schnelltxt2img
Text-to-ImageGenerate images from textGenerate stunningly photorealistic visuals with exceptional clarity and high-fidelity details.Z-Image-Turbo INT8ZImageTurbo_INT8txt2img
Text-to-SpeechTurn text into natural voiceNarration, accessibility, product voices in multiple languages and tonesKokoro-82MKokorotxt2audio
Video-to-TextTranscribe video into textTranscribe YouTube, Twitch VODs, Kick streams, X/Twitter videos — subtitles, captions, SEOWhisper large-v3WhisperLargeV3video2txt
Image/Text-to-VideoGenerate short AI videosCinematic motion, transitions, stylizationLTX Video-0.9.8 13BLtxv_13B_0_9_8_Distilled_FP8img2video, txt2video
Image-to-TextExtract meaning from imagesDescriptions, OCR, accessibility, moderationNanonets Ocr S F16Nanonets_Ocr_S_F16img2txt
Audio-to-TextConvert audio into textSubtitles, notes, search, accessibility (multi-language)Whisper large-v3WhisperLargeV3video2txt
Text-to-EmbeddingCreate vector embeddingsSearch, RAG, semantic similarity, clusteringBGE M3Bge_M3_FP16txt2embedding
Image-to-ImageTransform existing imagesStyle transfer, edits, in/outpaintingQwenImageEdit-Plus (NF4)QwenImageEdit_Plus_NF4img2img
Image Background RemovalRemove background from imagesProduct photos, portraits, cutouts, compositing, e-commerce assetsBen2Ben2img-rmbg
Display names and example slugs above are provided for clarity; always fetch the live model list for the exact slug strings currently enabled on the network.

Resolution requirements

Models that generate images or video require dimensions (width, height) divisible by a specific step value. If you provide dimensions that don’t match, the API will automatically round them up to the nearest valid value.
ModelTaskResolution StepExample valid sizes
Flux1schnelltxt2img128512, 640, 768, 896, 1024
ZImageTurbo_INT8txt2img16512, 528, 544, 768, 1024
Ltxv_13B_0_9_8_Distilled_FP8txt2video, img2video16256, 512, 768
For example, if you request width: 500 with Flux1schnell, the output will be 512px wide (rounded up to nearest multiple of 128).

Picking the right model

  • Image Generation (Text-to-Image): Start with Flux1schnell or ZImageTurbo_INT8 for fast iteration. Increase steps/resolution for quality; control style via prompt and (optionally) LoRAs.
  • Speech Generation (TTS): Kokoro offers natural prosody across multiple languages/voices. Choose voices/language in the payload; test speed vs. quality for your use case.
  • Text Transcription (Video/Audio-to-Text): WhisperLargeV3 is a strong general-purpose baseline with robust multilingual support. Supports direct URL transcription from YouTube, Twitch, Kick, and X/Twitter — including VODs and finished stream recordings. For long videos, enable timestamps and chunking.
  • Text Recognition (Image-to-Text): Nanonets_Ocr_S_F16 targets clear captions and text extraction. For complex layouts, consider multiple passes or post-processing.
  • Video Generation (Image-to-Video / Text-to-Video): Ltxv_13B_0_9_8_Distilled_FP8 is suited for short clips and stylized motion. Start with low duration/frames to validate aesthetics, then scale up.
  • Embedding (Text-to-Embedding): Bge_M3_FP16 provides dense vector embeddings for semantic search, clustering, and retrieval-augmented generation (RAG). Use for similarity queries or knowledge base indexing.
  • Image Transformation (Image-to-Image): For edits and style transfer, start with QwenImageEdit_Plus_NF4. Use fewer steps for quick drafts and increase steps for higher fidelity; combine with masks or control prompts for targeted changes.
  • Background Removal (Image Background Removal): Ben2 provides fast, high-quality background removal for images up to 2048×2048px. Ideal for product photos, portraits, and generating transparent PNGs for downstream compositing or design work.
For a deeper discussion of strengths/limits and typical parameter sets per task, see Model Selection.

API usage examples

1) List models (fetch slugs at runtime)
curl -X GET "https://api.deapi.ai/api/v1/client/models" \
  -H "Authorization: Bearer $DEAPI_API_KEY"
2) Use a model in Text-to-Image
curl -X POST "https://api.deapi.ai/api/v1/client/txt2img" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "prompt": "isometric cozy cabin at dusk, soft rim light, artstation trending",
    "model": "Flux1schnell",
    "width": 768,
    "height": 768,
    "steps": 4,
    "guidance": 0,
    "seed": 12345,
    "loras": []
  }'
Flux1schnell has max_steps: 10 and does not support guidance (guidance must be 0). Always check model limits via the Models endpoint.
3) Use a model in Video-to-Text (YouTube)
curl -X POST "https://api.deapi.ai/api/v1/client/vid2txt" \
  -H "Authorization: Bearer $DEAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "include_ts": true,
    "model": "WhisperLargeV3"
  }'
Jobs return a request_id. Poll results with GET /api/v1/client/request-status/{request_id}

Best practices

  • Resolve the model list dynamically. Don’t hardcode slugs—fetch once at startup or periodically.
  • Pin versions for reproducibility. If you need bit-for-bit repeats (e.g., in T2I), pin model version + set seed.
  • Budget before scaling. Larger models and higher resolution/steps cost more—use the calculator on the homepage.
  • Handle deprecation. Implement a fallback path if a model becomes unavailable (e.g., switch to a recommended successor).

  • Model Selection: how to choose the best model per task and budget.
  • Endpoints: Text-to-Image, Text-to-Speech, Image-to-Text (OCR), Image Background Removal, Image-to-Image, Video-to-Text, Audio-to-Text, Image-to-Video, Text-to-Video, Upload Video File, Upload Audio File, Text-to-Embedding, Get Results, Check Balance.

Live models (API)

Use the endpoint below to retrieve the current, authoritative list of models (with slug strings to use in requests):
GET https://api.deapi.ai/api/v1/client/models
The response includes stable slug values to pass as the model parameter in any task endpoint.