Skip to main content
POST
/
api
/
v1
/
client
/
txt2audio
cURL
curl --request POST \
  --url https://api.deapi.ai/api/v1/client/txt2audio \
  --header 'Accept: <accept>' \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'text=A beautiful sunset over mountains' \
  --form model=Kokoro \
  --form lang=en-us \
  --form speed=1 \
  --form format=flac \
  --form sample_rate=24000 \
  --form mode=custom_voice \
  --form voice=af_sky \
  --form 'ref_audio=<string>' \
  --form 'ref_text=<string>' \
  --form 'instruct=<string>' \
  --form webhook_url=https://your-server.com/webhooks/deapi \
  --form ref_audio.0='@example-file' \
  --form ref_audio.1='@example-file'
{
  "data": {
    "request_id": "c08a339c-73e5-4d67-a4d5-231302fbff9a"
  }
}
Text-to-Speech converts text into natural-sounding audio. The endpoint supports three TTS modes via the mode parameter:
  • custom_voice (default) — Use a preset voice from the model’s voice library. Requires the voice parameter.
  • voice_clone — Clone a voice from a short reference audio clip. Requires the ref_audio parameter (3–10 seconds, max 10 MB). Optionally provide ref_text with a transcript of the reference audio for improved accuracy.
  • voice_design — Create a new voice from a natural language description. Requires the instruct parameter (e.g. "A warm female voice with a British accent").
Prerequisite: To ensure a successful request, you must first consult the Model Selection endpoint to identify a valid model slug, check specific limits and features, and verify available languages and voices.
Mode-specific required fields:
  • custom_voicevoice is required.
  • voice_cloneref_audio is required. ref_text is optional but recommended.
  • voice_designinstruct is required.
If mode is omitted, the API defaults to custom_voice.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

Accept
enum<string>
default:application/json
required
Available options:
application/json

Body

multipart/form-data

Audio generation parameters. Supports three TTS modes: custom_voice (default, preset speakers), voice_clone (clone from reference audio), voice_design (create voice from description).

text
string
required

Text to be converted to speech

Example:

"A beautiful sunset over mountains"

model
string
required

The model to use for speech generation. Available models can be retrieved via the GET /api/v1/client/models endpoint.

Example:

"Kokoro"

lang
string
required

Language to be used during audio generation

Example:

"en-us"

speed
number
required

Generated audio speech speed

Example:

1

format
string
required

Audio output format

Example:

"flac"

sample_rate
number
required

Sample rate of generated audio

Example:

24000

mode
enum<string> | null

TTS mode: custom_voice (default), voice_clone, or voice_design. Determines which fields are required.

Available options:
custom_voice,
voice_clone,
voice_design
Example:

"custom_voice"

voice
string | null

Name of the voice to be used. Required for custom_voice mode.

Example:

"af_sky"

ref_audio
file | null

Reference audio file for voice cloning. Supported formats: mp3, wav, flac, ogg, m4a. Max 10MB. Duration must be between 3-10 seconds (model-specific limits may apply). Required for voice_clone mode.

ref_text
string | null

Optional transcript of the reference audio for improved voice cloning accuracy.

instruct
string | null

Natural language voice description for voice_design mode (e.g. "A warm female voice with a British accent"), or style/emotion control in custom_voice mode.

webhook_url
string<uri> | null

Optional HTTPS URL to receive webhook notifications for job status changes (processing, completed, failed). Must be HTTPS. Max 2048 characters. See Webhook Documentation for payload structure and authentication details.

Maximum string length: 2048
Example:

"https://your-server.com/webhooks/deapi"

Response

ID of the inference request.

data
object

Information from success endpoint