Audio/Video Transcription
Unified transcription — auto-detects URL or file upload for audio and video sources (YouTube, X, Twitch, Kick, TikTok, X/Twitter Spaces).
vid2txt, aud2txt, videofile2txt, audiofile2txt, transcribe). Accepts URLs or multipart file uploads. Returns a request_id for status polling.
video_url / audio_url, or upload a file via multipart form-data.slug and check supported languages.Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
application/json Body
Transcription parameters. Provide exactly one of source_url or source_file.
Should transcription include timestamps
The model to use for transcription. Available models can be retrieved via the GET /api/v1/client/models endpoint.
"WhisperLargeV3"
URL of video/audio to transcribe (YouTube, Twitter/X, Twitch, Kick, TikTok, Twitter Spaces). Mutually exclusive with source_file.
"https://www.youtube.com/watch?v=jNQXAC9IVRw"
Audio or video file to transcribe. Supported audio: aac, mpeg, ogg, wav, webm, flac (max 20 MB). Supported video: mp4, mpeg, quicktime, avi, wmv, ogg (max 50 MB). Total request body is capped at 75 MB. Mutually exclusive with source_url.
If true, the result will be returned directly in the response instead of only download url.
Optional HTTPS URL to receive webhook notifications for job status changes.
2048"https://your-server.com/webhooks/deapi"
Response
ID of the inference request.
Information from success endpoint