New: Audio API, Embeddings & Realtime WebSocket now available!
osmAPI LogoosmAPI

Create Transcription

Create Transcription

POST /v1/audio/transcriptions

Transcribes audio into text. Supports multiple models and response formats.

Request

Content-Type: multipart/form-data

ParameterTypeRequiredDescription
filefileYesAudio file (max 25MB). Formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
modelstringYesModel ID: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize, groq/whisper-large-v3, groq/whisper-large-v3-turbo
languagestringNoISO-639-1 language code (e.g., en, es)
response_formatstringNojson (default), text, srt, verbose_json, vtt
temperaturenumberNo0 to 1. Default 0.
promptstringNoGuide transcription style
timestamp_granularities[]string[]NoArray of timestamp granularities: segment, word. Only available with verbose_json response format.
streamstringNotrue for streaming SSE response. Only gpt-4o-transcribe and gpt-4o-mini-transcribe.

Example

curl -X POST "https://api.osmapi.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -F file=@audio.mp3 \
  -F model=whisper-1

Response

{
  "text": "Hello, this is a transcription test."
}

Response Headers

HeaderDescription
x-request-idUnique request identifier
x-osm-response-costRequest cost in USD

How is this guide?