Create Transcription

POST /v1/audio/transcriptions

Transcribes audio into text. Supports multiple models and response formats.

Request

Content-Type: multipart/form-data

Parameter	Type	Required	Description
`file`	file	Yes	Audio file (max 25MB). Formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
`model`	string	Yes	Model ID: `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, `gpt-4o-transcribe-diarize`, `groq/whisper-large-v3`, `groq/whisper-large-v3-turbo`
`language`	string	No	ISO-639-1 language code (e.g., `en`, `es`)
`response_format`	string	No	`json` (default), `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	No	0 to 1. Default 0.
`prompt`	string	No	Guide transcription style
`timestamp_granularities[]`	string[]	No	Array of timestamp granularities: `segment`, `word`. Only available with `verbose_json` response format.
`stream`	string	No	`true` for streaming SSE response. Only `gpt-4o-transcribe` and `gpt-4o-mini-transcribe`.

Example

curl -X POST "https://api.osmapi.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OSM_API_KEY" \
  -F file=@audio.mp3 \
  -F model=whisper-1

Response

{
  "text": "Hello, this is a transcription test."
}

Response Headers

Header	Description
`x-request-id`	Unique request identifier
`x-osm-response-cost`	Request cost in USD