Pre-Recorded Audio (STT)

Transcribe uploaded audio files with speaker diarization and multi-language support.

POST

Limited promo: 50% off (July & August 2026)

All speech-to-text model rates are half price through 31 August 2026. The pricing table below lists both the regular and promo rates.

Transcribe uploaded audio files with advanced features like speaker diarization and multi-language support. Our API follows OpenAI and Deepgram compatible patterns.

SDK integration

Use the official Deepgram SDK with our API endpoint.

import { createClient } from '@deepgram/sdk';

const client = createClient('YOUR_API_KEY', {
  global: {
    fetch: { options: { url: 'https://api.greenpt.ai/v1' } },
  },
});

from deepgram import DeepgramClient
from deepgram.environment import DeepgramClientEnvironment

# Create environment for GreenPT API
greenpt_env = DeepgramClientEnvironment(
    base="https://api.greenpt.ai",
    production="wss://api.greenpt.ai",
    agent="wss://api.greenpt.ai",
)

deepgram = DeepgramClient("YOUR_API_KEY", environment=greenpt_env)

Endpoint

POST https://api.greenpt.ai/v1/listen

Request body

Required and optional parameters.

Parameter	Type	Required	Description
`file`	binary	Yes	The audio file to transcribe (WAV, MP3, FLAC, etc.).
`model`	string	No	Speech model to use. Defaults to `"green-s"`.
`language`	string	No	Language code (e.g. `"en"`, `"fr"`, `"de"`). Defaults to `"en"` if not specified.
`diarize`	boolean	No	Deprecated. Enable speaker diarization (always uses `v1`). Prefer `diarize_model`.
`diarize_model`	string	No	Enables diarization and selects the model version: `"latest"` (currently v2), `"v2"`, or `"v1"`. Do not also set `diarize`.
`punctuate`	boolean	No	Add punctuation and capitalization to transcript.
`smart_format`	boolean	No	Apply formatting to transcript output for improved readability.
`filler_words`	boolean	No	Include filler words like "uh" and "um" in transcript.
`numerals`	boolean	No	Convert numbers from written format to numerical format.
`sentiment`	boolean	No	Analyze sentiment throughout the transcript.
`topics`	boolean	No	Detect topics throughout the transcript.
`intents`	boolean	No	Recognize speaker intent throughout the transcript.

Example request: local file

curl \
  --request POST \
  --header 'Authorization: Token YOUR_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.greenpt.ai/v1/listen?model=green-s&language=en&diarize_model=v2&punctuate=true'

Example request: URL / bucket

curl \
  --request POST \
  --header 'Authorization: Token YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{"url":"https://static.deepgram.com/examples/Bueller-Life-moves-pretty-fast.wav"}' \
  --url 'https://api.greenpt.ai/v1/listen?model=green-s&language=en&diarize_model=v2&punctuate=true'

Example response

{
  "metadata": {
    "request_id": "a847f427-4ad5-4d67-9b95-db801e58251c",
    "duration": 25.933313,
    "channels": 1,
    "created": "2024-05-12T18:57:13.426Z"
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "Hi, thanks for calling support. Yeah, my account is locked.",
            "confidence": 0.98,
            "words": [
              { "word": "hi", "start": 0.5, "end": 0.8, "confidence": 0.99, "speaker": 0 },
              { "word": "thanks", "start": 0.8, "end": 1.1, "confidence": 0.98, "speaker": 0 },
              { "word": "for", "start": 1.1, "end": 1.25, "confidence": 0.99, "speaker": 0 },
              { "word": "calling", "start": 1.25, "end": 1.6, "confidence": 0.98, "speaker": 0 },
              { "word": "support", "start": 1.6, "end": 2.1, "confidence": 0.97, "speaker": 0 },
              { "word": "yeah", "start": 2.4, "end": 2.7, "confidence": 0.98, "speaker": 1 },
              { "word": "my", "start": 2.7, "end": 2.85, "confidence": 0.99, "speaker": 1 },
              { "word": "account", "start": 2.85, "end": 3.2, "confidence": 0.98, "speaker": 1 },
              { "word": "is", "start": 3.2, "end": 3.35, "confidence": 0.99, "speaker": 1 },
              { "word": "locked", "start": 3.35, "end": 3.8, "confidence": 0.97, "speaker": 1 }
            ]
          }
        ]
      }
    ]
  }
}

Complete SDK example

Full working example with the Deepgram SDK.

import { createClient } from '@deepgram/sdk';
import fs from 'fs';

const client = createClient('YOUR_API_KEY', {
  global: {
    fetch: { options: { url: 'https://api.greenpt.ai/v1' } },
  },
});

async function transcribeLocalFile() {
  const { result, error } = await client.listen.prerecorded.transcribeFile(
    fs.readFileSync('path/to/audio.wav'),
    {
      model: 'green-s',
      language: 'en',
      punctuate: true,
      diarize_model: 'v2',
    },
  );

  if (error) throw error;
  console.log(result);
}

transcribeLocalFile();

from deepgram import DeepgramClient
from deepgram.environment import DeepgramClientEnvironment

# Create environment for GreenPT API
greenpt_env = DeepgramClientEnvironment(
    base="https://api.greenpt.ai",
    production="wss://api.greenpt.ai",
    agent="wss://api.greenpt.ai",
)

deepgram = DeepgramClient("YOUR_API_KEY", environment=greenpt_env)

# For URL-based transcription
response = deepgram.listen.v1.media.transcribe_url(
    url="https://static.deepgram.com/examples/Bueller-Life-moves-pretty-fast.wav",
    model="green-s",
    language="en",
    punctuate=True,
    diarize_model="v2",
    smart_format=True,
)

print(response)

Available models

Choose the model that fits your language and use case.

`green-s`: GreenS

Reliable speech-to-text for single-language audio. Great for recordings, podcasts, and archived content.

Supported languages: English, German, Spanish, French, Italian, Dutch, Portuguese, Romanian, Bulgarian, Catalan, Danish, Finnish, Swedish.

`green-s-pro`: GreenS Pro

Advanced model with automatic language detection. Ideal for international content and mixed-language recordings.

Supported languages: English, German, Dutch, Swedish, Turkish.

Multilingual: use multi for automatic language detection across languages in the same file.

Multilingual mode

Transcribe conversations where speakers switch between languages.

With green-s-pro, set language=multi to transcribe audio where multiple languages are spoken. The model automatically detects and transcribes each language as speakers switch.

Languages supported in multilingual mode: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch.

Price difference

Multilingual processing costs more than single-language. For pre-recorded: €0.28/hour vs €0.23/hour for monolingual.

Output differences

With language=multi, the response adds a languages array and a language field per word:

"alternatives": [{
  "transcript": "No recuerdo mi bank password.",
  "languages": ["es", "en"],
  "words": [
    { "word": "no", "language": "es" },
    { "word": "recuerdo", "language": "es" },
    { "word": "bank", "language": "en" }
  ]
}]

Speaker diarization

Diarization labels which speaker said each word. Use diarize_model to enable it and pick the model version in one parameter; you don't need to also set diarize:

Value	Description
`latest`	Newest generally available diarizer (currently v2).
`v2`	Diarization V2 with improved speaker attribution.
`v1`	Previous diarizer.

Diarization V2 improves speaker-attribution accuracy and reduces labeling errors. It works across all pre-recorded STT models (green-s, green-s-pro) and languages at no extra cost.

The legacy diarize=true flag still works but always uses v1. Do not set both diarize and diarize_model; requests that include both are rejected. Diarization V2 is not available for live streaming.

Minimum SDK version

To set diarize_model through the official Deepgram SDK, use @deepgram/sdk 5.3.0+ (Node.js) or deepgram-sdk 7.2.0+ (Python) — the releases that made batch diarization v2 generally available. Over plain HTTP, diarize_model is just a query parameter and works with any client.

Available features

Add-on capabilities for pre-recorded transcription.

Feature	Description	`green-s`	`green-s-pro`
Speaker diarization	Identify different speakers in the audio.	Yes	Yes
Entity detection	Detect names, dates, and other entities in the transcript.	Yes	Yes
Language detection	Auto-detect spoken language with `detect_language` or `multi` (off by default; defaults to `en`).	Yes	Yes
Profanity filter	Filter or mask profanity in the transcript.	Yes	Yes
Speech intent & topics	Detect topics and speaker intent.	Yes	Yes
Summarization	Generate a summary of the transcript (English recommended).	Yes	Yes
Smart formatting	Improved punctuation and readability (English only).	Yes	-

Pricing

Pre-recorded transcription rates per hour of audio.

New pricing, now live: these model rates are updated and in effect today. The add-on features below are a separate launch promo and free for now.

Model	Regular rate	Promo (Jul–Aug 2026, −50%)
`green-s`: all supported languages	€0.23 / hour	€0.12 / hour
`green-s-pro`: monolingual	€0.23 / hour	€0.12 / hour
`green-s-pro`: multilingual	€0.28 / hour	€0.14 / hour

All prices in EUR, excl. taxes. green-s and green-s-pro monolingual share the same pre-recorded rate. Choose green-s-pro for multilingual mode or additional language options.

Additional features

Launch promo: these add-ons are free right now. The prices shown are the standard per-hour rates that apply once the promo ends; you are not charged for them today.

Feature	Rate (promo: free now)
Redaction	€0.10 / hour
Entity detection	€0.08 / hour
Keyterm prompting	€0.07 / hour

Speaker diarization is included free for pre-recorded audio. Multichannel audio is billed per channel, so 2-channel audio is charged at double the per-hour rate.

Pre-Recorded Audio (STT)

On this page