Live Audio Streaming (STT)

Real-time speech-to-text transcription using WebSocket connections for live audio streams.

WebSocket

Limited promo: 50% off (July & August 2026)

All speech-to-text model rates are half price through 31 August 2026. The pricing table below lists both the regular and promo rates.

Real-time speech-to-text transcription using WebSocket connections for live audio streams. Our API follows OpenAI and Deepgram compatible patterns.

SDK integration

Use the official Deepgram SDK with our WebSocket endpoint.

import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';

const client = createClient('YOUR_API_KEY', {
  global: {
    websocket: { options: { url: 'https://api.greenpt.ai/v1' } },
  },
});

// Setup live transcription
const connection = client.listen.live({
  model: 'green-s',
  language: 'en',
  smart_format: true,
});

connection.on(LiveTranscriptionEvents.Transcript, (data) => {
  console.log(data.channel.alternatives[0].transcript);
});

connection.on(LiveTranscriptionEvents.Open, () => {
  // Send audio data when connection is open
  connection.send(audioData);
});

from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from deepgram.environment import DeepgramClientEnvironment

# Create environment for GreenPT API
greenpt_env = DeepgramClientEnvironment(
    base="https://api.greenpt.ai",
    production="wss://api.greenpt.ai",
    agent="wss://api.greenpt.ai",
)

deepgram = DeepgramClient("YOUR_API_KEY", environment=greenpt_env)
dg_connection = deepgram.listen.websocket.v("1")

def on_message(self, result, **kwargs):
    sentence = result.channel.alternatives[0].transcript
    if len(sentence) == 0:
        return
    print(f"speaker: {sentence}")

dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)

options = LiveOptions(
    model="green-s",
    language="en",
    interim_results=True,
    diarize=True,
    smart_format=True,
)

dg_connection.start(options)

WebSocket endpoint

wss://api.greenpt.ai/v1/listen

Handshake

Parameters for the WebSocket connection.

Parameter	Type	Required	Description
`Authorization`	header	Yes	API key for authentication. Format: `Token YOUR_API_KEY`.
`encoding`	query	No	Audio encoding format (e.g. `linear16`, `opus`).
`sample_rate`	query	No	Sample rate of audio (e.g. `16000`, `24000`).
`language`	query	No	Language code (e.g. `en`, `es`, `fr`). Defaults to `en` if not specified.
`interim_results`	query	No	Receive partial transcription results as audio is processed.
`diarize`	query	No	Enable speaker diarization to identify different speakers.
`punctuate`	query	No	Add punctuation and capitalization to transcript.
`smart_format`	query	No	Apply formatting to transcript output for improved readability.
`vad_events`	query	No	Enable voice activity detection events.

Connection example

const ws = new WebSocket(
  'wss://api.greenpt.ai/v1/listen?encoding=linear16&sample_rate=16000&language=en&interim_results=true',
);

ws.onopen = () => {
  console.log('WebSocket connected');
  // Start sending audio data
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'Results') {
    console.log('Transcript:', data.channel.alternatives[0].transcript);
    console.log('Is final:', data.is_final);
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

Send audio

How to send audio data and control the stream.

// Convert audio buffer to base64 and send
function sendAudioChunk(audioBuffer) {
  const base64Audio = btoa(
    String.fromCharCode(...new Uint8Array(audioBuffer)),
  );
  ws.send(base64Audio);
}

// Close the stream when done
function closeStream() {
  ws.send(
    JSON.stringify({
      type: 'CloseStream',
    }),
  );
}

// Keep connection alive
function keepAlive() {
  ws.send(
    JSON.stringify({
      type: 'KeepAlive',
    }),
  );
}

Receive transcription

Example response format with detailed information.

{
  "type": "Results",
  "channel": {
    "alternatives": [{
      "confidence": 0.98,
      "transcript": "Hello, world! Welcome to GreenPT!",
      "words": [{
        "confidence": 0.99,
        "end": 0.5,
        "punctuated_word": "Hello,",
        "start": 0.1,
        "word": "hello"
      }, {
        "confidence": 0.98,
        "end": 0.8,
        "punctuated_word": "world!",
        "start": 0.6,
        "word": "world"
      }]
    }]
  },
  "duration": 2,
  "is_final": true,
  "metadata": {
    "model_info": {
      "name": "nova-2",
      "version": "1.0.0"
    },
    "request_id": "987fcdeb-51a2-43b7-91e4-c95bafcda21a"
  },
  "start": 0,
  "speech_final": true
}

Complete SDK example

Full working example with the Deepgram SDK.

import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
import fetch from 'cross-fetch';

const url = 'YOUR_LIVESTREAM_URL';

const client = createClient('YOUR_API_KEY', {
  global: {
    websocket: { options: { url: 'wss://api.greenpt.ai/v1' } },
  },
});

// Setup live transcription
const connection = client.listen.live({
  model: 'green-s',
  language: 'en',
  smart_format: true,
});

// Listen for events from the live transcription connection
connection.on(LiveTranscriptionEvents.Open, () => {
  connection.on(LiveTranscriptionEvents.Close, () => {
    console.log('Connection closed.');
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    console.log(data.channel.alternatives[0].transcript);
  });

  connection.on(LiveTranscriptionEvents.Metadata, (data) => {
    console.log(data);
  });

  connection.on(LiveTranscriptionEvents.Error, (err) => {
    console.error(err);
  });

  // Fetch the audio stream and send it to the live transcription connection
  fetch(url)
    .then((r) => r.body)
    .then((res) => {
      if (res) {
        res.on('readable', () => {
          connection.send(res.read());
        });
      }
    });
});

View complete Deepgram SDK documentation → developers.deepgram.com/sdks/sdk-features

Available models

Choose the model that fits your language and use case.

`green-s`: GreenS

Reliable speech-to-text for single-language streams. Great for meetings, podcasts, and voice assistants.

Supported languages: English, German, Spanish, French, Italian, Dutch, Portuguese, Romanian, Bulgarian, Catalan, Danish, Finnish, Swedish.

`green-s-pro`: GreenS Pro

Advanced model with automatic language detection. Handles multiple languages in the same stream.

Supported languages: English, German, Dutch, Swedish, Turkish.

Multilingual: use multi for automatic language detection across languages in the same stream.

Multilingual mode

Transcribe conversations where speakers switch between languages.

With green-s-pro, set language=multi to transcribe live audio where multiple languages are spoken. The model automatically detects and transcribes each language as speakers switch.

Languages supported in multilingual mode: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch.

Price difference

Multilingual processing costs more than single-language. For live: €0.37/hour vs €0.31/hour for monolingual.

Output differences

With language=multi, the response adds a languages array and a language field per word:

"alternatives": [{
  "transcript": "No recuerdo mi bank password.",
  "languages": ["es", "en"],
  "words": [
    { "word": "no", "language": "es" },
    { "word": "recuerdo", "language": "es" },
    { "word": "bank", "language": "en" }
  ]
}]

Available features

Add-on capabilities for live transcription.

Feature	Description	`green-s`	`green-s-pro`
Speaker diarization	Identify different speakers in the audio.	Yes	Yes
Language detection	Auto-detect spoken language with `multi` (off by default; defaults to `en`).	Yes	Yes
Profanity filter	Filter or mask profanity in the transcript.	Yes	Yes
Speech intent & topics	Detect topics and speaker intent.	Yes	Yes
Smart formatting	Improved punctuation and readability (English only).	Yes	-

Summarization: not available for live streaming. Use the pre-recorded API for transcript summaries.

Pricing

Live transcription rates per hour of audio.

New pricing, now live: these model rates are updated and in effect today. The add-on features below are a separate launch promo and free for now.

Model	Regular rate	Promo (Jul–Aug 2026, −50%)
`green-s`: all supported languages	€0.31 / hour	€0.16 / hour
`green-s-pro`: monolingual	€0.31 / hour	€0.16 / hour
`green-s-pro`: multilingual	€0.37 / hour	€0.19 / hour

All prices in EUR, excl. taxes.

Additional features

Launch promo: these add-ons are free right now. The prices shown are the standard per-hour rates that apply once the promo ends; you are not charged for them today.

Feature	Rate (promo: free now)
Redaction	€0.10 / hour
Entity detection	€0.08 / hour
Streaming diarization	€0.10 / hour
Keyterm prompting	€0.07 / hour

Multichannel audio is billed per channel, so 2-channel audio is charged at double the per-hour rate.

Live Audio Streaming (STT)

On this page