GreenPT Docs

Live Audio Streaming (STT)

Real-time speech-to-text transcription using WebSocket connections for live audio streams.

WebSocket

Real-time speech-to-text transcription using WebSocket connections for live audio streams. Our API follows OpenAI and Deepgram compatible patterns.

SDK integration

Use the official Deepgram SDK with our WebSocket endpoint.

import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';

const client = createClient('YOUR_API_KEY', {
  global: {
    websocket: { options: { url: 'https://api.greenpt.ai/v1' } },
  },
});

// Setup live transcription
const connection = client.listen.live({
  model: 'green-s',
  language: 'en',
  smart_format: true,
});

connection.on(LiveTranscriptionEvents.Transcript, (data) => {
  console.log(data.channel.alternatives[0].transcript);
});

connection.on(LiveTranscriptionEvents.Open, () => {
  // Send audio data when connection is open
  connection.send(audioData);
});
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from deepgram.environment import DeepgramClientEnvironment

# Create environment for GreenPT API
greenpt_env = DeepgramClientEnvironment(
    base="https://api.greenpt.ai",
    production="wss://api.greenpt.ai",
    agent="wss://api.greenpt.ai",
)

deepgram = DeepgramClient("YOUR_API_KEY", environment=greenpt_env)
dg_connection = deepgram.listen.websocket.v("1")

def on_message(self, result, **kwargs):
    sentence = result.channel.alternatives[0].transcript
    if len(sentence) == 0:
        return
    print(f"speaker: {sentence}")

dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)

options = LiveOptions(
    model="green-s",
    language="en",
    interim_results=True,
    diarize=True,
    smart_format=True,
)

dg_connection.start(options)

WebSocket endpoint

wss://api.greenpt.ai/v1/listen

Handshake

Parameters for the WebSocket connection.

ParameterTypeRequiredDescription
AuthorizationheaderYesAPI key for authentication. Format: Token YOUR_API_KEY.
encodingqueryNoAudio encoding format (e.g. linear16, opus).
sample_ratequeryNoSample rate of audio (e.g. 16000, 24000).
languagequeryNoLanguage code (e.g. en, es, fr).
interim_resultsqueryNoReceive partial transcription results as audio is processed.
diarizequeryNoEnable speaker diarization to identify different speakers.
punctuatequeryNoAdd punctuation and capitalization to transcript.
smart_formatqueryNoApply formatting to transcript output for improved readability.
vad_eventsqueryNoEnable voice activity detection events.

Connection example

const ws = new WebSocket(
  'wss://api.greenpt.ai/v1/listen?encoding=linear16&sample_rate=16000&language=en&interim_results=true',
);

ws.onopen = () => {
  console.log('WebSocket connected');
  // Start sending audio data
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'Results') {
    console.log('Transcript:', data.channel.alternatives[0].transcript);
    console.log('Is final:', data.is_final);
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

Send audio

How to send audio data and control the stream.

// Convert audio buffer to base64 and send
function sendAudioChunk(audioBuffer) {
  const base64Audio = btoa(
    String.fromCharCode(...new Uint8Array(audioBuffer)),
  );
  ws.send(base64Audio);
}

// Close the stream when done
function closeStream() {
  ws.send(
    JSON.stringify({
      type: 'CloseStream',
    }),
  );
}

// Keep connection alive
function keepAlive() {
  ws.send(
    JSON.stringify({
      type: 'KeepAlive',
    }),
  );
}

Receive transcription

Example response format with detailed information.

{
  "type": "Results",
  "channel": {
    "alternatives": [{
      "confidence": 0.98,
      "transcript": "Hello, world! Welcome to GreenPT!",
      "words": [{
        "confidence": 0.99,
        "end": 0.5,
        "punctuated_word": "Hello,",
        "start": 0.1,
        "word": "hello"
      }, {
        "confidence": 0.98,
        "end": 0.8,
        "punctuated_word": "world!",
        "start": 0.6,
        "word": "world"
      }]
    }]
  },
  "duration": 2,
  "is_final": true,
  "metadata": {
    "model_info": {
      "name": "nova-2",
      "version": "1.0.0"
    },
    "request_id": "987fcdeb-51a2-43b7-91e4-c95bafcda21a"
  },
  "start": 0,
  "speech_final": true
}

Complete SDK example

Full working example with the Deepgram SDK.

import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
import fetch from 'cross-fetch';

const url = 'YOUR_LIVESTREAM_URL';

const client = createClient('YOUR_API_KEY', {
  global: {
    websocket: { options: { url: 'wss://api.greenpt.ai/v1' } },
  },
});

// Setup live transcription
const connection = client.listen.live({
  model: 'green-s',
  language: 'en',
  smart_format: true,
});

// Listen for events from the live transcription connection
connection.on(LiveTranscriptionEvents.Open, () => {
  connection.on(LiveTranscriptionEvents.Close, () => {
    console.log('Connection closed.');
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    console.log(data.channel.alternatives[0].transcript);
  });

  connection.on(LiveTranscriptionEvents.Metadata, (data) => {
    console.log(data);
  });

  connection.on(LiveTranscriptionEvents.Error, (err) => {
    console.error(err);
  });

  // Fetch the audio stream and send it to the live transcription connection
  fetch(url)
    .then((r) => r.body)
    .then((res) => {
      if (res) {
        res.on('readable', () => {
          connection.send(res.read());
        });
      }
    });
});

View complete Deepgram SDK documentation → developers.deepgram.com/sdks/sdk-features

Available models

Choose the model that fits your language and use case.

green-s: GreenS

Reliable speech-to-text for single-language streams. Great for meetings, podcasts, and voice assistants.

Supported languages: English, German, Spanish, French, Italian, Dutch, Portuguese, Romanian.

green-s-pro: GreenS Pro

Advanced model with automatic language detection. Handles multiple languages in the same stream.

Supported languages: English, German, Dutch, Swedish, Turkish.

Multilingual: use multi for automatic language detection across languages in the same stream.

Multilingual mode

Transcribe conversations where speakers switch between languages.

With green-s-pro, set language=multi to transcribe live audio where multiple languages are spoken. The model automatically detects and transcribes each language as speakers switch.

Languages supported in multilingual mode: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch.

Price difference

Multilingual processing costs more than single-language. For live: €1.04/hour vs €0.78/hour for monolingual.

Output differences

With language=multi, the response adds a languages array and a language field per word:

"alternatives": [{
  "transcript": "No recuerdo mi bank password.",
  "languages": ["es", "en"],
  "words": [
    { "word": "no", "language": "es" },
    { "word": "recuerdo", "language": "es" },
    { "word": "bank", "language": "en" }
  ]
}]

Available features

Add-on capabilities for live transcription.

FeatureDescriptiongreen-sgreen-s-pro
Speaker diarizationIdentify different speakers in the audio.YesYes
Language detectionAutomatically detect spoken language.YesYes
Profanity filterFilter or mask profanity in the transcript.YesYes
Speech intent & topicsDetect topics and speaker intent.YesYes
Smart formattingImproved punctuation and readability (English only).Yes-

Summarization: not available for live streaming. Use the pre-recorded API for transcript summaries.

Pricing

Live transcription rates per hour of audio.

ModelRate
green-s: all supported languages€0.65 / hour
green-s-pro: monolingual€0.78 / hour
green-s-pro: multilingual€1.04 / hour

All prices in EUR, excl. taxes.

On this page