Live Audio Streaming (STT)
Real-time speech-to-text transcription using WebSocket connections for live audio streams.
Real-time speech-to-text transcription using WebSocket connections for live audio streams. Our API follows OpenAI and Deepgram compatible patterns.
SDK integration
Use the official Deepgram SDK with our WebSocket endpoint.
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
const client = createClient('YOUR_API_KEY', {
global: {
websocket: { options: { url: 'https://api.greenpt.ai/v1' } },
},
});
// Setup live transcription
const connection = client.listen.live({
model: 'green-s',
language: 'en',
smart_format: true,
});
connection.on(LiveTranscriptionEvents.Transcript, (data) => {
console.log(data.channel.alternatives[0].transcript);
});
connection.on(LiveTranscriptionEvents.Open, () => {
// Send audio data when connection is open
connection.send(audioData);
});from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from deepgram.environment import DeepgramClientEnvironment
# Create environment for GreenPT API
greenpt_env = DeepgramClientEnvironment(
base="https://api.greenpt.ai",
production="wss://api.greenpt.ai",
agent="wss://api.greenpt.ai",
)
deepgram = DeepgramClient("YOUR_API_KEY", environment=greenpt_env)
dg_connection = deepgram.listen.websocket.v("1")
def on_message(self, result, **kwargs):
sentence = result.channel.alternatives[0].transcript
if len(sentence) == 0:
return
print(f"speaker: {sentence}")
dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
options = LiveOptions(
model="green-s",
language="en",
interim_results=True,
diarize=True,
smart_format=True,
)
dg_connection.start(options)WebSocket endpoint
wss://api.greenpt.ai/v1/listenHandshake
Parameters for the WebSocket connection.
| Parameter | Type | Required | Description |
|---|---|---|---|
Authorization | header | Yes | API key for authentication. Format: Token YOUR_API_KEY. |
encoding | query | No | Audio encoding format (e.g. linear16, opus). |
sample_rate | query | No | Sample rate of audio (e.g. 16000, 24000). |
language | query | No | Language code (e.g. en, es, fr). |
interim_results | query | No | Receive partial transcription results as audio is processed. |
diarize | query | No | Enable speaker diarization to identify different speakers. |
punctuate | query | No | Add punctuation and capitalization to transcript. |
smart_format | query | No | Apply formatting to transcript output for improved readability. |
vad_events | query | No | Enable voice activity detection events. |
Connection example
const ws = new WebSocket(
'wss://api.greenpt.ai/v1/listen?encoding=linear16&sample_rate=16000&language=en&interim_results=true',
);
ws.onopen = () => {
console.log('WebSocket connected');
// Start sending audio data
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'Results') {
console.log('Transcript:', data.channel.alternatives[0].transcript);
console.log('Is final:', data.is_final);
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};Send audio
How to send audio data and control the stream.
// Convert audio buffer to base64 and send
function sendAudioChunk(audioBuffer) {
const base64Audio = btoa(
String.fromCharCode(...new Uint8Array(audioBuffer)),
);
ws.send(base64Audio);
}
// Close the stream when done
function closeStream() {
ws.send(
JSON.stringify({
type: 'CloseStream',
}),
);
}
// Keep connection alive
function keepAlive() {
ws.send(
JSON.stringify({
type: 'KeepAlive',
}),
);
}Receive transcription
Example response format with detailed information.
{
"type": "Results",
"channel": {
"alternatives": [{
"confidence": 0.98,
"transcript": "Hello, world! Welcome to GreenPT!",
"words": [{
"confidence": 0.99,
"end": 0.5,
"punctuated_word": "Hello,",
"start": 0.1,
"word": "hello"
}, {
"confidence": 0.98,
"end": 0.8,
"punctuated_word": "world!",
"start": 0.6,
"word": "world"
}]
}]
},
"duration": 2,
"is_final": true,
"metadata": {
"model_info": {
"name": "nova-2",
"version": "1.0.0"
},
"request_id": "987fcdeb-51a2-43b7-91e4-c95bafcda21a"
},
"start": 0,
"speech_final": true
}Complete SDK example
Full working example with the Deepgram SDK.
import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk';
import fetch from 'cross-fetch';
const url = 'YOUR_LIVESTREAM_URL';
const client = createClient('YOUR_API_KEY', {
global: {
websocket: { options: { url: 'wss://api.greenpt.ai/v1' } },
},
});
// Setup live transcription
const connection = client.listen.live({
model: 'green-s',
language: 'en',
smart_format: true,
});
// Listen for events from the live transcription connection
connection.on(LiveTranscriptionEvents.Open, () => {
connection.on(LiveTranscriptionEvents.Close, () => {
console.log('Connection closed.');
});
connection.on(LiveTranscriptionEvents.Transcript, (data) => {
console.log(data.channel.alternatives[0].transcript);
});
connection.on(LiveTranscriptionEvents.Metadata, (data) => {
console.log(data);
});
connection.on(LiveTranscriptionEvents.Error, (err) => {
console.error(err);
});
// Fetch the audio stream and send it to the live transcription connection
fetch(url)
.then((r) => r.body)
.then((res) => {
if (res) {
res.on('readable', () => {
connection.send(res.read());
});
}
});
});View complete Deepgram SDK documentation → developers.deepgram.com/sdks/sdk-features
Available models
Choose the model that fits your language and use case.
green-s: GreenS
Reliable speech-to-text for single-language streams. Great for meetings, podcasts, and voice assistants.
Supported languages: English, German, Spanish, French, Italian, Dutch, Portuguese, Romanian.
green-s-pro: GreenS Pro
Advanced model with automatic language detection. Handles multiple languages in the same stream.
Supported languages: English, German, Dutch, Swedish, Turkish.
Multilingual: use
multifor automatic language detection across languages in the same stream.
Multilingual mode
Transcribe conversations where speakers switch between languages.
With green-s-pro, set language=multi to transcribe live audio where
multiple languages are spoken. The model automatically detects and
transcribes each language as speakers switch.
Languages supported in multilingual mode: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch.
Price difference
Multilingual processing costs more than single-language. For live: €1.04/hour vs €0.78/hour for monolingual.
Output differences
With language=multi, the response adds a languages array and a language
field per word:
"alternatives": [{
"transcript": "No recuerdo mi bank password.",
"languages": ["es", "en"],
"words": [
{ "word": "no", "language": "es" },
{ "word": "recuerdo", "language": "es" },
{ "word": "bank", "language": "en" }
]
}]Available features
Add-on capabilities for live transcription.
| Feature | Description | green-s | green-s-pro |
|---|---|---|---|
| Speaker diarization | Identify different speakers in the audio. | Yes | Yes |
| Language detection | Automatically detect spoken language. | Yes | Yes |
| Profanity filter | Filter or mask profanity in the transcript. | Yes | Yes |
| Speech intent & topics | Detect topics and speaker intent. | Yes | Yes |
| Smart formatting | Improved punctuation and readability (English only). | Yes | - |
Summarization: not available for live streaming. Use the pre-recorded API for transcript summaries.
Pricing
Live transcription rates per hour of audio.
| Model | Rate |
|---|---|
green-s: all supported languages | €0.65 / hour |
green-s-pro: monolingual | €0.78 / hour |
green-s-pro: multilingual | €1.04 / hour |
All prices in EUR, excl. taxes.