Real-Time Processing

Overview

Frontier AI provides fast AI coaching during live sales calls through a real-time processing pipeline. The system handles transcript streams, question detection, and AI-powered response generation. Note that Recall.ai operates from US East, which adds some latency for EU/UK users.

Processing Architecture

Real-Time AI Pipeline

The real-time processing uses OpenAI Queue for immediate AI responses during live calls, while Humanloop handles post-call analysis.

WebSocket Communication Flow

Real-time communication between client applications and the system uses WebSockets for bidirectional data flow.

Real-Time Processing Components

Durable Objects

CallServer - Manages individual call state and WebSocket connections

Maintains ephemeral call state
Buffers transcript segments for processing
Handles WebSocket message routing
Manages AI coaching state

QuestionsServer - Dedicated AI question detection

Analyzes transcript segments for customer questions
Uses OpenAI Queue for fast AI processing
Maintains question detection state

FeedbackServer - AI coaching and suggestions

Generates real-time coaching suggestions
Processes RAG queries against knowledge base
Uses OpenAI Queue for immediate responses

OpenAI Queue (Real-Time)

Purpose-built for real-time AI processing during live calls:

Features:

Rate limiting to prevent OpenAI API overload
Connection pooling for efficient API usage
Direct API calls for minimal latency
Error handling with circuit breaker patterns

Message Types

The WebSocket connection supports these real-time message types:

Message Type	Direction	Purpose
`TRANSCRIPT_UPDATE`	Server → Client	Real-time transcription chunks
`QUESTION_DETECTED`	Server → Client	AI-detected customer questions
`RESPONSE_SUGGESTION`	Server → Client	RAG-powered response suggestions
`CALL_STATUS_CHANGE`	Server → Client	Call state updates
`PARTICIPANT_UPDATE`	Server → Client	Speaker changes and identification
`COACHING_FEEDBACK`	Server → Client	Real-time coaching suggestions
`USER_ACTION`	Client → Server	User interactions and preferences

Performance Characteristics

Transcript Processing: Fast processing from webhook to client (note: Recall.ai US East adds latency)
Question Detection: AI analysis typically under 1 second
Response Generation: RAG queries typically 1-2 seconds
WebSocket Delivery: Low latency edge-to-client delivery

Scalability

Concurrent Calls: Serverless architecture scales with demand
Edge Computing: Cloudflare edge network reduces latency for EU/UK users
Auto-scaling: Durable Objects scale based on demand
Rate Limiting: Prevents API quota exhaustion

Error Recovery

WebSocket Reconnection: Automatic client reconnection
State Recovery: Durable Objects maintain call state
Graceful Degradation: Continues without AI if services fail
Circuit Breakers: Prevent cascade failures to OpenAI API

Real-Time Data Flow

This architecture provides fast AI processing with response times in the 1-2 second range while maintaining reliability and scalability for production usage.

🏗️ System Architecture 👤 User Experience