Real-Time Processing
Overview
Frontier AI provides fast AI coaching during live sales calls through a real-time processing pipeline. The system handles transcript streams, question detection, and AI-powered response generation. Note that Recall.ai operates from US East, which adds some latency for EU/UK users.
Processing Architecture
Real-Time AI Pipeline
The real-time processing uses OpenAI Queue for immediate AI responses during live calls, while Humanloop handles post-call analysis.
WebSocket Communication Flow
Real-time communication between client applications and the system uses WebSockets for bidirectional data flow.
Real-Time Processing Components
Durable Objects
CallServer - Manages individual call state and WebSocket connections
- Maintains ephemeral call state
- Buffers transcript segments for processing
- Handles WebSocket message routing
- Manages AI coaching state
QuestionsServer - Dedicated AI question detection
- Analyzes transcript segments for customer questions
- Uses OpenAI Queue for fast AI processing
- Maintains question detection state
FeedbackServer - AI coaching and suggestions
- Generates real-time coaching suggestions
- Processes RAG queries against knowledge base
- Uses OpenAI Queue for immediate responses
OpenAI Queue (Real-Time)
Purpose-built for real-time AI processing during live calls:
Features:
- Rate limiting to prevent OpenAI API overload
- Connection pooling for efficient API usage
- Direct API calls for minimal latency
- Error handling with circuit breaker patterns
Message Types
The WebSocket connection supports these real-time message types:
| Message Type | Direction | Purpose |
|---|---|---|
TRANSCRIPT_UPDATE | Server → Client | Real-time transcription chunks |
QUESTION_DETECTED | Server → Client | AI-detected customer questions |
RESPONSE_SUGGESTION | Server → Client | RAG-powered response suggestions |
CALL_STATUS_CHANGE | Server → Client | Call state updates |
PARTICIPANT_UPDATE | Server → Client | Speaker changes and identification |
COACHING_FEEDBACK | Server → Client | Real-time coaching suggestions |
USER_ACTION | Client → Server | User interactions and preferences |
Performance Characteristics
Performance Characteristics
- Transcript Processing: Fast processing from webhook to client (note: Recall.ai US East adds latency)
- Question Detection: AI analysis typically under 1 second
- Response Generation: RAG queries typically 1-2 seconds
- WebSocket Delivery: Low latency edge-to-client delivery
Scalability
- Concurrent Calls: Serverless architecture scales with demand
- Edge Computing: Cloudflare edge network reduces latency for EU/UK users
- Auto-scaling: Durable Objects scale based on demand
- Rate Limiting: Prevents API quota exhaustion
Error Recovery
- WebSocket Reconnection: Automatic client reconnection
- State Recovery: Durable Objects maintain call state
- Graceful Degradation: Continues without AI if services fail
- Circuit Breakers: Prevent cascade failures to OpenAI API
Real-Time Data Flow
This architecture provides fast AI processing with response times in the 1-2 second range while maintaining reliability and scalability for production usage.