📋Frontier AI System Architecture Documentation - Technical reference and development guide
⚡ Real-Time Processing

Real-Time Processing

Overview

Frontier AI provides fast AI coaching during live sales calls through a real-time processing pipeline. The system handles transcript streams, question detection, and AI-powered response generation. Note that Recall.ai operates from US East, which adds some latency for EU/UK users.

Processing Architecture

Real-Time AI Pipeline

The real-time processing uses OpenAI Queue for immediate AI responses during live calls, while Humanloop handles post-call analysis.

WebSocket Communication Flow

Real-time communication between client applications and the system uses WebSockets for bidirectional data flow.

Real-Time Processing Components

Durable Objects

CallServer - Manages individual call state and WebSocket connections

  • Maintains ephemeral call state
  • Buffers transcript segments for processing
  • Handles WebSocket message routing
  • Manages AI coaching state

QuestionsServer - Dedicated AI question detection

  • Analyzes transcript segments for customer questions
  • Uses OpenAI Queue for fast AI processing
  • Maintains question detection state

FeedbackServer - AI coaching and suggestions

  • Generates real-time coaching suggestions
  • Processes RAG queries against knowledge base
  • Uses OpenAI Queue for immediate responses

OpenAI Queue (Real-Time)

Purpose-built for real-time AI processing during live calls:

Features:

  • Rate limiting to prevent OpenAI API overload
  • Connection pooling for efficient API usage
  • Direct API calls for minimal latency
  • Error handling with circuit breaker patterns

Message Types

The WebSocket connection supports these real-time message types:

Message TypeDirectionPurpose
TRANSCRIPT_UPDATEServer → ClientReal-time transcription chunks
QUESTION_DETECTEDServer → ClientAI-detected customer questions
RESPONSE_SUGGESTIONServer → ClientRAG-powered response suggestions
CALL_STATUS_CHANGEServer → ClientCall state updates
PARTICIPANT_UPDATEServer → ClientSpeaker changes and identification
COACHING_FEEDBACKServer → ClientReal-time coaching suggestions
USER_ACTIONClient → ServerUser interactions and preferences

Performance Characteristics

Performance Characteristics

  • Transcript Processing: Fast processing from webhook to client (note: Recall.ai US East adds latency)
  • Question Detection: AI analysis typically under 1 second
  • Response Generation: RAG queries typically 1-2 seconds
  • WebSocket Delivery: Low latency edge-to-client delivery

Scalability

  • Concurrent Calls: Serverless architecture scales with demand
  • Edge Computing: Cloudflare edge network reduces latency for EU/UK users
  • Auto-scaling: Durable Objects scale based on demand
  • Rate Limiting: Prevents API quota exhaustion

Error Recovery

  • WebSocket Reconnection: Automatic client reconnection
  • State Recovery: Durable Objects maintain call state
  • Graceful Degradation: Continues without AI if services fail
  • Circuit Breakers: Prevent cascade failures to OpenAI API

Real-Time Data Flow

This architecture provides fast AI processing with response times in the 1-2 second range while maintaining reliability and scalability for production usage.