Data Architecture
Overview
Frontier AI uses a modern data architecture built on Neon PostgreSQL as the primary database, with specialized Cloudflare services for real-time processing, vector search, and object storage.
Database Schema
The system uses a normalized PostgreSQL schema with clear relationships between users, calls, and analysis data:
Data Layer Architecture
Primary Database (Neon PostgreSQL)
Purpose: Persistent storage for all application data
- User accounts and authentication data
- Call records and metadata
- Transcript storage and indexing
- Analysis results and summaries
- User settings and preferences
Key Features:
- Serverless Scaling: Automatic scaling based on demand
- Branching: Database branches for development/staging
- Connection Pooling: Efficient connection management
- Backup & Recovery: Automated daily backups
Vector Database (Cloudflare Vectorize)
Purpose: AI embeddings and semantic search
- Knowledge base document embeddings
- Transcript semantic search
- RAG (Retrieval Augmented Generation) queries
- Similar call detection
Integration Flow:
Object Storage (Cloudflare R2)
Purpose: Large file storage and static assets
- Call recordings (when available)
- Knowledge base documents
- User-uploaded files
- System assets and backups
Cache Layer (Cloudflare KV)
Purpose: Fast key-value storage
- Session data and temporary state
- API response caching
- Bot mapping for Recall.ai integration
- User preference caching
Ephemeral Storage (Durable Objects)
Purpose: Real-time state management
- Active WebSocket connections
- Transcript buffering during calls
- AI processing state
- Temporary coaching data
Data Flow Patterns
Real-Time Data Flow
Post-Call Analysis Flow
Data Consistency & Integrity
ACID Compliance
Neon PostgreSQL provides full ACID compliance for critical data:
- Atomicity: All transactions complete or rollback entirely
- Consistency: Data integrity constraints enforced
- Isolation: Concurrent operations don't interfere
- Durability: Committed data survives system failures
Eventual Consistency
Distributed Systems use eventual consistency:
- Durable Objects: State eventually propagates to PostgreSQL
- Vectorize: Embeddings updated asynchronously
- KV Store: Cache invalidation handles consistency
- R2 Storage: Object consistency across edge locations
Data Validation
Performance Optimization
Read Optimization
- Database Indexing: Strategic indexes on frequently queried columns
- Connection Pooling: Efficient database connection management
- Query Optimization: Optimized SQL queries with proper joins
- Caching Strategy: KV cache for frequently accessed data
Write Optimization
- Batch Processing: Bulk operations for transcript data
- Async Processing: Background jobs for heavy operations
- Write-Behind Caching: Immediate response with delayed persistence
- Partitioning: Table partitioning for time-series data
Storage Optimization
| Data Type | Storage Solution | Retention Policy |
|---|---|---|
| User Data | Neon PostgreSQL | Indefinite |
| Call Metadata | Neon PostgreSQL | 2 years |
| Transcripts | Neon PostgreSQL | 1 year |
| Call Recordings | R2 Storage | 6 months |
| Vector Embeddings | Vectorize | 1 year |
| Cache Data | KV Store | 24 hours |
| Session State | Durable Objects | Call duration |
Data Security & Compliance
Encryption
- At Rest: All data encrypted in Neon PostgreSQL
- In Transit: TLS 1.3 for all API communications
- Application Level: Sensitive data encrypted before storage
- Key Management: Doppler for secure secret management
Access Control
- Role-Based Access: Granular permissions per user role
- API Authentication: Clerk-based authentication for all endpoints
- Database Security: Row-level security for multi-tenant data
- Audit Logging: Comprehensive access logging for compliance
This data architecture provides the foundation for reliable, scalable, and secure data processing for real-time AI applications.