Module 9: Deployment Strategies

Learning Content

Module Overview

Building an accurate AI model in development is only half the battle. Deploying it to production—where it processes real data, integrates with business systems, and impacts actual users—introduces new challenges: scalability, reliability, security, and operability. Poor deployment decisions can turn a great model into an unusable system.

This module teaches deployment strategies tailored to Malta businesses, from batch processing to real-time APIs, cloud vs. on-premise decisions, and how to rollout AI systems safely without disrupting operations.

🔑 Key Concept: Deployment Patterns Matter as Much as Algorithms

A 90% accurate model deployed poorly (slow predictions, frequent downtime) delivers less value than an 85% accurate model deployed excellently (fast, reliable, well-integrated). Focus on production engineering, not just model tuning.

Deployment Architecture Patterns

Pattern 1: Batch Prediction

How It Works: Model runs on schedule (daily, hourly, weekly) to score large datasets. Predictions stored in database for business applications to consume.

Example Use Cases:

iGaming Churn Prediction: Score all active players nightly for churn risk. CRM system shows risk scores next morning for retention team.
Retail Demand Forecasting: Generate weekly sales forecasts for all products every Sunday night.
Credit Risk Scoring: Score all loan applications daily in overnight batch process.

Advantages:

Simple architecture—doesn't require real-time infrastructure
Cost-effective—batch compute cheaper than always-on real-time APIs
Easy to debug—can rerun batch if issues occur
Handles large volumes efficiently

Limitations:

Not real-time—predictions lag by hours/days
Can't respond to immediate events (e.g., fraud detection during transaction)
Wasted compute on users who don't need prediction

When to Use: When real-time predictions aren't needed, processing large volumes, or minimizing infrastructure costs

Pattern 2: Real-Time REST API

How It Works: Model hosted as always-on API endpoint. Business applications send requests, receive immediate predictions (milliseconds).

Example Use Cases:

Fraud Detection: Each transaction sent to API during checkout, blocked if fraud risk high
Product Recommendations: Website calls API when user loads page, displays personalized product suggestions
Chatbot Responses: Natural language API processes user messages, returns intelligent responses

Advantages:

Real-time responses—predictions made when needed
Scalable—can handle variable traffic with auto-scaling
Efficient—only predict for users/events requiring it
Integrates easily with web/mobile applications

Limitations:

Complex infrastructure—requires API hosting, load balancing, monitoring
Higher costs—always-on servers vs. scheduled batch
Latency critical—users won't wait seconds for predictions
Requires robust error handling (what if API down?)

When to Use: When decisions must be made immediately, user-facing applications, or event-driven responses needed

Pattern 3: Streaming/Event-Driven

How It Works: Model listens to event stream (Kafka, AWS Kinesis), processes events in real-time, publishes predictions back to stream.

Example Use Cases:

Real-Time Anomaly Detection: Monitor transaction stream, flag anomalies immediately
Dynamic Pricing: Monitor demand signals, adjust prices in real-time
IoT Predictive Maintenance: Sensor data streamed, predict equipment failures before they occur

When to Use: High-volume event processing, sub-second latency requirements, complex event processing pipelines

Pattern 4: Edge/Mobile Deployment

How It Works: Model runs directly on user's device (mobile app, browser, IoT device) rather than cloud server.

Example Use Cases:

Mobile Game AI: Opponent AI runs on player's phone, no server needed
Offline Recommendations: App provides recommendations even without internet connection
Privacy-Preserving AI: Sensitive data never leaves device (e.g., health data analysis)

When to Use: Privacy requirements, offline functionality needed, or minimizing latency/bandwidth

Cloud vs. On-Premise Deployment

Cloud Deployment (AWS, Azure, Google Cloud)

Advantages:

Elasticity: Auto-scale compute resources based on demand
Managed Services: Cloud providers handle infrastructure, focus on models not servers
Pay-as-You-Go: Only pay for what you use, no upfront hardware investment
Geographic Distribution: Deploy globally close to users

Disadvantages:

Cost Unpredictability: Bills can spike unexpectedly with high usage
Vendor Lock-In: Tied to cloud provider's services and pricing
Data Residency Concerns: For GDPR, may need EU-specific regions

Popular Malta Cloud Options:

AWS (Amazon): Most comprehensive, complex. EU regions: Frankfurt, Ireland. Cost: €500-€10,000+/month depending on usage.
Azure (Microsoft): Strong enterprise integration. EU regions: Netherlands, Ireland. Similar cost to AWS.
Google Cloud: Strong ML tools (Vertex AI). EU regions: Belgium, Netherlands.

On-Premise Deployment (Your Own Servers)

Advantages:

Full Control: Complete control over hardware, network, security
Predictable Costs: Fixed hardware investment, no surprise bills
Data Sovereignty: Data never leaves your infrastructure

Disadvantages:

High Upfront Investment: €20K-€200K+ for capable server infrastructure
Operational Overhead: Need IT team to maintain servers, updates, security
No Elasticity: Can't scale dynamically; over-provision or risk capacity limits

When On-Premise Makes Sense: Highly sensitive data (financial, healthcare), regulatory constraints preventing cloud, or existing unused server capacity

Hybrid: Best of Both Worlds

Common Hybrid Approach:

Store sensitive data on-premise (Malta servers)
Process and train models in cloud (EU regions for GDPR compliance)
Deploy predictions back on-premise or via private cloud connection

Deployment Rollout Strategy

Phase 1: Shadow Mode (2-4 Weeks)

AI runs in production but predictions aren't acted upon
Log predictions alongside actual outcomes for validation
Verify accuracy matches test set performance
Identify integration issues without user impact

Phase 2: Canary Deployment (1-2 Weeks)

Route 5-10% of traffic to new AI system
90-95% still uses old system (or no AI)
Monitor performance, errors, user feedback closely
If issues detected, easy to stop and fix

Phase 3: Gradual Rollout (2-4 Weeks)

Incrementally increase traffic: 10% → 25% → 50% → 100%
Monitor continuously at each stage
Pause or rollback if metrics degrade

Phase 4: Full Deployment + Monitoring

100% traffic on new AI system
Continuous monitoring and optimization
Periodic retraining (quarterly or when drift detected)

Malta Case Study: iGaming Recommendation Engine Deployment

Company: Malta iGaming operator, 1.8M players, deploying game recommendation AI

Requirements:

Real-time recommendations (player loads website, sees personalized game suggestions within 200ms)
Handle peak traffic: 15,000 concurrent users during major sports events
GDPR-compliant (EU data residency)
99.5% uptime (downtime means lost revenue)

Deployment Architecture Decision:

Pattern: Real-Time REST API (need immediate predictions)
Infrastructure: AWS EU (Ireland) region for GDPR compliance
API Framework: FastAPI (Python) in Docker containers
Scaling: Auto-scaling (2-10 instances based on traffic)
Load Balancing: AWS Application Load Balancer distributes traffic
Database: Redis cache for fast lookups (player profiles, recent games)
Monitoring: CloudWatch metrics + PagerDuty alerts

Cost Analysis:

API hosting (AWS EC2): €800/month (avg 4 instances)
Load balancer: €200/month
Redis cache: €150/month
Data transfer: €300/month
Total: €1,450/month (~€17K/year)

Deployment Rollout (8 Weeks):

Week 1-2: Shadow Mode

API live but not called by website
Internal testing team triggered predictions
Discovered API timeout issues under concurrent load
Fixed: increased connection pool size, optimized database queries

Week 3-4: Canary (10% Traffic)

10% of players see AI recommendations, 90% see generic "popular games"
Monitored: API latency (avg 145ms ✓), error rate (0.3% ✓), click-through rate
Result: 28% increase in click-through rate for AI recommendations vs. generic
User feedback positive, no complaints

Week 5-6: Gradual Rollout (25% → 50%)

Incrementally increased traffic
Week 5: Load spike during Champions League final (20K concurrent users)
Auto-scaling triggered: scaled from 4 to 8 instances automatically
Latency spiked to 320ms temporarily, then stabilized at 160ms
System handled peak load successfully ✓

Week 7-8: Full Deployment (100%)

All players see AI recommendations
Average API latency: 152ms (within 200ms target ✓)
Uptime: 99.7% (exceeded 99.5% SLA ✓)
Business Impact: 32% increase in game engagement (players playing more games per session)

Ongoing Optimization (Months 3-6):

Implemented Redis caching for frequent players—reduced API latency to 95ms
Added A/B testing framework—continuously test recommendation algorithm variants
Quarterly model retraining—maintain accuracy as player preferences evolve

Deployment Success Factors:

Gradual rollout prevented "big bang" failure—issues caught in canary phase
Auto-scaling handled unpredictable traffic spikes (major sports events)
EU cloud region ensured GDPR compliance (critical for Malta gaming license)
Monitoring infrastructure detected and alerted issues before users impacted

Deployment Checklist

Before deploying AI to production:

☐ Performance Requirements Defined: Latency targets, throughput capacity, uptime SLA
☐ Deployment Pattern Selected: Batch, real-time API, streaming, or edge
☐ Infrastructure Provisioned: Cloud/on-premise servers, load balancing, databases
☐ Security Configured: Authentication, encryption, access controls, GDPR compliance
☐ Monitoring & Alerts: Dashboards, alerts for errors/latency/accuracy degradation
☐ Rollback Plan: Documented and tested process to revert to previous version
☐ Shadow Mode Tested: Verified production accuracy matches test set
☐ Canary Deployment: Tested on 5-10% traffic without issues
☐ Load Testing: Verified system handles expected + 2x peak traffic
☐ Documentation: API docs, integration guides, troubleshooting runbooks

Key Takeaways

Four deployment patterns: Batch (scheduled processing), Real-Time API (immediate predictions), Streaming (event-driven), Edge (on-device)
Cloud deployment offers elasticity and managed services; on-premise offers control and predictable costs
For Malta businesses: Use EU cloud regions (Ireland, Frankfurt, Netherlands) for GDPR compliance
Gradual rollout de-risks deployment: shadow mode → canary (10%) → gradual (25%/50%) → full (100%)
Auto-scaling critical for unpredictable traffic (e.g., iGaming during major sports events)
Monitor continuously: latency, error rates, accuracy, uptime—detect issues before users complain
Real-time APIs cost more but enable user-facing AI; batch is cheaper for backend analytics
Always have rollback plan—ability to revert quickly is production safety net
Typical cloud deployment costs for Malta SME: €1K-€5K/month depending on usage

Learning Content

Module Overview

🔑 Key Concept: Deployment Patterns Matter as Much as Algorithms

Deployment Architecture Patterns

Pattern 1: Batch Prediction

Pattern 2: Real-Time REST API

Pattern 3: Streaming/Event-Driven

Pattern 4: Edge/Mobile Deployment

Cloud vs. On-Premise Deployment

Cloud Deployment (AWS, Azure, Google Cloud)

On-Premise Deployment (Your Own Servers)

Hybrid: Best of Both Worlds

Deployment Rollout Strategy

Malta Case Study: iGaming Recommendation Engine Deployment

Deployment Checklist

Key Takeaways

📝 Knowledge Check Quiz

Question 1

Question 2

Question 3

Question 4

Question 5

💡 Hands-On Exercise