6 min read

CadooVision Technical Deep Dive: How We Built Exercise Form Verification

CadooVision Architecture Overview

CadooVision is Cadoo's server-side computer vision pipeline running on Google Cloud Run. It processes workout videos through five stages: pose detection, side selection, rep counting, form scoring, and credit calculation.

Technology Stack

  • Pose Detection: Google MediaPipe Pose — 33 keypoints per frame with X/Y coordinates and visibility confidence
  • Signal Processing: scipy.signal.find_peaks with dynamic prominence thresholds
  • Angle Calculation: 3-point inverse cosine geometry
  • Infrastructure: Google Cloud Run (auto-scaling), Firebase Storage (video hosting)
  • Backend: NestJS/TypeScript (Ledger), Prisma ORM, PostgreSQL
  • Frontend: Flutter (iOS + Android) with on-device MediaPipe for practice mode

Stage 1: Pose Detection

Every frame of the uploaded video is processed through MediaPipe Pose, extracting 33 body keypoints. Each keypoint includes X/Y coordinates (normalized 0-1) and a visibility confidence score.

Stage 2: Automatic Side Selection

CadooVision auto-detects which side of the body is most visible to the camera (left vs right). It samples the middle 25% of frames for stability and compares visibility confidence across 6 keypoints: shoulder, elbow, wrist, hip, knee, ankle. The dominant side is used for all subsequent analysis. This means users don't need to worry about camera placement.

Stage 3: Rep Counting via Signal Processing

Rather than counting "movements," CadooVision tracks the distance between specific keypoint pairs over time and uses signal processing to identify repetitions:

Exercise-Specific Signals

ExerciseSignalDetectionProminence
PushupsShoulder-wrist Y-distancePeak detection (arms extended = top)0.15
SquatsShoulder-knee Y-distanceValley detection (knees bent = bottom)0.20
SitupsShoulder-hip distance (both axes)Dual-axis peak detection0.20

Situp dual-axis detection: Camera orientation varies — side-facing cameras capture horizontal movement better, front-facing cameras capture vertical. CadooVision runs peak detection on both X and Y axes independently and uses whichever finds more reps.

Signal-Derived Rep Boundaries

Instead of arbitrary fixed windows around each peak, rep boundaries come from the signal itself:

  • For peak-detected reps (pushups, situps): find the local minimum (trough) between consecutive peaks
  • For valley-detected reps (squats): find the local maximum between consecutive valleys
  • Each rep gets a natural start_frame, middle_frame (peak/valley), and end_frame

Signal processing includes: interpolation of missing frames, off-screen coordinate filtering (Y < 0 or Y > 1), and exercise-specific outlier removal.

Stage 4: Per-Rep Form Scoring

For every detected rep, CadooVision samples poses across the full movement window (start_frame to end_frame), calculates joint angles using 3-point geometry, and scores against biomechanical standards.

Visibility gate: Any keypoint with confidence < 0.3 is excluded from angle calculations.

Pushup Form: 3 Components

ComponentWeightAngleIdealFail
Depth40%Elbow (shoulder-elbow-wrist) at bottom≤ 90°> 140°
Lockout25%Elbow at top≥ 160°< 140°
Body Line35%Hip (shoulder-hip-ankle) deviation from 180°≤ 10° dev> 25° dev

Squat Form: 3 Components

ComponentWeightAngleIdealFail
Depth40%Knee (hip-knee-ankle) at bottom≤ 90°> 140°
Lockout25%Knee at top≥ 160°< 140°
Torso35%Torso (shoulder-hip-knee) uprightness≥ 80°< 50°

Situp Form: 3 Components

ComponentWeightMetricIdealFail
Range of Motion45%Trunk angle change (max - min)≥ 40°< 20°
Knee Stability30%Knee angle std deviation from 90°≤ 15°> 30°
Control25%Trunk angle smoothness (std dev)≤ 30> 50

Stage 5: Credit Calculation

The form scoring system doubles as anti-cheat:

  1. Any rep scoring below 60% is discarded (not counted at all)
  2. attemptedReps = count of reps scoring ≥ 60%
  3. averageFormScore = mean of attempted reps' scores
  4. creditedReps = attemptedReps × averageFormScore

Data Response Format

CadooVision returns to the Ledger backend:

  • cvTotalReps — all reps detected (including bad form)
  • cvAttemptedReps — reps scoring ≥ 60%
  • cvCreditedReps — attemptedReps × averageFormScore
  • cvAverageFormScore — mean form score (0.0-1.0)
  • cvFormScores[] — per-rep scores in order (-1 = not calculable)

The Activity's reps field uses attemptedReps (not totalReps), ensuring only form-verified reps count toward challenge completion and payouts.

On-Device vs Server Pipeline

CadooVision runs in two modes:

  • On-device (Flutter/MediaPipe): Real-time practice mode — guides positioning, detects practice reps, gives immediate form feedback before recording starts
  • Server-side (Cloud Run): Full analysis of recorded video — production-grade rep counting, form scoring, and credit calculation for challenge payouts

The on-device pipeline ensures users are positioned correctly and can see their form score in real-time. The server pipeline is the source of truth for credited reps and payouts.

Build With CadooVision

Interested in integrating form-verified exercise data into your product? partnerships@cadoo.com

Cadoo AthleteCadoo Logo

Play Fitness Games Today

App Store
App Store
Google Play
Google Play
Cadoo Logo
Cadoo Facebook linkCadoo Instagram linkCadoo Twitter linkCadoo Reddit linkCadoo Tiktok  link