CadooVision Architecture Overview
CadooVision is Cadoo's server-side computer vision pipeline running on Google Cloud Run. It processes workout videos through five stages: pose detection, side selection, rep counting, form scoring, and credit calculation.
Technology Stack
- Pose Detection: Google MediaPipe Pose — 33 keypoints per frame with X/Y coordinates and visibility confidence
- Signal Processing: scipy.signal.find_peaks with dynamic prominence thresholds
- Angle Calculation: 3-point inverse cosine geometry
- Infrastructure: Google Cloud Run (auto-scaling), Firebase Storage (video hosting)
- Backend: NestJS/TypeScript (Ledger), Prisma ORM, PostgreSQL
- Frontend: Flutter (iOS + Android) with on-device MediaPipe for practice mode
Stage 1: Pose Detection
Every frame of the uploaded video is processed through MediaPipe Pose, extracting 33 body keypoints. Each keypoint includes X/Y coordinates (normalized 0-1) and a visibility confidence score.
Stage 2: Automatic Side Selection
CadooVision auto-detects which side of the body is most visible to the camera (left vs right). It samples the middle 25% of frames for stability and compares visibility confidence across 6 keypoints: shoulder, elbow, wrist, hip, knee, ankle. The dominant side is used for all subsequent analysis. This means users don't need to worry about camera placement.
Stage 3: Rep Counting via Signal Processing
Rather than counting "movements," CadooVision tracks the distance between specific keypoint pairs over time and uses signal processing to identify repetitions:
Exercise-Specific Signals
| Exercise | Signal | Detection | Prominence |
|---|---|---|---|
| Pushups | Shoulder-wrist Y-distance | Peak detection (arms extended = top) | 0.15 |
| Squats | Shoulder-knee Y-distance | Valley detection (knees bent = bottom) | 0.20 |
| Situps | Shoulder-hip distance (both axes) | Dual-axis peak detection | 0.20 |
Situp dual-axis detection: Camera orientation varies — side-facing cameras capture horizontal movement better, front-facing cameras capture vertical. CadooVision runs peak detection on both X and Y axes independently and uses whichever finds more reps.
Signal-Derived Rep Boundaries
Instead of arbitrary fixed windows around each peak, rep boundaries come from the signal itself:
- For peak-detected reps (pushups, situps): find the local minimum (trough) between consecutive peaks
- For valley-detected reps (squats): find the local maximum between consecutive valleys
- Each rep gets a natural start_frame, middle_frame (peak/valley), and end_frame
Signal processing includes: interpolation of missing frames, off-screen coordinate filtering (Y < 0 or Y > 1), and exercise-specific outlier removal.
Stage 4: Per-Rep Form Scoring
For every detected rep, CadooVision samples poses across the full movement window (start_frame to end_frame), calculates joint angles using 3-point geometry, and scores against biomechanical standards.
Visibility gate: Any keypoint with confidence < 0.3 is excluded from angle calculations.
Pushup Form: 3 Components
| Component | Weight | Angle | Ideal | Fail |
|---|---|---|---|---|
| Depth | 40% | Elbow (shoulder-elbow-wrist) at bottom | ≤ 90° | > 140° |
| Lockout | 25% | Elbow at top | ≥ 160° | < 140° |
| Body Line | 35% | Hip (shoulder-hip-ankle) deviation from 180° | ≤ 10° dev | > 25° dev |
Squat Form: 3 Components
| Component | Weight | Angle | Ideal | Fail |
|---|---|---|---|---|
| Depth | 40% | Knee (hip-knee-ankle) at bottom | ≤ 90° | > 140° |
| Lockout | 25% | Knee at top | ≥ 160° | < 140° |
| Torso | 35% | Torso (shoulder-hip-knee) uprightness | ≥ 80° | < 50° |
Situp Form: 3 Components
| Component | Weight | Metric | Ideal | Fail |
|---|---|---|---|---|
| Range of Motion | 45% | Trunk angle change (max - min) | ≥ 40° | < 20° |
| Knee Stability | 30% | Knee angle std deviation from 90° | ≤ 15° | > 30° |
| Control | 25% | Trunk angle smoothness (std dev) | ≤ 30 | > 50 |
Stage 5: Credit Calculation
The form scoring system doubles as anti-cheat:
- Any rep scoring below 60% is discarded (not counted at all)
- attemptedReps = count of reps scoring ≥ 60%
- averageFormScore = mean of attempted reps' scores
- creditedReps = attemptedReps × averageFormScore
Data Response Format
CadooVision returns to the Ledger backend:
cvTotalReps— all reps detected (including bad form)cvAttemptedReps— reps scoring ≥ 60%cvCreditedReps— attemptedReps × averageFormScorecvAverageFormScore— mean form score (0.0-1.0)cvFormScores[]— per-rep scores in order (-1 = not calculable)
The Activity's reps field uses attemptedReps (not totalReps), ensuring only form-verified reps count toward challenge completion and payouts.
On-Device vs Server Pipeline
CadooVision runs in two modes:
- On-device (Flutter/MediaPipe): Real-time practice mode — guides positioning, detects practice reps, gives immediate form feedback before recording starts
- Server-side (Cloud Run): Full analysis of recorded video — production-grade rep counting, form scoring, and credit calculation for challenge payouts
The on-device pipeline ensures users are positioned correctly and can see their form score in real-time. The server pipeline is the source of truth for credited reps and payouts.
Build With CadooVision
Interested in integrating form-verified exercise data into your product? partnerships@cadoo.com







