Experimentation Constraints
Seed InputRules for running funnel experiments — max 3 variants, 3-4 week cycles, revenue per ad click as north star metric, with supporting metrics and winner criteria
raw_input/experimentation_constraints.md
Experimentation Constraints
Rules and principles governing how Twofold Health designs and tests funnel variations. This file is read by Agent 5 when generating variation specs with content variations for A/B testing.
Testing Principles
- Max 3 variants tested simultaneously — keeps results interpretable and traffic allocation practical
- No staggered starts — all variants in a test launch and end together
- 3-4 week test duration per test for statistical significance
- Implementation time is not a constraint — assume the team can build any variation quickly; optimize for learning, not ease of implementation
Variation Design Principles
- Structure and content are not independent — when comparing variations, either test structure with fixed content, OR test content within a fixed structure. Do not test both axes simultaneously unless one is isolated
- No baseline requirement — the current homepage does not need to be included as a control. All variations can be new approaches
- Content variations enable A/B testing — each variation spec should include 2-3 content versions per key touchpoint so the team can test copy independently after structure is validated
Metrics Framework
Primary Metric
- Revenue per ad click — the single north star metric. Captures the full funnel from ad spend to revenue
Supporting Metrics (for diagnosis, not decisioning)
- Funnel completion rate (ad click → signup)
- User activation rate (signup → first recording)
- Subscription conversion rate (trial → paid)
- Time-to-signup
- Drop-off points within each variation
Winner Criteria
- Statistical significance required — do not call a winner without it
- Primary metric decides the winner — revenue per ad click
- Tiebreaker — if primary metric is within noise, use funnel completion rate, then activation rate
- Sample size — ensure each variant gets enough traffic for significance. If traffic is limited, reduce variant count rather than running underpowered tests