Experimentation Constraints

Seed Input

Rules for running funnel experiments — max 3 variants, 3-4 week cycles, revenue per ad click as north star metric, with supporting metrics and winner criteria

raw_input/experimentation_constraints.md

Experimentation Constraints

Rules and principles governing how Twofold Health designs and tests funnel variations. This file is read by Agent 5 when generating variation specs with content variations for A/B testing.

Testing Principles

  • Max 3 variants tested simultaneously — keeps results interpretable and traffic allocation practical
  • No staggered starts — all variants in a test launch and end together
  • 3-4 week test duration per test for statistical significance
  • Implementation time is not a constraint — assume the team can build any variation quickly; optimize for learning, not ease of implementation

Variation Design Principles

  • Structure and content are not independent — when comparing variations, either test structure with fixed content, OR test content within a fixed structure. Do not test both axes simultaneously unless one is isolated
  • No baseline requirement — the current homepage does not need to be included as a control. All variations can be new approaches
  • Content variations enable A/B testing — each variation spec should include 2-3 content versions per key touchpoint so the team can test copy independently after structure is validated

Metrics Framework

Primary Metric

  • Revenue per ad click — the single north star metric. Captures the full funnel from ad spend to revenue

Supporting Metrics (for diagnosis, not decisioning)

  • Funnel completion rate (ad click → signup)
  • User activation rate (signup → first recording)
  • Subscription conversion rate (trial → paid)
  • Time-to-signup
  • Drop-off points within each variation

Winner Criteria

  • Statistical significance required — do not call a winner without it
  • Primary metric decides the winner — revenue per ad click
  • Tiebreaker — if primary metric is within noise, use funnel completion rate, then activation rate
  • Sample size — ensure each variant gets enough traffic for significance. If traffic is limited, reduce variant count rather than running underpowered tests