ab-test-analysis

Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.

Author

Category

PM

Install

Hot:7

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=phuryn-pm-data-analytics-skills-ab-test-analysis&locale=en&source=copy
name:ab-test-analysisdescription:"Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant."

A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

Context

You are analyzing A/B test results for $ARGUMENTS.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

Instructions

  • Understand the experiment:

  • - What was the hypothesis?
    - What was changed (the variant)?
    - What is the primary metric? Any guardrail metrics?
    - How long did the test run?
    - What is the traffic split?

  • Validate the test setup:

  • - Sample size: Is the sample large enough for the expected effect size?
    - Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
    - Flag if the test is underpowered (<80% power)
    - Duration: Did the test run for at least 1-2 full business cycles?
    - Randomization: Any evidence of sample ratio mismatch (SRM)?
    - Novelty/primacy effects: Was there enough time to wash out initial behavior changes?

  • Calculate statistical significance:

  • - Conversion rate for control and variant
    - Relative lift: (variant - control) / control × 100
    - p-value: Using a two-tailed z-test or chi-squared test
    - Confidence interval: 95% CI for the difference
    - Statistical significance: Is p < 0.05?
    - Practical significance: Is the lift meaningful for the business?

    If the user provides raw data, generate and run a Python script to calculate these.

  • Check guardrail metrics:

  • - Did any guardrail metrics (revenue, engagement, page load time) degrade?
    - A winning primary metric with degraded guardrails may not be a true win

  • Interpret results:
  • OutcomeRecommendation
    Significant positive lift, no guardrail issuesShip it — roll out to 100%
    Significant positive lift, guardrail concernsInvestigate — understand trade-offs before shipping
    Not significant, positive trendExtend the test — need more data or larger effect
    Not significant, flatStop the test — no meaningful difference detected
    Significant negative liftDon't ship — revert to control, analyze why

  • Provide the analysis summary:

  • ## A/B Test Results: [Test Name]
    
       **Hypothesis**: [What we expected]
       **Duration**: [X days] | **Sample**: [N control / M variant]
    
       | Metric | Control | Variant | Lift | p-value | Significant? |
       |---|---|---|---|---|---|
       | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
       | [Guardrail] | ... | ... | ... | ... | ... |
    
       **Recommendation**: [Ship / Extend / Stop / Investigate]
       **Reasoning**: [Why]
       **Next steps**: [What to do]

    Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.


    Further Reading

  • A/B Testing 101 + Examples

  • Testing Product Ideas: The Ultimate Validation Experiments Library

  • Are You Tracking the Right Metrics?