ab-test-setup
Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Author
Category
Business AnalysisInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
A/B Test Setup - A Rigorous Guide to A/B Test Design and Execution
Skill Overview
A/B Test Setup is a structured set of guidelines for setting up A/B tests that, through mandatory hypothesis locking, metric definition, and execution readiness checks, ensures each experiment has statistical rigor and operational readiness from the design phase, preventing invalid experiments caused by design flaws.
Applicable Scenarios
Core Features
Frequently Asked Questions
How much traffic is needed to run an A/B test?
Required traffic depends on the baseline conversion rate and the minimum effect you want to detect. For example, if the baseline conversion rate is 5% and you want to detect a 10% relative increase (i.e., an absolute increase of 0.5%), at a 95% confidence level and 80% statistical power, each variant needs about 30,000 samples. If daily traffic is limited, you may need to extend the test duration or reassess the reasonableness of the MDE.
If I see clearly positive results during the experiment, can I stop early?
No. Stopping early undermines the statistical validity of the experiment; even if the results look good, they may be false positives. You must adhere to the sample size determined during the design phase unless the experiment is terminated early due to severe guardrail metric failures. This skill explicitly reminds teams not to stop early just because it "looks good."
What are guardrail metrics and why are they important?
Guardrail metrics are key indicators that must not significantly decline during an experiment; they prevent systemic harm from "local optimizations." For example, a redesign that increases click-through rate might reduce user retention or increase load times. If guardrail metrics show significant negative changes, even if the primary metric wins, that variant should not be promoted. This skill requires guardrail metrics to be clearly defined during the experiment design phase and gives them veto power during result analysis.