scientific-critical-thinking
Evaluate scientific claims and evidence quality. Use for assessing experimental design validity, identifying biases and confounders, applying evidence grading frameworks (GRADE, Cochrane Risk of Bias), or teaching critical analysis. Best for understanding evidence quality, identifying flaws. For formal peer review writing use peer-review.
Author
Category
Other ToolsInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Scientific Critical Thinking
Skill Overview
A professional tool for systematically evaluating the quality of scientific research, used to review experimental design, identify biases, assess levels of evidence, and apply GRADE and Cochrane frameworks to judge the reliability of conclusions.
Applicable Scenarios
Core Functions
Frequently Asked Questions
What is the difference between scientific critical thinking and formal peer review?
Scientific critical thinking is mainly used for personal research evaluation, judging evidence quality, or internal quality checks before peer review. It does not produce formal review reports or feedback for authors. If you need to write an official peer review report, you should use the peer-review skill.
How can you judge whether an observational study supports causal conclusions?
Observational studies alone cannot directly prove causation. You need to assess: temporal order (whether the cause precedes the effect), whether there is a dose-response relationship, whether confounding factors are controlled, whether the biological mechanism is plausible, and whether different studies are consistent. If a paper uses correlational language ("correlated", "associated") to state causal conclusions, that is a red flag.
What are the "downgrade" criteria in GRADE assessment?
The GRADE system starts from study design type (RCTs initially rated as high quality, observational studies initially as low quality), then considers five downgrading factors: risk of bias (randomization, blinding, adequacy of follow-up), inconsistency (conflicting results across studies), indirectness (whether population, intervention, outcomes align with the target), imprecision (wide confidence intervals, small sample sizes), and publication bias. Each serious problem leads to one level of downgrade.
How to identify P-hacking and selective reporting in studies?
Check whether all pre-specified outcomes are reported (compare the study registry protocol with the published paper), whether multiple analyses were performed but only significant results reported, whether hypotheses were modified after seeing the results (HARKing), whether excessive subgroup analyses were conducted without multiple-comparison correction, and whether p-values are suspiciously clustered around 0.05.
Can results from small-sample studies be trusted?
Even if statistically significant, small-sample studies should be treated cautiously: effect sizes may be exaggerated, confidence intervals are often wide, and studies may be severely underpowered. Pay attention to whether an a priori power analysis was conducted, whether the effect size is practically meaningful, whether other studies replicate the finding, and whether results are overinterpreted. A single small-sample study typically provides only low-quality evidence.