Sample Size Calculation

Introduction

Fundamentally, research seeks to uncover new knowledge.

Sample size calculation determines the optimal number of participants required for a study.

  • This balance ensures sufficient statistical power to detect meaningful effects while minimizing unnecessary resource expenditure and ethical implications, particularly in areas like drug trials.
  • Moreover, in studies with large sample, even small effect becomes significant, whereas in small sample, a large effect may not be significant.

Numerous statistical software packages are available (e.g., SPSS, R program, PS: Power and Sample Size Calculation), and their use should be clearly specified within research studies.



Key Elements of Sample Size Calculation

Type of research question and associated designs

  • Therapeutic
    • Randomized controlled trials (RCTs) for binomial (e.g., success/failure) or continuous outcomes (e.g., blood pressure).
  • Aetiology
    • Cohort or case-control studies, with considerations for confounders and effect size.
  • Diagnostic
    • Cross-sectional studies, focusing on sensitivity and specificity.
  • Descriptive
    • Surveys and observational studies, often requiring larger sample sizes due to the nature of the data.

Study variables

  • Exposures
    • Independent variables influencing the outcome.
  • Outcomes
    • The rate of an event occurring in the study.

Standard items

  • Power
    • Probability of detecting a true effect (typically 80%).
  • Alpha level (α)
    • Type I error rate (usually 0.05), representing the probability of rejecting a null hypothesis when it's true.
  • Effect size
    • Magnitude of the difference or association to be detected.



Randomized Controlled Trials

For binomial outcomes, the chi-square test is commonly used.

  • The event rates, typically denoted as p0 (control group) and p1 (intervention group), are often derived from previous research or clinical expertise.
  • The alternative hypothesis may be expressed in terms of either two proportions or relative risk.
  • The ratio of control subjects to intervention subjects is represented by m.
  • Detecting small differences or ruling out common occurrences in RCTs often necessitates larger sample sizes.

Chi Square Test for RCT Sample Size Estimation

For continuous outcomes, the t-test is commonly used to compare means between two groups.

  • The effect size (δ) represents the difference in means between these groups.
  • The standard deviation (σ), often derived from previous studies, measures the variability within the groups.
  • Typically, continuous outcomes require smaller sample sizes than binomial outcomes to detect similar effect sizes, assuming equal variances.
  • Importantly, the t-test assumes that the data are normally distributed within each group and is generally suitable for cases with relatively narrow standard deviations.
T-Test for RCT Sample Size Estimation

To account for anticipated drop-outs or loss to follow-up, researchers should typically inflate the sample size by 10-20%.

  • However, to ensure adequate power and precision, the exact adjustment should be tailored to the specific study context, considering the expected dropout rate.



Cohort Design

A cohort study is indeed designed to start with a group of people who do not have the disease of interest and follow them over time to investigate if exposure to certain factors increases the risk of developing the disease.

Sample size calculations for binomial outcomes in case-control studies follow similar principles as those used in randomized controlled trials (i.e. chi-square test), but with the additional challenge of accounting for confounders.

  • Studies with a limited number of cases typically require at least 10 events for each confounder.
  • For categorical confounders, the number of categories minus one is often used as a guideline.
Consequently, multiple confounders can substantially inflate the necessary sample size.

  • To address this challenge, researchers can consider strategies such as composite outcomes, adopting a case-control design (if feasible), or carefully considering recruitment rates based on disease prevalence and study duration.

Similarly, additional participants should be recruited to account for anticipated drop-outs or loss to follow-up.

NOTE: When calculating sample sizes for different aspects of a study (e.g., primary outcome, secondary outcomes, subgroup analysis, or accounting for confounders), it is essential to choose the largest calculated sample size. This ensures adequate power for all planned analyses.



Case Control

Case-control studies retrospectively compare exposure histories between individuals with a disease (cases) and those without (controls).

  • The alternative hypothesis can be expressed as either two proportions or an odds ratio (ψ).
  • The two proportions compared are the exposure rates among cases and controls.
Case Control Sample Size Estimation

Although case-control studies are retrospective, careful consideration of confounders is essential, as in cohort studies.

For rare diseases (outcomes), increasing the control-to-case ratio (m) can enhance study power, but diminishing returns occur beyond a ratio of four.

Similarly, additional participants should be recruited to account for anticipated drop-outs or loss to follow-up.



Descriptive Study

Prevalence and incidence are measures of disease frequency within a population.

  • Prevalence represents the proportion of individuals with a disease at a specific point in time, encompassing both new and existing cases.
  • Incidence, on the other hand, measures the rate of new cases occurring within a defined population and time period.

Epi Info can be utilized for conducting descriptive studies.

  • Within the StatCalc module, select the "Population Survey" option.
  • The expected frequency (%) of the outcome variable, typically sourced from existing literature, is a crucial input.
  • For simple random sampling, design effect and cluster size can be set to 1.
  • The acceptable margin of error, typically set at 5%, determines the study's precision. A narrower margin of error necessitates a larger sample size.

StatCalc in Epi Info

NOTE: Accurate estimation of rare occurrences generally demanding significantly larger samples.



Summary

For complex studies (e.g. surveys or test research), consulting a statistician is strongly recommended.

  • Their expertise ensures that appropriate statistical methods are employed, enhancing the study's credibility and rigor.

Sample size calculation is also a critical element in critically appraising a research paper.

  • An insufficient sample size can render even a clinically significant result meaningless.



External Links

Comments