theory.toys – Polygenic Score

Genetic Architecture

Heritability (h²): 0.5 Number of causal variants (M):

GWAS Discovery Cohort

Sample size (N): Include GWAS sampling noise (realistic)

Test Population & Portability

Test cohort size: 1000

Population: 🔑 Unlock

Display Options

Show regression line Show y=x identity line

Understanding Polygenic Scores

A polygenic score (PGS) predicts an individual's phenotype (e.g., height, disease risk) from their genotype using effect sizes estimated from genome-wide association studies (GWAS).

How PGS Works

The polygenic score is calculated as: PGS = Σ β̂ᵢ × xᵢ where β̂ᵢ are effect sizes from the GWAS discovery cohort and xᵢ are genotypes (0, 1, or 2 copies of the effect allele).

Prediction Accuracy (R²)

R² = correlation(true phenotype, PGS)² measures variance explained
R² ≤ h² always (can't predict more than the genetic component)
R² → h² as GWAS sample size N → ∞ (better β̂ᵢ estimates)
Larger N reduces sampling noise in effect size estimates
More causal variants (M) requires larger N to estimate all effects accurately

Portability Problem

PGS trained in one population (e.g., European ancestry) often perform poorly in other populations (e.g., African, East Asian) due to:

Different allele frequencies: Effect alleles common in one population may be rare in another
Different LD structure: Linkage disequilibrium patterns vary across populations, so tag SNPs don't track causal variants equally well
Different genetic architecture: Effect sizes may differ due to gene-environment or gene-gene interactions

In this simulation, selecting "Different ancestry" simulates portability loss by using different allele frequencies and adding noise to the effect-phenotype relationship.

Experiment

Increase GWAS sample size to see R² approach h²
Toggle sampling noise off to see the theoretical maximum (perfect effect estimates)
Increase number of causal variants to see how polygenic architecture affects prediction
Switch to different ancestry to see portability loss