Understanding Polygenic Scores
A polygenic score (PGS) predicts an individual's phenotype (e.g., height, disease risk)
from their genotype using effect sizes estimated from genome-wide association studies (GWAS).
How PGS Works
The polygenic score is calculated as: PGS = Σ β̂ᵢ × xᵢ
where β̂ᵢ are effect sizes from the GWAS discovery cohort and xᵢ are genotypes (0, 1, or 2 copies of the effect allele).
Prediction Accuracy (R²)
- R² = correlation(true phenotype, PGS)² measures variance explained
- R² ≤ h² always (can't predict more than the genetic component)
- R² → h² as GWAS sample size N → ∞ (better β̂ᵢ estimates)
- Larger N reduces sampling noise in effect size estimates
- More causal variants (M) requires larger N to estimate all effects accurately
Portability Problem
PGS trained in one population (e.g., European ancestry) often perform poorly in other populations
(e.g., African, East Asian) due to:
- Different allele frequencies: Effect alleles common in one population may be rare in another
- Different LD structure: Linkage disequilibrium patterns vary across populations, so tag SNPs don't track causal variants equally well
- Different genetic architecture: Effect sizes may differ due to gene-environment or gene-gene interactions
In this simulation, selecting "Different ancestry" simulates portability loss by using different allele frequencies
and adding noise to the effect-phenotype relationship.
Experiment
- Increase GWAS sample size to see R² approach h²
- Toggle sampling noise off to see the theoretical maximum (perfect effect estimates)
- Increase number of causal variants to see how polygenic architecture affects prediction
- Switch to different ancestry to see portability loss