The massive Human Genome Project launched nearly 30 years ago brought hope that scientists would finally decipher the root cause of human disease. However, reality has not quite met expectations, as scientists soon discovered the human genome is mindbogglingly complex.
The complete set of human genes contains a phenomenal 3 billion base pairs — the building blocks of DNA — therefore, fully understanding human genetics is incredibly difficult – if not, impossible.
According to geneticist Adam Rutherford, the author of A Brief History of Everyone Who Ever Lived, we are only just scratching the surface.
Nonetheless, genome-wide association studies or GWAS are increasingly used to study the underlying biological nature of diseases and the variation of human traits.
The first GWAS was published in 2002 (1), and almost twenty years on, studies have advanced dramatically – from scanning the genomes of just a few hundred to now tens of thousands of people.
Genome-wide association studies and genetic risk factors
For decades, researchers have used GWAS to search for genetic links to diseases and human traits from obesity to cancer, Alzheimer’s to depression, and even height and educational attainment.
By hunting through the genomes of tens of thousands of people, and in one study even a million, scientists have uncovered common genetic variations — for example, single-letter changes in DNA sequences, known as single nucleotide polymorphisms or SNPs — in groups of individuals with the same disease or trait.
One of the major aims of GWAS is to predict which people or populations are most at risk of developing a certain illness or condition, maybe even before birth – creating a genetic risk profile.
Basically, by tallying up the number of genetic variants or SNPs found in an individual associated with a particular disease, researchers can calculate a polygenic risk score. This is used to predict the likelihood of developing a particular disease or trait.
However, polygenic risk factors are typically weak predictors and their predictive power can vary quite significantly across studies.
For instance, a major focus of GWAS has been educational attainment and socioeconomic status, as well as disorders like obesity and depression which seem to be more prevalent in low-income communities (2).
One such study used polygenic risk scores to examine the link between genetics and neighbourhoods for teen pregnancy and poor educational outcomes (3).
The authors found that children living in worse-off neighbourhoods do not have an increased genetic risk of obesity or mental health disorders. Whereas, future physical and mental health problems could be predicted by both postal code (environment) and genetic code.
Another recent study attempting to determine a genetic basis for risk tolerance or “risky behaviour found that differences between individuals are, in fact, mainly caused by environmental factors that can interact with genetic factors (4).
Caution is needed when interpreting GWAS results
Most researchers would probably agree that GWAS provide important insights and some pretty compelling data. But they are not without limitations. Many additional factors affect how genes are expressed.
How the overall gene network is connected
Many different factors — seemingly hundreds and thousands — can be involved in determining a single human trait. And the influence of each SNP or genetic variation can be tiny.
In their “omnigenic” model, Boyle et al. suggest that gene regulatory networks are “sufficiently interconnected” (5). The 2017 study raised doubts as to whether funders should continue to pour money into GWAS.
One of the study’s authors, Jonathan Pritchard, a geneticist from Stanford University argues that common illnesses may, in fact, be linked to hundreds of thousands of DNA variants.
The authors also suggest that many GWAS “hits” may not be relevant to the disease being targeted and therefore, may not be suitable as potential therapeutic targets. The hits are instead what they refer to as ‘peripheral’ variants that influence other genes that are directly linked to the illness. Therefore, most heritability can be explained by effects on genes outside the “core pathways”.
The authors suggest that without understanding how all gene networks are connected, it will remain difficult to deduct any meaningful information from GWAS. And many other geneticists seem to agree.
Rare genetic variants
Some of the variants associated with diseases, and indeed, human traits, are so rare that they cannot be detected by GWAS (6,7).
As an example, previous attempts to uncover the genes responsible for height puzzled scientists. The hundreds of common gene variants originally identified as being linked to height only seemed to have a minuscule effect on the actual trait – they only explained a mere 16% of differences in height across a population (8). This has since been referred to as “missing heritability”.
Now, geneticists have discovered that most of this missing heritability for height is found in rare gene variants (9). The paper published on 23 April in Nature helps confirm that our existing understanding of genetics is, indeed, not broken.
But the findings again highlight difficulties in attempting to pinpoint the exact source of human inheritance – be that a common trait or a heritable disease.
Furthermore, just because you have a certain gene variant doesn’t necessarily mean a particular trait connected to that variant will occur. The gene is simply correlated to that trait and under the right conditions, could arise (10,11). Enter left stage: epigenetics.
Epigenetics – the link between nature and nurture
Epigenetics can influence a wide variety of illnesses and behaviours, including cancers and neurodegenerative disorders. The importance of epigenetics has led to the growth of epigenome-wide association studies (EWAS), alongside GWAS.
Epigenetic alterations are combinations of DNA methylation — which “tag” regions of DNA — histone modifications, chromatin remodelling, and microRNA. All of these processes can promote or inhibit gene expression without making structural changes to the DNA itself – and are often triggered by the environment.
To this end, one study published on 16 February in the American Journal of Anthropology, showed that socioeconomic status is associated with DNA methylation at a large number of sites across the genome – more than 2,500 sites, across more than 1,500 genes — which also known to play a crucial role in human health (12).
Recent studies also suggest epigenetics may be an underlying mechanism used by the body to remember experiences of certain life events, such as poverty. And these “memories” can be passed on to future generations.
GWAS can be time-consuming and expensive
Due to the high costs and time involved in sequencing the entire genome of millions of people, as well as the sheer number of nucleotides that make up the human genome — around six billion — scientists typically only scan for a small subset of SNPs.
This means GWAS currently focus on a mere snapshot of the genome. But as databases continue to grow, so too does our knowledge.
Perhaps, researchers will one day be able to sequence the full genome of millions of people to uncover the roots of common diseases. And hopefully, develop novel therapies. But this will likely take a lot more time and money.
What can genome-wide association studies really tell us?
The influence of genes and the environment on human traits remains controversial. Some scientists argue that too much “hard evidence” is derived from genetic studies. This is further complicated by numerous external factors as well as a number of other limitations.
After nearly two decades of research, scientists still do not fully understand the complexities of the human genome and the pathways involved in human disease. One thing is for sure, the complete set of human genes and their associated biological functions are highly complex.
Moreover, many of the variations that contribute to disease and differences in human traits and behaviour are so subtle that it’s akin to looking for a needle in a haystack.
So, are GWAS are useful? Despite the known limitations and uncertainties, yes. Anything that contributes to increasing our existing knowledge is always useful. Furthermore, each new GWAS hit contributes to developing a more complete picture of the highly convoluted human gene network.
As the number of sequenced genomes continues to grow — thanks to initiatives like the UK Biobank which currently contains data on more than half a million individuals — so too does our potential to derive reasonable correlations.
But like any new scientific finding, genetic predictions have a high risk of being misinterpreted and should, therefore, be taken with a grain of salt.
References
(1) Ozaki, K. et al. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nature Genetics (2002). DOI: 10.1038/ng1047
(2) Martin, N. Getting to the genetic and environmental roots of educational inequality. Science of Learning (2018). DOI: 10.1038/s41539-018-0021-1
(3) Belsky, D.W. et al. Genetics and the geography of health, behaviour and attainment. Nature Human Behavior (2019). DOI: 10.1038/s41562-019-0562-1
(4) Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nature (2019). DOI: 10.1038/s41588-018-0309-3
(5) Boyle, E.A., Li, Y.I., and Pritchard, J.K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell (2017). DOI: 10.1016/j.cell.2017.05.038
(6) Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science (2012). DOI: 10.1126/science.1219240
(7) Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science (2012). DOI: 10.1126/science.1217876
(8) Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics (2014). DOI: 10.1038/ng.3097
(9) Wainschtein, P. et al. Recovery of trait heritability from whole genome sequence data. Preprint at bioRxiv (2019). DOI: 10.1101/588020
(10) Polderman, T.J.C. et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics (2015). DOI: 10.1038/ng.3285
(11) Lee, J.J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics (2018). DOI: 10.1038/s41588-018-0147-3
(12) McDade, T.W. et al. Genome‐wide analysis of DNA methylation in relation to socioeconomic status during development and early adulthood. American Journal of Physical Anthropology (2019). DOI: 10.1002/ajpa.23800