The missing heritability problem
In my last post I described the transition from candidate gene studies to genome-wide association studies, and argued that the corresponding change in the methods used, focusing on the whole genome rather than on a handful of genes of presumed biological relevance, has transformed our understanding of the genetic basis of complex traits. In this post I discuss the reasons why, despite this success, we still have not accounted for all the genetic influences we expect to find.As I discussed previously, genome-wide association studies (GWAS) have been extremely successful in identifying genetic variants associated with a range of disease outcomes – countless replicable associations have emerged over the last few years. Nevertheless, despite this success, the proportion of variability in specific traits accounted for so far is much less than what twin, family and adoption studies would lead us to expect. The individual variants identified are associated with a very small proportion of variance in the trait of interest (typically 0.1% of less), so that together they still only account for a modest proportion. Twin, family and adoption studies would lead us to expect that 50% or more of the variance in many complex traits is attributable to genetic influences, but so far we have found only a small fraction of that total. This has become known as the “missing heritability” problem. Where are the other genes? Should we be seeking common genetic variants of smaller and smaller effect, in larger and larger studies? Or is there a role for rare variants (i.e., those which occur with a low frequency in a particular population, typically a minor allele frequency less than 5%), which may have a larger effect?
It is clear that some missing heritability will be accounted for by variants that have not yet been identified via GWAS. Most GWAS genotyping chips don’t capture rare variants very well, but evolutionary theory predicts that those mutations that strongly influence complex phenotypes will tend to occur at low frequencies. Under the evolutionary neutral model, variants with these large effects are predicted to be rare. However, under the same model, while rare variants of large effect constitute the majority of causal variants, they still only contribute a small proportion of phenotypic variance in a population, because they are rare. On the other hand, common variants of small effect contribute a greater overall proportion of variance. There are new methods which use a less stringent threshold for including variants identified via GWAS – instead of only including those that reach “genomewide significance” (i.e., a P-value < 10-8 – see my earlier post), those which reach a much more modest level of statistical evidence (e.g., P < 0.5) are included. This much more inclusive approach has shown that when considered together, common genetic variants do in fact seem to account for a substantial proportion of expected heritability.
In other words, complex traits, such as most disease outcomes but also those behavioural traits of interest to psychologists, are highly polygenic – that is, they are influenced by a very large number of common genetic variants of very small effect. This, in turn, explains why we have yet to reliably identify specific genetic variants associated with many psychological and behavioural traits – while the latest GWAS of traits such as height and weight (the GIANT Consortium) includes data on over 250,000 individuals, there exists no such collection of data on most psychological and behavioural traits. This situation is changing though – a recent GWAS of educational attainment combined data on over 125,000 individuals, and three genetic loci were identified with genomewide significance, although these were associated with very small effects (as we would expect). Excitingly, these findings have recently been replicated. Another large GWAS, this time of schizophrenia, identified 108 loci associated with the disease, putting this psychiatric condition on a par with traits such as height and weight in terms of our understanding of the underlying genetics.
The success of the GWAS method is remarkable – the recent schizophrenia GWAS, for example, has provided a number of intriguing new biological targets for further study. It should only be a matter of time (and sample size) before we begin to identify variants associated with personality, cognitive ability and so on. Once we do, we will understand more about the biological basis for these traits, and finally begin to account for the missing heritability.
References:
Munafò, M.R., & Flint J. (2014). Schizophrenia: genesis of a complex disease. Nature, 511, 412-3.
Rietveld, C.A., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-71.
@MarcusMunafo
@BristolTARG