Deleteriousness versus pathogenicity

What's the difference between deleteriousness and pathogenicity? Box 1 from the article "Guidelines for investigating causality of sequence variants in human disease" has the following definitions:

The popular Combined Annotation Dependent Depletion (CADD) tool predicts deleteriousness across all bases in the human genome. It works by first identifying recently fixed alleles; it's easy enough to identify fixed alleles due to various projects such as the now retired HapMap project and the 1000 Genomes Project, which catalogue allele frequencies across various populations. To identify recently fixed alleles the CADD team looked for alleles that differed between the human genome and the inferred human-chimpanzee ancestral genome; chimpanzees are the most genetically similar extant species to us. The assumption is that these recently fixed alleles had a selective advantage and/or deleterious alleles were removed by negative selection or purifying selection thus becoming fixed.

To build a classifier that can identify sites in the genome that are under purifying selection, the CADD team simulated the same number of alleles and contrasted them against the set of recently fixed alleles. Specifically they built a support vector machine (SVM) classifier using features from the two sets of alleles; these features are 63 distinct annotations on the alleles. Basically, an allele that is under purifying selection would have a annotation profile that is distinct from the simulated alleles. The SVM classifier was then applied to the entire genome and all possible single nucleotide variations and small insertion/deletions. A raw score is given to each base in the reference human genome; quoting the CADD info page:

"Raw" CADD scores come straight from the model, and are interpretable as the extent to which the annotation profile for a given variant suggests that that variant is likely to be "observed" (negative values) vs "simulated" (positive values).

Further reading

Ingo Helbig has a very nice write up of CADD on the Beyond the Ion Channel blog. I also have a small summary on SVMs in my machine learning GitHub repository.

