About
This undergraduate course is designed to cover a broad domain of bioinformatics as it is applied to the study of infectious diseases. The course is structured by different topics that are anchored by recent, high-impact papers in the scientific literature. For each paper, we will cover the overall theme, the context of the specific study, the underlying model and algorithm, and then run a simplified version of the analysis in the laboratory section.
Learning objectives
- To develop a fundamental understanding of the concepts underlying the analysis of genetic sequence variation from infectious disease outbreaks (genetic distances, maximum likelihood).
- To gain basic command-line literacy.
- To become acquainted with popular software tools used for the analysis of infectious disease sequence data.
Outline
- Databases
- NCBI GenBank
- scoring matrices
- BLAST queries
- Alignment
- Smith-Waterman and related algorithms
- homology search and domain prediction
- Genetic diversity
- measures of diversity (entropy)
- genetic distances
- virus nomenclature
- molecular epidemiology (genetic clustering)
- Building trees
- Distance-based methods (neighbor-joining)
- Rooting (outgroup, midpoint)
- 16S rRNA
- Measuring rates of evolution
- Markov chain models (Jukes-Cantor)
- Rates of evolution
- Probability and maximum likelihood
- Detecting selection
- Molecular clocks
- Rescaling trees
- Root-to-tip methods
- Dating zoonoses
- Modeling epidemics
- Compartmental models
- Kingman’s coalescent
- Bayesian inference
- Demographic growth models (skylines)
- Next-generation sequencing
- NGS data formats
- Short-read mapping
- RNA-Seq analysis
- Genomics
- de novo assembly of NGS data
- metagenomics
- novel pathogens
GitHub repository
All code used to implement this website can be obtained on GitHub.
License
These course materials, with the exception of the data sets associated with publications from other parties, are released into the public domain under the Creative Commons Attribution-ShareAlike 4.0 license, under which you are free to copy, modify and redistribute this content, even for commercial purposes, so long as that derived content is distributed under this same license.