About

This undergraduate course is designed to cover a broad domain of bioinformatics as it is applied to the study of infectious diseases. The course is structured by different topics that are anchored by recent, high-impact papers in the scientific literature. For each paper, we will cover the overall theme, the context of the specific study, the underlying model and algorithm, and then run a simplified version of the analysis in the laboratory section.

Learning objectives

To develop a fundamental understanding of the concepts underlying the analysis of genetic sequence variation from infectious disease outbreaks (genetic distances, maximum likelihood).
To gain basic command-line literacy.
To become acquainted with popular software tools used for the analysis of infectious disease sequence data.

Outline

Databases
- NCBI GenBank
- scoring matrices
- BLAST queries
Alignment
- Smith-Waterman and related algorithms
- homology search and domain prediction
Genetic diversity
- measures of diversity (entropy)
- genetic distances
- virus nomenclature
- molecular epidemiology (genetic clustering)
Building trees
- Distance-based methods (neighbor-joining)
- Rooting (outgroup, midpoint)
- 16S rRNA
Measuring rates of evolution
- Markov chain models (Jukes-Cantor)
- Rates of evolution
- Probability and maximum likelihood
- Detecting selection
Molecular clocks
- Rescaling trees
- Root-to-tip methods
- Dating zoonoses
Modeling epidemics
- Compartmental models
- Kingman’s coalescent
- Bayesian inference
- Demographic growth models (skylines)
Next-generation sequencing
- NGS data formats
- Short-read mapping
- RNA-Seq analysis
Genomics
- de novo assembly of NGS data
- metagenomics
- novel pathogens

GitHub repository

All code used to implement this website can be obtained on GitHub.

License

These course materials, with the exception of the data sets associated with publications from other parties, are released into the public domain under the Creative Commons Attribution-ShareAlike 4.0 license, under which you are free to copy, modify and redistribute this content, even for commercial purposes, so long as that derived content is distributed under this same license.

Bioinformatics of Infectious Diseases

About

Learning objectives

Outline

GitHub repository

License