About

This undergraduate course is designed to cover a broad domain of bioinformatics as it is applied to the study of infectious diseases. The course is structured by different topics that are anchored by recent, high-impact papers in the scientific literature. For each paper, we will cover the overall theme, the context of the specific study, the underlying model and algorithm, and then run a simplified version of the analysis in the laboratory section.

Learning objectives

Outline

  1. Databases
    • NCBI GenBank
    • scoring matrices
    • BLAST queries
  2. Alignment
    • Smith-Waterman and related algorithms
    • homology search and domain prediction
  3. Genetic diversity
    • measures of diversity (entropy)
    • genetic distances
    • virus nomenclature
    • molecular epidemiology (genetic clustering)
  4. Building trees
    • Distance-based methods (neighbor-joining)
    • Rooting (outgroup, midpoint)
    • 16S rRNA
  5. Measuring rates of evolution
    • Markov chain models (Jukes-Cantor)
    • Rates of evolution
    • Probability and maximum likelihood
    • Detecting selection
  6. Molecular clocks
    • Rescaling trees
    • Root-to-tip methods
    • Dating zoonoses
  7. Modeling epidemics
    • Compartmental models
    • Kingman’s coalescent
    • Bayesian inference
    • Demographic growth models (skylines)
  8. Next-generation sequencing
    • NGS data formats
    • Short-read mapping
    • RNA-Seq analysis
  9. Genomics
    • de novo assembly of NGS data
    • metagenomics
    • novel pathogens

GitHub repository

All code used to implement this website can be obtained on GitHub.

License

These course materials, with the exception of the data sets associated with publications from other parties, are released into the public domain under the Creative Commons Attribution-ShareAlike 4.0 license, under which you are free to copy, modify and redistribute this content, even for commercial purposes, so long as that derived content is distributed under this same license.