Population genetics:

  • smcsmc [c++, python] Inference of the ancestral recombination graph and demographic events from whole genome sequence data.
  • stdpopsim [python] A community-maintained standard library of population genetic models.
    • Install from Pypi.
  • aavcf [c++] Convert allelic encodings from major/minor to ancestral/derived in VCF files.
    • Install from source.
  • ancient_african_admixture [python, R, Snakemake] Processing pipelines and analysis for all of results shown in Cole et al 2021, currently under review.
    • This code may be useful for those wanting a real world example of using SMCSMC and is documented, but may change before the publication of my paper.
  • IMa3_Input [python] A Snakemake pipeline for creating IMa3 input from a VCF file via the Popgen Pipeline Platform.
    • Install from source.
  • genomic_interval_pipeline [Rust] Creates an HDF5 database for Keras out of genomic regions and their annotations. A (much faster) drop in replacement for the Basset pre-processing pipeline.
  • deepstab [python] All in one processing tool and deep learning model to predict RNA stability from sequence.
    • The entire pipeline is usable but almost completely undocumented.
  • lanceotron [python] A deep learning model for peak calling of next-generation sequencing data such as ATAC-seq, ChIP-seq, etc.
    • Install from pip


  • blda [python, R] Weighted latent dirichlet allocation for bulk ATAC-seq.
    • Install from Github.
  • scJaccard [Rust] Pure rust computation of the continuous Jaccard index at a single cell level
    • Install from source.
  • wgba [python] Which genome build again? Infers genome build from interval files like bed and bigWig.
    • Install from pip


  • ggsource [R] Automagically remembers how a plot was generated by editing PDF metadata with ExifTool and providing a drop-in replacement for ggplot functions.
  • cookbook [python] A cookbook built in flask and made static with Frozen-Flask. Automatically deployed to Github Pages through CircleCI.

Text processing and analysis:

  • goldi [c++, R] Identification of multi-word terms (such as in Gene Ontology) in free form text with application to the biomedical literature.
    • Install from CRAN.
  • yamldoc [python] A dependency free documentation generator for YAML formatted data.
    • Install from Pypi.
  • allerID [R, shiny] Web application to identify allergens in lists of ingredients using goldi.

Just for fun:

  • rpg_epic_converter [python] A small discord bot implementing a graph-based conversion strategy between resources in a role playing game, EPIC RPG.