nekrutenkoLab | 2014

Galaxy G

Galaxy is an open, web-based platform for reproducible data intensive biomedical research. It is used by thousands of users worldwide to make sense of large datasets generated by next-generation sequencing technologies. Our main site perfomes over 5,000 analyses daily.

Mutation dynamicsM

Next-Generation sequencing technologies allow us to detect nucleotide changes at very high resolution. Unfortunately the signal is often obscured by noise. We are using cutting edge technologies such as ddPCR to overcome these challenges and understand mutational flicker.


Many of today's publications utilizing advanced sequencing technologies cannot be readily reproduced due to the lack of primary data, software, and analytical details. We are working on raising awarness of these issues and developing software for making modern life-sciences transparent.

  • dan.jpg

    Dan Blankenberg | 2006 - G M

    In addition to hacking Galaxy framework itself, Dan has wrapped many popular tools, developed a set of utilities for fastq manipulation, and wrote Galaxy's own variant caller.
  • daveb.jpg

    Dave Bouvier | 2012 - G R

    Dave works on the Galaxy's AppStore, ToolShed, and oversees tool migration, managing tool dependencies, their tests and is working on implementing a framework for testing tools in real time. He is also wrapping some of the popular tools for NGS analysis.
  • marten.jpg

    Martin Cech | 2013 - G

    Martin uses his extensive web-development expertise to overhaul Galaxy's reporting system, redesign data libraries to be responsive for very large datasets, and is working on implementing Galaxy's capabilities for interacting with Illumina machines such as miSeq.
  • john.jpg

    John Chilton | 2013 - G

    John is making Galaxy scalable so it can effectively utilize very large computational infrastructure and re-engineering the software framework to be able to take on thousands of samples.
  • nate.jpg

    Nate Coraor | 2008 - G

    Nate administers the entire Galaxy ecosystem including all hardware, networking, and software aspects. He is also the keeper of the Galaxy's source code and the master of its release process.
  • dorine.jpg

    Dorine Francheteau | 2012 - 2013 G

    Dorine has been curbing the abuse of Galaxy by rogue users. She is also working on redesigning Galaxy's set of tools for the analysis of Next-Generation sequencing data.
  • jen.jpg

    Jen Jackson | 2009 - G

    Jen is responsible for Galaxy's user support. She administers Galaxy's mailing lists, develops educational materials, and produces screencasts. In addition, she is running Galaxy workshops throughout the country.
  • greg.jpg

    Greg Von Kuster | 2007 - G R

    Greg has developed key components of Galaxy including data libraries and sample tracking system. He is working on developments of our AppStore - the Galaxy ToolShed.
  • anton.jpg

    Anton Nekrutenko | 2003 - G M R

    Anton is the custodian of the lab spending most of his time in the exciting pursuit of funding and unable to answer most of his e-mails. If you really need to get him - e-mail twice.
  • boris.jpg

    Boris Rabolledo Jaramillo | 2011 - G M

    Boris is a Fulbright graduate student working on the analysis of large number of mitochondrial genomes. He is an expert in detection and analysis of low frequency variants in Next-Generation sequencing data.
  • nick.jpg

    Nicholas Stoler | 2013 - G M

    Nick is an NIH predoctoral graduate student in Computation, Bioinformatics, and Statistics. He works on detection of variants from RNA-seq data and on development of reliable approaches for finding indels in non-diploid genomes.

  • Galaxy

    Galaxy TeamGR

    Together with the group of James Taylor at Johns Hopkins we make up the Galaxy Team. The team develops and maintains the Galaxy codebase, public website, data, AMIs, and provides extensive user and developer support.
2014 Blankenberg et al. Dissemination of scientific software with Galaxy ToolShed Genome Biol. G R
2014 Dickins et al. Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach Biotechniques G M R
2014 Blankenberg et al. Wrangling Galaxy's reference data Bioinformatics G R
2013 Sandve et al. Ten simple rules for reproducible computational research PLoS CompBio. R
2013 Altschul et al. The anatomy of successful computational biology software Nat Biotech. G
2013 Bar-Yaacov et al. RNA-DNA differences in human mitochondria restore ancestral form of 16S ribosomal RNA Genome Res. M
2013 Goecks et al. Web-based visual analysis for high-throughput genomics BMC Genomics G
2012 Goecks et al. NGS analyses by visualization with Trackster. Nat Biotechnol. G
2012 Nekrutenko & Taylor Next-generation sequencing data interpretation: enhancing reproducibility and accessibility NRG R
2012 Hillman-Jackson et al. Using Galaxy to perform large-scale interactive data analyses Curr Protoc Bioinf. G
2011 Afgan et al. Harnessing cloud computing with Galaxy Cloud Nat Biotech. G
2011 Goto et al. A massively parallel sequencing approach uncovers ancient origins and high genetic variability of endangered Przewalski's horses GBE p
2011 Blankenberg et al. Making whole genomem ultiple alignments usable for biologists Bioinformatics G
2011 Goto et al. Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study Genome Biol. G M R
2011 Blankenberg et al. Integrating diverse databases into an unified analysis framework: a Galaxy approach Database G
2010 Afgan et al. CloudMan: delivering cloud compute clusters BMC Bioinf. G
2010 Goecks et al. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences Genome Biol. G
2010 Blankenberg et al. Manipulation of FASTQ data with Galaxy. Bioinformatics G
2010 Bock et al. Web-based analysis of (Epi-) genome data using EpiGRAPH and Galaxy Methods Mol Biol. G
2010 Schuster et al. Complete Khoisan and Bantu genomes from southern Africa Nature G
2010 Blankenberg et al. Galaxy: a web-based genome analysis tool for experimentalists Curr Prot Mol Biol. G
2009 Kosakovsky Pond et al. Windshield splatter analysis with the Galaxy metagenomic pipeline Genome Res. G R
2009 Han et al. Transcriptome of embryonic and neonatal mouse cortex by high-throughput RNA sequencing PNAS p
2009 Dickins & Nekrutenko High-resolution mapping of evolutionary trajectories in a phage GBE M
2008 Wadhawan et al. Wheels within wheels: clues to the evolution of the Gnas and Gnal loci MBE M
2007 Taylor et al. Using galaxy to perform large-scale interactive data analyses Curr Prot Bioinf. G
2008 Lazarus et al. Toward the commoditization of translational genomic research: Design and implementation features of the Galaxy genomic workbench Summit on Trans. Bioinf. G
2007 Miller et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser Genome Res. G
2007 Szklarczyk et al. Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function PNAS M
2007 Blankenberg et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly Genome Res. G
2007 Chung et al. A first look at ARFome: dual-coding genes in mammalian genomes. PLoS CompBio. p
2006 Nekrutenko & He Functionality of unspliced XBP1 is required to explain evolution of overlapping reading frames TIG p
2006 Wilson et al. mNSC1 shows no evidence of protein-coding capacity. Gene p
2006 Chung et al. Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network BMC Bioinf. p
2005 Giardine et al. Galaxy: a platform for interactive large-scale genome analysis Genome Res. G
2005 Nekrutenko et al. Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay PLoS Genet. p
2004 Nekrutenko A Identification of novel exons from rat-mouse comparisons JME p
2004 Miller et al. Comparative genomics Annu Rev Genomics Hum Genet p
2004 Gibbs et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution Nature p
2004 Nekrutenko Reconciling the numbers: ESTs versus protein-coding genes MBE p
2003 Li et al. Detection of gene duplications and block duplications in eukaryotic genomes J Struct Funct Genomics p
2003 Nekrutenko et al. ETOPE: Evolutionary test of predicted exons NAR p
2003 Nekrutenko et al. An evolutionary approach reveals a high protein-coding capacity of the human genome TIG p
2003 Thomas et al. Evolutionary dynamics of oncogenes and tumor suppressor genes: higher intensities of purifying selection than other genes MBE p
2003 Nekrutenko & Baker Subgenome-specific markers in allopolyploid cotton Gossypium hirsutum: implications for evolutionary analysis of polyploids Gene p
2002 Kaessmann et al. Signatures of domain shuffling in the human genome Genome Res. p
2002 Nekrutenko et al. The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study Genome Res. p
2001 Nekrutenko & Li Transposable elements are found in a large number of human protein-coding genes TIG p
2001 Li et al. Evolutionary analyses of the human genome Nature p
2000 Gu et al. Densities, length proportions, and other distributional features of repetitive sequences in the human genome estimated from 430 megabases of genomic sequence Gene p
2001 Adkins et al. Bushbaby growth hormone is much more similar to nonprimate growth hormones than to rhesus monkey and human growth hormones MBE p
2000 Nekrutenko & Li Assessment of compositional heterogeneity within and between eukaryotic genomes Genome Res. p
2000 Makova et al. Evolution of microsatellite alleles in four species of mice (genus Apodemus) J Mol Evol p
2000 Nekrutenko et al. Isolation of binary species-specific PCR-based markers and their value for diagnostic applications Gene p
1999 Nekrutenko et al. Representational difference analysis to distinguish cryptic species Mol Ecol. p
1998 Nekrutenko et al. Cytosolic isocitrate dehydrogenase in humans, mice, and voles and phylogenetic analysis of the enzyme family MBE p
G = Galaxy, M = Mutational dynamics, R = Reproducibility, p = Previous work.
Istvan Albert Penn State G
Benjamin Dickins Nottingham Trent M
Geremy Goecks GWU G
Sergei Kosakovsky Pond UCSD M G
Ross Hardison Penn State G
Kateryna Makova Penn State M G
Webb Miller Penn State G R
James Taylor Johns Hopkins G R
Pittsburgh Supercomputing Center Pitt & CMU G R
Texas Advanced Computing Center UTexas G R
G = Galaxy, M = Mutational dynamics, R = Reproducibility.
IBIOS/BMB554 Foundations of data-driven life sciences
BMMB501 Core concepts in biomolecular sciences