Xiaoqiu Huang
Xiaoqiu (See-ow-chew) Huang, Professor
Department of Computer Science, Iowa State University
226 Atanasoff Hall, Ames, IA 50011
EM: xqhuang AT iastate DOT edu Phone: 515-294-2432 Fax: 515-294-0258
Google Scholar Citations:
Profile
Teaching
-
Introduction to Data Structures: Com S 228
-
Computational Techniques for Genome Assembly and Analysis:
ComS/BCB 551
Research
Xiaoqiu Huang, with his collaborators, has developed algorithms and software in four areas of
bioinformatics:
sequence alignment, sequence assembly, gene finding and tree construction.
He is also interested in understanding the prevalence and implications of
gene flow in the evolution of asexual filamentous fungal pathogens (see blow).
In sequence alignment, a linear-space algorithm was developed for computing
k best local alignments between two sequences (Huang and Miller 1991),
significantly reducing the quadratic space complexity of the
Smith-Waterman algorithm.
According to
Michael Waterman, the Smith-Waterman local alignment algorithm was developed to
address the weakness of the
Needleman-Wunsch global alignment algorithm in comparing homologous
genomic segments with conserved regions (such as exons) separated by divergent regions (such as introns).
Although the Huang-Miller algorithm could be used to find many pairs of conserved
regions (in separate, unordered alignments) between two genomic segments,
a linear-space global alignment algorithm was designed to compute
one alignment of genomic segments with all pairs of conserved
regions in similarity blocks and all pairs of divergent regions in difference blocks
(in the natural order), precisely showing the boundary between
similarity and difference blocks (Huang and Chao 2003).
An appropriate substitution matrix (e.g. at the maximum-likelihood evolutionary distance)
was selected in alignment of protein sequences to suit their conservation level (Huang 2008).
Multiple sets of alignment parameters with different levels of stringency
were dynamically selected to suit various kinds of sequence regions
with different levels of conservation (Huang and Brutlag 2007).
In sequence assembly, Huang was encouraged by
Ross Overbeek in 1990 to develop
a contig assembly program (CAP).
As one of the first sequence assembly programs based on the overlap-layout-alignment approach,
CAP was special in using a maximum similarity algorithm,
instead of a minimum distance algorithm, for computing overlaps between sequences,
in generating contigs in a decreasing order of overlap quality without the need
to determine the orientation of sequences before the layout stage, and in using
a special alignment algorithm for generating a multiple alignment of sequences in a contig (Huang 1992).
Paired-end reads and base quality values were used in the well-known CAP3 program (Huang and Madan 1999).
A whole-genome assembly program named PCAP was developed to handle million of
Sanger sequences in assembly of large animal genomes (Huang et al. 2003).
PCAP was used in several large whole-genome assembly projects
such as chimpanzee (Nature 437: 69-87), chicken (Nature 432: 695-716),
and platypus (Nature 453: 175-183).
The de novo feature of the PCAP chimpanzee genome assembly
was useful in a study
(BMC Genomics 7: 15).
A unique feature of PCAP was the introduction of a superword array in finding
overlaps between sequences (Huang et al. 2006).
Unlike suffix arrays,
superword arrays allow base mismatches (Huang 2017).
In gene finding, a linear-space DNA-protein alignment algorithm
was developed to find exon-intron structures in a genomic sequence (Huang and Zhang 1996).
This algorithm, along with a linear-space DNA-cDNA alignment algorithm,
was combined with fast sequence comparison methods into a sequence
annotation package named AAT (Huang et al. 1997).
AAT was used to find gene structures in the first plant genome project (Nature 402: 761-768).
Genes and other functional elements are often located in C+G rich regions of
the genome, which can be found with a local content algorithm (Huang 1994).
A useful feature of this algorithm is that the lengths of regions found
are solely determined by their contents, not by any user-provided window size.
In tree construction,
a new formulation of phylogenetic reconstruction based on the maximum sequence similarity
was described (Huang and Vingron 2009).
A hierarchical phylogeny construction algorithm for managing a large number of
taxa was proposed (Das and Huang 2019).
Horizontal theory of molecular evolution
After examining highly variable genomic regions in several asexual
fungal pathogens such as Verticillium dahliae (Huang 2014),
Fusarium virguliforme (Huang et al. 2016), and
Fusarium oxysporum (Huang 2019), Huang (in 2019) proposed
the horizontal theory of molecular evolution, suggesting that
in a major eukaryotic lineage such as fungi,
most of the variation within and between recently-emerged asexual
pathogenic species are due to horizontal transfer of genetic variants
that are beneficial under certain conditions,
maintain genetic structure, or generate genetic variation (structural variants
as well as nucleotide substitutions).
Horizontal transfer refers to the movement of genetic information between genomes
other than by the vertical transmission of DNA from parent to offspring.
This occurs on a regular and frequent basis (ongoing) when pathogens carry supernumerary
chromosomes flanked by complementary subtelomeres that are host-specific
and contain helicase-like genes. Supernumerary chromosomes are enriched in
transposons and pathogenicity genes
associated with heterokaryotic incompatibility (het) genes.
Core chromosomes are flanked by the same type of complementary subtelomeres
as supernumerary chromosomes, allowing the exchange of genes between
core and supernumerary chromosomes by homologous recombination over
their highy similar subtelomeres.
Pathogenicity genes in core chromosomes are maintained through
their association with het genes, because a loss of
a region containing pathogenicity and het genes
can result in incompatible het interactions triggering
local and surrounding cell death.
Horizontal theory concerning structural variants complements
sexual reproduction
and
the neutral theory of molecular evolution concerning nucleotide substitutions
by providing alternative solutions to
Muller's ratchet
and
Haldane's dilemma.
Selected Publications
Huang X, Miller W. (1991)
A Time-Efficient, Linear-Space Local Similarity Algorithm.
Advances in Applied Mathematics 12:337-357.
DOI
Huang X. (1992)
A Contig Assembly Program Based on Sensitive Detection of Fragment Overlaps.
Genomics 14: 18-25.
DOI
Huang X. (1994)
An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a Content Requirement.
Bioinformatics 10: 219-225.
DOI
Huang X, Zhang J. (1996)
Methods for Comparing a DNA Sequence with a Protein Sequence.
Bioinformatics 12: 497-506.
DOI
Huang X, Adams MD, Zhou H, Kerlavage AR. (1997)
A Tool for Analyzing and Annotating Genomic Sequences.
Genomics 46: 37-45.
DOI
Huang X, Madan A. (1999)
CAP3: A DNA Sequence Assembly Program.
Genome Research 9: 868-877.
DOI
Huang X, Chao K-M. (2003)
A Generalized Global Alignment Algorithm.
Bioinformatics 19: 228-233.
DOI
Huang X, Wang J, Aluru S, Yang S-P, Hillier L. (2003)
PCAP: A Whole-Genome Assembly Program.
Genome Research 13: 2164-2170.
DOI
Ye L, Huang X. (2005)
MAP2: Multiple Alignment of Syntenic Genomic Sequences.
Nucleic Acids Research 33: 162-170.
DOI
Huang X, Yang S-P, Chinwalla A, Hillier L,
Minx P, Mardis E, Wilson R. (2006)
Application of a Superword Array in Genome Assembly.
Nucleic Acids Research 34: 201-205.
DOI
Huang X, Brutlag DL. (2007)
Dynamic Use of Multiple Parameter Sets in Sequence Alignment.
Nucleic Acids Research 35: 678-686.
DOI
Huang X. (2008)
Sequence Alignment with an Appropriate Substitution Matrix.
Journal of Computational Biology 15: 129-138.
DOI
Huang X, Vingron M. (2009)
Maximum Similarity: A New Formulation of Phylogenetic Reconstruction
Journal of Computational Biology 16: 887-896.
DOI
Huang X. (2014)
Horizontal Transfer Generates Genetic Variation in an Asexual Pathogen.
PeerJ 2: e650.
DOI
Huang X, Das A, Sahu BB, Srivastava SK, Leandro LF,
O'Donnell K, Bhattacharyya MK. (2016)
Identification of Highly Variable Supernumerary Chromosome Segments in an Asexual Pathogen.
PLoS ONE 11: e0158183.
DOI
Huang X. (2017)
Sequence Assembly. In Keith JM. (ed.)
Bioinformatics - Volume 1: Data, Sequence Analysis, and Evolution.
Humana Press, Second Edition, 35-45.
DOI
Das A, Huang X. (2019)
HPC: Hierarchical Phylogeny Construction.
PLoS ONE 14: e0221357.
DOI
Huang X. (2019)
Host-specific subtelomere: Genomic architecture of pathogen emergence in asexual filamentous fungi.
doi: https://doi.org/10.1101/721753
DOI
Computer Programs
We have developed a number of
computer programs
for analysis of DNA and protein sequences.
The programs below are used by scientists around the world in their research.
SIM: A local sequence alignment program,
MAP: A multiple sequence alignment program,
CAP3 and PCAP: Sequence and genome assembly programs
(downloading),
AAT: A set of programs for finding genes in DNA sequences
(downloading).