Recent News
New associate dean interested in helping students realize their potential
August 6, 2024
Hand and Machine Lab researchers showcase work at Hawaii conference
June 13, 2024
Two from School of Engineering to receive local 40 Under 40 awards
April 18, 2024
Making waves: Undergraduate combines computer science skills, love of water for summer internship
April 9, 2024
News Archives
Efficient Haplotype Inference on Pedigrees and Haplotype Based Disease Gene Mapping
April 13, 2004
Date: Tuesday, April 13, 2004
Time: 11am-12:15pm
Location: Woodward 149
Jing Li <[email protected]>
Ph.D. Candidate, Department of Computer Science and Engineering, University of California, Riverside
Abstract: With the completion of the Human Genome Project, an (almost) complete human genomic DNA sequence has become available. An important next step in human genomics is to determine genetic variations among humans and the correlation between genetic variations and phenotypic variations (such as disease status, quantitative traits, etc.). The patterns of human DNA sequence variations can be described by SNP (single nucleotide polymorphism) haplotypes. However, humans are diploid and, in practice, haplotype data cannot be collected directly, especially in large scale sequencing projects (mainly) due to cost considerations. Instead, genotype data are collected routinely in large sequencing projects. Hence, efficient and accurate computational methods and computer programs for the inference of haplotypes from genotypes are highly demanded. We are interested in the haplotype inference problem on pedigrees and haplotype-based association mapping methods for identifying disease genes. We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and accurate for data sets requiring few recombinants. A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. All possible feasible haplotype configurations could be obtained by adopting the Gaussian elimination algorithm. For genotypes with missing alleles, we develop an effective integer linear programming (ILP) formulation of the MRHC problem and a branch-and-bound strategy that utilizes a partial order relationship (and some other special relationships) among variables to decide the branching order. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. The ILP algorithm works for any pedigree structures, regardless of the number of recombinants, and effective for any practical size problems. We have implemented the above algorithms in a software package called PedPhase and tested them on simulated data sets as well as on a real data set. The results show that the algorithms perform very well. Haplotype information is much valuable for disease gene association mapping, which is a very important problem in biomedical research. We also develop a new algorithmic method for haplotype mapping of case-control data based on a density-based clustering algorithm, and propose a new haplotype (dis)similarity measure. The mapping regards haplotype segments as data points in a high dimensional space. Clusters are then identified using a density-based clustering algorithm. Z-score based on the numbers of cases and controls in a cluster can be used as an indicator of the degree of association between the cluster and the disease under study. Preliminary experimental results on an independent simulated data set, and on a real data set with the known disease gene location show that our method could predict gene locations with high accuracy, even when the rate of phenocopies is high.
Biography:Jing Li currently is a Ph.D. candidate in the Department of Computer Science and Engineering at the University of California - Riverside. He received a B.S. in Statistics from Peking University, Beijing, China in July 1995 and a M.S., in Statistical Genetics, from Creighton University in Aug. 2000. He was a winner of the ACM Student Research Competition in 2003. Jing Li's recent research interest includes Bioinformatics / computational molecular biology, algorithms and statistical genetics. He is particularly interested in developing algorithms for haplotype inference and haplotype-based disease gene mapping.