Querying and Mining Chemical Databases for Drug Discovery
- Degree Grantor:
- University of California, Santa Barbara. Computer Science
- Degree Supervisor:
- Ambuj K. Singh
- Place of Publication:
- [Santa Barbara, Calif.]
- University of California, Santa Barbara
- Creation Date:
- Issued Date:
- Computer Science
- Drug Discovery,
Machine Learning, and
- Dissertations, Academic and Online resources
- Ph.D.--University of California, Santa Barbara, 2012
Drug discovery and development has exploded into a multi-billion dollar industry. Unfortunately, despite a steady increase in pharmaceutical research, the number of new drugs discovered has been, at best, flat. The low productivity of current approaches to drug discovery has been ascribed to a number of factors including limited focus to a single protein target and undesirable effects, such as toxicity, that are discovered too late in the discovery process. In this dissertation, I propose strategies to combat the low productivity of current drug-discovery techniques and show that by integrating the principles of statistical significance and diversity into the molecular analysis framework, we can accelerate the drug discovery rate.
In the first part of my thesis, I explore the importance of mining statistically significant patterns from large collections of scientific data and demonstrate their utility in drug discovery. I show that over-represented subgraphs in molecular databases are correlated with biological activity and can be used to learn accurate classification models. Furthermore, statistically significant pharmacophoric patterns can be employed to predict the binding mechanisms between small molecules and protein targets. Finally, I show that mining discriminative subgraphs from protein-protein interaction networks allows us to learn the complex network-encoded logic functions that decide the clinical outcomes of diseases.
In the second part of my thesis, I explore the importance of structural diversity in top-k queries, and develop index structures to answer such queries in a scalable manner. First, I explore the importance of modeling attractive and repulsive dimensions in molecular analysis and demonstrate their utility in going beyond traditional similarity or distance measures. Next, I show that diversity-aware top-k answer sets are informationally denser than traditional top-k answer sets.
Overall, this thesis proposes core indexing and mining algorithms that extend the current state of the art in computer science research. Among the various applications of the developed algorithms, impact in the field of drug discovery acts as the unifying theme binding all of the chapters together. However, these methods are also applicable in other scientific domains such as software bug mining, analysis of communication graphs, social networks, sensor networks, and transportation networks.
- Physical Description:
- 1 online resource (293 pages)
- UCSB electronic theses and dissertations
- Catalog System Number:
- Sayan Ranu, 2012
- In Copyright
- Copyright Holder:
- Sayan Ranu
|Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance.|