Alexandria Digital Research Library

Querying and Mining Chemical Databases for Drug Discovery

Author:
Ranu, Sayan
Degree Grantor:
University of California, Santa Barbara. Computer Science
Degree Supervisor:
Ambuj K. Singh
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2012
Issued Date:
2012
Topics:
Computer Science
Keywords:
Drug Discovery
Indexing
Data Mining
Machine Learning
Databases
Genres:
Dissertations, Academic and Online resources
Dissertation:
Ph.D.--University of California, Santa Barbara, 2012
Description:

Drug discovery and development has exploded into a multi-billion dollar industry. Unfortunately, despite a steady increase in pharmaceutical research, the number of new drugs discovered has been, at best, flat. The low productivity of current approaches to drug discovery has been ascribed to a number of factors including limited focus to a single protein target and undesirable effects, such as toxicity, that are discovered too late in the discovery process. In this dissertation, I propose strategies to combat the low productivity of current drug-discovery techniques and show that by integrating the principles of statistical significance and diversity into the molecular analysis framework, we can accelerate the drug discovery rate.

In the first part of my thesis, I explore the importance of mining statistically significant patterns from large collections of scientific data and demonstrate their utility in drug discovery. I show that over-represented subgraphs in molecular databases are correlated with biological activity and can be used to learn accurate classification models. Furthermore, statistically significant pharmacophoric patterns can be employed to predict the binding mechanisms between small molecules and protein targets. Finally, I show that mining discriminative subgraphs from protein-protein interaction networks allows us to learn the complex network-encoded logic functions that decide the clinical outcomes of diseases.

In the second part of my thesis, I explore the importance of structural diversity in top-k queries, and develop index structures to answer such queries in a scalable manner. First, I explore the importance of modeling attractive and repulsive dimensions in molecular analysis and demonstrate their utility in going beyond traditional similarity or distance measures. Next, I show that diversity-aware top-k answer sets are informationally denser than traditional top-k answer sets.

Overall, this thesis proposes core indexing and mining algorithms that extend the current state of the art in computer science research. Among the various applications of the developed algorithms, impact in the field of drug discovery acts as the unifying theme binding all of the chapters together. However, these methods are also applicable in other scientific domains such as software bug mining, analysis of communication graphs, social networks, sensor networks, and transportation networks.

Physical Description:
1 online resource (293 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f3765c7m
ISBN:
9781267294821
Catalog System Number:
990037519120203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Sayan Ranu
Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance.