ARTIFICIAL INTELLIGENCE RESEARCH LABORATORY
    Center for Computational Intelligence, Learning, and Discovery
    Department of Computer Science


Learning Classifiers From Autonomous, Semantically Heterogeneous, Distributed Data

Personnel Project Summary Funding Publications Software Other Projects ISU Artificial Intelligence Research Lab Center for Computational Intelligence, Learning, and Discovery

Personnel

Faculty


Ph.D. Alumni


Current Ph.D. Students


Project Summary

Advances in networks, sensors, storage, computing, and high throughput data acquisition, have led to a proliferation of autonomous, distributed data sources in many areas of human activity. New discoveries in biological, physical, and social sciences and engineering are being driven by our ability to discover, share, integrate and analyze disparate types of data. Statistically-based machine learning algorithms offer some of the most cost-effective approaches to discovery of experimentally testable predictive models and hypotheses from data. However, the large size, distributed nature, and autonomy of the data sources (and the attendant differences in access, queries allowed, processing capabilities, structure, organization, and underlying data models and data semantics) present hurdles to effective utilization of machine learning. This research aims to overcome these hurdles by developing efficient, resource-aware distributed algorithms and software services to support collaborative, integrative knowledge acquisition such a setting. The research team will implement, deploy, and evaluate the resulting algorithms using benchmark data sets, associated data models and ontologies, and user-specified inter-ontology mappings on a distributed test-bed of networked databases and services at Iowa State University and Kansas State University. The resulting open-source software can potentially transform collaborative e-science in the same way that Web has transformed information sharing. Broader impacts of this research include enhanced opportunities for research-based training of graduate and undergraduate students, interdisciplinary collaborations, participation of under-represented groups, and development of increasingly sophisticated software to support collaborative, integrative e-science.


Funding

At present, the primary source of funding for this project is:

Additional support for the project has come from:

The project has benefited from work supported by related, but non-overlapping grants including:

This work builds on the results of a NSF-supported ITR project:


Publications

Books

  1. Honavar, V., Caragea, D., Koul, N., Zhang, J., Silvescu, A. and Lin, H. (2014). Learning Predictive Models from Statistical Queries against Big Data. In press.

Refereed Journal and Conference Publications

  1. Lee, S. and Honavar, V. (2013). Transportability of a Causal Effect from Multiple Environments. In: Proceedings of the 27th Conference on Artificial Intelligence (AAAI 2013).

  2. Lee, S. and Honavar, V. (2013). Causal Transportability of Experiments on Controllable Subsets of Variables: z-Transportability. In: Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013).

  3. Lin, H. and Honavar, V. (2013). Learning Classifiers from Chains of Multiple Interlinked RDF Data Stores. In: IEEE Big Data Congress. Best Student Paper Award.

  4. Lin, H., Lee, S., Bui, N. and Honavar, V. (2013). Learning Classifiers from Distributional Data. In: IEEE Big Data Congress.

  5. Bui, N. and Honavar, V. (2013). On the Utility of Abstraction in Labeling Actors in Social Networks. In: The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

  6. Andorf, C., Honavar, V. and Sen, T. (2013). Predicting the Binding Patterns of Proteins: A Study Using Yeast Protein Interaction Networks. PLOS One 8(2): e56833, doi: 10.1371/journal.pone.0056833

  7. Letao Qi, Harris T. Lin, Vasant Honavar: Clustering remote RDF data using SPARQL update queries. In: The 4th International Workshop on Graph Data Management: Techniques and Applications (GDM 2013), ICDE Workshops 2013: 236-242

  8. Xue, L., Jordan, R., El-Manzalawy, Y., Dobbs, D., and Honavar, V. (2013) DockRank: Ranking Docked Conformations Using Partner-Specific Sequence Homology Based Protein Interface Prediction. Proteins: Structure, Function and Bioinformatics.

  9. El-Manzalawy, Y., Dobbs, D., and Honavar, V. (2012). Predicting protective bacterial antigens using random forest classifiers.. ACM Conference on Bioinformatics and Computational Biology. pp. 426-433, 2012.

  10. Jordan, R., El-Manzalawy, Y., Dobbs, D., and Honavar, V. (2012). Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinformatics 2012, 13:41 doi:10.1186/1471-2105-13-41. Highly Accessed.

  11. Tao, J., Slutzki, G., and Honavar, V. (2012). PSpace Tableau Algorithms for Acyclic Modalized ALC. Journal of Automated Reasoning. Vol. 49. pp. 551-582

  12. Tu, K. and Honavar, V. (2012). Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars. In: Proceedings of EMNLP-CoNLL 2012 : Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. pp. 1324-1334.

  13. Walia, R., Caragea, C., Lewis, B., Towfic, F., Terribilini, M., El-Manzalawy, Y., Dobbs, D., Honavar, V. (2012). Protein-RNA Interface Residue Prediction Using Machine Learning: An Assessment of the State of the Art. BMC Bioinformatics 13:89 doi:10.1186/1471-2105-13-89.

  14. El-Manzalawy, Y., Dobbs, D., and Honavar, V. (2011). Predicting MHC-II binding affinity using multiple instance regression. IEEE/ACM Transactions on Computational Biology and Bioinformatics. DOI: 10.1109/TCBB2010.94

  15. Lin, H., Koul, N., and Honavar, V. (2011). Learning Relational Bayesian Classifiers from RDF Data. In: Proceedings of the International Semantic Web Conference (ISWC 2011). In press.

  16. Silvescu, A. and Honavar, V. (2011). Abstraction Super-structuring Normal Forms: Towards a Theory of Structural Induction. In: The Proceedings of the Solomonoff 85th Memorial Conference. Springer-Verlag Lecture Notes in Artificial Intelligence. In press.

  17. Tu, K. and Honavar, V. (2011). On the Utility of Curricula in Unsupervised Learning of Grammars. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011) pp. 1523-1528.

  18. Tu, K., Ouyang, X., Han, D., Yu, Y., and Honavar, V. (2011). Exemplar-based Robust Coherent Biclustering. In: Proceedings of the SIAM Conference on Data Mining (SDM 2011). pp. 884-895.

  19. Yakhnenko, O., and Honavar, V. (2011). Multi-Instance Multi-Label Learning for Image Classification with Large Vocabularies. In: British Machine Vision Conference. In press.

  20. Caragea, C., Silvescu, A., Caragea, D. and Honavar, V. (2010). Abstraction-Augmented Markov Models. In: Proceedings of the IEEE Conference on Data Mining (ICDM 2010). IEEE Press. pp. 68-77.

  21. Caragea, C. Silvescu, A., Caragea, D., and Honavar, V. (2010). Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models. BMC Bioinformatics. doi: 10.1186/1471-2105-11-S8-S6.

  22. Koul, N., Bui, N., and Honavar, V. (2010). Scalable, Updatable Predictive Models for Sequence Data. In Proceedings of the IEEE Intenational Conference on Bioinformatics and Biomedicine (BIBM 2010).

  23. Koul, N. and Honavar, V. (2010). Learning in the Presence of Ontology Mapping Errors. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. pp. 291-296. ACM Press.

  24. Pandit, S., and Honavar, V. (2010). Ontology-Guided Extraction of Complex Nested Relationships from Text. IEEE Conference on Tools With Artificial Intelligence (ICTAI 2010). pp. 173-178.

  25. Sanghvi, B., Koul, N., and Honavar, V. (2010). Identifying and Eliminating Inconsistencies in Mappings across Hierarchical Ontologies. In: Springer-Verlag Lecture Notes in Computer Science Vol. 6427, pp. 999-1008. Berlin: Springer.

  26. Bao, J., Voutsadakis G., Slutzki, G. Honavar:, V. (2009). Package-Based Description Logics. In: Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization. Lecture Notes in Computer Science Vol. 5445, pp. 349-371

  27. Bromberg, F., Margaritis, D., and Honavar, V. (2009). Efficient Markov Network Structure Discovery from Independence Tests. Journal of Artificial Intelligence Research. Vol. 35. pp. 449-485.

  28. Caragea, C., Caragea, D., and Honavar, V. (2009). Learning Link-Based Classifiers from Ontology-Extended Textual Data. In: Proceedings of the IEEE Conference on Tools with Artificial Intelligence.

  29. Caragea, C., Caragea, D., and Honavar, V. (2009). Learning Link-Based Classifiers from Ontology-Extended Distributed Data. In: Proceedings of the 8th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE). Springer-Verlag Lecture Notes in Computer Science Vol. 5871 pp 1139-1146, Berlin: Springer.

  30. El-Manzalawi, Y. and Honavar, V. (2009). MICCLLR: Multiple-Instance Learning using Class Conditional Log Likelihood Ratio. In: Proceedings of the 12th International Conference on Discovery Science (DS 2009). Springer-Verlag Lecture Notes in Computer Science Vol. 5808, pp. 80-91, Berlin: Springer.

  31. Koul, N., and Honavar, V. (2009). Design and Implementation of a Query Planner for Data Integration. In: Proceedings of the IEEE Conference on Tools with Artificial Intelligence.

  32. Pham, H., Santhanam, G., McCalley, J., and Honavar, V. (2009). BenSOA: a Flexible Service-Oriented Architecture for Power System Asset Management. In Proceedings of the North American Power Symposium (NAPS).

  33. Silvescu, A., Caragea, C. and Honavar, V. (2009). Combining Super-structuring and Abstraction on Sequence Classification. IEEE Conference on Data Mining (ICDM 2009).

  34. Yakhnenko, O., and Honavar, V. (2009). Multi-Modal Hierarchical Dirichlet Process Model for Predicting Image Annotation and Image-Object Label Correspondence. In: Proceedings of the SIAM Conference on Data Mining, SIAM. pp. 281-294

  35. Bao, J., Voutsadakis, G., Slutzki, G., and Honavar, V. (2008). On the Decidability of Role Mappings between Modular Ontologies. In: Proceedings of the 23nd Conference on Artificial Intelligence (AAAI-2008), Menlo Park, CA: AAAI Press, pp. 400-405

  36. Caragea D., Cook, D., Wickham H. and Honavar, V. (2008). Visual Methods for Examining SVM Classifiers. Simeon J. Simoff, Michael H. Bohlen, Arturas Mazeika (Eds.): Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. Springer-Verlag Lecture Notes in Computer Science Vol. 4404 pp.136-153

  37. Koul, N., Caragea, C., Bahirwani, V., Caragea, D., and Honavar, V. (2008). Learning Classifiers from Large Databases Using Statistical Queries. In: Proceedings of the ACM/IEEE/WIC Conference on Web Intelligence, pp. 923-926.

  38. Tu, K., and Honavar, V. (2008). Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering. . In: International Colloquium on Grammatical Inference (ICGI-2008). Springer-Verlag Lecture Notes in Computer Science vol. 5278 pp. 224-237.

  39. Voutsadakis, G., Bao, J., Slutzki, G., and Honavar, V. (2008). F-ALCI: A Fully Contextualized, Federated Logic for the Semantic Web. Proceedings of the ACM/IEEE/WIC Conference on Web Intelligence, Sydney, Australia.

  40. Yakhnenko, O. and Honavar, V. (2008). Annotating Images and Image Objects using a Hierarchical Dirichlet Process Model. 9th International Workshop on Multimedia Data Mining (SIGKDD MDM 2008), Las Vegas, ACM.

  41. Bao, J., Slutzki, G., and Honavar, V. (2007). A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies.. In: Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-2007). Vancouver, Canada. Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies. pp. 1304-1309. AAAI Press.

  42. Bao, J., Slutzki, G., and Honavar, V. (2007). Privacy-Preserving Reasoning on the Semantic Web. IEEE/WIC/ACM Conference on Web Intelligence. IEEE. pp. 791-797

Book Chapters

  1. Bao, J., Slutzki, G., and Honavar, V. (2009). P-DL: A Semantic Importing Approach to Selective Knowledge Reuse in Modular Ontologies. In: Ontology Modularization. Parent, C., Spaccapietra, S., and Stuckenschmidt, H. (Ed). Berlin: Springer.

  2. Honavar, V. and Caragea, D. (2008). Towards a Semantics-Enabled Infrastructure for Knowledge Acquisition from Distributed Data. In: Next Generation Data Mining. Kargupta, H. et al. (ed). Taylor and Francis.

  3. Caragea, C. and Honavar, V. (2008). Machine Learning in Computational Biology. In: Encyclopedia of Database Systems, (Raschid, L., Editor), Springer.

  4. Caragea, D., Cook, D., Wickham, H., and Honavar, V. (2008). Visual Methods for Examining SVM Classifiers. In: Visual Data Mining: Theory, Techniques, and Tools for Visual Analytics. Springer.

  5. Caragea, D. and Honavar, V. (2008). Learning Classifiers from Distributed Data. In: Encyclopedia of Database Technologies and Applications, Ferraggine, V.E., Doorn, J.H., and Rivero, L.C. (Ed). New York: Idea Group.

  6. Caragea, D. and Honavar, V. (2008). Learning Classifiers from Semantically Heterogeneous Data. In: Encyclopedia of Data Warehousing and Mining, Wang, J. (ed).

Plenary Talks and Invited Lectures at Conferences

  1. Honavar, V. (2007). Invited Talk, Making Biology and Medicine a Predictive Science. NSF Workshop on Biomedical Informatics. Oregon, 2007.

  2. Honavar, V. (2007). Invited Talk, Knowledge Acquisition from Semantically Disparate Distributed Data. NSF Workshop on Next Generation Data Mining and Cyber‐Enabled Discovery, Baltimore, Maryland, 2007.

  3. Honavar, V. (2007). Invited Lecture. On Selective Sharing and Reuse of Ontologies. Semantic Technologies Conference, San Jose, CA, May 2007.

Tutorials
  1. Honavar, V. and Caragea, D. Tutorial: Collaborative Knowledge Acquisition from Semantically Disparate, Distributed Data Sources, 2006 International Symposium on Collaborative Technologies and Systems, Las Vegas, Nevada, USA, May 2006.

  2. Honavar, V. and Caragea, D. Semantic Web Technologies for Collaborative Knowledge Acquisition, International Conference on Digital Information Management, Bangalore, India, December 2006.
Invited Colloquia

  1. Honavar, V. (2008). Invited Colloquium, Semantics-Enabled Infrastructure for Collaborative, Integrative e‐Science. School of Information Technology, Jawaharlal Nehru University, New Delhi, India, December 2008.

  2. Honavar, V. (2008). Invited Talk, Computational Sciences. High Performance Computing Center, Jawaharlal Nehru University, New Delhi, India, December 2008.

  3. Honavar, V. (2008). Invited Colloquium, Semantics-Enabled infrastructure for collaborative, integrative e‐science. Yahoo!, Bangalore, India, January 2008.

  4. Honavar, V. (2007). Semantic Web for Collaborative Knowledge Acquisition. HP Research labs, Bangalore, India.

  5. Honavar, V. (2006). Invited Colloquium, Algorithms and Software for Knowledge Acquisition from Semantically Heterogeneous, Distributed Data Sources. Dept. of Electrical and Computer Engineering. University of Iowa.

  6. Honavar, V. (2006). Invited Colloquium, Algorithms and Software for Collaborative Discovery in Systems Biology. Dept. Biostatistics, Bioinformatics and Epidemiology. Medical University of South Carolina.

  7. Honavar, V. (2005). Invited Talk, Algorithms and Software for Knowledge Acquisition from Semantically Heterogeneous, Distributed, Autonomous Information Sources. Google Research.

Doctoral Dissertations

  1. Neeraj Koul (Computer Science). Ph.D., 2011. Learning Predictive Models from Massive, Semantically Disparate Data.

  2. George Voutsadakis (Computer Science, with Giora Slutzki), Ph.D., 2010. Thesis: Federated Description Logics for the Semantic Web.

  3. Cornelia Caragea (Computer Science), Ph.D., 2009. Thesis: Abstraction-Based Probabilistic Models for Sequence Classification.

  4. Oksana Yakhnenko (Honavar). Ph.D., Computer Science, 2009. Thesis: Generative and Discriminative Models for Learning from Weakly-Labeled and Unlabeled Data with Application to Learning from Images and Text.

  5. Adrian Silvescu (Honavar). Ph.D., Computer Science, 2008. Thesis: Structural Induction: Towards Automatic Ontology Elicitation

  6. Jie Bao (Honavar). Ph.D., Computer Science, 2007. Thesis: Representing and Reasoning with Modular Ontologies.

  7. Jyotishman Pathak (Honavar). Ph.D., Computer Science, 2007. Thesis: Interactive and Verifiable Web Service Composition, Reformulation, and Adaptation.

  8. Dae-Ki Kang (2006). Abstraction, Aggregation and Recursion for Generating Accurate and Simple Classifiers. Doctoral Dissertation. Department of Computer Science, Iowa State University.

  9. Jun Zhang (2005). Learning Ontology Aware Classifiers. Doctoral Dissertation. Department of Computer Science, Iowa State University.

  10. Doina Caragea (2004). Learning Classifiers From Distributed, Semantically Heterogeneous, Autonomous Data Sources. Doctoral Dissertation. Department of Computer Science, Iowa State University.
Selected Publications from the Earlier ITR project

  1. Bao, J., Caragea, D., and Honavar, V. (2006). On the Semantics of Linking and Importing in Modular Ontologies.In: Proceedings of the International Semantic Web Conference (ISWC 2006), Lecture Notes in Computer Science, Berlin: Springer. Lecture Notes in Computer Science Vol. 4273, pp. 72-86.

  2. Bao, J., Caragea, D., and Honavar, V. (2006). A Tableau Based Federated Reasoning Algorithm for Modular Ontologies. In: Proceedings of the ACM/IEEE/WIC Conference on Web Intelligence. IEEE Press. pp. 404-410.

  3. Bao, J., Caragea, D., and Honavar, V. Modular Ontologies - A Formal Investigation of Semantics and Expressivity. In Proceedings of the First Asian Semantic Web Conference, Beijing, China, Springer-Verlag Lecture Notes in Computer Science Vol. 4185, pp. 616-631. Best Paper Award, 2006.

  4. Kang, D-K., Silvescu, A. and Honavar, V. RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, 2006.

  5. Vasile, F., Silvescu, A., Kang, D-K., and Honavar, V. TRIPPER: An Attribute Value Taxonomy Guided Rule Learner. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Berlin: Springer-Verlag. pp. 55-59, 2006.

  6. Zhang, J., Kang, D-K., Silvescu, A. and Honavar, V. Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. Knowledge and Information Systems. Vol. 9. No. 2. pp. 157-179, 2006.

  7. Caragea, D., Zhang, J., Bao, J., Pathak, J., and Honavar, V. (2005). Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous Information Sources (Invited paper). In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. Lecture Notes in Computer Science. Singapore. Vol. 3734. pp. 13-44. Berlin: Springer-Verlag.

  8. Caragea, D., Silvescu, A., Pathak, J., Bao, J., Andorf, C., Dobbs, D., and Honavar, V. (2005). Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. In: Data Integration in Life Sciences (DILS 2005) Springer-Verlag Lecture Notes in Computer Science. San Diego. Vol. 3615. pp. 175-190. Berlin: Springer-Verlag.

  9. Caragea, D., Bao, J., Pathak, J., Andorf, C,., Dobbs, D., and Honavar, V. Information Integration from Semantically Heterogeneous Biological Data Sources. Proceedings of the Sixteenth International Workshop on Databases and Expert Systems Applications (DEXA 05), Copenhagen, IEEE Computer Society. pp. 580-584, 2005.

  10. Kang, D-K., Zhang, J., Silvescu, A., and Honavar, V. Multinomial Event Model Based Abstraction for Sequence and Text Classification. Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005), Edinburgh, UK, Berlin: Springer-Verlag. Vol. 3607. pp. 134-148, 2005.

  11. Yakhnenko, O., Silvescu, A., and Honavar, V. Discriminatively Trained Markov Model for Sequence Classification. IEEE Conference on Data Mining (ICDM 2005), Houston, Texas, IEEE Press, 2005.

  12. Zhang, J., Caragea, D. and Honavar, V. (2005). Learning Ontology-Aware Classifiers. In: Proceedings of the 8th International Conference on Discovery Science. Springer-Verlag Lecture Notes in Computer Science. Singapore. Vol. 3735. pp. 308-321. Berlin: Springer-Verlag.
  13. Caragea, D., Pathak, J., and Honavar, V. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004), Agia Napa, Cyprus, 2004.

  14. Caragea, D., Pathak, J. and Honavar, V. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004). Springer-Verlag Lecture Notes in Computer Science. Cyprus, Greece. Vol. 3291. pp. 963-980. Springer-Verlag.

  15. Caragea, D., Silvescu, A., and Honavar, V. (2004). A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems. Vol. 1. pp. 80-89.

  16. Kang, D-K., Silvescu, A., Zhang, J., and Honavar, V. (2004). Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers. In: Proceedings of the IEEE International Conference on Data Mining.

  17. Zhang, J. and Honavar, V. (2004). AVT-NBL - An Algorithm for Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. In: Proceedings of the IEEE International Conference on Data Mining.

  18. Atramentov, A., Leiva, H., and Honavar, V. (2003). A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag.

  19. Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. In press.

  20. Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction. In: Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Berlin: Springer-Verlag.

  21. Caragea, D., Silvescu, A., and Honavar, V. (2001). Invited Chapter. Towards a Theoretical Framework for Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources. In: Emerging Neural Architectures Based on Neuroscience. Berlin: Springer-Verlag.

Software