|Artificial Intelligence Research Laboratory |
Department of Computer Science
Iowa State University
Interactive Visual Overviews of Large Multi-Dimensional Datasets
Personnel Project Summary Funding Publications Additional Information Projects AI Lab
Recent advances in high throughput data acquisition, digital storage, computer and communications technologies have made it possible to gather, store, and transmit large volumes of data. Translating the advances in data acquisition and storage technologies into fundamental gains in understanding of the respective domains, requires the development of sophisticated computational tools to assist in the knowledge discovery process. Given the large volumes of data, and the broad range of scientifically relevant and potentially complex interrelationships that might be of interest, machine learning or data mining algorithms offer one of the most practical and cost-effective approaches to data-driven knowledge discovery. However, fully automated knowledge discovery is beyond the current state of the art in artificial intelligence, and we still need the "little-understood ability of human beings to `see the big picture' and `know where to look' when presented with visual data". The proposed research seeks to develop sophisticated dynamic graphics tools for interactive exploratory analysis of very large datasets. These tools, when used in conjunction with data mining algorithms, will enable the user to overview the data space as well as the complex relationships discovered by the data mining algorithms. This would significantly enhance the utility of machine learning algorithms for interactive data-driven knowledge discovery from large, high dimensional datasets. The proposed research brings together a team of researchers with complementary research interests and expertise in statistics and visualization, artificial intelligence, machine learning, and bioinformatics, databases and information management to develop a modular and extensible software toolbox for user-driven, computer-assisted, interactive exploration of extremely large, high-dimensional datasets. This research focuses on tools for visualizing large, multi-dimensional data. The objective is to scale up the proven visual methods to work with larger amounts of data. The researchers plan to integrate the dynamic graphics tools of the grand tour with machine learning algorithms to provide an interactive, user-centered, collection of tools for knowledge discovery from very large data sets. Specifically, machine learning algorithms including dimensionality reduction techniques (e.g., principal component analysis, independent component analysis, clustering), neural networks, statistical methods (discriminant analysis, density estimation), decision tree or rule induction algorithms, feature extraction, feature subset selection, and feature extraction techniques will be an integral part of the knowledge discovery toolbox, along with the dynamic graphics tools. To scale the tour algorithm up to any size data set two approaches will be explored. The first involves the use of texture maps/images to display projected data, the maps created by a separate process that generates grand tour projections of the data which we store in a stack, and the second involves selecting useful subsets of cases (for example, using support vector machines and related machine learning algorithms) and intelligent selection of projections. The visual interface for the tour will pull projections/images out of the stack as needed to provide a smooth movie of the data, and provide interaction tools such as like a video recorder, play, forward/backward, freeze frame, zoom and pan. In addition, ways to compensate for overplotting, and ways to interact with images to facilitate drill down will be explored. The integration of these tools with data mining algorithms will enable the user to:
© Vasant Honavar, 1999.