Next-generation Internet search techniques will greatly improve the ability to sift through the massive, ever-changing information posted to the Web – and enable people to better use this information for identifying critical issues such as homeland security concerns or imminent disease outbreaks, said William H. Hsu, Ph.D., an associate professor of computer and information sciences and director of the Laboratory for Knowledge Discovery in Databases at Kansas State University.
Hsu recently highlighted his research on extracting information from different blog user profiles and news articles and developing interactive spatial data cleaning and visualization techniques during a presentation at the Ohio Supercomputer Center titled, “Text-Driven Link Mining: A Multimodal Information Extraction and Annotation Approach.”
In the Web 2.0 realm, Hsu and his team are working to predict links among social networking sites that incorporate both user interests and topological features. A rudimentary example of this application is on-line shopping: You’ve ordered movie A, and upon checkout the site suggests you might also like Movie, B, C or D.
“Increasing precision and recall of link existence will help us find relevant features,” Hsu said.
Likewise, Hsu is creating, for his work with K-State's National Agricultural Biosecurity Center, a system to extract from the Web relevant information about disease outbreaks. The goals are two-fold: develop search and crawl tools that comb news stories, social media outlets or other online public sources for details about diseases, including initial outbreak dates and locations, numbers of people or animals affected and diseases spread. Then, the system will present this information in maps or timelines that help analysts mitigate the spread of human and animal diseases.
Challenges, Hsu said, include resolving ambiguous terms, recognizing images, and combining useful information that is embedded in text with data extracted from databases.
Finally, in the annotation domain, he is using visual analytics to assist spatial markup and data cleaning of heterogeneous data from natural language sources. While today’s search programs depend on the text captions that accompany photos, the data mining research at K-State and its partner institutions is leading to technology that will allow search engines to "look" through images from the Web.
Hsu’s work is funded, in part, by the Department of Defense and the Department of Homeland Security. He was visiting the Ohio Supercomputer Center to explore potential collaborative opportunities with OSC’s Cyberinfrastructure and Software Development team.
_________________
The Ohio Supercomputer Center (OSC) is a catalytic partner of Ohio universities and industries that provides a reliable high performance computing infrastructure for a diverse statewide/regional community including education, academic research, industry, and state government. Funded by the Ohio Board of Regents, OSC promotes and stimulates computational research and education in order to act as a key enabler for the state's aspirations in advanced technology, information systems, and advanced industries. For additional information, visit http://www.osc.edu.