Information Extraction: Natural Language, Spatiotemporal Machine Learning, and Link Analysis Approaches
This talk will survey current open problems and approaches used in the domain of information extraction, a subtopic of information retrieval whose goal is to automatically obtain categorized and semantically well-defined data from unstructured content such as text and web documents. I will first describe three active research topics in information extraction: (1) recognizing textual entailment, the problem of automatically detecting when the meaning of one short piece of text logically follows from that of another; (2) question answering, the problem of automatically responding to a query posed in a natural language; and (3) update summarization, the problem of automatically generating a brief restatement of the main points in a text when the user of the system has already read a given set of earlier articles. Next, I will present a current project in natural language processing and machine learning aimed at synthesizing and improving methodologies for these tasks. I will then relate these tasks to our continuing research in document categorization, event extraction, and spatial data mining. Finally, I will present some recent results and preliminary work in the related field of weblog analysis, focusing on link analysis approaches, to which information extraction techniques are also applicable.
