George Mason University
CDS/CCDS/Statistics Colloquium Series
Seminar Announcement


Online Text Processing of News Articles

Elizabeth Leeds Hohman

Naval Surface Warfare Center
Advanced Computation Division


Research 1, Room 301, Fairfax Campus
George Mason University, 4400 University Drive, Fairfax, VA 22030

Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
Date: October 19, 2007



ABSTRACT

We look at a modification of the traditional vector space model for text representation that allows us to compute vectors for text documents in time. That is, documents are represented by an approximation to their TFIDF vectors without requiring a full corpus of articles. This is achieved by using an exponential window for computing the document frequency of words and by managing a changing lexicon. A graph model is developed in order to provide a reduced representation of the documents. Graph nodes represent document topics and evolve in time. A data set of recent health news articles is used to demonstrate the concepts. Methods are presented for visualizing the data and the content of the graph nodes.