Data Loam – Weaving the Fabric of Data
Global information is now large enough to allow the emergence of correlations within the data. Data objects may self-organise according to consistent, if perhaps complex, modes of interaction between the attributes of the data objects. This project centres on pattern discovery in long format text documents. How can we search better? Or how can we find related articles when they do not contain the same key words? The test database for this project took the first 250,000 pages of Wikipedia and indexed them using a dictionary of 115,000 known English words. The image is a heat map of points generated wherever a word is found on a page--the rows are index words, for example “chair”, “table” and “bed” cluster together as “furniture;” and the columns are documents, Wikipedia pages on Nietzsche and Socrates, for example, grouped as philosophers. A new image is generated after each round of groupings, and the data “self-organizes” into hierarchies seen as building densities in the heat map. This closed loop continues and evolves until stable, or perhaps dynamically evolving, patterns emerge.