December 25th, 2005

I was thinking today about how a web crawler works, what it actually does and why. It occurred to me that during sleep, among other things, the mind must be re-indexing in a manner similar to that of a web crawler.

Think of the mind as having a sort of “state”, a vast set of relational data in motion. At any given moment a snapshot of this state is virtually unique. In this model, without a constant re-indexing, the data and relations become more and more jumbled, to the point where they are ultimately useless.

In fact, the idea of “working memory” may just be a simplification of the daily process of re-indexing the entire collection of data. We constantly need to have new information parsed and integrated, and given a quick index in order to be immediately useful. We also need to have all of the information, old information and new, parsed and re-indexed in order to keep the relations up-to-date and free from the clutter that is caused from constantly moving the data.

Imagine the state of a hard drive after a full day of adding and deleting information, moving files around, etc. There are holes and gaps everywhere. In an operating system, the (directory) scheme used to store data and the content of the data itself are tightly coupled. Directories contain subdirectories where data is aggregated based on content, and this continues on recursively. In the brain (and on the Internet) the storage scheme and relations between content are loosely coupled. What this means is that data doesn’t actually need to (and practically never does) exist in any one locale in order to maintain strong relations. Search abstracts away the actual location of data and provides relations based on links between it. The data in question can and will exist anywhere; the links and contextual relationships are what’s important.

Unlike the hard drive example above, not only would the nightly “crawl” need to contend with data that has been moved (like a hard drive defragmentation), but also rebuild and strengthen relationships by following relational links and re-indexing.

Not only does the structure of the Internet provide a model for the structure of the human mind, but the tools we use to maintain and navigate it represent processes that occur naturally during daily maintenance.

2 comments

“It seems not only does the structure of the Internet provide a model of the structure of the human mind…”

The Internet is one big mess. I’m guessing there are a lot of psychiatrists trying to fix it.

The fact that the Internet is “one big mess” only stands to further my point. Although the Internet is chaotic, a huge disorganized database, we search it daily and seemingly effortlessly find what we need.

The reason that we can do this is because it has been crawled and content indexed. This means that a search for a keyword may turn up two pages, one existing in Australia and the other in Canada. What they have in common are links and content, not location. The authors of the two pages need not know of one another, nor must they ever. The information has been indexed as pertinent to your search and therefore the pages are returned.

Leave a Reply