Laboratory Note - The Seeds of Monster Space

The story so far: this blog contains the lab notes of a research project to use machine learning and big-data analytics to invent a totally new kind of monster. Our initial analysis probed the concept of a monster, and we conducted an unethical experiment in which we put Mr. Spock on the Nostromo in place of the Xenomorph from the movie Alien. Also, in groundbreaking work, we found a way to turn the children's show character Cookie Monster into an actually terrifying monster (well, sort of). With the theoretical foundations of our work prepared, we now move to the next phase: gathering and analyzing data. To do this, we need monsters. Lots of monsters. Indeed, we need all of the monsters.

We want to document every monster that we can – every creature from books, movies, games, creepypastas, TV shows, comics, mythology, oral tradition, and anywhere else they can be found. We don’t want to simply list them, but to document them. We want to vivisect our monsters to understand how they work, and what makes them scary. We want to fit each monster into its place in the space of all monsters. In other words, we want to create monster space.

Along the way, we will use our intermediate results to try out methods for building new monsters. In these dangerous and unauthorized experiments, we will build monsters based on our models of monster space, write stories about them, and put those stories into the world. With any luck, we will keep billions of people from sleeping with our newly developed creatures.

A journey of a thousand miles begins with a single step, the era of telephony began with “Mr. Watson come here,” and our project has similarly humble beginnings: we will start with the one-hop scrape.

The One-Hop Scrape

We are going to use data about monsters found in Wikipedia. Why Wikipedia? Because there are monsters there. Potentially a lot of monsters. Also, there are links between pages. The article for King Kong links to the article about Godzilla. Godzilla links to the The Creature from the Black Lagoon, which links to the article about the movie The Thing. And so on. Our observation is that articles about monsters link to more articles about monsters. If this phenomenon holds across much of Wikipedia’s monster pages, then we can crawl through the network of inter-page links, traveling from monster to monster, and adding them to a curated database of articles about monsters.

Suppose we repeat this process several times, using the previous cycle’s list of monster pages as the seeds for the next round. This is called bootstrapping. Visually, the process looks like this:

As of this blog post, we haven’t even gone around the bootstrapping cycle once. Here’s what we have done so far:

Brainstorm a list of 92 monsters. These are the Seed Monsters.
Look up the Wikipedia pages for those monsters
Use the Wikipedia-Spider to visit each of the 92 seed pages, and retrieve all of the pages that the seed page links to.

In other words, we’re at step 3 in the picture.

The results of the one-hop Wikipedia scrape

The set of seed monsters (which you can find on my GitHub page) is a list of 92 Wikipedia pages that either discuss a single monster (e.g. Dragon, Zombie, Ghost), a movie where monsters fight each other (e.g. Gamera vs. Zigra) or are “list” pages, like List of Monster Movies, that contain a lot of monster pages in their immediate “one hop” space in the Wikipedia inter-article graph.

The seed pages were just the monsters I could think of off the top of my head, no doubt influenced by growing up in the United States in the age of CRT-based televisions and Blockbuster video stores. In other words, it’s a biased sample.

Using the Wikipedia spider, I compiled a list of all the Wikipedia pages that were one hop away from a seed page. It turns out that 7078 Wikipedia pages lie one-hop away from the seed monsters. Most of these pages are definitely not about monsters. Most are about non-monster topics like Romance Film, United Kingdom, and Roger Ebert. But there are monsters there. New ones that were not in my seed list. Monsters I never heard of before were retrieved by the one-hop scrape process. Just from skimming a few of the 7078 pages I retrieved, I found:

In other words, the bootstrapping process seems to work. I started with a bunch of monsters I knew, gathered adjacent Wikipedia pages and found new monsters that I wasn’t previously aware of.

But how well does it work? What is the ratio of true positives (monster pages) to false-positives (non-monster pages)? The only way to know for sure is to examine each of the 7078 pages, give them a monster or non-monster label, then compute the ratio. That’s a lot of work!

To give me a very rough picture of the purity of the scraping results, I randomly chose 60 pages from the 7078. I looked at each page and gave it a monsterosity score from 0 (totally non monster) to 3 (obviously describes a monster). Pages received a score of 1 if they discussed topics that related to horror, or the unknown. Pages received a score of 2 if they discussed an imaginary entity.

I sorted these randomly chosen pages by their score and arranged them into this color-coded, three-column table.

In my random sample of 60 pages, I found three monsters, seven imaginary entities that could potentially be portrayed as monsters in a story, and nine pages that somehow dealt with imaginary worlds, or seemed to be related to monsters in some strong ways. This is obviously a very subjective rating system, but these results nevertheless are promising. The bootstrapping process does collect monsters in its drag-net. We need some way of further refining the results, but it’s a start!

Monster-adjacent pages

Even though most of the 7078 pages retrieved via the one-hop scrape are not about specific monsters, they are all monster-adjacent pages. Pages that are one hop away from at least one monster. A brief perusal of these uncovers some interesting material. Stuff that seems like it could be woven into a horror or monster story. For example:

These aren’t necessarily scary or terrifying concepts on their own. But I think they’re kind of creepy sounding.

This blog is about monsters. But more fundamentally, it’s also about amplifying creativity. Creativity has been described as the process of “connecting the dots” between already-existing ideas to create new ones. The more dots you have (the more things you know), the more raw material you have to form new connections.

As a quick-little side project, I’m going to connect two dots. The first dot is the Wikipedia page for the word Pareidolia. Maybe it’s just me, but the word itself sounds creepy. Speaking of creepiness, the second dot I’m going to use to create something new is a psychology paper I came across titled “On the nature of creepiness.” The paper describes a small-scale study to define what the sensation of creepiness is and uncover some factors that make a person “creepy.”

Using pareidolia as the main story idea, with scientifically based guidance on creepiness to guide me, I created …The Coincidence Man!

Works Cited

McAndrew , Francis T. and Koehnke Sara S. “On the nature of creepiness.” New Ideas in Psychology, Volume 43, 2016, Pages 10-15,