July 31, 2012

Idea: Text Supercuts

You’re familiar with supercuts, those obsessively edited videos that show you, for example, every utterance of the phrase Damn It from every season of the show “24”.

Well, here’s a supercut of a different kind: the text supercut. Need an example?

Here’s every use of the word “cat” from Dr. Suess’ book The Cat in the Hat:

Cat cat. Cat cat Cat cat. Cat cat. cat. cat. cat cat. Cat Cat Cat cat. cat. cat Cat cat Cat cat, cat. cat cat! Cat

[My first effort was a supercut of every instance of “whale” in Moby Dick. But it turns out the word appears so many times that it ended up being way too large to reasonably share in a blog post.]


Love the idea. Reminds me of the silly idea I had the other day about a “Where’s Waldo” audio book…

How about a pictorial supercut? For example, cutting out Waldo from the entire Where’s Waldo series and aligning them on one page.

A little more time consuming than just ‘grep’ unfortunately :)

I had a similar thought recently when reading a Nancy Drew book to my daughter, though instead of pulling out a specific word, it would be all of the adverbs. That Carolyn Keene liked her adverbs.

You need more than grep, since grep simply shows the lines where the word occurs. A Perl script would be simple enough, and I think that (or a similar tool) was what David used.

The pattern would be /\bcat[^\w\s]?\s/ig

Regarding Where’s Waldo, there’s been some discussion on an algorithm to do that here.

This post reminds me of a piece of “concept poetry” that I did in a college class. I noticed that my professor would randomly underline some words in my papers as she proofread them. It seemed like a subconscious part of her reading process. So when we had to write a poem for class, I took one of my old papers and strung together all the words that she had underlined. Here’s an excerpt (the whole thing is 262 words):

modern science religion more and more concerned citizens they both continue to coexist religion holds struggling population alive and inquiring swells and lulls in check for many centuries supernatural concrete observation money and patronage modern grant applicant more impressing the duke theology reasonable and open-minded enough amenable to a radical shift if the evidence pointed to it discover something significant power outage de facto foundation for all biology scientific theory explanation

Speaking of Dr. Seuss, I was wondering the other day if Kool & The Gang’s Jungle Boogie had fewer words in it than Green Eggs and Ham. Turns out the song has less than half that many:
”Aaaaaaah!, about, boogie, down, feel, flow, funk, get, I’m, it, jungle, let, say, talking’, the, till, uh, up, with, ya, y’all.”

I thought this post was going to be about shaving words into people’s hair.

So something along the lines of a KWOOC concordance, then?

I’d like to see every utterance of, “Ya!” in Fargo.

Re: lar3ry ‘s post
A simple “grep -o” will do. No need to use the powerful but unreadable Perl.

Re: Justin’s post

The closest I could get with “grep -o” is:

egrep -o ‘\b[Cc]at[^\w\s]?\s’

Which is the same pattern (except I could not get GNU’s egrep’s -i to work on my OS X system, hence the modified [Cc] instead of c).

Since my original comment was off the top of my head, here are some better patterns (using the [Cc] construct):


(I like the last one, because it only allows the word “cat” with or without one or more punctuation marks in the current locale.)

I used Perl in my original comment only because the regular expressions are so “expressive,” not because it was the only tool that could be used.

i missed 5 on the u.s. naming in 10 min