Saturday, December 28, 2013

Lyric Analytics

I was messing around with the text mining (tm) package in R and was thinking of something I could comb through.  I looked through some other blogs and websites to see how they were using it:  mining through presidential speeches debates being one of the more notable uses.  I was thinking about it in terms of where we focus mostly on how things are said and really not what is actually said...ie music.  For some people, what is said is important and I would argue for most artists who, "make it" eventually it comes around to be important to an artist's being catapulted into fame :-)



One of the greater(est) rock and roll icons who began making albums slightly before my ears were ready for it is "The Boss".  Lots of albums, lots of words.  Although Bruce Springsteen was definitely communicating a lot of powerful ideas in his music, ultimately it's the passion he sings with, the way he is miraculously able to have a sax weaved into most of his songs, or just having come up with the nickname "The Boss" that makes him awesome.  Anyways, I thought about looking at the album Born to Run as a primer in what I'll call, "lyric mining".

Below is a graph showing the most frequent words used in the album across every song that are mentioned at least 5 times.  For those of us familiar with the album, if you just look at these words you can hear the music.

Number of times words are used in the album Born to Run

Below is a heat map showing words and their corresponding albums (lighter the colors = more usage).  I chose only words that were mentioned at least 5 times in the album, otherwise it was too large.  On the axis not labeled are dendograms (basically a graphical way to show how things are associated or how similar they are - more on how to read them here).  Each cell in the heat map has a bar graph showing the relative usage of the word...highest being 10 times in one song...that being the word "one" in the song "She's the One".

Born to Run album Heatmap

In terms of what is said you'll notice that the song "Night" is associated with "Jungleland"(height of the lines and being in the same "clade" on the dendogram) in terms of the words used, at least those that are used at least 5 times.  Here's how this looks when they are graphed against each other:

Night lyrics graphed against Jungleland lyrics for words mentioned 5=< times
Alternatively "10 Avenue Freeze Out" is an outlier in terms of word usage among the other songs as you can see it sits relatively unconnected from other songs in the dendogram on the x-axis.

On the y-axis you can see the associations across different words and their usage.  "Night" and "One" even though they are used a lot are not distributed the same (different "clades")...meaning when "The Boss" is belting it out, he's using these words in different places - different songs.

Which brings up an interesting point about great albums (in my opinion):  their distribution of themes.  "Born to Run" definitely has some great themes in it and while I won't interpret the meaning of each song, we can see it through the distribution of words in the lyrics and songs in the album.  While we know the song themes are strong in this album, we can also see (through the word distribution) that the great themes are distributed across the album and it's not just one song that makes this album great.



Thursday, December 19, 2013

What Twitter Says About A&E and Duck Dynasty

I was looking on twitter and noticed a lot of people talking about Duck Dynasty and then talking about the consequences to A&E.  I was curious on what was being said.  I mined Twitter for #A&E and #duckdynasty and got the following wordclouds:

#A&E
#duckdynasty amp is ampersand


These were for words mentioned at least 20 times over a few thousand tweets.  Clearly, people have more variety to say about A&E for their move in regards to Phil Robertson.  Whereas people hashtagging Duck Dynasty itsefl are just memorializing/supporting Phil.  One hashtag is an A&E bash whereas the other seems like a support network.  The two hashtags are definitely connected, but are communicating two different things it seems across the twittersphere and are being used by maybe the same people but two different emotions.  No sentiment analysis here...maybe at a later date.

Saturday, December 7, 2013

OKC Thunder: Too close for comfort?

The Thunder this season have already played some pretty exciting/anxiety-causing basketball.  Not just the excitement of watching new and old players return to the court, but the games seem to have been back-to-back nail-biting.  Peering around on the data from the good people at basketball reference, I wanted to see if this feeling actually was supported by data comparing this season so far to last season.  What better way to visualize it than with the rgexf package in R.

So this season the Thunder have had some close games.  So have many other teams in the West.  Most pundits can agree the West is stacked and there are a lot more close games to come.  But what is interesting is that so far the Thunder have had way more games than last season where the point differential between the two teams is 3 or less.  In fact as of 12/4/2013 they have the most in the NBA.
















Here is a visualization of the match-ups.  2012 the Thunder had two by early December:  Pistons and Spurs. We have had two games with the Warriors within 3 already (as most may recall).  Also interesting to note where western teams are when comparing 2012 to 2013 so far.  In general it will be interesting to see how this year pans out with this season starting out this way.  OKC fans have been biting their nails this season....perhaps more than any other fans and definitely more than last season around this time.  So in general fans...the data supports the emotion :-)

*Few notes on the networks:  Thunder away games are in blue, Thunder home in black along with every other NBA game.  Warriors had two games against the Thunder in 2013 but only one edge represented in the network (sorry couldn't work that one out).  

2012 Match-ups within 3 
2013 Match-ups within 3