Friday, April 28, 2017

Retrieving Reading Levels with R

For those that don't work in education or aren't aware, there is a measurement for a child's reading level called a Lexile ® Level.  There are ways in which this level can be retrieved using different reading assessments.  The measurement can be used to match a child's reading level to books that are the same level.  So the company maintains a database of books that they have assigned levels.  There are other assessments that measure how well a child is reading, but I'm not aware of any systems that assign books as comprehensively.  The library at the school I work at lacked Lexile Levels for their books.  This is unfortunate because of the fact that teachers are not able provide students with books for their respective Lexile Level.  Fortunately though the library had a list of ISBNs for the books.

On the Lexile website there is a way to search for books using ISBN numbers to retrieve their Lexile ® Level if the book is available in their database.  Entering every ISBN number available is a task fit for something not human.

rvest to the rescue.

Below is the script to retrieve the Lexile Levels of books if a list of ISBNs is available.  This was an incredible time save provided by some R code and hopefully someone else out there could use it.
 
library(rvest) library(httr)
library(htmltools)
library(dplyr)
##Prep for things used later
url<-"https://www.lexile.com/fab/results/?keyword="
url2<-"https://lexile.com/book/details/"
##CSV file with ISBN numbers
dat1<-read.csv("~isbns.csv",header=FALSE)
##dat1<-data.frame(dat1[203:634,])
dat<-as.character(dat1[,1])%>%trimws()
##dat<-dat[41:51]
blank<-as.character("NA")
blank1<-as.character("NA")
##blank2<-as.character("NA")
##blank3<-as.character("NA")
all<-data.frame("A","B","C")
colnames(all)<-c("name","lexiledat","num")
all<-data.frame(all[-1,])
for(i in dat) {
sites<-paste(url,i,sep="")
x <- GET(sites, add_headers('user-agent' = 'r'))
webpath<-x$url%>%includeHTML%>%read_html()
##Book Name
name<-webpath%>%html_nodes(xpath="///div[2]/div/div[2]/h4/a")%>%html_text()%>%trimws()
##Lexile Range
lexile<-webpath%>%html_nodes(xpath="///div[2]/div/div[3]/div[1]")%>%html_text()%>%trimws()%>%as.character()
##CSS change sometimes
lexiledat<-ifelse(is.na(lexile[2])==TRUE,lexile,lexile[2])
test1<-data.frame(lexiledat,NA)
##Breaks every now and then when adding Author/Pages
##Author Name
##author<-webpath%>%html_nodes(xpath='///div[2]/div/div[2]/span')%>%html_text()%>%as.character()%>%trimws()
##author<-sub("by: ","",author)
##Pages
##pages<-webpath%>%html_nodes(xpath='///div[2]/div/div[2]/div/div[1]')%>%html_text()%>%as.character()%>%trimws()
##pages<-sub("Pages: ","",pages)
##Some books not found, this excludes them and replaces with NA values
df<-if(is.na(test1)) data.frame(blank,blank1) else data.frame(name,lexiledat,stringsAsFactors = FALSE)
colnames(df)<-c("name","lexiledat")
df$num <- i
all<-bind_rows(all,df)
}
master<-rbind(all1,all)
   
Link to code


Friday, March 24, 2017

Neural Networks for Learning Lyrics

I created a Twitter account which was inspired by a couple Twitter accounts that applied a particular type of machine learning technique to learn how two (at the time) presidential hopefuls spoke. I thought, why not see what a model like this could do with lyrics from my favorite rock n roll artist?
Long short term memory (LSTM) is a recurrent neural network (RNN) that can be used to produce sentences or phrases by learning from text. The two twitter accounts that inspired this were @deeplearnthebern and @deepdrumpf which use this technique to produce phrases and sentences.
I scraped a little more than 300 of his songs and have fed them to a LSTM model using R and the mxnet library. Primarily I used the mxnet.io/ to build and train the model…great site and tools.  The tutorials on their site are very helpful and particularly this one.



The repository is here that contains the code for the scraper and other information.
Follow deeplearnbruce for tweets that are hopefully entertaining for Springsteen fans or anyone else. 

Saturday, December 10, 2016

Supreme Court Politics

I had wanted to post this before the US Election, but time constraints didn't allow.  With the potential for new Supreme Court Justices in the next four years, many voters and namely single-issue voters rallied behind Donald Trump for his seeming support for a conservative justice.  Most of the people I spoke with were primarily concerned with the potential appointment of a justice who could help in concentrating efforts in overturning abortion.  For a more in-depth look on Trump, his stances and commentary on overturning abortion I found this article to be helpful.

I was curious in the past how the political leanings of the Justices has changed over time and if opportunities like a conservative bench with a republican president have occurred in the past.  I was curious because I wondered if an appointment during the Trump administration would change anything.

The graph below shows the amount of abortions over time during different presidents and each point shows the political split of the bench (Democrat-Republican).


Political leanings on a bench aren't indicative of a pro-life vote.  However, learning more conservative politically does provide potential for a favorable pro-life vote.  I found this information interesting since the split on the court has predominantly been conservative for the last 40 years and only recently has it become more liberal.  The graph speaks for itself and the code for it is available here.

This information isn't put out here to necessarily change minds on this issue.  That's best done around tables where both sides can listen, but I thought this was helpful for understanding some of the numbers and history on this particular circumstance.