I needed to find the most common member of an R data frame. A while back, I found the tm package for text mining. It turns out there is a paper on the package, which illustrates how to use it. Indeed, it's practically what I needed. A few tweaks later and.. it worked. The only thing left is to share the code with the lot of you:
require(tm)
corpus = Corpus(VectorSource(yesterday$Link)) # yesterday is a data frame containing the vector link
tdm <- TermDocumentMatrix(corpus)
m <- as.matrix(tdm)
# this was the key line, specifically, the rowSums function
v <- sort(rowSums(m), decreasing = TRUE)
# for some reason, names(v) has a leading space, so exterminate it
names <- sub(' ','', names(v))
cat(paste(names[1:3], '\n'))
June 12, 2014
How to Find the Most Popular Elements of a Dataset
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment