Wordcloud using R

Wordcloud is cloud of words (visualization technique) that gives prominence to words based on frequency of each word usage from source text. Here is one example how to create wordcloud using R.

Exercise: Read comments from news article / blog and check trending key words for that topic(s) comments.

Learning:

  • Iterative process to remove stop words as “tm” library tm_map(x,content_transformer(removeStopwords),stopwords()) does not remove all stop words.
  • content_transfomer() function needs to be used mainly in later version of R. Function such as tolower() of “tm” library return different data type.
  • Standalone wordcloud reflects frequently used words but does not reflect sentiments about them. Wordcloud should be used along with sentimental analysis of posts.

Sample comments below:

  1. We should first take care of all connectivity issues in remote areas. Then we should make the rural masses aware of what their rights are and how can they use it. We should make a simple user friendly app wherein the rural masses will get all-in-one assistance (agriculture,health, animal care,bus time table, online mandi trading and online banking facility.
  2. In my opinion, a citizen in rural india who opens a bank account under the PRADHAN MANTRI JAN DHAN YOJNA should be given a free sim card which has internet facility activated by default and lifetime validity.This would encuorage more and more people to open accounts and will lead them with access to mobile facillty,Interet Facility.

R Code:

  • install.packages(“tm”)
  • library(“tm”)
  • tp <- read.csv(“D:/Customer/Srini/TP.csv”,sep=”,”,header=TRUE, stringsAsFactors =FALSE)
  • tpPost <- tp$Post
  • tpc <- Corpus(VectorSource(tpPost))
  • tpc <- tm_map(tpc,content_transformer(removePunctuation))
  • tpc <- tm_map(tpc,content_transfomer(removeNumbers))
  • tpc <- tm_map(tpc,content_transformer(tolower))
  • tpc <- tm_map(tpc,content_transformer(removeWords),stopwords(“English”))
  • tpc <- tm_map(tpc,content_transformer(stripWhitespaces))
  • inspect(tpc[1:10])
  • stopWords <- c(“can”,”make”,”get”,”need”,”use”,”sir”,”like”,”also”,”open”,”call”,”netfor”,”for”,”new”,”ca”,”issued”,”become”,”cscs”,”will”,”lack”,”first”,”care”,”take”)
  • tpc <- tm_map(tpc,content_transformer(removeWords),stopWords)
  • tpc <- tm_map(tpc,content_transformer(stripWhitespace))
  • inspect(tpc[1:10])
  • wordcloud(tpc,min.freq=200,max.words=Inf,random.order=FALSE)

    image 

  • pal2 <- brewer.pal(8,”Dark2″)
  • wordcloud(tpc,min.freq=200,max.words=Inf,random.order=FALSE,rot.per=.15,colors=pal2)

    image

  • wordcloud(tpc,min.freq=200,max.words=Inf,random.order=FALSE,scale = c(8,.2),rot.per=.15,colors=pal2)

image

Until next time…

Guru

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s