R and Cassandra

I often find myself turning to R to perform basic statistical analyses that either aren’t possible with Microsoft Excel, or are too manually tedious. Recently, I was faced with the challenge of analyzing data stored in Cassandra and started with the goal of creating a histogram of message sizes. I began my efforts by:

  • Grep email logs for the data of interest,
  • Capturing the output to a CSV,
  • Opening the CSV in Excel,
  • Calculating frequency statistics
  • Charting them

Awfully manual … there must be a better way! Enter the powers of R.

A quick google search led me to RCasssandra, which allows me to do the following:

[code language=”r”]
library(RCassandra)
conn = RC.connect(host="localhost", port=9160L)
RC.login(conn, username = "user", password="user")
RC.use(conn, "MINE")
data <- RC.get.range.slices(conn, "MyData", rlimit=10)
RC.close
[/code]

Then it’s easy to calculate my summary statistics, do some box plots, and get on with the rest of my job.

As a footnote, nice to see that the code highlighter I’m using actually supports R!

Leave a Reply