{"id":42,"date":"2013-06-12T13:35:03","date_gmt":"2013-06-12T19:35:03","guid":{"rendered":"http:\/\/xiaan.com\/blog\/?p=42"},"modified":"2024-02-09T15:06:42","modified_gmt":"2024-02-09T22:06:42","slug":"r-and-cassandra","status":"publish","type":"post","link":"https:\/\/xiaan.com\/blog\/2013\/06\/r-and-cassandra\/","title":{"rendered":"R and Cassandra"},"content":{"rendered":"\n<p>I often find myself turning to R to perform basic statistical analyses that either aren&#8217;t possible with Microsoft Excel, or are too manually tedious. Recently, I was faced with the challenge of analyzing data stored in Cassandra and started with the goal of creating a histogram of message sizes. I began my efforts by:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Grep email logs for the data of interest,<\/li>\n\n\n\n<li>Capturing the output to a CSV,<\/li>\n\n\n\n<li>Opening the CSV in Excel,<\/li>\n\n\n\n<li>Calculating frequency statistics<\/li>\n\n\n\n<li>Charting them<\/li>\n<\/ul>\n\n\n\n<p>Awfully manual &#8230; there must be a better way! Enter the powers of R.<\/p>\n\n\n\n<p>A quick google search led me to <a href=\"http:\/\/www.rforge.net\/RCassandra\/\">RCasssandra<\/a>, which allows me to do the following:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#24292e;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:16px 0 0 16px;width:100%;text-align:left;background-color:#ffffff\"><span style=\"background:#2f363c;padding:0.3rem 0.5rem 0.2rem;border-radius:1rem;font-size:0.8em;line-height:1;height:1.25rem;text-align:center;display:inline-flex;align-items:center;justify-content:center;color:#ffffff\">R<\/span><\/span><span role=\"button\" tabindex=\"0\" data-code=\"library(RCassandra)\nconn = RC.connect(host=&quot;localhost&quot;, port=9160L)\nRC.login(conn, username = &quot;user&quot;, password=&quot;user&quot;)\nRC.use(conn, &quot;MINE&quot;)\ndata &lt;- RC.get.range.slices(conn, &quot;MyData&quot;, rlimit=10)\nRC.close\" style=\"color:#24292e;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki github-light\" style=\"background-color: #fff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #005CC5\">library<\/span><span style=\"color: #24292E\">(RCassandra)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E36209\">conn<\/span><span style=\"color: #24292E\"> <\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #24292E\"> <\/span><span style=\"color: #E36209\">RC.connect<\/span><span style=\"color: #24292E\">(<\/span><span style=\"color: #E36209\">host<\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #032F62\">&quot;localhost&quot;<\/span><span style=\"color: #24292E\">, <\/span><span style=\"color: #E36209\">port<\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #005CC5\">9160L<\/span><span style=\"color: #24292E\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E36209\">RC.login<\/span><span style=\"color: #24292E\">(conn, <\/span><span style=\"color: #E36209\">username<\/span><span style=\"color: #24292E\"> <\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #24292E\"> <\/span><span style=\"color: #032F62\">&quot;user&quot;<\/span><span style=\"color: #24292E\">, <\/span><span style=\"color: #E36209\">password<\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #032F62\">&quot;user&quot;<\/span><span style=\"color: #24292E\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E36209\">RC.use<\/span><span style=\"color: #24292E\">(conn, <\/span><span style=\"color: #032F62\">&quot;MINE&quot;<\/span><span style=\"color: #24292E\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #24292E\">data <\/span><span style=\"color: #D73A49\">&lt;-<\/span><span style=\"color: #24292E\"> <\/span><span style=\"color: #E36209\">RC.get.range.slices<\/span><span style=\"color: #24292E\">(conn, <\/span><span style=\"color: #032F62\">&quot;MyData&quot;<\/span><span style=\"color: #24292E\">, <\/span><span style=\"color: #E36209\">rlimit<\/span><span style=\"color: #D73A49\">=<\/span><span style=\"color: #005CC5\">10<\/span><span style=\"color: #24292E\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #24292E\">RC.close<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Then it&#8217;s easy to calculate my summary statistics, do some box plots, and get on with the rest of my job.<\/p>\n\n\n\n<p>As a footnote, nice to see that the code highlighter I&#8217;m using actually supports R!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I often find myself turning to R to perform basic statistical analyses that either aren&#8217;t possible with Microsoft Excel, or are too manually tedious. Recently, I was faced with the challenge of analyzing data stored in Cassandra and started with the goal of creating a histogram of message sizes. I began my efforts by: Awfully &#8230;<a class=\"post-readmore\" href=\"https:\/\/xiaan.com\/blog\/2013\/06\/r-and-cassandra\/\">read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[10,11],"tags":[],"class_list":["post-42","post","type-post","status-publish","format-standard","hentry","category-big-data","category-r"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p467qb-G","jetpack_likes_enabled":false,"_links":{"self":[{"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/posts\/42","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/comments?post=42"}],"version-history":[{"count":9,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/posts\/42\/revisions"}],"predecessor-version":[{"id":228,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/posts\/42\/revisions\/228"}],"wp:attachment":[{"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/media?parent=42"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/categories?post=42"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/xiaan.com\/blog\/wp-json\/wp\/v2\/tags?post=42"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}