{"id":342,"date":"2017-01-19T12:29:43","date_gmt":"2017-01-19T17:29:43","guid":{"rendered":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/?p=342"},"modified":"2017-01-24T10:52:00","modified_gmt":"2017-01-24T15:52:00","slug":"update-on-a-trans-institutional-big-data-infrastructure-at-vanderbilt","status":"publish","type":"post","link":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/2017\/01\/update-on-a-trans-institutional-big-data-infrastructure-at-vanderbilt\/","title":{"rendered":"Update on \u201cA Trans-Institutional Big Data Infrastructure at Vanderbilt\u201d"},"content":{"rendered":"<div id=\"attachment_339\" style=\"width: 210px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-339\" class=\"wp-image-339\" src=\"https:\/\/cdn.vanderbilt.edu\/t2-my\/my-prd\/wp-content\/uploads\/sites\/2261\/2017\/01\/Will-French_2.jpg\" alt=\"Will French_2\" width=\"200\" height=\"186\" \/><p id=\"caption-attachment-339\" class=\"wp-caption-text\">Will French, ACCRE&#8217;s Manager of Research Computing Operations<\/p><\/div>\n<p><em>Written by Will French, ACCRE&#8217;s\u00a0<span class=\"s1\">Manager of Research Computing Operations<\/span><\/em><\/p>\n<p>The world is drowning in data. This is the reality that has become increasingly apparent across virtually all areas of industry, medicine, business, higher education, and research, among other sectors, over the last several years. In 2012, <a href=\"http:\/\/www.bbc.com\/news\/business-26383058\">IBM stated that 2.5 exabytes (i.e. 2.5 billion gigabytes) of data were being generated worldwide <em>per day<\/em><\/a>. The challenge of dealing with \u201cbig data\u201d \u2013 that is, effectively managing the flood of data being generated and extracting useful information from it \u2013 is undoubtedly one of the major challenges of the 21<sup>st<\/sup> century.<\/p>\n<p><a href=\"http:\/\/engineering.vanderbilt.edu\/people\/bd\">A number of researchers at Vanderbilt are actively exploring methods for tackling big data problems<\/a>. While the term \u201cbig data\u201d can take on many different meanings for distinct research domains and areas of inquiry, the inherent challenge is often the same: how does one increase the scale of data without running a computer out of memory, or waiting days or weeks for an analysis process to complete? Many methods for data storage, management, and analysis simply do not work as efficiently or effectively once the size of data crosses some critical threshold. Web companies such as Google and Yahoo began encountering these challenges in the early 2000s and developed new technologies that are only now making their way into academic research.<\/p>\n<p>The <a href=\"http:\/\/www.accre.vanderbilt.edu\/\">Advanced Computing Center for Research and Education<\/a> (ACCRE) is currently being supported by the university to build a data storage and computing environment that is designed and optimized for <a href=\"http:\/\/www.vanderbilt.edu\/strategicplan\/trans-institutional-programs\/tips-2015awards\/bigdata.php\">big data<\/a>, which launched in 2015 thanks in large part to a Vanderbilt <a href=\"http:\/\/vanderbilt.edu\/strategicplan\/trans-institutional-programs\/tipshome.php\">Trans-Institutional Program<\/a> (TIPs) award. This new environment is centered around the ecosystem called Hadoop that is widely used at large web companies and within the data science industry today. Over 100 Vanderbilt and VUMC research groups currently use ACCRE resources for a variety of demanding storage, backup, and computing applications. However, the current ACCRE environment was designed with assumptions about how data are stored and analyzed that may limit its ability to efficiently tackle big data problems. This new environment will be designed specifically with the challenges of big data in mind.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-334\" src=\"https:\/\/cdn.vanderbilt.edu\/t2-my\/my-prd\/wp-content\/uploads\/sites\/2261\/2017\/01\/ACCRE.png\" alt=\"ACCRE\" width=\"225\" height=\"225\" \/>ACCRE is currently managing a test Hadoop system that is accessed by a number of researchers and students across campus, allowing them to test the benefits of the new system. The test system also allows ACCRE staff to gain the expertise needed to run and manage the system, while also providing invaluable insight into the specific challenges within each researcher\u2019s applications. A new production-scale system will be purchased and deployed to the campus over the next few years. Once available, the production system will open up research opportunities for problems that were previously inaccessible. Additionally, researchers already making use of the Hadoop ecosystem can continue their analysis but at a significantly larger scale.<\/p>\n<p>One of the large users of the current environment is a group of over 50 students in Professor Daniel Fabbri\u2019s course in big data (CS 4266\/5266). This is the third year in a row that ACCRE has managed the environment for Professor\u2019s Fabbri\u2019s course. Students access the system remotely, take part in detailed training sessions led by Professor Fabbri and ACCRE staff, and complete homework assignments on big data problems, such as mining data on Wikipedia to search for interesting trends in topics or content. These experiences are invaluable for Vanderbilt students, said Fabbri: \u201cCourse projects previously were limited to small data sets. Now, with the cluster, students are analyzing much large data sets and are able study real-world problems.\u201d This real-world experience is a huge benefit of using the Hadoop environment managed by ACCRE. \u201cStudents can now experience the power and pain of working with big data sets with the cluster. These experiences will make Vanderbilt students extremely competitive in the job market,\u201d notes Fabbri.<\/p>\n<p>In addition to building a new big data system, a significant portion of the ACCRE TIPs project is also devoted to creating immersive educational experiences for undergraduate students through a ten-week <a href=\"http:\/\/www.accre.vanderbilt.edu\/?page_id=3063\">ACCRE Scholars summer research program<\/a> in which students use ACCRE resources to complete a research project with a Vanderbilt faculty member. This past summer, six Vanderbilt undergraduate students participated in the inaugural program in topics ranging from electronic structure calculations to single nucleotide polymorphisms and computational fluid dynamics. Two of the summer students also had the opportunity to travel to the annual Supercomputing conference held in Austin, Texas to present their summer research projects and interact with other students and researchers in the community. This summer, ACCRE plans to host a similar number of students as it continues to build and develop this exciting new big data environment.<\/p>\n<p>Be sure to visit the blog page often for updates on our TIPs project. We also encourage you to leave comments or ask questions in the space provided below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Written by Will French, ACCRE&#8217;s\u00a0Manager of Research Computing Operations The world is drowning in data. This is the reality that has become increasingly apparent across virtually all areas of industry, medicine, business, higher education, and research, among other sectors, over the last several years. In 2012, IBM stated that 2.5 exabytes (i.e. 2.5 billion gigabytes)&#8230;<\/p>\n","protected":false},"author":6209,"featured_media":334,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,12],"tags":[],"class_list":["post-342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","category-tips-2015"],"_links":{"self":[{"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/posts\/342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/users\/6209"}],"replies":[{"embeddable":true,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/comments?post=342"}],"version-history":[{"count":4,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/posts\/342\/revisions"}],"predecessor-version":[{"id":352,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/posts\/342\/revisions\/352"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/media\/334"}],"wp:attachment":[{"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/media?parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/categories?post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/my.vanderbilt.edu\/universityfundingprograms\/wp-json\/wp\/v2\/tags?post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}