Data Intensive Summer School

·         July 8 – 10,  2013, 10:00 – 4:00 CST

·         Marquette University,  Milwaukee,  WI

·         Raynor Memorial Library, Room 330b, 1355 West Wisconsin Avenue, Milwaukee, WI – 53233

·         Fee – Free; registration fee has been waived

·         Registration required - Look for “In person (Marquette University)”

·         Participants are required to provide their own laptops.


The Virtual Data Intensive Summer School will be held  10:00 AM -4:00 PM CST on July 8 through July 10. To accommodate participants from different time zones, a short lunch break will be held at roughly 1:00-1:30 CST each day. All times given below are CST, and are subject to minor changes.

Monday, July 8

10:00-11:00         Robert Sinkovits, San Diego Supercomputer Center
                                Introduction to summer school and basics of data intensive computing

11:00-12:00         Steve Tuecke, University of Chicago
                                Globus Online for Research Data Management

12:00-12:15         Break

12:15-1:15           Yoav Freund, University of California, San Diego
                                MapReduce, Hadoop and Spark

1:15-1:45              Lunch

1:45-4:00              Richard Marciano, University of North Carolina
                                Jeff Heard, Renaissance Computing Institute (RENCI)
                                Data management         

Tuesday, July 9

10:00-1:00           Chris Fariss, University of California, San Diego
                                Introduction to R and statistical analysis of data using R
                                (break at approximately 11:30)

1:00-1:30              Lunch

1:30-2:00              Chris Fariss, University of California, San Diego
                                Statistical analysis of data using R (continued)

2:00-4:00              Charles Elkan, University of California, San Diego
                                The landscape of analytics: a personal view

Wednesday, July 10

10:00-1:00           Amy Szcepanski, University of Tennessee, Knoxville
                                Introduction to Visualization with R
                                (break at approximately 11:30)

1:00-1:30              Lunch

2:00-4:00              Dean Abbott, Abbott Analytics Inc.
                                Text mining

Preparing for the virtual summer school

Several of the instructors have requested that you preinstall software on your laptop. Given the large number of participants and the compressed schedule, we ask that you comply and do this before the start of the summer school.


·         R, version 3.0.1 (Good Sport) or later

·         R packages: ggplot2 (version 0.9.3 or later), scales, ggmap, googleVis, and igraph

·         Download uber cars data set from

·         KNIME:

·         Google Chrome browser:


Prior knowledge of R is not required, but we do assume that you have some programming experience and familiarity with basic programming concepts (variables, arrays, loops, branching, etc.). You may find it helpful to acquaint yourself with basic R syntax ahead of time. Reading the first two chapters of the following online introduction is recommended

Prior knowledge of KNIME is not necessary, but you may find it helpful to familiarize yourself with the software. The KNIMEtech website contains very useful introductory material that will help you get started.

Data Intensive Summer School 2013

Virtual School of Computational Science and Engineering (VSCSE) is organizing a data intensive summer school from July 8 – 10, 2013. Marquette University in collaboration with University of Wisconsin - Milwaukee and other local research entities as part of the Southeast Wisconsin high performance cyberinfrastructure (SeWhip) invites all local researchers, industry practitioners, and students interested in big data and data intensive computing to join for a free summer school hosted by experts in the field.

The Data Intensive Summer School focuses on the skills needed to manage process and gain insight from large amounts of data. It is targeted at researchers from the physical, biological, economic and social sciences that are beginning to drown in data. We will cover the nuts and bolts of data intensive computing, common tools and software, predictive analytics algorithms, data management and non-relational database models. Given the short duration of the summer school, the emphasis will be on providing a solid foundation that the attendees can use as a starting point for advanced topics of particular relevance to their work.


·         Experience working in a Linux environment

·         Familiarity with relational data base models

·         Examples and assignments will most likely use R, MATLAB and Weka. We do not require experience in these languages or tools, but you should already have an understanding of basic programming concepts (loops, conditionals, functions, arrays, variables, scoping, etc.)  


    Robert Sinkovits, San Diego Supercomputer Center

Course topics:

·         Nuts and bolts of data intensive computing

·         Computer hardware, storage devices and file systems

·         Cloud storage

·         Data compression

·         Networking and data movement

·         Data management

·         Digital libraries and archives

·         Data management plans

·         Access control, integrity and provenance

·         Introduction to R programming

·         Introduction to Weka

·         Predictive analytics

·         Standard algorithms: k-mean clustering, decision trees, SVM

·         Over-fitting and trusting results

·         Dealing with missing data

·         ETL (Extract, transfer and load)

·         The ETL life cycle

·         ETL tools from scripts to commercial solutions

·         Non-relational databases

·         Brief refresher on relational model

·         Survey of non-relational models and technologies

·         Visualization

·         Presentation of data for maximum insight

·         R and ggplot package

NOTE: Participants are required to provide their own laptops.

Summer School Logistics

The data intensive summer school will be presented live via high-definition video conferencing technology on the Marquette University campus in Room 330b of Raynor Memorial Library, 1355 West Wisconsin Avenue, Milwaukee, WI – 53233.

Parking is available for a fee. Please visit the university parking webpage for more details.

The fee for the workshop is free, but registration required at https:/  Look for “In person (Marquette University)”.


For more information see You may also contact and

This Virtual School of Computational Science and Engineering is supported in part by the NSF under award number OCI-1041313. Local participation is supported by Marquette University and the Southeast Wisconsin High Performance Cyberinfrastructure (