Data Intensive Summer School
· July 8 – 10, 2013, 10:00 – 4:00 CST
· Marquette University, Milwaukee, WI
· Raynor Memorial Library, Room 330b, 1355 West Wisconsin Avenue, Milwaukee, WI – 53233
· Fee – Free; registration fee has been waived
· Registration required - https://portal.xsede.org/course-calendar. Look for “In person (Marquette University)”
· Participants are required to provide their own laptops.
The Virtual Data Intensive Summer School will be held 10:00 AM -4:00 PM CST on July 8 through July 10. To accommodate participants from different time zones, a short lunch break will be held at roughly 1:00-1:30 CST each day. All times given below are CST, and are subject to minor changes.
Monday, July 8
Sinkovits, San Diego Supercomputer Center
Introduction to summer school and basics of data intensive computing
Tuecke, University of Chicago
Globus Online for Research Data Management
Freund, University of California, San Diego
MapReduce, Hadoop and Spark
Marciano, University of North Carolina
Jeff Heard, Renaissance Computing Institute (RENCI)
Tuesday, July 9
Fariss, University of California, San Diego
Introduction to R and statistical analysis of data using R
(break at approximately 11:30)
1:30-2:00 Chris Fariss, University of California, San Diego
Statistical analysis of data using R (continued)
Elkan, University of California, San Diego
The landscape of analytics: a personal view
Wednesday, July 10
Szcepanski, University of Tennessee, Knoxville
Introduction to Visualization with R
(break at approximately 11:30)
2:00-4:00 Dean Abbott, Abbott Analytics Inc.
Preparing for the virtual summer school
Several of the instructors have requested that you preinstall software on your laptop. Given the large number of participants and the compressed schedule, we ask that you comply and do this before the start of the summer school.
· R, version 3.0.1 (Good Sport) or later http://www.r-project.org/
· R packages: ggplot2 (version 0.9.3 or later), scales, ggmap, googleVis, and igraph
· Download uber cars data set from http://www.infochimps.com/datasets/uber-anonymized-gps-logs
· KNIME: http://www.knime.org/
Prior knowledge of R is not required, but we do assume that you have some programming experience and familiarity with basic programming concepts (variables, arrays, loops, branching, etc.). You may find it helpful to acquaint yourself with basic R syntax ahead of time. Reading the first two chapters of the following online introduction is recommended http://cran.r-project.org/doc/manuals/R-intro.html
Prior knowledge of KNIME is not necessary, but you may find it helpful to familiarize yourself with the software. The KNIMEtech website contains very useful introductory material that will help you get started.
Data Intensive Summer School 2013
Virtual School of Computational Science and Engineering (VSCSE) is organizing a data intensive summer school from July 8 – 10, 2013. Marquette University in collaboration with University of Wisconsin - Milwaukee and other local research entities as part of the Southeast Wisconsin high performance cyberinfrastructure (SeWhip) invites all local researchers, industry practitioners, and students interested in big data and data intensive computing to join for a free summer school hosted by experts in the field.
The Data Intensive Summer School focuses on the skills needed to manage process and gain insight from large amounts of data. It is targeted at researchers from the physical, biological, economic and social sciences that are beginning to drown in data. We will cover the nuts and bolts of data intensive computing, common tools and software, predictive analytics algorithms, data management and non-relational database models. Given the short duration of the summer school, the emphasis will be on providing a solid foundation that the attendees can use as a starting point for advanced topics of particular relevance to their work.
· Experience working in a Linux environment
· Familiarity with relational data base models
· Examples and assignments will most likely use R, MATLAB and Weka. We do not require experience in these languages or tools, but you should already have an understanding of basic programming concepts (loops, conditionals, functions, arrays, variables, scoping, etc.)
Robert Sinkovits, San Diego Supercomputer Center
· Nuts and bolts of data intensive computing
· Computer hardware, storage devices and file systems
· Cloud storage
· Data compression
· Networking and data movement
· Data management
· Digital libraries and archives
· Data management plans
· Access control, integrity and provenance
· Introduction to R programming
· Introduction to Weka
· Predictive analytics
· Standard algorithms: k-mean clustering, decision trees, SVM
· Over-fitting and trusting results
· Dealing with missing data
· ETL (Extract, transfer and load)
· The ETL life cycle
· ETL tools from scripts to commercial solutions
· Non-relational databases
· Brief refresher on relational model
· Survey of non-relational models and technologies
· Presentation of data for maximum insight
· R and ggplot package
NOTE: Participants are required to provide their own laptops.
Summer School Logistics
The data intensive summer school will be presented live via high-definition video conferencing technology on the Marquette University campus in Room 330b of Raynor Memorial Library, 1355 West Wisconsin Avenue, Milwaukee, WI – 53233.
Parking is available for a fee. Please visit the university parking webpage for more details.
The fee for the workshop is free, but registration required at https:/portal.xsede.org/course-calendar. Look for “In person (Marquette University)”.
For more information see http://www.vscse.org/summerschool/2013/index.html. You may also contact Praveen.email@example.com and firstname.lastname@example.org
This Virtual School of Computational Science and Engineering is supported in part by the NSF under award number OCI-1041313. Local participation is supported by Marquette University and the Southeast Wisconsin High Performance Cyberinfrastructure (http://www.sewhip.org).