Data science with the tidyverse
My goal is to create an environment for data science where you can spend your precious mental energy on the problem at hand, rather than fighting with a programming language. To this end, I've developed a number of packages that make individual parts of the
process easier (ggplot2 for visualisation, dplyr for data manipulation, tidyr for data tidying, ...). Recently I've been thinking more about how the pieces fit together. In the words of Hal Abelson, "No matter how complex and polished the individual operations are,
it is often the quality of the glue that most directly determines the power of the system."
In this talk, I'll discuss the idea of the tidyverse, a set of conventions that ties together disparate package to provide a uniform interface for doing data science. Along the way you'll learn about tidy data, tibbles, list columns, pure functions,
referential integrity, piping, and why ggplot2 should never have existed.
Hadley is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (ggplot2, dplyr, tidyr), data ingest
(readr, readxl, haven), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his homepage,
*Join us for light refreshments and meet our guest from 3:45 to 4:00 in the lobby of Duncan Hall. The colloquium begins at 4:00 and ends at 5:00. Open to the general public.
Monday, September 19, 2016
4:00 PM to 5:00 PM
McMurtry Auditorium, RM1055