JHU-Coursera Data Science Specialization and MOOCs Interest


I have completed this specialization more than a year ago but I have decided to write again about it because I'm teaching Data Visualization at Universidad de Chile and JHU materials has proven to be an excellent source for my students. If you want to read more about this specialization you can do that here.

Why I do recommend this specialization?

During my major I had to take four statistics courses that consisted on probability, inference and econometrics. During those courses, among other things, I did learn that the OLS estimator in the context of a linear model turns out to be a BLUE estimator, but I did never learn how to report statistical information and that is a problem.

What I did learn in this specialization? To ask questions to making inference, publishing results, and about something really important named Reproducible Research. This specialization has a focus on Reproducible Research and communicating results. Most courses have both quizzes and projects. I had the chance to share projects made under a totally different approach to mine and I did learn a lot from that.

I did like it as I had no knowledge about R, and I needed to use R to complete my thesis about Structural Equation Modeling because my advisor Edgar E. Kausel is cool and he wanted to make it reproducible. The courses are well structured and focused on practical applications rather than on statistical theory. At first, it was hard as I had to read a lot and write a lot code that is not needed in programs such as SPSS or Stata, but that code is a fundamental piece for open science.

MOOCs and people's interest

I am lucky enough to have students from quite different backgrounds such as Engineering and Obstetrics. Why shall a student of Obstetrics or English Literature have an interest for Data Visualization? When I uploaded the syllabus I stated that the first weeks of the course were going to be about Google Sheets and R to learn to process data first.

Talking to my students I could realize that some of them wanted to create elegant plots, or understanding Public Health statistics or using R to analyze texts like this article by Julia Silge. It was surprisingly good to see they found my course to be useful and some of them have heard about R but didn't take a MOOC in the past because some of them are in english.

It became common that when I am presented as a person who knows statistics I'm often asked about Big Data and my opinion about how that is going to change our lives. Some journalist and economist friends call me or send me emails to ask me about my book and which MOOC do I recommend to learn R because they face restrictions with cells limits in spreadsheet software or they cannot use the propietary software they use at work when thay are on their laptops.

The good and the bad about JHU Data Science Specialization

I recommend this specialization given that, in my opinion, its favourable points overpass the negative points, and the negative points are more experience-related than contents-related.

Good points

Bad points

Courses description

In case you want a detailed description, here's the content of each course.

Course 1 • The Data Scientist’s Toolbox


This course teaches you how to set up a Github account and sync files. No other quizzes or assignments than those related to configure and use Github.

Course 2 • R Programming


Course 3 • Getting and Cleaning Data


Course 4 • Exploratory Data Analysis


Course 5 • Reproducible Research


Course 6 • Statistical Inference


Course 7 • Regression Models


Course 8 • Practical Machine Learning


Course 9 • Developing Data Products


Here's my Course project.

Course 10 • Data Science Capstone


This course consisted on the solely purpose of writing a Shiny application that works for text prediction. This project required me to study a lot and use all of the things that I learned during the specialization.

When I took this specialization SwiftKey was paying attention to what the students were doing with things such as Empirical Bayes Method --that was used by Turing himself to decript messages-- to create an efficient application given Shiny limits. Also, the best students that had a blog had the possibility to be accepted as R-Bloggers writers.

Here's my Course Project.