The Hitchhiker's Guide to Ggplot2

Published: 2016-11-30
Updated: 2017-10-10

"Any bleeder knows that books are never finished, only abandoned."
César A. Hidalgo

About the book

You can find the book here. Less than two weeks ago R 3.4.2 was released, so we updated our R version and packages to rebuild the book and update most of the examples. The present version of the book constitutes a major update that simplifies many examples and introduces a novel approach to work with TTF/OTF fonts in different systems.

Besides my full time position I also teach and an important fraction of my students are more than ten years older than me, they do work full time, and attend night courses. Feedback from those students who face important time constraints has constantly shaped my lectures and also the examples within the book, I try to keep it simple and effective.

The last major update was all about re-doing all of our examples using the Tidyverse. Why? Base R was ok for us, but as the Rstats world change, newer and better tools are available and we can't ignore that! For example, dplyr is a great R package that makes it easier to learn and read code, and also allows people to focus on important Data Science concepts (e.g. tidy data) rather than losing focus and being centered on syntax.

This is a book that may look complete but changes in R package are always demanding changes in the examples contained within the book. This is why the electronic format is perfect for the purpose of this work. Trapping it inside a dead tree book is ultimately a waste of time and resources in my own view.

Being it my first book, this is also my first collaborative work. I wrote it in a 50-50 collaboration with Jodie Burchell. Jodie is an amazing data scientist. I highly recommend reading her blog Standard Error where you can find really good material on Reproducible Research and more.

This is a technical book. The scope of the book is to go straight to the point and the writing style is similar to a recipe with detailed instructions. It is assumed that you know the basics of R and that you want to learn how to create beautiful plots.

Each chapter will explain how to create a different type of plot, and will take you step-by-step from a basic plot to a highly customised graph. The chapters' order is by degree of difficulty.

Every chapter is independent from the others. You can read the whole book or go to a section of interest and we are sure that it will be easy to understand the instructions and reproduce our examples without reading the first chapters.

In total this book contains 237 pages (letter paper size) of different recipes to obtain an acceptable aesthetic result. You can download the book for free (yes, really!) from Leanpub.

How the book started?

Almost a year ago I finished writing the eleventh tutorial in a series on using ggplot2 I created with Jodie Burchell.

I asked Jodie to co-authors some blog entries when I found her blog and I realised that my interest in Data Science was reflected on her blog. The book comes after those entries on our blogs.

A few weeks later those tutorials evolved into the shape of an ebook. The reason behind it was that what we started to write had an unexpected success. We even had RTs from important people in the R community such as Hadley Wickham. Finally the book was released by Leanpub.

We also included a pack that contains the Rmd files that we used to generate every chart that is displayed in the book.

Why Leanpub?

Leanpub is a platform where you can easily write your book by using MS Word among other writing software and it even has GitHub and Dropbox integration. We went for R Markdown with LaTeX output, and that means that Leanpub is both easy to use and flexible at the same time.

Even more, Leanpub enables the audience to download your books for free, if you allow it, or you can define a price range with a suggested price indication. The website gives the authors a royalty of 90% minus 50 cents per sale (compared to other platforms this is convenient for the authors). You can also sell your books with additional exercises, lessons in video, etc.

For example, last year I updated all the examples contained in the book just a few days after ggplot2 2.2 was released and my readers had a notification email just after I uploaded the new version. People who pay or does not pay for your books can download the newer versions of if for free.

If that's not enough Leanpub allows you to create bundles and sell your books as a set or you can charge another price for your book plus additional material such as Rmarkdown notebooks, instructional videos and more.

What I learned from my first book?

At the moment I am teaching Data Visualization and from my students I learned that good visualizations come after they learn the visualization concepts. Coding cleary helps but coding goes after the fundamentals.

It would be better to teach visualization fundamentals first and not in parallel while coding, and this applies specially when a part of your audience has never wrote code before.

I got a lot of feedback from my students last term. That was really helpful to improve the book and dive some steps in smaller pieces to facilitate the understading of the Grammar of Graphics.

The interested reader may find some remarkable books that can be read before mine. I highly recommend:

Those are really good books that show the fundamentals of Data Visualisation and provide the key concepts and rules needed to communicate effectively with data.