We entered the "Big Data" era but we still believe in chemtrails

Some of my friends are entering an age where they decide to make babies and raise children. These lines are dedicated to them, specially to those who believe in chemtrails and are against vaccines.

Reproducible Research means easing others the task of verifying findings and build upon them by sharing scientific claims with their data and software code. Computer science has created amazing tools for data analysis and exposed limitations in our ability to evaluate published findings at the same time.

If we had a true culture of sharing the norm would be that most scientific articles -no matter how standard, polemical or novel its claim is- would come accompanied with both data and code used as a part of the analysis. Of course this applies for the case of empirical articles, those that in my opinion are much harder to create than articles made out of theoretical facts.

Controversial scientific claim is not equivalent to indemonstrable scientific claim. If I postulate something like "Vaccines Don't Cause Autism" after I conduct a controlled study, then my study can be replicated when an investigator at another institution conducts a study addressing the same question, collects his/her own data, analyzes it separately from me, and publishes her own findings that are not in contradiction with my findings.

The claim "Vaccines Cause Autism" is not controversial, its obscene and makes me angry. Science works around verifying facts, and fact checking is hard, so please stop being disrespectful to people like Dr. Alexander Flemming and many people who worked hard for a brighter future for mankind.

To dissipate any doubt, because I do believe in the scientific method, let's suppose somebody claims there is a relationship between vaccines and autism. A claim like that, discredited by the way, should appear on a peer reviewed journal where experts validate how the study was conducted. As a minimum desirable condition the investigator should make the analytic data of his/her study publicly available.

If the latter happens, I'd be doubting about the study, so I would ask for the analytic data and computer code for the data analysis to be made publicly available. When the code is run on the analytic data, the findings should be identical to the published results. This steps is not of lesser importance, I refer about code because spreadsheets in software like Excel or Google Sheets are hard to trace, and the code for a study like that should be written in R or Python with data available in CSV or similar formats so that any person should be able to trace the results without licensing limitations.

What if the study that links vaccines to autism was opportunistic in its timing and it would be difficult to find a similar context in which to repeat it? then the study is not replicable.

What if the data for the study is so gross that is not just cherry-picking and turns out to be totally fabricated? in that case you have RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children, a famous study by Andrew Wakefield that gave him the unglorious honour of losing his medical license. I still don't know why so many anti-vaccine blogs insist on that article but turn the blind-eye on what happened later.

New technologies are increasing the rate of data collection, creating datasets that are more complex and extremely high dimensional. That's why reproducibility is important. In the meantime I work creating the analytic data that powers the Observatory of Economic Complexity and I try to make that process efficient, traceable and open so you can go to GitHub and inspect the code, detect bottlenecks or suggest speed improvements.