Ask A Data Scientist: Career Guidance for Aspiring Data Scientists

by Pat Lapomarda, on June 16, 2016

 

Question:

"I work with a younger colleague interested in a career path as a data scientist. Is there any guidance or resource that you can suggest that I can provide for her to leverage? She already does business and data analysis and is working on advanced Tableau training." - David Maillet

Blog Graphics-20.png

Perfect timing on this question! A former colleague of mine, Will Tewalt, also asked me almost the exact same question yesterday.  We just listed 5 Inexpensive Ways To Learn About Data Online. The first 4 are a great place to start, but the Data Science Specialization on Coursera by Johns Hopkins University is my recommendation. It is a 9 course certification program and sounds like it would be a perfect fit for your colleague. I began the courses last year and the instructors, Brian Caffo, Roger Peng, and Jeff Leek are excellent! It really covers all the bases to become proficient in today's data science world!

 

The first course in the specialization, Data Scientist's Toolbox, gets you started using R/RStudio and Git/GitHub. It also introduces you to RMarkdown, which is a great way to ensure that your analytics are reproducible. It's an all too common is problem that when a new analyst picks up someone else's work, it's usually easier to start over than to start from where their predecessor left off. This concept is reinforced throughout the entire specialization, especially in the second and third courses: R Programming and Getting & Cleaning Data, as well as the key focus for the fifth course: Reproducible Research.

The fourth course, Exploratory Data Analysis, covers plotting with the base package plotting and Hadley Wickham's ggplot2 package. ggplot2 is great, but I'm still a huge Tableau-ficionado for visualizing data, so I'm very happy to hear your colleague is already working in Tableau! This course also covers both hierarchical & k-means clustering, which is part of the upcoming Tableau 10 release.

Statistical Inference, the sixth course in the Data Science specialization, is a primer (or re-fresher) in statistics. It's by far the most complex course in the series, but Brian Caffo, the instructor, does a great job with the material. I've recommended this course specifically to junior analysts who want to move into predictive modeling or machine learning, so they have a solid foundation from which to build upon.

The seventh and eighth courses, Regression Models and Practical Machine Learning, respectively, are filled with practical knowledge and will help your colleague make the transition from data analyst to data scientist. That said, make sure she takes Statistical Inference first, so she's doing real-science and not junk-science.

The final course, Developing Data Products, covers some slick tools, like Shiny, rCharts, GoogleVis, and Slidify. It will also walk her through how to create her own R package, so she can become the next Hadley Wickham! At the end of the specialization, in order to earn the certificate, there's a final Capstone project. I haven't yet had time to fit in the Capstone, so if she begins the specialization now, we might end up in the same session!

Thanks to David Maillet for such a great question; I hope this has been insightful!

If you have a data science question, go ahead and ask here! Be sure to subscribe to the Arkatechture blog (up top!) to receive instant updates whenever AADS goes up, which is on every third Thursday! 

Topics:Data Science

The Arkatechture Blog

A place for visualization veterans, analytics enthusiasts, and self-aware artificial intelligence to binge on all things data. 

Subscribe to our Blog