Ask A Data Scientist: Which Algorithm Do You Use?
by Pat Lapomarda, on October 20, 2016
I recently read an article on KDNuggets about the Top Algorithms Used by Data Scientists. The top one was Regression:
(Source: KDNuggets)
I frequently use regression, since it provides very clear interpretability. My experience is that in industry this is critical, since most traditional applications of predictive modeling have been used as enhanced rate estimates meant to out-perform expert opinions or moving averages, such as loss/claim prediction in banking or insurance. By using regression, data scientists can find common ground with experts through a review of the effects by interpreting the coefficients (all other things equal, a 1% rise in factor X results in a 2% rise in target Y, etc.).
The second algorithm, clustering, is one that Tableau 10 just made much more accessible. It's the go-to algorithm for making sense of populations by grouping them into subsets based upon their native characteristics. Traditionally, Marketers were the primary users of clustering. They'd build complex demographic segmentations that allowed them to easily pick up on differences in behavior that they could turn into results without building new models for each campaign. Clustering creates a reusable model that can be re-calibrated for many different uses without going back through the pain of customized variable selection.
Cluster modeling is a great part of any budding data scientists toolkit, but what data software's clustering package is best for your purposes and experience level? In the next installment of Ask a Data Scientist we will compare two approaches; Tableau, the "new kid" on the clustering block, and R, a tried and true clustering stalwart. So, stay tuned, and thanks for reading!