Ask A Data Scientist: Using Statistical Measures To Choose A Classifier

by Pat Lapomarda, on May 19, 2016

Welcome to Ask A Data Scientist where the data experts at Arkatechture will be answering audience-generated questions on a monthly basis! First up, we have a question that our Director of Data Science, Pat Lapomarda, was asked recently by a colleague. If you have a pressing data question, feel free to leave a comment below or submit your question here! Without further ado, let's get started:

Question:

“How do you choose among three classifiers using the four standard statistical measures produced in Azure Machine Learning?”

Let’s start this off with a brief explanation for those who read this question as gibberish:

AzureML is a "drag-and-drop" cloud service that enables anyone to to easily build, deploy, and share predictive analytics solutions. Platforms like this make it easy to build classifiers and other predictive models.

A classifier is a specific type of predictive model that takes thing and puts them into groups. Kinds of classifiers that are used every day are medical tests- like cancer tests, pregnancy tests, Lyme disease tests, etc. These all have two possible outcomes: the patient has the condition or they do not.

Classifiers can have more than two outcomes, but for this example let’s pretend our classifiers are three made-up cancer tests. How do you determine which test is the best?

Let’s start with some numbers: the statistical measures that AzureML produces for these 3 imaginary tests are:

To understand these statistics, let’s look at the Confusion Matrix, which outlines the different types of outcomes using its 4 quadrants - True Positive, False Positive, True Negative, & False Negative:

For the sake of simplicity, let's focus on the default measures that AzureML produced: Accuracy, Precision, Recall, and F1, which is a combination of Precision & Recall (also called the harmonic mean).

Accuracy measures the correct diagnoses (both for and against cancer) out of all patients diagnosed. When testing for something like cancer, Accuracy can be misleading. Cancer has a less than 1% incidence rate, so getting to 99% accuracy can be achieved by always predicting against cancer. Accuracy doesn’t help much when predicting cancer, but if you're predicting something with a higher incidence rate, it can be helpful.

Precision measures how many patients diagnosed with cancer ended up having it. This determines the False Positive Rate. In the cancer example, a false positive would mean giving chemotherapy to someone without cancer. The challenge here is the possibility that people who actually have cancer are missed (False Negatives). These people would not get treated, which is a much worse outcome than going through unnecessary treatment.

Recall measures how many of those who actually have cancer were diagnosed correctly. This determines the False Negative Rate, which results in identifying the maximum number of actual cancer cases as possible. The problem with Recall is that getting to 100% could be achieved, in this case, by assuming everyone has cancer. This would, very reasonably, make 99% of patients very dissatisfied.

F1 is a hybrid measure that combines Precision & Recall, taking both False Negative and False Positive errors into account. By default, F1 will assume that the costs of each error are equal, but this ratio can be customized to weigh them differently. Since AzureML uses the default F1 formula, weighting the errors was not an option.

Now that we understand how AzureML calculated the statistics for our imaginary cancer tests, which one is the best?

In this case, if there is a trade-off between the two outcomes that can be balanced, then B wins. However, if it’s really important to predict the positive outcome, like for diagnosing cancer, then you may want C and live with the higher False Positive Rate!

Hopefully you learned a little something from this post - if you have a pressing data science question, submit it here or leave it in the comments below! And be sure to subscribe to the Arka Blog so you never miss a new AADS post!

Ask A Data Scientist: Using Statistical Measures To Choose A Classifier

Question:

“How do you choose among three classifiers using the four standard statistical measures produced in Azure Machine Learning?”

The Arkatechture Blog

Subscribe to our Blog

Arkalytics

Consulting

About Us