We wrapped up 2016 with a reflection on the topics we covered. We closed by giving readers this synopsis of the year's topics:
Therefore, when planning a predictive solution, starting & ending with the user of the model and taking time to understand the critical aspects of how & why the solution is needed are paramount.
At Arkatecture, We Build Data Driven Business, so in staying true to that principle, we reviewed the traffic for each edition of Ask A Data Scientist and found that November's topic How Does Clustering in R vs. Tableau Work? had the most traffic. It was different because we didn't just talk about Data Science, we did some. So our New Year's Resolution for 2017 is to change Ask a Data Scientist up a bit. Instead of writing about data science, we are going to show you how to do data science!
To kick-off this installment, we are going to walk though the process of discovering a business problem, quantifying the problem, and then finding a solution. Many Data Science processes start with "Get Data". While it's tempting to dive right into building something, what? This is not a new challenge. The answer to this question is found by practicing the CRISP-DM process. Notice that while the process is centered on data, it starts by looping between Business Understanding and Data Understanding until you're ready for the next iterative work that loops between Data Preparation and Modeling until your model is ready for Evaluation then Deployment. Then you can repeat the process to create more value or refine your understanding so you can sucessfully solve the original Business Problem.
Arkatechture uses the Slack messaging platform. It was implemented by the team to streamline communication within the company, since different team members were using different Instant Messaging tools. A recent development is that team members & leaders are leaving channels and are limiting their use of Slack, defeating the purpose of the tool all together.
This is a great problem to dig into! As a Data & Analytics company, Arkatechture has it's own Data Lake where it stores data from the applications it uses, such as Slack, so the data is readily available. Also, Arkatechture has experience with the Slack API, making a poptential deployment more cost-effective.
To kick this off, we organized an in-person meeting with 3 groups:
The conversations that occur on Slack can often lead to distraction. Some company-wide Slack channels have a high level of activity, so if you don't view these channels for 2+ hours, you can miss a lot. This causes some employees to view the channels very actively to keep up with the conversations, or spend a good chunk of time going back to read the conversations they've missed.
Every channel has a topic with the idea that only the topic is discussed in their respective channels. However, quite often irrelevant conversation takes place in some of these channels. This can make it frustrating for some employees causing them to leave channels in which they should be involved.
One of Slack's main purpose in our company is to answer questions quickly. However, in the Arka-family, we love to pull each other's legs, so when someone asks a question, there is usually a little bantering involved before the question is actually answered. While this is all just some light-hearted fun, it can cause some employees to turn away from Slack and use tools like email, where questions are answered less quickly.
In order to use Data Science to solve this problem, we need to start using the Slack data we have and determine if we can observe the behaviors being described in Slack. Only then can we discuss solving the problem with an analytic. But right now, we don't even know if we can quantify & measure what is being described as the Business Problem.
For any predictive analytical project, we start by determining a target variable of interest that will become the dependent variable in the analytic we produce. We should also determine if this is a ranking, estimating, or classification problem, and create the target as such. Slack is mostly semi-structured text, so we will need to develop features that quantify these questions. As a start, we explored the slack data, building a wordcloud for each channel. Here is the one for our #pizzachallenge channel:
When we saw this and compared it to the results of the Pizza Challenge, we knew that we had sucessfully captured the messages by channel.
To see how we created the word clouds, click here.
Check back next month to see if we can quantify the problem and find a solution!