Slackalytics Part One: Slack Channel Word Clouds With R
by Pat Lapomarda, on January 26, 2017
We wrapped up 2016 with a reflection on the topics we covered. We closed by giving readers this synopsis of the year's topics:
Therefore, when planning a predictive solution, starting & ending with the user of the model and taking time to understand the critical aspects of how & why the solution is needed are paramount.
At Arkatecture, We Build Data Driven Business, so in staying true to that principle, we reviewed the traffic for each edition of Ask A Data Scientist and found that November's topic How Does Clustering in R vs. Tableau Work? had the most traffic. It was different because we didn't just talk about Data Science, we did some. So our New Year's Resolution for 2017 is to change Ask a Data Scientist up a bit. Instead of writing about data science, we are going to show you how to do data science!
To kick-off this installment, we are going to walk though the process of discovering a business problem, quantifying the problem, and then finding a solution. Many Data Science processes start with "Get Data". While it's tempting to dive right into building something, what? This is not a new challenge. The answer to this question is found by practicing the CRISP-DM process. Notice that while the process is centered on data, it starts by looping between Business Understanding and Data Understanding until you're ready for the next iterative work that loops between Data Preparation and Modeling until your model is ready for Evaluation then Deployment. Then you can repeat the process to create more value or refine your understanding so you can sucessfully solve the original Business Problem.
The Business Problem
Arkatechture uses the Slack messaging platform. It was implemented by the team to streamline communication within the company, since different team members were using different Instant Messaging tools. A recent development is that team members & leaders are leaving channels and are limiting their use of Slack, defeating the purpose of the tool all together.
This is a great problem to dig into! As a Data & Analytics company, Arkatechture has it's own Data Lake where it stores data from the applications it uses, such as Slack, so the data is readily available. Also, Arkatechture has experience with the Slack API, making a poptential deployment more cost-effective.
To kick this off, we organized an in-person meeting with 3 groups:
- Leaders - who had abandoned some (or all) of the Slack channels and could help define & quantify the problem.
- Data Science Team - who were charged with coming up with a solution to the problem using data.
- Marketing Team - who observed & took notes to develop this post, which helped the rest of the group to focus.
Qualifying the Problem:
The conversations that occur on Slack can often lead to distraction. Some company-wide Slack channels have a high level of activity, so if you don't view these channels for 2+ hours, you can miss a lot. This causes some employees to view the channels very actively to keep up with the conversations, or spend a good chunk of time going back to read the conversations they've missed.
Irrelevant Content In Channels
Every channel has a topic with the idea that only the topic is discussed in their respective channels. However, quite often irrelevant conversation takes place in some of these channels. This can make it frustrating for some employees causing them to leave channels in which they should be involved.
One of Slack's main purpose in our company is to answer questions quickly. However, in the Arka-family, we love to pull each other's legs, so when someone asks a question, there is usually a little bantering involved before the question is actually answered. While this is all just some light-hearted fun, it can cause some employees to turn away from Slack and use tools like email, where questions are answered less quickly.
Quantifying the Problem:
In order to use Data Science to solve this problem, we need to start using the Slack data we have and determine if we can observe the behaviors being described in Slack. Only then can we discuss solving the problem with an analytic. But right now, we don't even know if we can quantify & measure what is being described as the Business Problem.
What Needs To Be Quantified:
- Who is being distracted?
- Measure both sides of the conversation (Consumers & Contributors)
- Time it takes for a question to be answered
- Channel Topic vs. Channel Conversation
- Number of participants in a channel vs. amount of irrelevant content
For any predictive analytical project, we start by determining a target variable of interest that will become the dependent variable in the analytic we produce. We should also determine if this is a ranking, estimating, or classification problem, and create the target as such. Slack is mostly semi-structured text, so we will need to develop features that quantify these questions. As a start, we explored the slack data, building a wordcloud for each channel. Here is the one for our #pizzachallenge channel:
When we saw this and compared it to the results of the Pizza Challenge, we knew that we had sucessfully captured the messages by channel.
To see how we created the word clouds, click here.
Check back next month to see if we can quantify the problem and find a solution!