Three Simple Rules for Building Data Products that People Will Actually Use
Tim Trefren is one of the founders at Mixpanel, the most advanced analytics platform for web & mobile applications. He has many years of experience building compelling, accessible interfaces to data. To learn more, check out the Mixpanel engineering blog.
Building data products is not easy.
Many people are uncomfortable with numbers, and even more don't really understand statistics. It's very, very easy to overwhelm people with numbers, charts, and tables - and yet numbers are more important than ever. The trend toward running companies in a data-driven way is only growing...which means more programmers will be spending time building data products. These might be internal reporting tools (like the dashboards that your CEO will use to run the company) or, like Mixpanel, you might be building external-facing data analysis products for your customers.
Either way, the question is: how do you build usable interfaces to data that still give deep insights?
We've spent the last 6 years at Mixpanel working on this problem. In that time, we've come up with a few simple rules that apply to almost everyone:
Help your users understand and trust the data they are looking at
Strike the right balance between ease and power
Support rapid iteration & quick feedback loops
Help your users understand and trust the data they are looking at
Your users are busy. They're behind schedule and stressed out. Add to that the general discomfort many people have with numbers and it can be very difficult to build interfaces to data that people will actually use.
You can make things easy for them by providing context for the data they're looking at and by providing proof points that help them trust the data in the first place. This can be achieved with a combination of lots of little things - there’s no silver bullet - and the specific techniques you use depend a lot on the audience and on the medium.
Let's talk about two specific examples:
An executive dashboard (audience: CEO, goal: understand the KPI's of the business)
a product dashboard (audience: product managers/designers/engineers, goal: understand product usage patterns).
Giving context for data
The executive dashboard is going to have a smaller number of high-level metrics. Here, you're going to want to give context explicitly with very clear titles, labels, and descriptions of what the user is looking at. This would be overboard for something people look at every day, but when you only look at something once a month or once a quarter you need to tell more of a story.
For the product dashboard you still need context, but it doesn't have to be so explicit: your users live here and they don't need the same static description wasting UI space every day. The important thing is making any labels that you do have very clear - good, human readable names (e.g. "Add item to cart" vs "add_cart_v2") go a very long way to ease comprehension.
In the Mixpanel interface, we've made a point of using clear labels for data. For example, we send metadata about browser information over the wire as part of our data payload. What we send looks something like "{ '$browser': 'ie' }", but that's not what we're going to show the user - it's too much work to parse. Instead, we let them filter by "Browser" and "Internet Explorer".
Helping users trust data
If people are going to use this data to make decisions they need to trust it! This is particularly important when you are reporting unusual results - you need to help the user understand that this isn't just a data collection or reporting error.
Like providing context, there are a number of techniques you can use for building trust with users. Your users will have a number of questions about the numbers they are seeing:
where is the data from?
how was the calculation made?
what caused that crazy spike?
In the executive dashboard, you’ll want to answer these questions and clear up any other ambiguities explicitly in very clear prose. For example, if you’re reporting on revenue numbers and combining data from multiple sources (salesforce, app database, stripe, etc) call it out: “Raw revenue numbers are pulled from the Stripe API. Projected revenue is based on expected Bookings for next month from Salesforce.”
In the product dashboard, the key to answering these questions will be making the numbers verifiable. You don't need to explain it in words, just make it possible for the user to verify the number they're seeing. This best way to do this is to make it possible to access the raw data behind the rolled up numbers you're reporting.
For example, if you're reporting on the revenue generated by a product, users should be able to dig into the raw list of transactions (with all metadata attached) that you are using to calculate monthly revenue. This allows them to verify the methodology (they can sum the transactions themselves) and verify that the data is good (no duplicate or weird looking transactions).
If you can get users to trust the data they are looking at, you’re much closer to adding real value, and helping them make data-driven decisions.
Strike the right balance between ease and power
It's tempting to make everything possible in the UI - typically, the logic goes:
we have the capability in the backend
someone might want it
might as well put it in, can't hurt, right?
Wrong! This is a failure pattern! People will not use the tools you build if they are too complex. If the first thing that greets people when they try to look at their data is a bewildering array of toggles and controls, you will fail.
The very simple rule of thumb I would offer is: avoid checkboxes, radio buttons, and tabs - if you have these kind of configuration options, you're probably trying to do too much.
This is particularly important when you're first getting started with a new data product. Starting simple will make it much easier for you to get people to try it and for you to gather feedback - over time, it's okay to get more complex, as long as the complexity you add solves real problems for your users.
For example, Mixpanel offers a report called "retention" that helps you understand how sticky your product is. It answers the question "how many of my users come back and use the product again each day/week/month?".
When we first built it, that's the only question it answered. Over time, we've added additional functionality, making it possible to compare retention across highly configurable segments of users. We were careful, however, to add this complexity in an incremental way, and with incremental UI complexity: the first-time user is not asked to configure a segment, they just get immediate results. If you want the additional power, you have it - but we avoid forcing everyone down that path, because it's too intimidating.
The spectrum of power for data products ranges from "hard but powerful" (writing a complicated query against a database of some kind) to "easy but weak" (looking at a pdf full of static pie charts). If your goal is making data accessible to a wide audience, you want to be in the middle - structured reports with flexibility built in. This is how you solve 80%+ of problems for your users. Too far either way and you narrow your reach.
To do this, first decide on a type of analysis. Maybe you want to analyze time intervals (e.g. web page response times or total time on site), or you want to measure the number of times something happens. Each option lends itself to a certain type of interface, so you need to make a choice.
Within that framework, allow the user to tweak the inputs (in a limited number of ways!), focusing on the 80% solution.
Generally, that means user configurable:
date ranges
data filters (for example, "all api requests < 200ms response time")
analysis filters (total counts, unique counts, sums, average, etc)
Again, be conscious to avoid adding all the bells and whistles. Most of the time they will scare away new users, make it harder to do common actions, and not get used much in the first place. Not building a feature isn't the end of the world. You'll almost always be wrong the first time, and not having something is a great way to get feedback - we do this all the time at Mixpanel.
Also, if it's a feature that just makes something easier (rather than possible), you may not need it yet. Things that would drive a programmer crazy ("aghhhhh, so automatable!") are nothing to someone who spends an hour a day doing data entry into Salesforce. In the long run, you should absolutely help your users save time - but skipping it in the short term (particularly in the early days of building your product) can help you focus on building more critical functionality ASAP.
Support rapid iteration and quick feedback loops
"Fast" is one of the most important attributes for real-world usage - when something is slow, people don't use it (Google has investigated this, as have others, and I’m certain you’ve experienced the frustration of a slow web product first hand).
This is a good general rule for all products, but speed is particularly relevant when you're working with data. You will often be doing things that are by nature slow, and anything you can do to speed them up will really help people actually use the things that you build. If you present people with a spinner for 30 seconds, they’re not going to stick around to see your results, no matter how good they are.
Our goal at Mixpanel has been to make queries very fast, and to support rapid exploration of data. A typical user session looks like:
look at a single user action (such as "add item to cart")
dig into metadata ("split out by country")
dig further ("let's look only at people in the UK")
... (continue)
end with a very specific segment ("users in the UK between age 20 and 30 who have been signed up for more than 6 months")
Each step of the way, we do a very fast query and immediately update the results displayed. This lets the user continually focus their search until they end up with the information they're looking for, and each step informs the next. If queries took a few seconds, this workflow would not be awesome. Being fast lets us do things differently.
This ability to explore is key. It’s very common for users to have only a vague idea what they’re looking at at first, and making it fast & easy for them to refine the search parameters lets them get real value out of the product quickly.
Value, in the end, is what users will judge you on - so the faster you can deliver results, the better.
Conclusion
Data that people can’t understand is just noise. Data that people can understand and trust is knowledge.
When presenting data to users, it’s critical to help them gain that understanding and trust.
At Mixpanel, that means giving lots of context for the data we display, making that data verifiable, and limiting UI complexity. We also strive for speed, because it keeps users happy. These all work together toward our goal: to provide powerful tools that don’t overwhelm, and to make those tools easy to use.
Hopefully, you can use some of the lessons we’ve learned in your own products.