Detailed data types you can use for documentation prioritization

Data analysis is a valuable way to learn more about what documentation tasks to prioritize above others. My post (and talk), Just Add Data, presented at Write the Docs Portland in 2019, talk about this broadly. In this post I want to cover in detail a number of different data types that can lead to valuable insights for prioritization.

This list of data types is long, but I promise each one contains value for a technical writer. These types of data might come from your own collection, a user research organization, the business development department, marketing organization, or product management organization:

  • User research reports
  • Support cases
  • Forum threads and questions
  • Product usage metrics
  • Search strings
  • Tags on bugs or issues
  • Education/training course content and questions
  • Customer satisfaction survey

More documentation-specific data types:

  • Documentation feedback
  • Site metrics
  • Text analysis metrics
  • Download/last accessed numbers
  • Topic type metrics
  • Topic metadata
  • Contribution data
  • Social media analytics

Many of these data types are best used in combination with others.

User research reports

User research reports can contain a lot of valuable data that you can use for documentation. 

  • Types of customers being interviewed
  • Customer use cases and problems
  • Types of studies being performed

This can give you insight into both what the company finds valuable to study (so some insight into internal priorities) but also direct customer feedback about things that are confusing or the ways that they use the product. The types of customers that are interviewed can provide valuable audience or persona-targeting information, allowing you to better calibrate the information in your documentation. See How to use data in user research when you have no web analytics on the Gov.UK site for more details about what you can do with user research data.

Support cases

Support cases can help you better understand customer problems. Specific metrics include:

  • Number of cases
  • Frequency of cases
  • Categories of questions
  • Customer environments and licenses

With these you can compile metrics about specific customer problems, the frequency of problems, and the types of customers and customer environments that are encountering specific problems, allowing you to better understand target customers, or customers that might be using your documentation more than others. Support cases are also rich data for common customer problems, providing a good way to gather new use cases and subjects for topics. 

Forum threads and questions

These can be internal forums (like Splunk Answers for Splunk) or external ones, like Reddit or StackOverflow.

  • Common questions
  • Common categories
  • Frequently unanswered questions
  • Post titles

If you’re trying to understand what people are struggling with, or get a better sense of how people are using specific functionality, forum threads can help you understand. The types of questions that people ask and how they phrase them can also help make it clear what kinds of configuration combinations might make specific functions harder for customers. Based on the question types and frequencies that you see, you might be able to fine-tune existing documentation to make it more user-centric and easily findable, or supplement content with additional specific examples. 

Product usage metrics

Some examples of product usage metrics are as follows:

  • Time in product
  • Intra-product clicks
  • Types of data ingested
  • Types of content created
  • Amount of content created

Even if you don’t have specific usage data introspecting the product, you can gather metrics about how people are interacting with the purchase and activation process, and extrapolate accordingly.

  • Number of downloads and installs
  • License activations and types
  • Daily and monthly active users

You can use this type of data to better understand how people are spending their time in your product, and what features or functionality they’re using. Even if a customer has purchased or installed the product, it’s even more valuable to find out if they’re actually using it, and if so, how.

If your product is only in beta, and you want more data to help you prioritize an overall documentation backlog, such as topics that are tied to a specific release, you can use some product usage data to understand where people are spending more of their time, and draw conclusions about what to prioritize based on that.

Maybe the under-utilized features could use more documentation, or more targeted documentation. Maybe the features themselves need work.  Be careful not to draw overly-simplistic conclusions about the data that you see from product usage metrics. Keep context in mind at all times. 

Search strings

You can gather search strings from HTTP referer data from web searches performed on external search sites such as Google or DuckDuckGo, or from internal search services. It’s pretty unlikely that you’ll be able to gather search strings from external sites given the widespread implementation of HTTPS, but internal search services can be vital and valuable data sources for this.

Look at specific search strings to find out what people are looking for, and what people are searching that’s landing them on specific documentation pages. Maybe they’re searching for something and landing on the wrong page, and you can update your topic titles to help.

JIRA or issue data

You can use metrics from your issue tracking services to better understand product quality, as well as customer confusion.

  • Number of issues/bugs
  • Categories/tags/components of issues/bugs
  • Frequency of different types of issues being created/closed

Issue tags or bug components can help you identify categories of the product where there are lots of problems or perhaps customer confusion. This is especially useful data if you’re an open source product and want to get a good understanding of where there are issues that might need more decision support or guidance in the documentation. 

Training courses

If you have an education department, or produce training courses about your product, these are quite useful to gather data from. Some examples of data you might find useful:

  • Questions asked by customers
  • Questions asked by course developers
  • Use cases covered by content in courses
  • Enrollment in courses
  • Categories of courses offered

Also useful to correlate this with other data to help identify verticals of customers interested in different topics. Because education and training courses cover more hands-on material, it can be an excellent source of use case examples, as well as occasions where decision support and guidance is needed. 

Customer surveys

Customer surveys especially cover surveys like satisfaction surveys and sentiment analysis surveys. By reviewing the qualitative statements and types of questions asked in the surveys, you can gain valuable insights and information like:

  • What do people think about the product?
  • What do people want more help with?
  • How do people think about the product?
  • How do people feel about the product?
  • What does the company want to know from customers? 
  • What are the company priorities?

This can also help you think about how the documentation you write has a real effect on peoples’ interactions with the product, and can shift sentiment in one way or another.

Documentation feedback

Direct feedback on your documentation is a vital source of data if you can get it. 

  • Qualitative comments about the documentation
  • Usefulness votes (yes/no)
  • Ratings

Even if you don’t have a direct feedback mechanism on your website, you can collect documentation feedback from internal and external customers by paying attention in conversations with people and even asking them directly if they have any documentation feedback. Qualitative comments and direct feedback can be vital for making improvements to specific areas. 

Site metrics

If your documentation is on a website, you can use web access logs to gather important site metrics, such as the following:

  • Page views
  • Session data like time on page
  • Referer data
  • Link clicks
  • Button clicks
  • Bounce rate
  • Client IP

Site metrics like page views, session data, referer data, and link clicks can help you understand where people are coming to your docs from, how long they are staying on the page, how many readers there are, and where they’re going after they get to a topic. You can also use this data to understand better how people interact with your documentation. Are readers using a version switcher on your page? Are they expanding or collapsing information sections on the page to learn more? Maybe readers are using a table of contents to skip to specific parts of specific topics.  

You can split this data by IP address to understand groups of topics that specific users are clustering around, to better understand how people use the documentation.

Text analysis metrics

Data about the actual text on your documentation site is also useful to help understand the complexity of the documentation on your site.

  • Flesch-Kincaid readability score
  • Inclusivity level
  • Length of sentences and headers
  • Style linter

You can assess the readability or usability of the documentation, or even the grade level score for the content to understand how consistent your documentation is. Identify the length of sentences and headers to see if they match best practices in the industry for writing on the web. You can even scan content against a style linter to identify inconsistencies of documentation topics against a style guide.

Download metrics

If you don’t have site metrics for your documentation site, because the documentation is published only via PDF or another medium, you can still use metrics from that. 

  • Download numbers 
  • Download dates and times
  • Download categories and types

You can use these metrics to gather interest about what people want to be reading offline, or how frequently people are accessing your documentation. You can also correlate this data with product usage data and release cycles to determine how frequently people access the documentation compared with release dates, and the number of people accessing the documentation compared with the number of people using a product or service.

Topic type metrics

If you use strict topic typing at your documentation organization, you can use topic type metrics as an additional metadata layer for documentation data analysis. Even if you don’t, you can manually categorize organize your documentation by type to gather this data.

  • What are the topic types?
  • How many topic types are there?
  • How many topics are there of each type?

Understanding topic types can help you understand how reader interaction patterns can vary for your documentation by type, or whether your developer documentation has predominantly different types of documentation compared with your user documentation, and better understand what types of documentation are written for which audiences.

Topic metadata

Metadata about documentation topics is also incredibly valuable as a correlation data source. You can correlate topic metadata like the following information:

  • What are the titles?
  • Average length of a topic?
  • Last updated and creation dates
  • Versions that different topics apply for

You can correlate it with site metrics, to see if longer topics are viewed less-frequently than shorter topics, or identify outliers in those data points. You can also manually analyze the topic titles to identify if there are patterns (good or bad) that exist.

Contribution data

If you have information about who is writing documentation, and when, you can use these types of data:

  • Last updated dates
  • Authors/contributors
  • Amount of information added or removed

Contribution data can tell you how frequently specific topics were updated to add new information, and by whom, and how much information was added or removed. You can identify frequency patterns, clusters over time, as well as consistent contributors.

It’s useful to split this data by other features, or correlate it with other metrics, especially site metrics. You can then identify things like:

  • Last updated dates by topic
  • Last updated dates by product
  • Last updated dates over time

to see if there are correlations between updates and page views. Perhaps more frequently updated content is viewed more often.

Social media analytics

  • Social media referers
  • Link clicks from social media sites

If you publicize your documentation using social media, you can track the interest in the documentation from those sites. If you’re curious about social media referers leading people to your documentation, and see whether or not people are getting to your documentation in that way. Maybe your support team is responding to people on twitter with links to your documentation, and you want to better understand how frequently that happens and how frequently people click through those links to the documentation…

You can also identify whether or not, and how, people are sharing your documentation on social media by using data crawled or retrieved from those sites’ APIs, and looking for instances of links to your documentation. This can help you get a better sense of how people are using your documentation, how they’re talking about it, how they feel about it, and whether or not you have an organic community out there on the web sharing your documentation. 

Beyond documentation data

I hope that this detail has given you a better understanding of different types of data, beyond documentation data, that are available to you as a technical writer to draw valuable conclusions from. By analyzing these types of data, you are prepared for prioritizing your documentation task list, but also better able to understand the customers of your product and documentation. Even if only some of these are available to you, I hope they are useful. Be sure to read Just Add Data: Using data to prioritize your documentation for the full explanation of how to use data in this way. 

The Concepts Behind the Book: How to Measure Anything

I just finished reading How to Measure Anything: Finding the Value of Intangibles in Business by Douglas Hubbard. It discusses fascinating concepts about measurement and observability, but they are tendrils that you must follow among mentions of Excel, statistical formulas, and somewhat dry consulting anecdotes. For those of you that might want to focus mainly on the concepts rather than the literal statistics and formulas behind implementing his framework, I wanted to share the concepts that resonated with me. If you want to read a more thorough summary, I recommend the summary on Less Wrong, also titled How to Measure Anything.

The premise of the book is that people undertake many business decisions and large projects with the idea that success of the decisions or projects can’t be measured, and thus they aren’t measured. It seems a large waste of money and effort if you can’t measure the success of such projects and decisions, and so he developed a consulting business and a framework, Applied Information Economics (AIE)to prove that you can measure such things.

Near the end of his book on page 267, he summarizes his philosophy as six main points:

1. If it’s really that important, it’s something you can define. If it’s something you think exists at all, then it’s something that you’ve already observed somehow.

2. If it’s something important and something uncertain, then you have a cost of being wrong and a chance of being wrong.

3. You can quantify your current uncertainty with calibrated estimates.

4. You can compute the value of additional information by knowing the “threshold” of the measurement where it begins to make a difference compared to your existing uncertainty.

5. Once you know what it’s worth to measure something, you can put the measurement effort in context and decide on the effort it should take.

6. Knowing just a few methods for random sampling, controlled experiments, or even just improving on the judgment of experts can lead to a significant reduction in uncertainty.

To restate those points:

  1. Define what you want to know. Consider ways that you or others have measured similar problems. What you want to know might be easier to see than you thought.
  2. It’s valuable to measure things that you aren’t certain about if they are important to be certain about.
  3. Make estimates about what you think will happen, and calibrate those estimates to understand just how uncertain you are about outcomes.
  4. Determine a level of certainty that will help you feel more confident about a decision. Additionally, determine how much information will be needed to get you there.
  5. Determine how much effort it might take to gather that information.
  6. Understand that it probably takes less effort than you think to reduce uncertainty.

The crux of the book revolves around restating measurement from “answer a specific question” to “reduce uncertainty based on what you know today”.

Measure to reduce uncertainty

Before reading this book, I thought about data analysis as a way to find an answer to a question. I’d go in with a question, I’d find data, and thanks to that data, I’d magically know the answer. However, that approach only works with specifically-defined questions and perfect data. If I want to know “how many views did a specific documentation topic get last week” I can answer that straightforwardly with website metrics.

However, if I want to know “Was the guidance about how to perform a task more useful after I rewrote it?” there was really no way to know the answer to that question. Or so I thought.

Hubbard’s book makes the crucial distinction that data doesn’t need to exist to directly answer that question. It merely needs to make you more certain of the likely answer. You can make a guess about whether or not it was useful, carefully calibrating your guess based on your knowledge of similar scenarios, and then perform data analysis or measurement to improve the accuracy of your guess. If you’re not very certain of the answer, it doesn’t take much data or measurement to make you more certain, and thus increase your confidence in an outcome. However, the more certain you are, the more measurement you need to perform to increase your certainty.

Start by decomposing the problem

If you think what you want to measure isn’t measurable, Hubbard encourages you to think again, and decompose the problem. To use my example, and #1 on his list, I want to measure whether or not a documentation topic was more useful after I rewrote it. As he points out with his first point, the problem is likely more observable than I might think at first.

“Decompose the measurement so that it can be estimated from other measurements. Some of these elements may be easier to measure and sometimes the decomposition itself will have reduced uncertainty.”

I can decompose the question that I’m trying to answer, and consider how I might measure usefulness of a topic. Maybe something is more useful if it is viewed more often, or if people are sharing the link to the topic more frequently, or if there are qualitative comments in surveys or forums that refer to it. I can think about how I might tell someone that a topic is useful, what factors of the topic and information about it I might point to. Does it come up first when you search for a specific customer question? Maybe then search rankings for relevant keywords are an observable metric that could help me measure utility of a topic.

You can also perform extra research to think of ways to measure something.

“Consider your findings from secondary research: Look at how others measured similar issues. Even if their specific findings don’t relate to your measurement problem, is there anything you can salvage from the methods they used?”

Is it business critical to measure this?

Before I invest a lot of time and energy performing measurements, I want to make sure (to Hubbard’s second point in his list) that the question I am attempting to answer, what I am trying to measure, is important enough to merit measurement. This is also tied to points four, five, and six: does the importance of the knowledge outweigh the difficulty of the measurement? It often does, especially because (to his sixth point), the measurement is often easier to obtain than it might seem at first.

Estimate what you think you’ll measure

To Hubbard’s third point, a calibrated estimate is important when you do a measurement. I need to be able to estimate what “success” might look like, and what reasonable bounds of success I might expect are.

Make estimates about what you think will happen, and calibrate those estimates to understand just how uncertain you are about outcomes.

To continue with my question about a rewritten topic’s usefulness, let’s say that I’ve determined that added page views, elevated search rankings, and link shares on social media will mean the project is a success. I’d then want to estimate what number of each of those measurements might be meaningful.

To use page views as an example for estimation, If page views increase by 1%, it might not be meaningful. But maybe 5% is a meaningful increase? I can use that as a lower bound for my estimate. I can also think about a likely upper bound. A 1000% increase would be unreasonable, but maybe I could hope that page views would double, and I’d see a 100% increase in page views! I can use that as an upper bound. By considering and dismissing the 1% and 1000% numbers, I’m also doing some calibration of my estimates—essentially gut checking them with my expertise and existing knowledge. The summary of How to Measure Anything that I linked in the first paragraph addresses calibration of estimates in more detail, as does the book itself!

After I’ve settled on a range of measurement outcomes, I can assess how confident I am that this might happen. Hubbard calls this a Confidence Interval. I might only be 60% certain that page views will increase by at least 5% but they won’t increase more than 100%. This gives me a lot of uncertainty to reduce when I start measuring page views.

One way to start reducing my uncertainty about these percentage increases might be to look at the past page views of this topic, to try to understand what regular fluctuation in page views might be over time. I can look at the past 3 months, week by week, and might discover that 5% is too low to be meaningful, and a more reasonable signifier of success would be a 10% or higher increase in page views.

Estimating gives me a number that I am attempting to reduce uncertainty about, and performing that initial historical measurement can already help me reduce some uncertainty. Now I can be 100% certain that a successful change to the topic should show more than 5% page views on a week-to-week basis, and maybe am 80% certain that a successful change would show 10% or more page views.

When doing this, keep in mind another point of Hubbards:

“a persistent misconception is that unless a measurement meets an arbitrary standard….it has no value….what really makes a measurement of high value is a lot of uncertainty combined with a high cost of being wrong.”

If you’re choosing to undertake a large-scale project that will cost quite a bit if you get it wrong, you likely want to know in advance how to measure the success of that project. This point also underscores his continued emphasis on reducing uncertainty.

For my (admittedly mild) example, it isn’t valuable for me to declare that I can’t learn anything from page view data unless  3 months have passed. I can likely reduce uncertainty enough with two weeks of data to learn something valuable, especially if my uncertainty level is in relatively low (in this example, in the 40-70% range).

Measure just enough, not a lot

Hubbard talks about the notion of a Rule of Five:

There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.

Knowing the median value of a population can go a long way in reducing uncertainty. Even if you can only get a seemingly-tiny sample of data, this rule of five makes it clear that even that small sample can be incredibly valuable for reducing uncertainty about a likely value. You don’t have to know all of something to know something important about it.

Do something with what you’ve learned

After you perform measurements or do some data analysis and reduce your uncertainty, then it’s time to do something with what you’ve learned. Given my example, maybe my rewrite increased page views of the topic by 20%, something I’m now fairly certain is a significant degree, and it is now higher in the search results. I’ve now sufficiently reduced my uncertainty about whether or not the changes made this topic more useful, and I can now rewrite similar topics to use a similar content pattern with confidence. Or at least, more confidence than I had before.

Overall summary

My super abbreviated summary of the book would then be to do the following:

  1. Start by decomposing the problem
  2. Ask is it business critical to measure this?
  3. Estimate what you think you’ll measure
  4. Measure just enough, not a lot
  5. Do something with what you’ve learned

I recommend the book (with judicious skimming), especially if you need some conceptual discussion to help you unravel how best to measure a specific problem. As I read the book, I took numerous notes about how I might be able to measure something like support case deflection with documentation, or how to prioritize new features for product development (or documentation). I also considered how customers might better be able to identify valuable data sources for measuring security posture or other events in their data if they followed many of the practices outlined in this book.