Detailed data types you can use for documentation prioritization

Data analysis is a valuable way to learn more about what documentation tasks to prioritize above others. My post (and talk), Just Add Data, presented at Write the Docs Portland in 2019, talk about this broadly. In this post I want to cover in detail a number of different data types that can lead to valuable insights for prioritization.

This list of data types is long, but I promise each one contains value for a technical writer. These types of data might come from your own collection, a user research organization, the business development department, marketing organization, or product management organization:

  • User research reports
  • Support cases
  • Forum threads and questions
  • Product usage metrics
  • Search strings
  • Tags on bugs or issues
  • Education/training course content and questions
  • Customer satisfaction survey

More documentation-specific data types:

  • Documentation feedback
  • Site metrics
  • Text analysis metrics
  • Download/last accessed numbers
  • Topic type metrics
  • Topic metadata
  • Contribution data
  • Social media analytics

Many of these data types are best used in combination with others.

User research reports

User research reports can contain a lot of valuable data that you can use for documentation. 

  • Types of customers being interviewed
  • Customer use cases and problems
  • Types of studies being performed

This can give you insight into both what the company finds valuable to study (so some insight into internal priorities) but also direct customer feedback about things that are confusing or the ways that they use the product. The types of customers that are interviewed can provide valuable audience or persona-targeting information, allowing you to better calibrate the information in your documentation. See How to use data in user research when you have no web analytics on the Gov.UK site for more details about what you can do with user research data.

Support cases

Support cases can help you better understand customer problems. Specific metrics include:

  • Number of cases
  • Frequency of cases
  • Categories of questions
  • Customer environments and licenses

With these you can compile metrics about specific customer problems, the frequency of problems, and the types of customers and customer environments that are encountering specific problems, allowing you to better understand target customers, or customers that might be using your documentation more than others. Support cases are also rich data for common customer problems, providing a good way to gather new use cases and subjects for topics. 

Forum threads and questions

These can be internal forums (like Splunk Answers for Splunk) or external ones, like Reddit or StackOverflow.

  • Common questions
  • Common categories
  • Frequently unanswered questions
  • Post titles

If you’re trying to understand what people are struggling with, or get a better sense of how people are using specific functionality, forum threads can help you understand. The types of questions that people ask and how they phrase them can also help make it clear what kinds of configuration combinations might make specific functions harder for customers. Based on the question types and frequencies that you see, you might be able to fine-tune existing documentation to make it more user-centric and easily findable, or supplement content with additional specific examples. 

Product usage metrics

Some examples of product usage metrics are as follows:

  • Time in product
  • Intra-product clicks
  • Types of data ingested
  • Types of content created
  • Amount of content created

Even if you don’t have specific usage data introspecting the product, you can gather metrics about how people are interacting with the purchase and activation process, and extrapolate accordingly.

  • Number of downloads and installs
  • License activations and types
  • Daily and monthly active users

You can use this type of data to better understand how people are spending their time in your product, and what features or functionality they’re using. Even if a customer has purchased or installed the product, it’s even more valuable to find out if they’re actually using it, and if so, how.

If your product is only in beta, and you want more data to help you prioritize an overall documentation backlog, such as topics that are tied to a specific release, you can use some product usage data to understand where people are spending more of their time, and draw conclusions about what to prioritize based on that.

Maybe the under-utilized features could use more documentation, or more targeted documentation. Maybe the features themselves need work.  Be careful not to draw overly-simplistic conclusions about the data that you see from product usage metrics. Keep context in mind at all times. 

Search strings

You can gather search strings from HTTP referer data from web searches performed on external search sites such as Google or DuckDuckGo, or from internal search services. It’s pretty unlikely that you’ll be able to gather search strings from external sites given the widespread implementation of HTTPS, but internal search services can be vital and valuable data sources for this.

Look at specific search strings to find out what people are looking for, and what people are searching that’s landing them on specific documentation pages. Maybe they’re searching for something and landing on the wrong page, and you can update your topic titles to help.

JIRA or issue data

You can use metrics from your issue tracking services to better understand product quality, as well as customer confusion.

  • Number of issues/bugs
  • Categories/tags/components of issues/bugs
  • Frequency of different types of issues being created/closed

Issue tags or bug components can help you identify categories of the product where there are lots of problems or perhaps customer confusion. This is especially useful data if you’re an open source product and want to get a good understanding of where there are issues that might need more decision support or guidance in the documentation. 

Training courses

If you have an education department, or produce training courses about your product, these are quite useful to gather data from. Some examples of data you might find useful:

  • Questions asked by customers
  • Questions asked by course developers
  • Use cases covered by content in courses
  • Enrollment in courses
  • Categories of courses offered

Also useful to correlate this with other data to help identify verticals of customers interested in different topics. Because education and training courses cover more hands-on material, it can be an excellent source of use case examples, as well as occasions where decision support and guidance is needed. 

Customer surveys

Customer surveys especially cover surveys like satisfaction surveys and sentiment analysis surveys. By reviewing the qualitative statements and types of questions asked in the surveys, you can gain valuable insights and information like:

  • What do people think about the product?
  • What do people want more help with?
  • How do people think about the product?
  • How do people feel about the product?
  • What does the company want to know from customers? 
  • What are the company priorities?

This can also help you think about how the documentation you write has a real effect on peoples’ interactions with the product, and can shift sentiment in one way or another.

Documentation feedback

Direct feedback on your documentation is a vital source of data if you can get it. 

  • Qualitative comments about the documentation
  • Usefulness votes (yes/no)
  • Ratings

Even if you don’t have a direct feedback mechanism on your website, you can collect documentation feedback from internal and external customers by paying attention in conversations with people and even asking them directly if they have any documentation feedback. Qualitative comments and direct feedback can be vital for making improvements to specific areas. 

Site metrics

If your documentation is on a website, you can use web access logs to gather important site metrics, such as the following:

  • Page views
  • Session data like time on page
  • Referer data
  • Link clicks
  • Button clicks
  • Bounce rate
  • Client IP

Site metrics like page views, session data, referer data, and link clicks can help you understand where people are coming to your docs from, how long they are staying on the page, how many readers there are, and where they’re going after they get to a topic. You can also use this data to understand better how people interact with your documentation. Are readers using a version switcher on your page? Are they expanding or collapsing information sections on the page to learn more? Maybe readers are using a table of contents to skip to specific parts of specific topics.  

You can split this data by IP address to understand groups of topics that specific users are clustering around, to better understand how people use the documentation.

Text analysis metrics

Data about the actual text on your documentation site is also useful to help understand the complexity of the documentation on your site.

  • Flesch-Kincaid readability score
  • Inclusivity level
  • Length of sentences and headers
  • Style linter

You can assess the readability or usability of the documentation, or even the grade level score for the content to understand how consistent your documentation is. Identify the length of sentences and headers to see if they match best practices in the industry for writing on the web. You can even scan content against a style linter to identify inconsistencies of documentation topics against a style guide.

Download metrics

If you don’t have site metrics for your documentation site, because the documentation is published only via PDF or another medium, you can still use metrics from that. 

  • Download numbers 
  • Download dates and times
  • Download categories and types

You can use these metrics to gather interest about what people want to be reading offline, or how frequently people are accessing your documentation. You can also correlate this data with product usage data and release cycles to determine how frequently people access the documentation compared with release dates, and the number of people accessing the documentation compared with the number of people using a product or service.

Topic type metrics

If you use strict topic typing at your documentation organization, you can use topic type metrics as an additional metadata layer for documentation data analysis. Even if you don’t, you can manually categorize organize your documentation by type to gather this data.

  • What are the topic types?
  • How many topic types are there?
  • How many topics are there of each type?

Understanding topic types can help you understand how reader interaction patterns can vary for your documentation by type, or whether your developer documentation has predominantly different types of documentation compared with your user documentation, and better understand what types of documentation are written for which audiences.

Topic metadata

Metadata about documentation topics is also incredibly valuable as a correlation data source. You can correlate topic metadata like the following information:

  • What are the titles?
  • Average length of a topic?
  • Last updated and creation dates
  • Versions that different topics apply for

You can correlate it with site metrics, to see if longer topics are viewed less-frequently than shorter topics, or identify outliers in those data points. You can also manually analyze the topic titles to identify if there are patterns (good or bad) that exist.

Contribution data

If you have information about who is writing documentation, and when, you can use these types of data:

  • Last updated dates
  • Authors/contributors
  • Amount of information added or removed

Contribution data can tell you how frequently specific topics were updated to add new information, and by whom, and how much information was added or removed. You can identify frequency patterns, clusters over time, as well as consistent contributors.

It’s useful to split this data by other features, or correlate it with other metrics, especially site metrics. You can then identify things like:

  • Last updated dates by topic
  • Last updated dates by product
  • Last updated dates over time

to see if there are correlations between updates and page views. Perhaps more frequently updated content is viewed more often.

Social media analytics

  • Social media referers
  • Link clicks from social media sites

If you publicize your documentation using social media, you can track the interest in the documentation from those sites. If you’re curious about social media referers leading people to your documentation, and see whether or not people are getting to your documentation in that way. Maybe your support team is responding to people on twitter with links to your documentation, and you want to better understand how frequently that happens and how frequently people click through those links to the documentation…

You can also identify whether or not, and how, people are sharing your documentation on social media by using data crawled or retrieved from those sites’ APIs, and looking for instances of links to your documentation. This can help you get a better sense of how people are using your documentation, how they’re talking about it, how they feel about it, and whether or not you have an organic community out there on the web sharing your documentation. 

Beyond documentation data

I hope that this detail has given you a better understanding of different types of data, beyond documentation data, that are available to you as a technical writer to draw valuable conclusions from. By analyzing these types of data, you are prepared for prioritizing your documentation task list, but also better able to understand the customers of your product and documentation. Even if only some of these are available to you, I hope they are useful. Be sure to read Just Add Data: Using data to prioritize your documentation for the full explanation of how to use data in this way.