Where search keyword data comes from

May 27, 2025

As someone who does a lot of armchair-yet-informed speculation about how various things get defined in music metadata, this deep dive into how keywords and entity relationships have been formed by Google over the years was fascinating to me.

Deborah Carver, based on her years of expertise in the content industry, explores where SEO keyword data comes from in CT No.112: Demystify your algorithms: The origin of keywords:

If you receive a list of target keywords or a keyword research spreadsheet, do you know how they were sourced? Where exactly does your keyword data come from? Your answer is likely “from my browser extension,” “from our audience editorial team,” “from my keyword research tool” or “from our digital marketing agency.”

But of course, those keywords had to come from somewhere else, first! She details the different tools that Google has offered over the years, and defines the different datasets that contribute to the keywords identified by Google—including those that SEO tools add on.

As she points out, it’s advantageous to Google because these investments in keywords help drive the success of its primary revenue driver:

Giving away raw data and free training on its software has proved wildly profitable for Google. It’s a juggernaut business strategy in the open source tradition. The more people who learn to use keyword data, the more people understand search engine marketing, the more ads Google can sell.

Perhaps the most fascinating aspect for me, as a fellow content professional, is her clarification about the different ways that people use search keywords compared with how they use natural language:

Keyword research also helps stakeholders understand that people don’t use marketing speak in their search terms. People also don’t search with the same words they use to talk or chat or tweet. Imagine if, when suggesting dinner, your partner said, “pizza near me now.” No one talks that way.

Many people search that way—17,800 times each month across the globe, on average. Search language is its own behavior, one that’s ever-evolving.

I don’t think that “search language” as a behavior will change anytime soon, even with the rise of LLMs. Google and other search engines are adding LLMs to the results and processing of search queries in a purported effort to make searching “easier” and “more natural”, but so far, it seems to me that adding LLM output to search results is diluting the quality of results as people realize their trained habits for looking for information no longer work the same way.

And too, directly using an LLM to produce information requires specific language techniques to get better results. The same way search language has evolved over time, prompt engineering is emerging as its own type of language. Whether prompt engineering is a skill that, like search language, everyday people will need to pick up remains to be seen.

The technology might be changing, but at the end of the day, we’re all just learning new ways to speak to machines.