Listening to Music while Sheltering in Place

The world is, to varying degrees, sheltering-in-place during this global coronavirus pandemic. Starting in March, the pandemic started to affect me personally: 

  • I started working from home on March 6th. 
  • Governor Gavin Newsom announced on March 11 that any gatherings over 250 people were strongly discouraged, effectively cancelling all concerts for the month of March. 
  • On March 16th, the mayor of San Francisco along with several other counties in the area, announced a shelter-in-place order. 

Ever since then, I’ve been at home. Given all these changes in my life, I was curious what new patterns I might see in my music listening habits. 

With large gatherings prohibited, I went to my last concert on March 7th. With gatherings increasingly cancelled nationwide, and touring musicians postponing and cancelling events, March 27th, Beatport hosted the first livestream festival, “ReConnect. A Global Music Series”. Many more followed. 

Industry-wide studies and data analysis have attempted to unpack various trends in the pandemic’s influence on the music industry. Analytics startup Chartmetric is digging into genre-based listening, geographical listening habits, and Billboard and Nielsen conducting a periodic entertainment tracker survey.

Because I’m me, and I have so much data about my music listening patterns, I wanted to explore what trends might be emerging in my personal habits. I analyzed the months March, April, and May during 2020, and in some cases compared that period against the same period in 2019, 2018, and 2017. The screenshots of data visualizations in this blog post represent data points from May 15th, so it is an incomplete analysis and comparison, given that May in 2020 is not yet complete. 

Looking at my listening habits during this time period, with key dates highlighted, it’s clear that the very beginning of the crisis didn’t have much of an effect on my listening behavior. However, after the shelter-in-place order, the amount of time I spent listening to music increased. After that increase it’s remained fairly steady.

Screenshot of an area chart depicting listening duration ranging from 100 minutes with a couple spikes of 500 minutes but hovering around a max of 250 minutes per day for much of january and february, then starting in march a new range from about 250 to 450 minutes per day, with a couple outliers of nearly 700 minutes of listening activity, and a couple outliers with only a 90 minutes of listening activity.

Key dates such as the first case in the United States, the first case in California, and the first case in the Bay Area are highlighted along with other pandemic-relevant dates.

Listening behavior during March, April, and May over time

When I started my analysis, I looked at my basic listening count from traditional music listening sources. I use Last.fm to scrobble my listening behavior in iTunes, Spotify, and the web from sites like YouTube, SoundCloud, Bandcamp, Hype Machine, and more. 

Chart depicting 2700 total listens for 2017, 2000 total listens for 2018, and 2300 total listens for 2019 during March, April, and May, compared to 3000 total listens in that same period in 2020.

If you just look at 2018 to 2020, it seems like my listening habits are trending upward, maybe with a culmination in 2020. But comparing against 2017, it isn’t much of a difference. I listened to 25% fewer tracks in 2018 compared with 2017, 19% more tracks in 2019 compared with 2018, and 25% more tracks in 2020 compared with 2019. 

Chart depicting total weekday listens during March, April, and May during 2017, 2018, 2019, and 2020 with total weekend listens during the same time. 2017 shows roughly 2400 listens on weekdays and 200ish for 2017, 2000 weekday listens vs 100 weekend listens for 2018, 2100 weekday listens vs 300 weekend listens in 2019, and 2500 weekday listens vs 200 weekend listens in 2020

If I break that down by when I was listening by comparing my weekend and weekday listening habits from the previous 3 years to now, there’s still perhaps a bit of an increase, but nothing much. 

With just the data points from Last.fm, there aren’t really any notable patterns. But number of tracks listened to on Spotify, SoundCloud, YouTube, or iTunes provides an incomplete perspective of my listening habits. If I expand the data I’m analyzing to include other types of listening—concerts attended and livestreams watched—and change the data point that I’m analyzing to the amount of time that I spend listening, instead of the number of tracks that I’ve listened to, it gets a bit more interesting. 

Chart shows roughly 12000 minutes spent listening in 2017, 10000 in 2018, 12000 in 2019, and 22000 in 2020While the number of tracks I listened to from 2019 to 2020 increased only 25%, the amount of time I spent listening to music increased by 74%, a full 150 hours more than the previous year during this time period. And May isn’t even over yet! 

It’s worth briefly noting that I’m estimating, rather than directly calculating, the amount of time spent listening to music tracks and attending live music events. To make this calculation, I’m using an estimate of 3 hours for each concert attended, 4 hours for each DJ set attended, 8 hours for each festival attended, and an estimate of 4 minutes for each track listened to, based on the average of all the tracks I’ve purchased over the past two years. Livestreamed sets are easier to track, but some of those are estimates as well because I didn’t start keeping track until the end of April.

I spent an extra 150 hours listening to music this year during this time—but when was I spending this time listening? If I break down the amount of time I spent listening by weekend compared with weekdays, it’s obvious:

Chart depicts 10000 weekday minutes and 5000 weekend minutes spent listening in 2017, 9500 weekday minutes and 4500 weekend minutes in 2018, 14000 weekday minutes and 2000 weekend minutes in 2019, and 12000 weekday minutes and 13000 weekend minutes in 2020

Before shelter-in-place, I’d spend most of my weekends outside, hanging out with friends, or attending concerts, DJ sets, and the occasional day party. Now that I’m spending my weekends largely inside and at home, coupled with the number of livestreaming festivals, I’m spending much more of that time listening to music. 

I was curious if perhaps working from home might reveal new weekday listening habits too, but the pattern remains fairly consistent. I also haven’t worked from home for an extended period before, so I don’t have a baseline to compare it with. 

It’s clear that weekends are when I’m doing most of my new listening, and that this new listening likely isn’t coming from my traditional listening habits. If I split the amount of time that I spend listening to music by the type of listening that I’m doing, the source of the added time spent listening is clear.

Depicts 11000 minutes of track listens and 1000 minutes of time spent at concerts in 2017, 8000 minutes spent listening to music tracks and 2000 minutes spent at concerts in 2018, 10000 minutes spent listening to music tracks and 3000 minutes spent at concerts in 2019, and 12000 minutes spent listening to music tracks and 9000 minutes listening to livestreams, with a sliver of 120 minutes spent at a single concert in 2020

Hello, livestreams. If you look closely you can also spy the sliver of a concert that I attended on March 7th.

Livestreams dominate, and so does Shazam

All of the livestreams I’ve been watching have primarily been DJ sets. Ordinarily, when I’m at a DJ set, I spend a good amount of time Shazamming the tracks I’m hearing. I want to identify the tracks that I’m enjoying so much on the dancefloor so I can track them down, buy them, and dig into the back catalog of those artists. 

So I requested my Shazam data to see what’s happening now that I’m home, with unlimited, shameless, and convenient access to Shazam.

For the time period that I have Shazam data for, the correlation of Shazam activity to number of livestreams watched is fairly consistent at roughly 10 successful Shazams per livestream.  

Chart details largely duplicated in surrounding text, but of note is a spike of 6 livestreams with only 30 or so songs shazammed, while the next few weeks show a fairly tight interlock of shazam activity with number of livestreams

Given the correlation of Shazam data, as well as the continued focus on watching DJ sets, I wanted to explore my artist discovery statistics as well. Especially when it seemed like my listening activity hadn’t shifted much, I was betting that my artist discovery statistics have been increasing during this time. If I look at just the past few years, there seems to be a direct increase during this time period. 

Chart depicts 260ish artists discovered in March, April, and May of 2018, 280 discovered in 2019, and 360 discovered in 2020Chart depicts 260ish artists discovered in March, April, and May of 2018, 280 discovered in 2019, and 360 discovered in 2020. Second chart shows the same data but adds 2017, with 390 artists discovered

However, after I add 2017 into the list as well, the pattern doesn’t seem like much of a pattern at all. Perhaps by the end of May, there will be a correlation or an outsized increase. But at least for now, the added number of livestreams I’ve been watching don’t seem to be producing an equivalently high number of artist discoveries, even though they’re elevated compared with the last two years. 

That could also be that the artists I’m discovering in the livestreams haven’t yet had a substantial effect on my non-livestream listening patterns, even if there’s 91 hours of music (and counting) in my quarandjed playlist where I store the tracks that catch my ear in a quarantine DJ set. Adding music to a playlist, of course, is not the same thing as listening to it. 

Livestreaming as concert replacement?

Shelter-in-place brought with it a slew of event cancellations and postponements. My live events calendar was severely affected. As of now, 15 concerts were affected in the following ways:

Chart depicts 6 concerts cancelled and 9 postponed

The amount of time that I spend at concerts compared with watching livestreams is also starkly different.

Chart depicts 1000 minutes spent at concerts in 2017, 2000 minutes at concerts in 2018, 2500 minutes at concerts in 2019, and 8000 minutes spent watching livestreams, with a topper of 120 minutes at a concert in 2020

I’ve spent 151 hours (and counting) watching livestreams, the rough equivalent of 50 concerts—my entire concert attendance of last year. This is almost certainly because I’m often listening to livestreams, rather than watching them happen.

Concerts require dedication—a period of time where you can’t really do anything else, a monetary investment, and travel to and from the show. Livestreams don’t have any of that, save a voluntary donation. That makes it easier to turn on a stream while I’m doing other things. While listening to a livestream, I often avoid engaging with the streaming experience. Unless the chat is a cozy few hundred folks at most, it’s a tire fire of trolls and not a pleasant experience. That, coupled with the fact that sitting on my couch watching a screen is inherently less engaging than standing in a club with music and people surrounding me, means that I’m often multitasking while livestreams are happening.

The attraction for me is that these streams are live, and they’re an event to tune into, and if you don’t, you might miss it. Because it’s live, you have the opportunity to create a shared collective experience. The chatrooms that accompany live video streams on YouTube, Twitch, and especially with Facebook’s Watch Party feature for Facebook Live videos, are what foster this shared experience. For me, it’s about that experience, so much so that I started a chat thread for Jamie xx’s 2020 Essential Mix so that my friends and I could experience and react to the set live. This personal experience is contrary to the conclusion drawn in this article on Hypebot called Our Music Consumption Habits Are Changing, But Will They Remain That Way? by Bobby Owsinski: “Given the choice, people would rather watch something than just listen.”. Given the choice, I’d rather have a shared collective experience with music rather than just sit alone on my couch and listen to it. 

Of course, with shelter-in-place, I haven’t been given a choice between attending concerts and watching livestreamed shows. It’s clear that without a choice, I’ll take whatever approximation of live music I can find.

 

What it takes to get to a concert

Ticket buying in the modern era is pretty brutal. You find out your favorite artist is coming to town, and with any luck, you discover this before the tickets go on sale. Then you start planning to get tickets. Set up a calendar reminder with a link to the site, then you get ready. If there are presales, you ask friends or you check emails — if you’re a dedicated concertgoer, you probably get emails from the promoters, venues, and maybe even your favorite artists’ fan clubs — tracking down the codes.

Then you get ready, mouse pointer cued up at 9:59, waiting until tickets go on sale. The time flips, it’s 10:00 AM and you click! Prepared to quickly select 2, best available (or GA floor, because who wants a balcony seat), and add to cart. But wait! You see the dreaded message. You’re in a queue. Now all you can do is desperately stare at the webpage, hoping nothing changes. What if a browser extension interferes? What if your browser freezes up? Finally, you’re out of the queue. You go to select your tickets, but wait. GA is all gone. All that’s left is the seated Loge. For a band that you dance to. Or worse, it’s already sold out. All that time, all that anxiety, all that preparation, only to get shut down. 

And that’s just the presale. You’ll do the whole thing over again at the next presale, or during the general onsale, hoping that the artist and the venue were strategic enough to set some tickets aside for each sale. If it comes down to it, you might have to show up to the venue an hour early (or more) before the show starts to get one of the limited tickets available at the door. 

That’s everything a dedicated concertgoer goes through to get concert tickets. Thing is, according to Kaitlyn Tiffany in The Atlantic, that’s also what modern ticket scalpers do.

This week was a brutal one for ticket sales for me and my friends. A show at a 2000+ capacity venue sold out within a few minutes during the presale, and a second show added later also sold out within minutes. The Format announced their first live dates in years, playing 2 shows in NYC and Chicago both, and 1 show in Phoenix. The presale tickets for all the shows sold out within a minute, or in the case of Phoenix, was plagued by ticket website issues but still managed to sell out by the end of the day. By the time the general ticket sales happened, they’d announced an additional show in each city. The general ticket sales also sold out within minutes, and Phoenix ended up with a third show before the day was up. 

How does it happen? And why do we put ourselves through this?! 

It’s important to note that buying concert tickets at all is a privilege. Some people (like me) make it a lifestyle to go to concerts and DJ sets. Others save their money and spend big to get great seats to see favorite artists in arena shows. But it takes money, time, and a bit of luck (or planning) to get tickets and get to a show. 

Whether or not you manage to get tickets to a show depends on several factors: 

  • Did you hear about the show before the tickets went on sale?
  • Did you have enough money at the time tickets went on sale (and in general) to afford the tickets?
  • Is your work schedule stable enough to know that you can go to the show if you buy tickets immediately when they go on sale?

If any one of these factors doesn’t work out, then you don’t have tickets to the show. Whether or not you get the opportunity to see an artist perform in concert at all is up to a whole other set of factors, subject to the careful strategies of the music industry combined with the artistic whims of the performers. 

If an artist doesn’t have a big enough fanbase in your city, and if it isn’t geographically convenient with available music venues, the artist probably won’t stop in your city. Even if they stop, the venue size can play a crucial role in whether or not you’ll get tickets to the show—will they be available, and will you even want them? 

Artists, especially after they’ve “gotten big”, can crave smaller, more intimate shows. But those are the shows that tend to sell out in a minute—especially if the fanbase in a certain city is larger than anticipated or if the artist is only playing a limited number of shows and end up drawing people from out of the ordinary reach of a venue.

Other times, artists can analyze the size of their fanbase in a city and then choose a venue—without considering if the venue size is appropriate for their type of music. Bon Iver toured 20,000+ seat arenas on their last tour, while they’re famous for their intimate music and have videos on YouTube with hundreds of thousands of views of Justin Vernon playing to just 1 fan. Even if an artist’s fanbase is large enough to fill an arena, the fans still might not want to buy tickets to see them in an arena. 

Beyond those considerations, artists can’t always play the venues they want to play due to promoter restrictions or other industry partnerships, sometimes leading to uncharacteristic bookings at oddly-sized or oddly-shaped venues: DJs playing a concert hall, rock bands in a semi-seated venue, or possibly even skipping a city entirely. 

The venue an artist chooses (or is forced to choose) can be a key factor when you’re deciding if you want to get tickets. But the artist (and their tour manager, and others) have still more to do before this concert happens. 

The ticket prices have to be set. Surely venues and promoters have set costs and prices that end up as effective ticket minimums for many shows, but artists certainly have a level of influence as well. Especially high-profile artists like Taylor Swift have chosen on past tours to make affordable tickets available to their fans.

And therein lies the rub: artists can price competitively, or highly, knowing they can charge a certain price and still sell out their show (or nearly sell it out). But they can also price affordably, hoping that legitimate fans will be able to snap up tickets when they go on sale, rather than delaying their purchase and being forced to buy from scalpers. 

OK so we’re still trying to buy these concert tickets. You’ve heard about the show, the artist has booked the venue and priced their tickets, you’ve got the money, you’ve got the time, you are ready at 10am on a Friday (or a Wednesday or a Thursday for those sweet sweet presale tickets). Where are you buying your tickets?

Ticket sites range from the homegrown (see: Bottom of the Hill), new kids on the block (Big Neon, Tixr), the budding behemoths (Eventbrite, AXS, Etix) and the (despised) old guard (TicketWeb/Ticketmaster/LiveNation). If you’re rushing to buy online tickets, you also need to prepare for the site experience. 

If it isn’t a site you’ve used before, you might want to consider if it requires an account to buy tickets. If it does, you have to make one and make sure you’re signed in before you try to buy the tickets. You also want to consider if the show big enough that you’ll end up in a queue to buy the tickets, and if the site is reliable enough to handle the load of a lot of people trying to buy tickets without crashing or throwing an error. 

Beyond site reliability, you have to consider your personal threshold for every ticket-buyer’s worst nightmare: fees. Almost every ticket purchase includes fees. How high do the fees need to be before you abandon your ticket purchase entirely? 

You also have to consider if there will be fees added to the face value of the ticket, and how high are too high of fees before you abandon the ticket purchase entirely. Of course, the irony of paying ticket fees is that most fans (myself included) dislike paying them because for so long the fees are hidden—last minute additions to your total, spiking the cost of $35 tickets to $60 at times. But it can be argued that transparently-disclosed fees are acceptable, and even necessary to provide a resilient, secure, reliable ticketing site—as well as to pay the promoters working hard to make sure your favorite band actually stops in your city.

Artists, promoters, venues, and ticketing sites do a lot to try to prevent ticket scalpers from bombing the market and selling out a show in minutes only to relist the tickets minutes later at unbelievable prices. Innovations in ticket technology, new marketplaces, and just plain making it harder to get tickets:

What makes a ticket purchaser legitimate? Probably some degree of purchasing tickets in a specific geographic region and in clusters of genres, likely combined with some fraud analysis. Then I wonder how suspicious my own ticket purchasing habits must look to the algorithms at times. As long as we’re attempting to define what a legitimate ticket purchaser looks like, we can consider who deserves the presale codes for shows.

There’s a notion that only “real fans” deserve first access to presale codes and tickets. But how do you verify and validate true fans? You could use specific digital consumption patterns, such as those that are probably used to give out Spotify presale codes, but those are limited to only those listening habits that are directly observable in digital data. Artists want people to buy tickets to their shows—that’s why often, presale codes are straightforward to track down.

Most often, getting tickets to a show is a matter of knowing the right people at the right time that might have information you don’t have. Songkick is there to fill in the gaps, alongside emails and texts from promoters and venues. But ultimately, nothing beats having a community of fans. And that was the thing that fascinated me about the article in The Atlantic about the modern ticket scalpers. Me and my friends, we use many of the same tactics to buy tickets. It’s a privilege and a challenge to get the tickets we want, but we love going to concerts. And often, it feels like it’s the only way these days we can help artists make money. 

Unbiased data analysis with the data-to-everything platform: unpacking the Splunk rebrand in an era of ethical data concerns

Splunk software provides powerful data collection, analysis, and reporting functionality. The new slogan, “data is for doing”, alongside taglines like “the data-to-everything platform” and “turn data into answers” want to bring the company to the forefront of data powerhouses, where it rightly belongs (I’m biased, I work for Splunk).

There is nuance in those phrases that can’t be adequately expressed in marketing materials, but that are crucial for doing ethical and unbiased data analysis, helping you find ultimately better answers with your data and do even better things with it.

Start with the question

If you start attempting to analyze data without an understanding of a question you’re trying to answer, you’re going to have a bad time. This is something I really appreciate about moving away from the slogan “listen to your data” (even though I love a good music pun). Listening to your data implies that you should start with the data, when in fact you should start with what you want to know and why you want to know it. You start with a question.

Data analysis starts with a question, and because I’m me, I want to answer a fairly complex question: what kind of music do I like to listen to? This overall question, also called an objective function in data science, can direct my data analysis. But first, I want to evaluate my question. If I’m going to turn my data into doing, I want to consider the ethics and the bias of my question.

Consider what you want to know, and why you want to know it so that you can consider the ethics of the question. 

  • Is this question ethical to ask? 
  • Is it ethical to use data to answer it? 
  • Could you ask a different question that would be more ethical and still help you find useful, actionable answers? 
  • Does my question contain inherent bias? 
  • How might the biases in my question affect the results of my data analysis? 

Questions like “How can we identify fans of this artist so that we can charge them more money for tickets?” or “What’s the highest fee that we can add to tickets where people will still buy the tickets?” could be good for business, or help increase profits, but they’re unethical. You’d be using data to take actions that are unfair, unequal, and unethical. Just because Splunk software can help you bring data to everything doesn’t mean that you should. 

Break down the question into answerable pieces

If my question is something that I’ve considered ethical to use data to help answer, then it’s time to consider how I’ll perform my data analysis. I want to be sure I consider the following about my question, before I try to answer it:

  • Is this question small enough to answer with data?
  • What data do I need to help me answer this question?
  • How much data do I need to help me answer this question?

I can turn data into answers, but I have to be careful about the answers that I look for. If I don’t consider the small questions that make up the big question, I might end up with biased answers. (For more on this, see my .conf17 talk with Celeste Tretto).

So if I consider “What kind of music do I like to listen to?”, I might recognize right away that the question is too broad. There are many things that could change the answer to that question. I’ll want to consider how my subjective preferences (what I like listening to) might change depending on what I’m doing at the time: commuting, working out, writing technical documentation, or hanging out on the couch. I need to break the question down further. 

A list of questions that might help me answer my overall question could be: 

  • What music do I listen to while I’m working? When am I usually working?
  • What music do I listen to while I’m commuting? When am I usually commuting?
  • What music do I listen to when I’m relaxing? When am I usually relaxing?
  • What are some characteristics of the music that I listen to?
  • What music do I listen to more frequently than other music?
  • What music have I purchased or added to a library? 
  • What information about my music taste isn’t captured in data?
  • Do I like all the music that I listen to?

As I’m breaking down the larger question of “What kind of music do I like to listen to?”, the most important question I can ask is “What kind of music do I think I like to listen to?”. This question matters because data analysis isn’t as simple as turning data into answers. That can make for catchy marketing, but the nuance here lies in using the data you have to reduce uncertainty about what you think the answer might be. The book How to Measure Anything by Douglas Hubbard covers this concept of data analysis as uncertainty reduction in great detail, but essentially the crux is that for a sufficiently valuable and complex question, there is no single objective answer (or else we would’ve found it already!). 

So I must consider, right at the start, what I think the answer (or answers) to my overall question might be. Since I want to know what kind of music I like, I therefore want to ask myself what kind of music I think I might like. Because “liking” and “kind of music” are subjective characteristics, there can be no single true answer that is objective truth. Very few, if any, complex questions have objectively true answers, especially those that can be found in data. 

So I can’t turn data into answers for my overall question, “What kind of music do I like?” but I can turn it into answers for more simple questions that are rooted in fact. The questions I listed earlier are much easier to answer with data, with relative certainty, because I broke up the complex, somewhat subjective question into many objective questions. 

Consider the data you have

After you have your questions, look for the answers! Consider the data that you have, and whether or not it is sufficient and appropriate to answer the questions. 

The flexibility of Splunk software means that you don’t have to consider the questions you’ll ask of the data before you ingest it. Structured or unstructured, you can ask questions of your data, but you might have to work harder to fully understand the context of the data to accurately interpret it. 

Before you analyze and interpret the data, you’ll want to gather context about the data, like:

  • Is the dataset complete? If not, what data is missing?
  • Is the data correct? If not, in what ways could it be biased or inaccurate?
  • Is the data similar to other datasets you’re using? If not, how is it different?

This additional metadata (data about your datasets) can provide crucial context necessary to accurately analyze and interpret data in an unbiased way. For example, if I know there is data missing in my analysis, I need to consider how to account for that missing data. I can add additional (relevant and useful) data, or I can acknowledge how the missing data might or might not affect the answers I get.

After gathering context about your datasets, you’ll also want to consider if the data is appropriate to answer the question(s) that you want to answer. 

In my case, I’ll want to assess the following aspects of the datasets: 

  • Is using the audio features API data from Spotify the best way to identify characteristics in music I listen to? 
  • Could another dataset be better? 
  • Should I make my own dataset? 
  • Does the data available to me align with what matters for my data analysis? 

You can see a small way that the journalist Matt Daniels of The Pudding considered the data relevant to answer the question “How popular is male falsetto?” for the Vox YouTube series Earworm starting at 1:45 in this clip. For about 90 seconds, Matt and the host of the show, Estelle Caswell, discuss the process of selecting the right data to answer their question, including discussing the size of the dataset (eventually choosing a smaller, but more relevant, dataset) to answer their question. 

Is more data always better? 

Data is valuable when it’s in context and applied with consideration for the problem that I’m trying to solve. Collecting data about my schedule may seem overly-intrusive or irrelevant, but if it’s applied to a broader question of “what kind of music do I like to listen to?” it can add valuable insights and possibly shift the possible overall answer, because I’ve applied that additional data with consideration for the question that I’m trying to answer.

Splunk published a white paper to accompany the rebranding, and it contains some excellent points. One of them that I want to explore further is the question:

“how complete, how smart, are these decisions if you’re ignoring vast swaths of your data?” 

On the one hand, having more data available can be valuable. I am able to get a more valuable answer to “what kind of music do I like” because I’m able to consider additional, seemingly irrelevant data about how I spend my time while I’m listening to music. However, there are many times when you want to ignore vast swaths of your data. 

The most important aspect to consider when adding data to your analysis is not quantity, but quality. Rather than focusing on how much data you might be ignoring, I’d suggest instead focusing on which data you might be ignoring, for which questions, and affecting which answers. You might have a lot of ignored data, but put your focus on the small amount of data that can make a big difference in the answers you find in the data.

As the academics in “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?” make clear with their crucial finding:

“More data lead to better conclusions only when we know how to take advantage of their information. In other words, size does matter, but only if it is used appropriately.”

The most important aspect of adding data to an analysis is exactly as the academics point out: it’s only more helpful if you know what to do with it. If you aren’t sure how to use additional data you have access to, it can distract you from what you’re trying to answer, or even make it harder to find useful answers because of the scale of the data you’re attempting to analyze. 

Douglas Hubbard in the book How to Measure Anything makes the case that doing data analysis is not about gathering the most data possible to produce the best answer possible. Instead, it’s about measuring to reduce uncertainty in the possible answers and measuring only what you need to know to make a better decision (based on the results of your data analysis). As a result, such a focused analysis often doesn’t require large amounts of data — rough calculations and small samples of data are often enough. More data might lead to greater precision in your answer, but it’s a tradeoff between time, effort, cost, and precision. (I also blogged about the high-level concepts in the book).

If I want to answer my question “What kind of music do I like to listen to?” I don’t need the listening data of every user on the Last.fm service, nor do I need metadata for songs I’ve never heard to help me identify song characteristics I might like. Because I want to answer a specific question, it’s important that I identify the specific data that I need to answer it—restricted by affected user, existence in another dataset, time range, type, or whatever else.

If you want more evidence, the notion that more data is always better is also neatly upended by the Nielsen-Norman Group in Why You Only Need to Test with 5 Users and the follow-up How Many Test Users in a Usability Study?.

Keep context alongside the data

Indeed, the white paper talks about bringing people to a world where they can take action without worrying about where their data is, or where it comes from. But it’s important to still consider where the data comes from, even if you aren’t having to worry about it because you use Splunk software. It’s relevant to data analysis to keep context about the data alongside the data.

For example, it’s important for me to keep track of the fact that the song characteristics I might use to identify the type of music I like come from a dataset crafted by Spotify, or that my listening behavior is tracked by the service Last.fm. Last.fm can only track certain types of listening behavior on certain devices, and Spotify has their own biases in creating a set of audio characteristics.

If I lose track of this seemingly-mundane context when analyzing my data, I can potentially incorrectly interpret my data and/or draw inaccurate conclusions about what kind of music I like to listen to, based purely on the limitations of the data available to me. If I don’t know where my data is coming from, or what it represents, then it’s easy to find biased answers to questions, even though I’m using data to answer them.

If you have more data than you need, this also makes keeping context close to your data more difficult. The more data, the more room for error when trying to track contextual meaning. Splunk software includes metadata fields for data that can help you keep some context with the data, such as where it came from, but other types of context you’d need to track yourself.

More data can not only complicate your analysis, but it can also create security and privacy concerns if you keep a lot of data around and for longer than you need it. If I want to know what kind of music I like to listen to, I might be comfortable doing data analysis to answer that question, identifying the characteristics of music that I like, and then removing all of the raw data that led me to that conclusion out of privacy or security concerns. Or I could drop the metadata for all songs that I’ve ever listened to, and keep only the metadata for some songs. I’d want to consider, again, how much data I really need to keep around. 

Turn data into answers—mostly

So I’ve broken down my overall question into smaller, more answerable questions, I’ve considered the data I have, and I’ve kept the context alongside the data I have. Now I can finally turn it into answers, just like I was promised!

It turns out I can take a corpus of my personal listening data and combine it with a dataset of my personal music libraries to weight the songs in the listening dataset. I can also assess the frequency of listens to further weight the songs in my analysis and formulate a ranking of songs in order of how much I like them. I’d probably also want to split that ranking by what I was doing while I was listening to the music, to eliminate outliers from the dataset that might bias the results. All the small questions that feed into the overall question are coming to life.

After I have that ranking, I could use additional metadata from another source, such as the Spotify audio features API, to identify the characteristics of the top-ranked songs, and ostensibly then be able to answer my overall question: what kind of music do I like to listen to?

By following all these steps, I turned my data into answers! And now I can turn my data into doing, by taking action on those characteristics. I can of course seek out new music based on those characteristics, but I can also book the ideal DJs for my birthday party, create or join a community of music lovers with similar taste in music, or even delete any music from my library that doesn’t match those characteristics. Maybe the only action I would take is self-reflection, and see if what the data has “told” me is in line with what I think is true about myself.

It is possible to turn data into answers, and turn data into doing, with caution and attention to all the ways that bias can be introduced into the data analysis process. But there’s still one more way that data analysis could result in biased outcomes: communicating results. 

Carefully communicate data findings

After I find the answers in my data, I need to carefully communicate them to avoid bias. If I want to tell all my friends that I figured out what kind of music I like to listen to, I want to make sure that I’m telling them that carefully so that they can take the appropriate and ethical action in response to what I tell them. 

I’ll want to present the answers in context. I need to describe the findings with the relevant qualifiers: I like music with these specific characteristics, and when I say I like this music I mean this is the kind of music that I listen to while doing things I enjoy, like working out, writing, or sitting on my couch. 

I also need to make clear what kind of action might be appropriate or ethical to take in reaction to this information. Maybe I want to find more music that has these characteristics, or I’d like to expand my taste, or I want to see some live shows and DJ sets that would feature music that has these characteristics. Actions that support those ends would be appropriate, but can also risk being unethical. What if someone learns of these characteristics, and chooses to then charge me more money than other people (whose taste in music is unknown) to see specific DJ sets or concerts featuring music with those characteristics? 

Data, per the white paper, “must be brought not only to every action and decision, but to every department.” Because of that, it’s important to consider how that happens. Share relevant parts of the process that led to the answers you found from the data. Communicate the results in a way that can be easily understood by your audience. This Medium post by Cecelia Shao, a product manager at Comet.ml, covers important points about how to communicate the results of data analysis. 

Use data for good

I wanted to talk through the data analysis process in the context of the rebranded slogans and marketing content so that I could unpack additional nuance that marketing content can’t convey. I know how easy it is to introduce bias into data analysis, and how easily data analysis can be applied to unethical questions, or used to take unethical actions.

As the white paper aptly points out, the value of data is not merely in having it, but in how you use it to create positive outcomes. You need to be sure you’re using data safely and intelligently, because with great access to data comes great responsibility. 

Go forth and use the data-to-everything platform to turn data into doing…the right thing. 

Disclosure: I work for Splunk. Thanks to my colleagues Chris Gales, Erica Chen, and Richard Brewer-Hay for the feedback on drafts of this post. While colleagues reviewed this post and provided feedback, the content is my own and represents my own views rather than those of Splunk the company. 

Streaming, the cloud, and music interactions: are libraries a thing of the past?

Several years ago I wrote about fragmented music libraries and music discovery. In light of the overwhelming popularity of Spotify and the dominance of streaming music (Spotify, Apple Music, Amazon Music, Tidal, and others), I’m curious if music libraries even exist anymore. Or, if they exist today, will they continue to exist? 

My guess is that the only people still maintaining music libraries are DJs, fervent music fans (like myself), or people that aren’t using streaming music at all (due to age, lack of interest, or lack of availability due to markets or internet speeds). 

I was chatting with a friend of mine that has a collection of vinyl records, but she only ever listens to vinyl if she’s relaxing on the weekend. Oftentimes she’s just asking Alexa to play some music, without much attention to where that music is coming from. With Amazon Music bundled into Amazon Prime for many members, people can be totally unaware that they’re using a streaming service at all. I’d hazard that this interaction pattern is true for most people, especially those that never enjoyed maintaining a music library but instead collected CDs and records because that was the only way to be able to listen to music at all. 

Even my own habits are changing, perhaps equally due to time constraints as due to current music technology services. I used to carefully curate playlists for sharing with others, listening in the car, mix CDs, and for radio shows. These days I make playlists for many of those same purposes on Spotify, but the songs in my “actual” music library (iTunes) aren’t categorized into playlists at all anymore, and I give the playlists I make on my iPhone random names like “Aaa yay” to make the playlists easier to find, rather than to describe the contents. 

I’m limited by storage size in terms of what I can add to my iPhone, just like I was with my iPod, but that shapes my experience of the music. Since I’m limited to a smaller catalogue, I’m able to sit with the music more and create more distinct memories. There are still songs that remind me of being in Berlin in 2011, limited to the songs that I added to my iPod before I left the United States because the internet I had access to in Germany was too slow to download new music and add it to my iPod. 

Nowadays, I am less motivated to carefully manage my iTunes library because it’s only on one device, whereas I can access my Spotify library across multiple devices. That’s the one I find myself carefully creating folders of playlists for, organizing and sorting tracks and playlists. A primary reason for the success of Spotify for my listening habits is the social and collaborative nature of it. It’s easy to share tracks with others, make a playlist for a DJ set that I went to to share with others, contribute to a weekly collaborative playlist with a community of fellow music-lovers, or to follow playlists created by artists and DJs I love. My local library can give me a lot, but it can’t give me that community interaction.

Indeed, in 2015 that’s something I identified as lacking. I felt that it was harder to feel part of a music culture, writing:

“It’s harder than it used to be to feel connected with music. It’s not a stream or a subculture one is tapped into anymore, because it’s so distributed on the web. There’s so much music, and it lives in so many different services, that the music culture has imploded a bit.”

I feel completely differently these days, thanks to a vibrant live music community in San Francisco. I loathe Facebook, but the groups that I’m a part of on that site enable me to feel connected to a greater music scene and community that supplement my connection to music and music discovery. Ironically, Facebook groups have also helped my music culture experience become more local. The music blogs that I used to be able to tap into are now largely defunct, or have multiple functions (the burning ear also running vinyl me please, or All Things Go also providing news and an annual festival in DC). Instead yet another way I discover new music is by paying attention to the artists and DJs that people in these Facebook groups are talking about and posting tracks and albums from. 

Despite the challenges of a local music library, I keep buying digital music partially because I made a promise to myself when I was younger that I’d do so when I could afford to, partially to support musicians and producers, and partially because I distrust that streaming services will stick around with all the music I might want to listen to. I’d rather “own” it, at least as best as I can when it’s a digital file that risks deletion and decomposition over time. 

Music discovery in the past was equal parts discovery and collection, with a hefty dose of listening after I collected new music.

A flowchart showing Discover -> Collect -> Listen in a triangle, with listen connecting back to discoverI’d do the following when discovering new music:

  • Writing down song lyrics while listening to the radio or while working my retail job, then later looking up the tracks to check out albums from the library to rip to my family computer.
  • Following music blogs like The Burning Ear, All Things Go, Earmilk, Stereogum, Line of Best Fit, then downloading what I liked best from their site from MediaFire or MegaUpload to save to my own library.
  • Trolling through illicit LiveJournal communities or invite-only torrent sites to download discographies for artists I already liked, or might like.

Over time, those music blogs shifted to using SoundCloud, the online communities and torrent sites shuttered, and I started listening to more music on streaming sites instead. The loop stopped going from discovery to collection and instead to discovery, like, and discovery again. 

Find a new track, listen, click the heart or the plus sign, and move on. Rarely do you remember to go back and listen to your fully-compiled list of saved tracks (or even if you do, trying to listen to the whole thing on shuffle will be limited by the web app, thanks SoundCloud). 

A flowchart showing a cycle from discover to like and back again using arrows.

This type of cycle is faster than the old cycle, and more focused on engagement with the service (rather than the music) and less on collecting and more on consuming. In some ways, downloading music was like this too. When I accidentally deleted my entire music library in 2012, the tatters of my library that I was able to recover from my iPod was a scant representation of my full collection, but included in that library was discographies that I would likely never listen to. Now that it’s been years, there have been a few occasions where I go back and discover that an artist I listen to now is in that graveyard of deleted songs, but even knowing that, I’m not sure I would’ve gotten to it any sooner. I was always collecting more than I was listening to. 

Streaming music lets me collect in the same way, but without the personal risk. It just makes me dependent on a third-party entity that permits me to access the tracks that they store for me. I end up with lists of liked tracks across multiple different services, none of which I fully control. These days my music discovery is now largely driven by 3 services: Spotify, Shazam, and Soundcloud. Spotify pushes algorithmic recommendations to me, Shazam enables me to discover what track the DJ is currently playing when I’m out at a DJ set, and Soundcloud lets me listen to recorded DJ sets as well as having excellent autoplay recommendations. In all of them I have lists of tracks that I may never revisit after saving them. Some of them I’ll never be able to revisit, because they’ve been deleted or the service has lost the rights to the track. 

In 2015 I lamented the fragmentation of music discovery, but looking back, my music discovery was always shared across services, devices, and methods—the central iTunes library was what tied the radio songs, the library CDs, the discography downloads, and the music blog tracks together. The real issue is that the primary music discovery modes of today are service-dependent, and each of those services provides their own constructs of a music library. I mentioned in 2015 that:

“my library is all over the place. iTunes is still the main home of my music—I can afford to buy new music when I want —but I frequent Spotify and SoundCloud to check out new music. I sync my iTunes library to Google Play Music too, so I can listen to it at work.” 

While this is still largely true, I largely consume Spotify when I’m at work, listen to SoundCloud sets or tracks from iTunes when I’m on-the-go with my phone, and listen to Spotify or iTunes when I’m on my personal laptop. That’s essentially 2.5 places that I keep a music library, and while I maintain a purchase pipeline of tracks from Spotify and SoundCloud into my iTunes library, it’s a fraction of my discoveries that make it into my collection for the long term. The days of a true central collection of my library are long since past. 

It seems a feat, with all these digital cloud music services streaming music into our ears, to have a local music library. Indeed, what’s the point of holding onto your local files when it becomes so difficult to access it? iTunes is becoming the Apple Music app, with the Apple Music streaming service front and center. Spotify is, well, Spotify. And SoundCloud continues to flounder yet provides an essential service of underground music and DJ sets. Google Play Music exists, but only has a web-based player (no client) to make it easier to access and listen to your local library after you’ve mirrored it to the cloud. Streaming is convenient. But streaming music lets others own your content for you, granting you subscription access to it at best, ruining the quality of your music listening experience at worst. 

A recent essay by Dave Holmes in Esquire talks about “The Deleted Years”, or the years that we stored music on iPods, but since Spotify and other streaming services, have largely moved on from. As he puts it, 

“From 2003 to 2012, music was disposable and nothing survived.”

Perhaps it’s more true that from 2012 onward, music is omnipresent and yet more disposable. It can disappear into the void of a streaming service, and we’ll never even know we saved it. At least an abandoned iPod gives us a tangible record of our past habits. 

As Vicki Boykis wrote about SoundCloud in 2017

“I’m worried that, for internet music culture, what’s coming is the loss of a place that offered innumerable avenues for creativity, for enjoyment, for discovery of music that couldn’t and wouldn’t be created anywhere else. And, like everyone who has ever invested enough emotion in an online space long enough to make it their own, I’m wondering what’s next.”

I’ll be here, discovering, collecting, liking, and listening for what’s next.

Music streaming and sovereignty

As the music industry moves away from downloads and toward building streaming platforms, international sovereignty becomes more of a barrier to people listening to music and discussing it with others, because they don’t have access to the same music on the same platforms. As Sean Michaels points out in The Morning News several years ago:

one of the undocumented glitches in the current internet is all its asymmetrical licensing rules. I can’t use Spotify in Canada (yet). Whenever I’m able to, there’s no guarantee that Spotify Canada’s music library will match Spotify America’s. Just as Netflix Canada is different than Netflix US, and YouTube won’t let me see Jon Stewart. As we move away from downloads and toward streaming, international sovereignty is going to become more and more of a barrier to common discussions of music.

Location has always been a challenge to music access, but it’s important to keep in mind that the internet and music streaming has not been an equitable boon to music access—it is still controlled.

Planning and analyzing my concert attendance with Splunk

This past year I added some additional datasets to the Splunk environment I use to analyze my music: information about tickets that I’ve purchased, and information about upcoming concerts.

Ticket purchase analysis

I started keeping track of the tickets that I’ve purchased over the years, which gave me good insights about ticket fees associated with specific ticket sites and concert promoters.  

Based on the data that I’ve accumulated so far, Ticketmaster doesn’t have the highest fees for concert tickets. Instead, Live Nation does. This distinction is relatively meaningless when you realize they’ve been the same company since 2010.

However, the ticket site isn’t the strongest indicator of fees, so I decided to split the data further by promoter to identify if specific promoters had higher fees than others.

Based on that data you can see that the one show I went to promoted by AT&T had fee percentages of nearly 37%, and that shows promoted by Live Nation (through their evolution and purchase by Ticketmaster) also had fees around 26%. Shows promoted by independent venues have somewhat higher fees than others, hovering around 25% for 1015 Folsom and Mezzanine, but shows promoted by organizations whose only purpose is promotion tend to have slightly lower fees, such as select entertainment with 18%, Popscene with 16.67%, and KC Turner Presents with 15.57%.

I realized I might want to refine this, so I recalculated this data, limiting it to promoters from which I’ve bought at least two tickets.

It’s a much more even spread in this case, ranging from 25% to 11% in fees. However, you can see that the same patterns exist— for the shows I’ve bought tickets to, the independent venues average 22-25% in fees, while dedicated independent promoters are 16% or less in added fees, with corporate promoters like Another Planet, JAM, and Goldenvoice filling the middle of the data ranging from 18% to 22%.

I also attempted to determine how I’m discovering concerts. This data is entirely reliant on my memory, with no other data to back it up, but it’s pretty fascinating to track.

It’s clear that Songkick has become a vital service in my concert-going planning, helping me discover 46 shows, and friends and email newsletters from venues helping me stay in the know as well for 19 and 14 shows respectively. Social media contributes as well, with a Facebook community (raptors) and Instagram making appearances with 10 and 2 discoveries respectively.

Concert data from Songkick

Because Songkick is so vital to my concert discovery, I wanted to amplify the information I get from the service. In addition to tracking artists on the site, I wanted to proactively gather information about artists coming to the SF Bay Area and compare that with my listening habits. To do this, I wrote a Songkick alert action in Python to run in Splunk.

Songkick does an excellent job for the artists that I’m already tracking, but there are some artists that I might have just recently discovered but am not yet tracking. To reduce the likelihood of missing fast-approaching concerts for these newly-discovered artists, I set up an alert to look for concerts for artists that I’ve discovered this year and have listened to at least 5 times.

To make sure I’m also catching other artists I care about, I use another alert to call the Songkick API for every artist that is above a calculated threshold. That threshold is based on the average listens for all artists that I’ve seen live, so this search helps me catch approaching concerts for my historical favorite artists.

Also to be honest, I also did this largely so that I could learn how to write an alert action in Splunk software. Alert actions are essentially bits of custom python code that you can dispatch with the results of a search in Splunk. The two alert examples I gave are both saved searches that run every day and update an index. I built a dashboard to visualize the results.

I wanted to use log data to confirm which artists were being sent to Songkick with my API request, even if no events were returned. To do this I added a logging statement in my Python code for the alert action, and then visualized the log statements (with the help of a lookup to match the artist_mbid with the artist name) to display the artists that had no upcoming concerts at all, or had no SF concerts.

For those artists without concerts in the San Francisco Bay Area, I wanted to know where they were going instead, so that I could identify possible travel locations for the future.

It seems like Paris is the place to be for several of these artists—there might be a festival that LAUER, Max Cooper, George Fitzgerald, and Gerald Toto are all playing at, or they just happen to all be visiting that city on their tours.

I’m planning to publish a more detailed blog post about the alert action code in the future on the Splunk blogs site, but until then I’ll be off looking up concert tickets to these upcoming shows….

Making Concert Decisions with Splunk

The annual Noise Pop music festival starts this week, and I purchased a badge this year, which means I get to go to any show that’s a part of the festival without buying a dedicated ticket.

That means I have a lot of choices to make this week! I decided to use data to assess (and validate) some of the harder choices I needed to make, so I built a dashboard, “Who Should I See?” to help me out.

First off, the Wednesday night show. Albert Hammond, Jr. of the Strokes is playing, but more people are talking about the Baths show the same night. Maybe I should go see Baths instead?

Screen capture showing two inputs, one with Baths and one with Albert Hammond, Jr, resulting in count of listens compared for each artist (6 vs 39) and listens over time for each artist. Baths has 1 listen before 2012, and 1 listen each year for 2016 until this year. Albert Hammond, Jr has 8 listens before 2010, and a consistent yet reducing number over time, with 5 in 2011 and 4 in 2015, but just a couple since then.

If I’m making my decisions purely based on listen count, it’s clear that I’m making the right choice to see Albert Hammond, Jr. It is telling, though, that I’ve listened to Baths more recently than him, which might have contributed to my indecision.

The other night I’m having a tough time deciding about is Saturday night. Beirut is playing, but across the Bay in Oakland. Two other interesting artists are playing closer to home, Bob Mould and River Whyless. I wouldn’t normally care about this so much, but I know my Friday night shows will keep me busy and leave me pretty tired. So which artist should I go see?

3 inputs on a dashboard this time, Beirut, Bob Mould, and River Whyless are the three artists being compared. Beirut has 44 listens, Bob Mould has 21, River Whyless has 3. Beirut has frequent listens over time, peaking at 6 before 2010, but with peaks at 5 in 2011 and 2019. Bob Mould has 6 listens pre-2009, but only 3 in 2010 and after that, 1 a year at most. River Whyless has 1 listen in April, and 2 in December of 2018.

It’s pretty clear that I’m making the right choice to go see Beirut, especially given my recent renewed interest thanks to their new album.

I also wanted to be able to consider if I should see a band at all! This isn’t as relevant this week thanks to the Noise Pop badge, but it currently evaluates if the number of listens I have for an artist exceeds the threshold that I calculate based on the total number of listens for all artists that I’ve seen live in concert. To do this, I’m evaluating whether or not an artist has more listens than the threshold. If they do, I return advice to “Go to the concert!” but if they don’t, I recommend “Only if it’s cheap, yo.”

Because I don’t need to make this decision for Noise Pop artists, I picked a few that I’ve been wanting to see lately: Lane 8, Luttrell, and The Rapture.

4 dashboard panels, 3 of which ask "Should I go see (artist) at all?" one for each artist, Lane 8, Luttrell, and The Rapture. Lane 8 and Luttrell both say "Only go if it's cheap, yo." and The Rapture says "Go to the concert!". The fourth panel shows frequent listening for The Rapture, especially from 2008-2012, with a recent peak in 2018. Lane 8 spikes at the end of the graph, and Luttrell is a small blip at the end of the graph.

While my interest in Lane 8 has spiked recently, there still aren’t enough cumulative listens to put them over the threshold. Same for Luttrell. However, The Rapture has enough to put me over the threshold (likely due to the fact that I’ve been listening to them for over 10 years), so I should go to the concert! I’m going to see The Rapture in May, so I am gleefully obeying my eval statement!

On a more digressive note, it’s clear to me that this evaluation needs some refinement to actually reflect my true concert-going sentiments. Currently, the threshold averages all the listens for all artists that I’ve seen live. It doesn’t restrict that average to consider only the listens that occur before seeing an artist live, which might make it more accurate. That calculation would also be fairly complex, given that it would need to account for artists that I’ve seen multiple times.

However, number of listens over time doesn’t alone reflect interest in going to a concert. It might be useful to also consider time spent listening, beyond count of listens for an artist. This is especially relevant when considering electronic music, or DJ sets, because I might only have 4 listen counts for an artist, but if that comprises 8 hours of DJ sets by that artist that I’ve listened to, that is a pretty strong signal that I would likely enjoy seeing that artist perform live.

I thought that I’d need to get direct access to the MusicBrainz database in order to get metadata like that, but it turns out that the Last.fm API makes some available through their track.getInfo endpoint, so I just found a new project! In the meantime I am able to at least calculate duration for tracks that exist in my iTunes library.

I now have a new avenue to explore with this project, collecting that data and refining this calculation. Reach out on Twitter to let me know what you might consider adding to this calculation to craft a data-driven concert-going decision-making dashboard.

If you’re interested in this app, it is open sourced and available on Splunkbase. I’ll commit the new dashboard to the app repo soon!