How do you make large scale harm visible on the individual level?

Teams that build security and privacy tools like Brave Browser, Tor Browser, Signal, Telegram, and others focus on usability and feature parity of these tools in an effort to more effectively acquire users from Google Chrome, iMessage, Google Hangouts, WhatsApp, and others. 

Do people fail to adopt these more secure and private tools because they aren’t as usable as what they’re already using, or because it requires too much effort to switch?

I mean, of course it’s both. You need to make the effort to switch, and in order to switch you need viable alternatives to switch to. And that’s where the usability and feature parity of Brave Browser and Signal compared with Google Chrome and WhatsApp come in. 

But if we’re living in a world where feature parity and usability are a foregone conclusion, and we are, then what? What needs to happen to drive a large-scale shift away from data-consuming and privacy-invading tools and toward those that don’t collect data and aggressively encrypt our messages? 

To me, that’s where it becomes clear that the amorphous effects of widespread data collection—though well-chronicled in blog posts, books, and shows like The Social Dilemma— don’t often lead to real change unless a personal threat is felt. 

Marginalized and surveilled communities adopt tools like Signal or FireChat in order to protect their privacy and security, because their privacy and security are actively under threat. For others, their privacy and security is still under threat, but indirectly. Lacking a single (or a series of) clear events that are tied to direct personal harm, people don’t often abandon a platform. 

If I don’t see how the use of using Google Chrome, YouTube, Facebook, Instagram, Twitter, and other sites and tools cause direct harm to me, I have little incentive to make a change, despite the evidence of aggregate harm on society—amplified societal divisions, active disinformation campaigns, and more. 

Essays that expose the “dark side” of social media and algorithms make an attempt to identify distinct personal harms caused by these systems. Essays like James Bridle’s essay on YouTube, Something is wrong on the internet (2017), or Adrian Chen’s essay about what social media content moderators experience, The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed (2014) or Casey Newton’s about the same, The secret lives of Facebook moderators in America (2019), gain widespread attention for the problems they expose, but don’t necessarily lead to people abandoning the platforms, nor lead the platforms themselves to take action. 

These theorists and journalists are making a serious attempt to make large-scale harm caused by these platforms visible on an individual level, but nothing is changing. Is it the fault of the individual, or the platform?

Spoilers, it’s always “both”. And here we can draw an analogy to climate change too. As with climate change, the effects resulting from these platforms and companies are so amorphous, it’s possible to point to alternate explanations—for a time. Dramatically worsening wildfires in the Western United States are a result of poor fire policy, worsening tropical storms are a result of weaker wind patterns (or stronger ones? I don’t study wind). 

One could argue that perhaps climate change is the result of mechanization and industrialization in general, and it would be happening without the companies currently contributing to it. Perhaps the dark side of the internet is just the dark side of reality, and nothing worse than would exist without these platforms and companies contributing. 

The truth is, it’s both. We live in a “yes, and” world. Climate change is causing, contributing to, and intensifying the effects of wildfires and the strength and frequency of tropical storms and hurricanes. Platform algorithms are causing, contributing to, and intensifying the effects of misinformation campaigns and violence on social media and the internet. 

And much like companies that contributed to climate change knew what was happening, as reported in The Guardian: Shell and Exxon’s secret 1980s climate change warnings, Facebook Google and others know that their algorithms are actively contributing to societal harm—but the companies aren’t doing enough about it. 

So what’s next? 

  • Do we continue to attempt to make the individual feel the pain of the community in an effort to cause individual change? 
  • Do we use laws and policy to constrain the use of algorithms for specific purposes, in an effort to regulate the effects away?
  • Do we build alternate tools with the same functionality and take users away from the harm-causing tools? 
  • Do we use our power as laborers to strike against the harm caused by the tools that we build? 

With climate change (and too with data security and privacy), we’re already taking all of these approaches. What else might be out there? What else can we do to lead to change? 

Repersonalizing Digital Communications: Against Standardizing and Interfering Mediations

Back in 2013 I wrote a blog post reacting to Cristina Vanko’s project to handwrite her text messages for one week. At the time, I focused on how Cristina introduced slowness into a digital communication that often operates as a conversation due to the immediacy and frequency of responses. Since 2013, texting has grown more popular and instant messaging has woven its way into our work environments as well. Reinvoking that slowness stays relevant, but careful notification settings can help recapture it as well. 

What I want to focus on is the way that her project repersonalizes the digital medium of communication, adding her handwriting and therefore more of her personality into the messages that she sends. I thought of this project again while watching a talk from Jonathan Zong for the Before and Beyond Typography Online Conference. In his talk, he points out that “writing is a form of identity representation”, with handwriting being “highly individualized and expressive”, while “in contrast, digital writing makes everyone’s writing look the same. People’s communications are filtered through the standardized letterforms of a font.” 

His project that he discusses in part of that talk, Biometric Sans, “elongates letterforms in response to the typing speed of the individual”, thus providing another way to reembody personality into digitally-mediated communications. He describes the font as “a gesture toward the reembodiment of typography, the reintroduction of the hand in digital writing.” It’s an explicit repersonalization of a digitally-mediated communication, in much the same way Cristina Vanko chose to handwrite her text messages to do the same. Both projects seek to repersonalize, and thereby rehumanize, the somewhat coldly standardized digital communication formats that we rely on. 

Without resorting to larger projects, we find other ways to repersonalize our digital communications: sharing stickers (I’m rather fond of Rejoinders), crafting new expressions (lol) and words, and even sending voice responses (at times accidentally) in text messages. In this way we can poke at the boundaries of the digital communication methods sanitized by standardized fonts for all users.

While Jonathan stayed rather focused on the typography mediation of digital communication due to the topic of the conference, I want to expand this notion of repersonalizing the digital communication methods. Fonts are not the only mechanism by which digital communications can be mediated and standardized—the tools that we use to create the text displayed by the fonts do just as much (if not more). 

The tools that mediate and standardize our text in other ways are, of course, automatic correction, predictive text, and the software keyboards themselves.

Apple is frustratingly subtle about automatic correction (autocorrect), oftentimes changing a perfectly legitimate word that you’ve typed into a word with a completely different meaning. It’s likely that autocorrect is attempting to “accelerate” your communications by guessing what you’re trying to type. This guess, mediating your input to alter the output, often interferes with your desired meaning. When this interfering mediation fails (which is often), you’re instead slowed down, forced to identify that your intended input has been unintentionally transformed, fix it, perhaps fix it again, and only then send your message.

Google, meanwhile, more often preemptively mediates your text. Predictive text in Google Mail “helps” you by suggesting commonly-typed words or responses.

Screenshot of Google Mail draft, with the text Here are some suggestions about what I might be typing next.  Do you want to go to the store? Maybe to the movies? What about to the mall?  What do you listen to? Sofi Tukker? What other DJs do you have? Where "have?" is a predictive suggestion and not actually typed.

This is another form of interference (in my mind), distracting you from what you’re actually trying to communicate and instead inserting you into a conflict with the software, fighting a standardized communication suggestion while you seek to express your point (and your personality) with a clear communication. Often, it can be distractingly bland or comical.

Screenshot of google mail smart responses, showing one that says "Thank you, I will do that." another that says "thank you!" and a third that says "Will do, thank you!" In Google Mail, this focus on standardized predictive responses also further perpetuates the notion of email as a “task to be completed” rather than an opportunity to interact, communicate, or share something of yourself with someone else. 

Software keyboards themselves also serve to mediate and effectively standardize digital communications. For me personally, I dislike software keyboards because I’m unable to touchtype on them (Frustrated, I tweeted about this in January). Lacking any hardware feedback or orientation, I frequently have to stare at the keyboard while I’m typing. I’m less able to focus on what I’m trying to say because I’m busy focusing on how to literally type it. This forced slowness, introducing a max speed at which you can communicate your thoughts, effectively forces you to rely on software-enabled shortcuts such as autocorrect, predictive text, or actual programmed shortcuts (such as replacing “omw” with “On my way!”), rather than being able to write or type at the speed of your thoughts (or close to it). Because of this limitation, I often choose to write out more abstract considerations or ideas longhand, or reluctantly open my computer, so that I have the privilege of a direct input-to-output translation without any or extensive software mediation. 

In a talk last June at the SF Public Library, Tom Mullaney discussed the mediation of software keyboards in depth, pointing out that software keyboards (or IMEs as he referred to them) do not serve as mechanical interpreters of what we type, but rather use input methods to transcribe text, and that those input methods can adapt to be more efficient. He used the term “hypography” to talk about the practice of writing when your input does not directly match the output. For example, when you use a programmed shortcut like omw, but also when you seek to type a character that isn’t represented on a key, such as ö, or if you’re typing in a language that uses a non-latin alphabet, a specific sequence of keystrokes to represent a fully-formed character in written text. Your input maps to an output, rather than the output matching the input. 

These inputs are often standardized, allowing you to learn the shortcuts over time and serving the purpose of accelerating your communications, but in the case of autocorrect or predictive text, they’re frequently suffering from new iterations—new words or phrases that interferingly mediate and change a slip up into a skip up, encourage you to respond to an email with a bland “Great, thanks!” or attempt to anticipate the entire rest of your sentence after you’ve only written a few words. Because I also have a German keyboard configured, my predictive text will occasionally “correct” an English typo into a German word, or overcapitalize generic English nouns by mistakenly applying German language rules. 

All of these interfering and distracting mediations that accelerate and decelerate our digital communications, alongside our ongoing efforts to repersonalize those communications, has me wondering: What do we lose when our digital communications are accelerated by expectations of instantaneous responses? What do we lose when they’re decelerated by interfering mediations of autocorrect? What do we lose when our communications are standardized by fonts, predictive text, and suggested responses?

Listening to Music while Sheltering in Place

The world is, to varying degrees, sheltering-in-place during this global coronavirus pandemic. Starting in March, the pandemic started to affect me personally: 

  • I started working from home on March 6th. 
  • Governor Gavin Newsom announced on March 11 that any gatherings over 250 people were strongly discouraged, effectively cancelling all concerts for the month of March. 
  • On March 16th, the mayor of San Francisco along with several other counties in the area, announced a shelter-in-place order. 

Ever since then, I’ve been at home. Given all these changes in my life, I was curious what new patterns I might see in my music listening habits. 

With large gatherings prohibited, I went to my last concert on March 7th. With gatherings increasingly cancelled nationwide, and touring musicians postponing and cancelling events, March 27th, Beatport hosted the first livestream festival, “ReConnect. A Global Music Series”. Many more followed. 

Industry-wide studies and data analysis have attempted to unpack various trends in the pandemic’s influence on the music industry. Analytics startup Chartmetric is digging into genre-based listening, geographical listening habits, and Billboard and Nielsen conducting a periodic entertainment tracker survey.

Because I’m me, and I have so much data about my music listening patterns, I wanted to explore what trends might be emerging in my personal habits. I analyzed the months March, April, and May during 2020, and in some cases compared that period against the same period in 2019, 2018, and 2017. The screenshots of data visualizations in this blog post represent data points from May 15th, so it is an incomplete analysis and comparison, given that May in 2020 is not yet complete. 

Looking at my listening habits during this time period, with key dates highlighted, it’s clear that the very beginning of the crisis didn’t have much of an effect on my listening behavior. However, after the shelter-in-place order, the amount of time I spent listening to music increased. After that increase it’s remained fairly steady.

Screenshot of an area chart depicting listening duration ranging from 100 minutes with a couple spikes of 500 minutes but hovering around a max of 250 minutes per day for much of january and february, then starting in march a new range from about 250 to 450 minutes per day, with a couple outliers of nearly 700 minutes of listening activity, and a couple outliers with only a 90 minutes of listening activity.

Key dates such as the first case in the United States, the first case in California, and the first case in the Bay Area are highlighted along with other pandemic-relevant dates.

Listening behavior during March, April, and May over time

When I started my analysis, I looked at my basic listening count from traditional music listening sources. I use Last.fm to scrobble my listening behavior in iTunes, Spotify, and the web from sites like YouTube, SoundCloud, Bandcamp, Hype Machine, and more. 

Chart depicting 2700 total listens for 2017, 2000 total listens for 2018, and 2300 total listens for 2019 during March, April, and May, compared to 3000 total listens in that same period in 2020.

If you just look at 2018 to 2020, it seems like my listening habits are trending upward, maybe with a culmination in 2020. But comparing against 2017, it isn’t much of a difference. I listened to 25% fewer tracks in 2018 compared with 2017, 19% more tracks in 2019 compared with 2018, and 25% more tracks in 2020 compared with 2019. 

Chart depicting total weekday listens during March, April, and May during 2017, 2018, 2019, and 2020 with total weekend listens during the same time. 2017 shows roughly 2400 listens on weekdays and 200ish for 2017, 2000 weekday listens vs 100 weekend listens for 2018, 2100 weekday listens vs 300 weekend listens in 2019, and 2500 weekday listens vs 200 weekend listens in 2020

If I break that down by when I was listening by comparing my weekend and weekday listening habits from the previous 3 years to now, there’s still perhaps a bit of an increase, but nothing much. 

With just the data points from Last.fm, there aren’t really any notable patterns. But number of tracks listened to on Spotify, SoundCloud, YouTube, or iTunes provides an incomplete perspective of my listening habits. If I expand the data I’m analyzing to include other types of listening—concerts attended and livestreams watched—and change the data point that I’m analyzing to the amount of time that I spend listening, instead of the number of tracks that I’ve listened to, it gets a bit more interesting. 

Chart shows roughly 12000 minutes spent listening in 2017, 10000 in 2018, 12000 in 2019, and 22000 in 2020While the number of tracks I listened to from 2019 to 2020 increased only 25%, the amount of time I spent listening to music increased by 74%, a full 150 hours more than the previous year during this time period. And May isn’t even over yet! 

It’s worth briefly noting that I’m estimating, rather than directly calculating, the amount of time spent listening to music tracks and attending live music events. To make this calculation, I’m using an estimate of 3 hours for each concert attended, 4 hours for each DJ set attended, 8 hours for each festival attended, and an estimate of 4 minutes for each track listened to, based on the average of all the tracks I’ve purchased over the past two years. Livestreamed sets are easier to track, but some of those are estimates as well because I didn’t start keeping track until the end of April.

I spent an extra 150 hours listening to music this year during this time—but when was I spending this time listening? If I break down the amount of time I spent listening by weekend compared with weekdays, it’s obvious:

Chart depicts 10000 weekday minutes and 5000 weekend minutes spent listening in 2017, 9500 weekday minutes and 4500 weekend minutes in 2018, 14000 weekday minutes and 2000 weekend minutes in 2019, and 12000 weekday minutes and 13000 weekend minutes in 2020

Before shelter-in-place, I’d spend most of my weekends outside, hanging out with friends, or attending concerts, DJ sets, and the occasional day party. Now that I’m spending my weekends largely inside and at home, coupled with the number of livestreaming festivals, I’m spending much more of that time listening to music. 

I was curious if perhaps working from home might reveal new weekday listening habits too, but the pattern remains fairly consistent. I also haven’t worked from home for an extended period before, so I don’t have a baseline to compare it with. 

It’s clear that weekends are when I’m doing most of my new listening, and that this new listening likely isn’t coming from my traditional listening habits. If I split the amount of time that I spend listening to music by the type of listening that I’m doing, the source of the added time spent listening is clear.

Depicts 11000 minutes of track listens and 1000 minutes of time spent at concerts in 2017, 8000 minutes spent listening to music tracks and 2000 minutes spent at concerts in 2018, 10000 minutes spent listening to music tracks and 3000 minutes spent at concerts in 2019, and 12000 minutes spent listening to music tracks and 9000 minutes listening to livestreams, with a sliver of 120 minutes spent at a single concert in 2020

Hello, livestreams. If you look closely you can also spy the sliver of a concert that I attended on March 7th.

Livestreams dominate, and so does Shazam

All of the livestreams I’ve been watching have primarily been DJ sets. Ordinarily, when I’m at a DJ set, I spend a good amount of time Shazamming the tracks I’m hearing. I want to identify the tracks that I’m enjoying so much on the dancefloor so I can track them down, buy them, and dig into the back catalog of those artists. 

So I requested my Shazam data to see what’s happening now that I’m home, with unlimited, shameless, and convenient access to Shazam.

For the time period that I have Shazam data for, the correlation of Shazam activity to number of livestreams watched is fairly consistent at roughly 10 successful Shazams per livestream.  

Chart details largely duplicated in surrounding text, but of note is a spike of 6 livestreams with only 30 or so songs shazammed, while the next few weeks show a fairly tight interlock of shazam activity with number of livestreams

Given the correlation of Shazam data, as well as the continued focus on watching DJ sets, I wanted to explore my artist discovery statistics as well. Especially when it seemed like my listening activity hadn’t shifted much, I was betting that my artist discovery statistics have been increasing during this time. If I look at just the past few years, there seems to be a direct increase during this time period. 

Chart depicts 260ish artists discovered in March, April, and May of 2018, 280 discovered in 2019, and 360 discovered in 2020Chart depicts 260ish artists discovered in March, April, and May of 2018, 280 discovered in 2019, and 360 discovered in 2020. Second chart shows the same data but adds 2017, with 390 artists discovered

However, after I add 2017 into the list as well, the pattern doesn’t seem like much of a pattern at all. Perhaps by the end of May, there will be a correlation or an outsized increase. But at least for now, the added number of livestreams I’ve been watching don’t seem to be producing an equivalently high number of artist discoveries, even though they’re elevated compared with the last two years. 

That could also be that the artists I’m discovering in the livestreams haven’t yet had a substantial effect on my non-livestream listening patterns, even if there’s 91 hours of music (and counting) in my quarandjed playlist where I store the tracks that catch my ear in a quarantine DJ set. Adding music to a playlist, of course, is not the same thing as listening to it. 

Livestreaming as concert replacement?

Shelter-in-place brought with it a slew of event cancellations and postponements. My live events calendar was severely affected. As of now, 15 concerts were affected in the following ways:

Chart depicts 6 concerts cancelled and 9 postponed

The amount of time that I spend at concerts compared with watching livestreams is also starkly different.

Chart depicts 1000 minutes spent at concerts in 2017, 2000 minutes at concerts in 2018, 2500 minutes at concerts in 2019, and 8000 minutes spent watching livestreams, with a topper of 120 minutes at a concert in 2020

I’ve spent 151 hours (and counting) watching livestreams, the rough equivalent of 50 concerts—my entire concert attendance of last year. This is almost certainly because I’m often listening to livestreams, rather than watching them happen.

Concerts require dedication—a period of time where you can’t really do anything else, a monetary investment, and travel to and from the show. Livestreams don’t have any of that, save a voluntary donation. That makes it easier to turn on a stream while I’m doing other things. While listening to a livestream, I often avoid engaging with the streaming experience. Unless the chat is a cozy few hundred folks at most, it’s a tire fire of trolls and not a pleasant experience. That, coupled with the fact that sitting on my couch watching a screen is inherently less engaging than standing in a club with music and people surrounding me, means that I’m often multitasking while livestreams are happening.

The attraction for me is that these streams are live, and they’re an event to tune into, and if you don’t, you might miss it. Because it’s live, you have the opportunity to create a shared collective experience. The chatrooms that accompany live video streams on YouTube, Twitch, and especially with Facebook’s Watch Party feature for Facebook Live videos, are what foster this shared experience. For me, it’s about that experience, so much so that I started a chat thread for Jamie xx’s 2020 Essential Mix so that my friends and I could experience and react to the set live. This personal experience is contrary to the conclusion drawn in this article on Hypebot called Our Music Consumption Habits Are Changing, But Will They Remain That Way? by Bobby Owsinski: “Given the choice, people would rather watch something than just listen.”. Given the choice, I’d rather have a shared collective experience with music rather than just sit alone on my couch and listen to it. 

Of course, with shelter-in-place, I haven’t been given a choice between attending concerts and watching livestreamed shows. It’s clear that without a choice, I’ll take whatever approximation of live music I can find.

 

What it takes to get to a concert

Ticket buying in the modern era is pretty brutal. You find out your favorite artist is coming to town, and with any luck, you discover this before the tickets go on sale. Then you start planning to get tickets. Set up a calendar reminder with a link to the site, then you get ready. If there are presales, you ask friends or you check emails — if you’re a dedicated concertgoer, you probably get emails from the promoters, venues, and maybe even your favorite artists’ fan clubs — tracking down the codes.

Then you get ready, mouse pointer cued up at 9:59, waiting until tickets go on sale. The time flips, it’s 10:00 AM and you click! Prepared to quickly select 2, best available (or GA floor, because who wants a balcony seat), and add to cart. But wait! You see the dreaded message. You’re in a queue. Now all you can do is desperately stare at the webpage, hoping nothing changes. What if a browser extension interferes? What if your browser freezes up? Finally, you’re out of the queue. You go to select your tickets, but wait. GA is all gone. All that’s left is the seated Loge. For a band that you dance to. Or worse, it’s already sold out. All that time, all that anxiety, all that preparation, only to get shut down. 

And that’s just the presale. You’ll do the whole thing over again at the next presale, or during the general onsale, hoping that the artist and the venue were strategic enough to set some tickets aside for each sale. If it comes down to it, you might have to show up to the venue an hour early (or more) before the show starts to get one of the limited tickets available at the door. 

That’s everything a dedicated concertgoer goes through to get concert tickets. Thing is, according to Kaitlyn Tiffany in The Atlantic, that’s also what modern ticket scalpers do.

This week was a brutal one for ticket sales for me and my friends. A show at a 2000+ capacity venue sold out within a few minutes during the presale, and a second show added later also sold out within minutes. The Format announced their first live dates in years, playing 2 shows in NYC and Chicago both, and 1 show in Phoenix. The presale tickets for all the shows sold out within a minute, or in the case of Phoenix, was plagued by ticket website issues but still managed to sell out by the end of the day. By the time the general ticket sales happened, they’d announced an additional show in each city. The general ticket sales also sold out within minutes, and Phoenix ended up with a third show before the day was up. 

How does it happen? And why do we put ourselves through this?! 

It’s important to note that buying concert tickets at all is a privilege. Some people (like me) make it a lifestyle to go to concerts and DJ sets. Others save their money and spend big to get great seats to see favorite artists in arena shows. But it takes money, time, and a bit of luck (or planning) to get tickets and get to a show. 

Whether or not you manage to get tickets to a show depends on several factors: 

  • Did you hear about the show before the tickets went on sale?
  • Did you have enough money at the time tickets went on sale (and in general) to afford the tickets?
  • Is your work schedule stable enough to know that you can go to the show if you buy tickets immediately when they go on sale?

If any one of these factors doesn’t work out, then you don’t have tickets to the show. Whether or not you get the opportunity to see an artist perform in concert at all is up to a whole other set of factors, subject to the careful strategies of the music industry combined with the artistic whims of the performers. 

If an artist doesn’t have a big enough fanbase in your city, and if it isn’t geographically convenient with available music venues, the artist probably won’t stop in your city. Even if they stop, the venue size can play a crucial role in whether or not you’ll get tickets to the show—will they be available, and will you even want them? 

Artists, especially after they’ve “gotten big”, can crave smaller, more intimate shows. But those are the shows that tend to sell out in a minute—especially if the fanbase in a certain city is larger than anticipated or if the artist is only playing a limited number of shows and end up drawing people from out of the ordinary reach of a venue.

Other times, artists can analyze the size of their fanbase in a city and then choose a venue—without considering if the venue size is appropriate for their type of music. Bon Iver toured 20,000+ seat arenas on their last tour, while they’re famous for their intimate music and have videos on YouTube with hundreds of thousands of views of Justin Vernon playing to just 1 fan. Even if an artist’s fanbase is large enough to fill an arena, the fans still might not want to buy tickets to see them in an arena. 

Beyond those considerations, artists can’t always play the venues they want to play due to promoter restrictions or other industry partnerships, sometimes leading to uncharacteristic bookings at oddly-sized or oddly-shaped venues: DJs playing a concert hall, rock bands in a semi-seated venue, or possibly even skipping a city entirely. 

The venue an artist chooses (or is forced to choose) can be a key factor when you’re deciding if you want to get tickets. But the artist (and their tour manager, and others) have still more to do before this concert happens. 

The ticket prices have to be set. Surely venues and promoters have set costs and prices that end up as effective ticket minimums for many shows, but artists certainly have a level of influence as well. Especially high-profile artists like Taylor Swift have chosen on past tours to make affordable tickets available to their fans.

And therein lies the rub: artists can price competitively, or highly, knowing they can charge a certain price and still sell out their show (or nearly sell it out). But they can also price affordably, hoping that legitimate fans will be able to snap up tickets when they go on sale, rather than delaying their purchase and being forced to buy from scalpers. 

OK so we’re still trying to buy these concert tickets. You’ve heard about the show, the artist has booked the venue and priced their tickets, you’ve got the money, you’ve got the time, you are ready at 10am on a Friday (or a Wednesday or a Thursday for those sweet sweet presale tickets). Where are you buying your tickets?

Ticket sites range from the homegrown (see: Bottom of the Hill), new kids on the block (Big Neon, Tixr), the budding behemoths (Eventbrite, AXS, Etix) and the (despised) old guard (TicketWeb/Ticketmaster/LiveNation). If you’re rushing to buy online tickets, you also need to prepare for the site experience. 

If it isn’t a site you’ve used before, you might want to consider if it requires an account to buy tickets. If it does, you have to make one and make sure you’re signed in before you try to buy the tickets. You also want to consider if the show big enough that you’ll end up in a queue to buy the tickets, and if the site is reliable enough to handle the load of a lot of people trying to buy tickets without crashing or throwing an error. 

Beyond site reliability, you have to consider your personal threshold for every ticket-buyer’s worst nightmare: fees. Almost every ticket purchase includes fees. How high do the fees need to be before you abandon your ticket purchase entirely? 

You also have to consider if there will be fees added to the face value of the ticket, and how high are too high of fees before you abandon the ticket purchase entirely. Of course, the irony of paying ticket fees is that most fans (myself included) dislike paying them because for so long the fees are hidden—last minute additions to your total, spiking the cost of $35 tickets to $60 at times. But it can be argued that transparently-disclosed fees are acceptable, and even necessary to provide a resilient, secure, reliable ticketing site—as well as to pay the promoters working hard to make sure your favorite band actually stops in your city.

Artists, promoters, venues, and ticketing sites do a lot to try to prevent ticket scalpers from bombing the market and selling out a show in minutes only to relist the tickets minutes later at unbelievable prices. Innovations in ticket technology, new marketplaces, and just plain making it harder to get tickets:

What makes a ticket purchaser legitimate? Probably some degree of purchasing tickets in a specific geographic region and in clusters of genres, likely combined with some fraud analysis. Then I wonder how suspicious my own ticket purchasing habits must look to the algorithms at times. As long as we’re attempting to define what a legitimate ticket purchaser looks like, we can consider who deserves the presale codes for shows.

There’s a notion that only “real fans” deserve first access to presale codes and tickets. But how do you verify and validate true fans? You could use specific digital consumption patterns, such as those that are probably used to give out Spotify presale codes, but those are limited to only those listening habits that are directly observable in digital data. Artists want people to buy tickets to their shows—that’s why often, presale codes are straightforward to track down.

Most often, getting tickets to a show is a matter of knowing the right people at the right time that might have information you don’t have. Songkick is there to fill in the gaps, alongside emails and texts from promoters and venues. But ultimately, nothing beats having a community of fans. And that was the thing that fascinated me about the article in The Atlantic about the modern ticket scalpers. Me and my friends, we use many of the same tactics to buy tickets. It’s a privilege and a challenge to get the tickets we want, but we love going to concerts. And often, it feels like it’s the only way these days we can help artists make money. 

Problems with Indexing Datasets like Web Pages

Google has created a dataset search for researchers or the average person looking for datasets. On the one hand, this is a cool idea. Datasets are hard to find in cases, and this ostensibly makes the datasets and accompanying research easier to find.
In my opinion this dataset search is problematic for two main reasons.

1. Positioning Google as a one-stop-shop for research is risky.

There’s consistent evidence that many people (especially college students who don’t work with their library) start and end their research with Google, rather than using scholarly databases, limiting the potential quality of their research. (There’s also something to be said here about the limiting of access to quality research behind exploitative and exclusionary paywalls, but that’s for another discussion).
Google’s business goal of being the first and last stop for information hunts makes sense for them as a company. But such a goal doesn’t necessarily improve academic research, or the knowledge that people derive based on information returned from search results.

2. Datasets without datasheets easily lead to bias.

The dataset search is clearly focused on indexing and making more available as many datasets as possible. The cost of that is continuing sloppy data analysis and research due to the lack of standardized Datasheets for Datasets (for example) that fully expose the contents and limitations of datasets.
The existing information about these datasets is constructed based on the schema defined by the dataset author, or perhaps more specifically, the site hosting the dataset. It’s encouraging that datasets have dates associated with them, but I’m curious where the description for the datasets are coming from.
Only the description and the name fields for the dataset are required before a dataset appears in the search. As such, the dataset search has limitations. Is the description for a given dataset any higher quality than the Knowledge Panels that show up in some Google search results? How can we as users independently validate the accuracy of the dataset schema information?
The quality of and details provided in the description field vary widely across various datasets (I did a cursory scan of datasets resulting from a keyword search for “cheese”) indicating that having a plain text required field doesn’t do much to assure quality and valuable information.
When datasets are easier to find, that can lead to better data insights for data analysts. However, it can just as easily lead to off-base analyses if someone misuses data that they found based on a keyword search, either intentionally or, more likely, because they don’t fully understand the limitations of a dataset.
Some vital limitations to understand when selecting one for use in data analysis are things like:
  • What does the data cover?
  • Who collected the data?
  • For what purpose was the data collected?
  • What features exist in the data?
  • Which fields were collected and which were derived?
  • If fields were derived, how were they derived?
  • What assumptions were made when collecting the data?

Without these valuable limitations being made as visible as the datasets themselves, I struggle to feel overly encouraged by this dataset search in its current form.

Ultimately, making information more easily accessible while removing or obscuring indicators that can help researchers assess the quality of the information is risky and creates new burdens for researchers.

Unbiased data analysis with the data-to-everything platform: unpacking the Splunk rebrand in an era of ethical data concerns

Splunk software provides powerful data collection, analysis, and reporting functionality. The new slogan, “data is for doing”, alongside taglines like “the data-to-everything platform” and “turn data into answers” want to bring the company to the forefront of data powerhouses, where it rightly belongs (I’m biased, I work for Splunk).

There is nuance in those phrases that can’t be adequately expressed in marketing materials, but that are crucial for doing ethical and unbiased data analysis, helping you find ultimately better answers with your data and do even better things with it.

Start with the question

If you start attempting to analyze data without an understanding of a question you’re trying to answer, you’re going to have a bad time. This is something I really appreciate about moving away from the slogan “listen to your data” (even though I love a good music pun). Listening to your data implies that you should start with the data, when in fact you should start with what you want to know and why you want to know it. You start with a question.

Data analysis starts with a question, and because I’m me, I want to answer a fairly complex question: what kind of music do I like to listen to? This overall question, also called an objective function in data science, can direct my data analysis. But first, I want to evaluate my question. If I’m going to turn my data into doing, I want to consider the ethics and the bias of my question.

Consider what you want to know, and why you want to know it so that you can consider the ethics of the question. 

  • Is this question ethical to ask? 
  • Is it ethical to use data to answer it? 
  • Could you ask a different question that would be more ethical and still help you find useful, actionable answers? 
  • Does my question contain inherent bias? 
  • How might the biases in my question affect the results of my data analysis? 

Questions like “How can we identify fans of this artist so that we can charge them more money for tickets?” or “What’s the highest fee that we can add to tickets where people will still buy the tickets?” could be good for business, or help increase profits, but they’re unethical. You’d be using data to take actions that are unfair, unequal, and unethical. Just because Splunk software can help you bring data to everything doesn’t mean that you should. 

Break down the question into answerable pieces

If my question is something that I’ve considered ethical to use data to help answer, then it’s time to consider how I’ll perform my data analysis. I want to be sure I consider the following about my question, before I try to answer it:

  • Is this question small enough to answer with data?
  • What data do I need to help me answer this question?
  • How much data do I need to help me answer this question?

I can turn data into answers, but I have to be careful about the answers that I look for. If I don’t consider the small questions that make up the big question, I might end up with biased answers. (For more on this, see my .conf17 talk with Celeste Tretto).

So if I consider “What kind of music do I like to listen to?”, I might recognize right away that the question is too broad. There are many things that could change the answer to that question. I’ll want to consider how my subjective preferences (what I like listening to) might change depending on what I’m doing at the time: commuting, working out, writing technical documentation, or hanging out on the couch. I need to break the question down further. 

A list of questions that might help me answer my overall question could be: 

  • What music do I listen to while I’m working? When am I usually working?
  • What music do I listen to while I’m commuting? When am I usually commuting?
  • What music do I listen to when I’m relaxing? When am I usually relaxing?
  • What are some characteristics of the music that I listen to?
  • What music do I listen to more frequently than other music?
  • What music have I purchased or added to a library? 
  • What information about my music taste isn’t captured in data?
  • Do I like all the music that I listen to?

As I’m breaking down the larger question of “What kind of music do I like to listen to?”, the most important question I can ask is “What kind of music do I think I like to listen to?”. This question matters because data analysis isn’t as simple as turning data into answers. That can make for catchy marketing, but the nuance here lies in using the data you have to reduce uncertainty about what you think the answer might be. The book How to Measure Anything by Douglas Hubbard covers this concept of data analysis as uncertainty reduction in great detail, but essentially the crux is that for a sufficiently valuable and complex question, there is no single objective answer (or else we would’ve found it already!). 

So I must consider, right at the start, what I think the answer (or answers) to my overall question might be. Since I want to know what kind of music I like, I therefore want to ask myself what kind of music I think I might like. Because “liking” and “kind of music” are subjective characteristics, there can be no single true answer that is objective truth. Very few, if any, complex questions have objectively true answers, especially those that can be found in data. 

So I can’t turn data into answers for my overall question, “What kind of music do I like?” but I can turn it into answers for more simple questions that are rooted in fact. The questions I listed earlier are much easier to answer with data, with relative certainty, because I broke up the complex, somewhat subjective question into many objective questions. 

Consider the data you have

After you have your questions, look for the answers! Consider the data that you have, and whether or not it is sufficient and appropriate to answer the questions. 

The flexibility of Splunk software means that you don’t have to consider the questions you’ll ask of the data before you ingest it. Structured or unstructured, you can ask questions of your data, but you might have to work harder to fully understand the context of the data to accurately interpret it. 

Before you analyze and interpret the data, you’ll want to gather context about the data, like:

  • Is the dataset complete? If not, what data is missing?
  • Is the data correct? If not, in what ways could it be biased or inaccurate?
  • Is the data similar to other datasets you’re using? If not, how is it different?

This additional metadata (data about your datasets) can provide crucial context necessary to accurately analyze and interpret data in an unbiased way. For example, if I know there is data missing in my analysis, I need to consider how to account for that missing data. I can add additional (relevant and useful) data, or I can acknowledge how the missing data might or might not affect the answers I get.

After gathering context about your datasets, you’ll also want to consider if the data is appropriate to answer the question(s) that you want to answer. 

In my case, I’ll want to assess the following aspects of the datasets: 

  • Is using the audio features API data from Spotify the best way to identify characteristics in music I listen to? 
  • Could another dataset be better? 
  • Should I make my own dataset? 
  • Does the data available to me align with what matters for my data analysis? 

You can see a small way that the journalist Matt Daniels of The Pudding considered the data relevant to answer the question “How popular is male falsetto?” for the Vox YouTube series Earworm starting at 1:45 in this clip. For about 90 seconds, Matt and the host of the show, Estelle Caswell, discuss the process of selecting the right data to answer their question, including discussing the size of the dataset (eventually choosing a smaller, but more relevant, dataset) to answer their question. 

Is more data always better? 

Data is valuable when it’s in context and applied with consideration for the problem that I’m trying to solve. Collecting data about my schedule may seem overly-intrusive or irrelevant, but if it’s applied to a broader question of “what kind of music do I like to listen to?” it can add valuable insights and possibly shift the possible overall answer, because I’ve applied that additional data with consideration for the question that I’m trying to answer.

Splunk published a white paper to accompany the rebranding, and it contains some excellent points. One of them that I want to explore further is the question:

“how complete, how smart, are these decisions if you’re ignoring vast swaths of your data?” 

On the one hand, having more data available can be valuable. I am able to get a more valuable answer to “what kind of music do I like” because I’m able to consider additional, seemingly irrelevant data about how I spend my time while I’m listening to music. However, there are many times when you want to ignore vast swaths of your data. 

The most important aspect to consider when adding data to your analysis is not quantity, but quality. Rather than focusing on how much data you might be ignoring, I’d suggest instead focusing on which data you might be ignoring, for which questions, and affecting which answers. You might have a lot of ignored data, but put your focus on the small amount of data that can make a big difference in the answers you find in the data.

As the academics in “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?” make clear with their crucial finding:

“More data lead to better conclusions only when we know how to take advantage of their information. In other words, size does matter, but only if it is used appropriately.”

The most important aspect of adding data to an analysis is exactly as the academics point out: it’s only more helpful if you know what to do with it. If you aren’t sure how to use additional data you have access to, it can distract you from what you’re trying to answer, or even make it harder to find useful answers because of the scale of the data you’re attempting to analyze. 

Douglas Hubbard in the book How to Measure Anything makes the case that doing data analysis is not about gathering the most data possible to produce the best answer possible. Instead, it’s about measuring to reduce uncertainty in the possible answers and measuring only what you need to know to make a better decision (based on the results of your data analysis). As a result, such a focused analysis often doesn’t require large amounts of data — rough calculations and small samples of data are often enough. More data might lead to greater precision in your answer, but it’s a tradeoff between time, effort, cost, and precision. (I also blogged about the high-level concepts in the book).

If I want to answer my question “What kind of music do I like to listen to?” I don’t need the listening data of every user on the Last.fm service, nor do I need metadata for songs I’ve never heard to help me identify song characteristics I might like. Because I want to answer a specific question, it’s important that I identify the specific data that I need to answer it—restricted by affected user, existence in another dataset, time range, type, or whatever else.

If you want more evidence, the notion that more data is always better is also neatly upended by the Nielsen-Norman Group in Why You Only Need to Test with 5 Users and the follow-up How Many Test Users in a Usability Study?.

Keep context alongside the data

Indeed, the white paper talks about bringing people to a world where they can take action without worrying about where their data is, or where it comes from. But it’s important to still consider where the data comes from, even if you aren’t having to worry about it because you use Splunk software. It’s relevant to data analysis to keep context about the data alongside the data.

For example, it’s important for me to keep track of the fact that the song characteristics I might use to identify the type of music I like come from a dataset crafted by Spotify, or that my listening behavior is tracked by the service Last.fm. Last.fm can only track certain types of listening behavior on certain devices, and Spotify has their own biases in creating a set of audio characteristics.

If I lose track of this seemingly-mundane context when analyzing my data, I can potentially incorrectly interpret my data and/or draw inaccurate conclusions about what kind of music I like to listen to, based purely on the limitations of the data available to me. If I don’t know where my data is coming from, or what it represents, then it’s easy to find biased answers to questions, even though I’m using data to answer them.

If you have more data than you need, this also makes keeping context close to your data more difficult. The more data, the more room for error when trying to track contextual meaning. Splunk software includes metadata fields for data that can help you keep some context with the data, such as where it came from, but other types of context you’d need to track yourself.

More data can not only complicate your analysis, but it can also create security and privacy concerns if you keep a lot of data around and for longer than you need it. If I want to know what kind of music I like to listen to, I might be comfortable doing data analysis to answer that question, identifying the characteristics of music that I like, and then removing all of the raw data that led me to that conclusion out of privacy or security concerns. Or I could drop the metadata for all songs that I’ve ever listened to, and keep only the metadata for some songs. I’d want to consider, again, how much data I really need to keep around. 

Turn data into answers—mostly

So I’ve broken down my overall question into smaller, more answerable questions, I’ve considered the data I have, and I’ve kept the context alongside the data I have. Now I can finally turn it into answers, just like I was promised!

It turns out I can take a corpus of my personal listening data and combine it with a dataset of my personal music libraries to weight the songs in the listening dataset. I can also assess the frequency of listens to further weight the songs in my analysis and formulate a ranking of songs in order of how much I like them. I’d probably also want to split that ranking by what I was doing while I was listening to the music, to eliminate outliers from the dataset that might bias the results. All the small questions that feed into the overall question are coming to life.

After I have that ranking, I could use additional metadata from another source, such as the Spotify audio features API, to identify the characteristics of the top-ranked songs, and ostensibly then be able to answer my overall question: what kind of music do I like to listen to?

By following all these steps, I turned my data into answers! And now I can turn my data into doing, by taking action on those characteristics. I can of course seek out new music based on those characteristics, but I can also book the ideal DJs for my birthday party, create or join a community of music lovers with similar taste in music, or even delete any music from my library that doesn’t match those characteristics. Maybe the only action I would take is self-reflection, and see if what the data has “told” me is in line with what I think is true about myself.

It is possible to turn data into answers, and turn data into doing, with caution and attention to all the ways that bias can be introduced into the data analysis process. But there’s still one more way that data analysis could result in biased outcomes: communicating results. 

Carefully communicate data findings

After I find the answers in my data, I need to carefully communicate them to avoid bias. If I want to tell all my friends that I figured out what kind of music I like to listen to, I want to make sure that I’m telling them that carefully so that they can take the appropriate and ethical action in response to what I tell them. 

I’ll want to present the answers in context. I need to describe the findings with the relevant qualifiers: I like music with these specific characteristics, and when I say I like this music I mean this is the kind of music that I listen to while doing things I enjoy, like working out, writing, or sitting on my couch. 

I also need to make clear what kind of action might be appropriate or ethical to take in reaction to this information. Maybe I want to find more music that has these characteristics, or I’d like to expand my taste, or I want to see some live shows and DJ sets that would feature music that has these characteristics. Actions that support those ends would be appropriate, but can also risk being unethical. What if someone learns of these characteristics, and chooses to then charge me more money than other people (whose taste in music is unknown) to see specific DJ sets or concerts featuring music with those characteristics? 

Data, per the white paper, “must be brought not only to every action and decision, but to every department.” Because of that, it’s important to consider how that happens. Share relevant parts of the process that led to the answers you found from the data. Communicate the results in a way that can be easily understood by your audience. This Medium post by Cecelia Shao, a product manager at Comet.ml, covers important points about how to communicate the results of data analysis. 

Use data for good

I wanted to talk through the data analysis process in the context of the rebranded slogans and marketing content so that I could unpack additional nuance that marketing content can’t convey. I know how easy it is to introduce bias into data analysis, and how easily data analysis can be applied to unethical questions, or used to take unethical actions.

As the white paper aptly points out, the value of data is not merely in having it, but in how you use it to create positive outcomes. You need to be sure you’re using data safely and intelligently, because with great access to data comes great responsibility. 

Go forth and use the data-to-everything platform to turn data into doing…the right thing. 

Disclosure: I work for Splunk. Thanks to my colleagues Chris Gales, Erica Chen, and Richard Brewer-Hay for the feedback on drafts of this post. While colleagues reviewed this post and provided feedback, the content is my own and represents my own views rather than those of Splunk the company. 

Streaming, the cloud, and music interactions: are libraries a thing of the past?

Several years ago I wrote about fragmented music libraries and music discovery. In light of the overwhelming popularity of Spotify and the dominance of streaming music (Spotify, Apple Music, Amazon Music, Tidal, and others), I’m curious if music libraries even exist anymore. Or, if they exist today, will they continue to exist? 

My guess is that the only people still maintaining music libraries are DJs, fervent music fans (like myself), or people that aren’t using streaming music at all (due to age, lack of interest, or lack of availability due to markets or internet speeds). 

I was chatting with a friend of mine that has a collection of vinyl records, but she only ever listens to vinyl if she’s relaxing on the weekend. Oftentimes she’s just asking Alexa to play some music, without much attention to where that music is coming from. With Amazon Music bundled into Amazon Prime for many members, people can be totally unaware that they’re using a streaming service at all. I’d hazard that this interaction pattern is true for most people, especially those that never enjoyed maintaining a music library but instead collected CDs and records because that was the only way to be able to listen to music at all. 

Even my own habits are changing, perhaps equally due to time constraints as due to current music technology services. I used to carefully curate playlists for sharing with others, listening in the car, mix CDs, and for radio shows. These days I make playlists for many of those same purposes on Spotify, but the songs in my “actual” music library (iTunes) aren’t categorized into playlists at all anymore, and I give the playlists I make on my iPhone random names like “Aaa yay” to make the playlists easier to find, rather than to describe the contents. 

I’m limited by storage size in terms of what I can add to my iPhone, just like I was with my iPod, but that shapes my experience of the music. Since I’m limited to a smaller catalogue, I’m able to sit with the music more and create more distinct memories. There are still songs that remind me of being in Berlin in 2011, limited to the songs that I added to my iPod before I left the United States because the internet I had access to in Germany was too slow to download new music and add it to my iPod. 

Nowadays, I am less motivated to carefully manage my iTunes library because it’s only on one device, whereas I can access my Spotify library across multiple devices. That’s the one I find myself carefully creating folders of playlists for, organizing and sorting tracks and playlists. A primary reason for the success of Spotify for my listening habits is the social and collaborative nature of it. It’s easy to share tracks with others, make a playlist for a DJ set that I went to to share with others, contribute to a weekly collaborative playlist with a community of fellow music-lovers, or to follow playlists created by artists and DJs I love. My local library can give me a lot, but it can’t give me that community interaction.

Indeed, in 2015 that’s something I identified as lacking. I felt that it was harder to feel part of a music culture, writing:

“It’s harder than it used to be to feel connected with music. It’s not a stream or a subculture one is tapped into anymore, because it’s so distributed on the web. There’s so much music, and it lives in so many different services, that the music culture has imploded a bit.”

I feel completely differently these days, thanks to a vibrant live music community in San Francisco. I loathe Facebook, but the groups that I’m a part of on that site enable me to feel connected to a greater music scene and community that supplement my connection to music and music discovery. Ironically, Facebook groups have also helped my music culture experience become more local. The music blogs that I used to be able to tap into are now largely defunct, or have multiple functions (the burning ear also running vinyl me please, or All Things Go also providing news and an annual festival in DC). Instead yet another way I discover new music is by paying attention to the artists and DJs that people in these Facebook groups are talking about and posting tracks and albums from. 

Despite the challenges of a local music library, I keep buying digital music partially because I made a promise to myself when I was younger that I’d do so when I could afford to, partially to support musicians and producers, and partially because I distrust that streaming services will stick around with all the music I might want to listen to. I’d rather “own” it, at least as best as I can when it’s a digital file that risks deletion and decomposition over time. 

Music discovery in the past was equal parts discovery and collection, with a hefty dose of listening after I collected new music.

A flowchart showing Discover -> Collect -> Listen in a triangle, with listen connecting back to discoverI’d do the following when discovering new music:

  • Writing down song lyrics while listening to the radio or while working my retail job, then later looking up the tracks to check out albums from the library to rip to my family computer.
  • Following music blogs like The Burning Ear, All Things Go, Earmilk, Stereogum, Line of Best Fit, then downloading what I liked best from their site from MediaFire or MegaUpload to save to my own library.
  • Trolling through illicit LiveJournal communities or invite-only torrent sites to download discographies for artists I already liked, or might like.

Over time, those music blogs shifted to using SoundCloud, the online communities and torrent sites shuttered, and I started listening to more music on streaming sites instead. The loop stopped going from discovery to collection and instead to discovery, like, and discovery again. 

Find a new track, listen, click the heart or the plus sign, and move on. Rarely do you remember to go back and listen to your fully-compiled list of saved tracks (or even if you do, trying to listen to the whole thing on shuffle will be limited by the web app, thanks SoundCloud). 

A flowchart showing a cycle from discover to like and back again using arrows.

This type of cycle is faster than the old cycle, and more focused on engagement with the service (rather than the music) and less on collecting and more on consuming. In some ways, downloading music was like this too. When I accidentally deleted my entire music library in 2012, the tatters of my library that I was able to recover from my iPod was a scant representation of my full collection, but included in that library was discographies that I would likely never listen to. Now that it’s been years, there have been a few occasions where I go back and discover that an artist I listen to now is in that graveyard of deleted songs, but even knowing that, I’m not sure I would’ve gotten to it any sooner. I was always collecting more than I was listening to. 

Streaming music lets me collect in the same way, but without the personal risk. It just makes me dependent on a third-party entity that permits me to access the tracks that they store for me. I end up with lists of liked tracks across multiple different services, none of which I fully control. These days my music discovery is now largely driven by 3 services: Spotify, Shazam, and Soundcloud. Spotify pushes algorithmic recommendations to me, Shazam enables me to discover what track the DJ is currently playing when I’m out at a DJ set, and Soundcloud lets me listen to recorded DJ sets as well as having excellent autoplay recommendations. In all of them I have lists of tracks that I may never revisit after saving them. Some of them I’ll never be able to revisit, because they’ve been deleted or the service has lost the rights to the track. 

In 2015 I lamented the fragmentation of music discovery, but looking back, my music discovery was always shared across services, devices, and methods—the central iTunes library was what tied the radio songs, the library CDs, the discography downloads, and the music blog tracks together. The real issue is that the primary music discovery modes of today are service-dependent, and each of those services provides their own constructs of a music library. I mentioned in 2015 that:

“my library is all over the place. iTunes is still the main home of my music—I can afford to buy new music when I want —but I frequent Spotify and SoundCloud to check out new music. I sync my iTunes library to Google Play Music too, so I can listen to it at work.” 

While this is still largely true, I largely consume Spotify when I’m at work, listen to SoundCloud sets or tracks from iTunes when I’m on-the-go with my phone, and listen to Spotify or iTunes when I’m on my personal laptop. That’s essentially 2.5 places that I keep a music library, and while I maintain a purchase pipeline of tracks from Spotify and SoundCloud into my iTunes library, it’s a fraction of my discoveries that make it into my collection for the long term. The days of a true central collection of my library are long since past. 

It seems a feat, with all these digital cloud music services streaming music into our ears, to have a local music library. Indeed, what’s the point of holding onto your local files when it becomes so difficult to access it? iTunes is becoming the Apple Music app, with the Apple Music streaming service front and center. Spotify is, well, Spotify. And SoundCloud continues to flounder yet provides an essential service of underground music and DJ sets. Google Play Music exists, but only has a web-based player (no client) to make it easier to access and listen to your local library after you’ve mirrored it to the cloud. Streaming is convenient. But streaming music lets others own your content for you, granting you subscription access to it at best, ruining the quality of your music listening experience at worst. 

A recent essay by Dave Holmes in Esquire talks about “The Deleted Years”, or the years that we stored music on iPods, but since Spotify and other streaming services, have largely moved on from. As he puts it, 

“From 2003 to 2012, music was disposable and nothing survived.”

Perhaps it’s more true that from 2012 onward, music is omnipresent and yet more disposable. It can disappear into the void of a streaming service, and we’ll never even know we saved it. At least an abandoned iPod gives us a tangible record of our past habits. 

As Vicki Boykis wrote about SoundCloud in 2017

“I’m worried that, for internet music culture, what’s coming is the loss of a place that offered innumerable avenues for creativity, for enjoyment, for discovery of music that couldn’t and wouldn’t be created anywhere else. And, like everyone who has ever invested enough emotion in an online space long enough to make it their own, I’m wondering what’s next.”

I’ll be here, discovering, collecting, liking, and listening for what’s next.

Music streaming and sovereignty

As the music industry moves away from downloads and toward building streaming platforms, international sovereignty becomes more of a barrier to people listening to music and discussing it with others, because they don’t have access to the same music on the same platforms. As Sean Michaels points out in The Morning News several years ago:

one of the undocumented glitches in the current internet is all its asymmetrical licensing rules. I can’t use Spotify in Canada (yet). Whenever I’m able to, there’s no guarantee that Spotify Canada’s music library will match Spotify America’s. Just as Netflix Canada is different than Netflix US, and YouTube won’t let me see Jon Stewart. As we move away from downloads and toward streaming, international sovereignty is going to become more and more of a barrier to common discussions of music.

Location has always been a challenge to music access, but it’s important to keep in mind that the internet and music streaming has not been an equitable boon to music access—it is still controlled.