Streaming, the cloud, and music interactions: are libraries a thing of the past?

Several years ago I wrote about fragmented music libraries and music discovery. In light of the overwhelming popularity of Spotify and the dominance of streaming music (Spotify, Apple Music, Amazon Music, Tidal, and others), I’m curious if music libraries even exist anymore. Or, if they exist today, will they continue to exist? 

My guess is that the only people still maintaining music libraries are DJs, fervent music fans (like myself), or people that aren’t using streaming music at all (due to age, lack of interest, or lack of availability due to markets or internet speeds). 

I was chatting with a friend of mine that has a collection of vinyl records, but she only ever listens to vinyl if she’s relaxing on the weekend. Oftentimes she’s just asking Alexa to play some music, without much attention to where that music is coming from. With Amazon Music bundled into Amazon Prime for many members, people can be totally unaware that they’re using a streaming service at all. I’d hazard that this interaction pattern is true for most people, especially those that never enjoyed maintaining a music library but instead collected CDs and records because that was the only way to be able to listen to music at all. 

Even my own habits are changing, perhaps equally due to time constraints as due to current music technology services. I used to carefully curate playlists for sharing with others, listening in the car, mix CDs, and for radio shows. These days I make playlists for many of those same purposes on Spotify, but the songs in my “actual” music library (iTunes) aren’t categorized into playlists at all anymore, and I give the playlists I make on my iPhone random names like “Aaa yay” to make the playlists easier to find, rather than to describe the contents. 

I’m limited by storage size in terms of what I can add to my iPhone, just like I was with my iPod, but that shapes my experience of the music. Since I’m limited to a smaller catalogue, I’m able to sit with the music more and create more distinct memories. There are still songs that remind me of being in Berlin in 2011, limited to the songs that I added to my iPod before I left the United States because the internet I had access to in Germany was too slow to download new music and add it to my iPod. 

Nowadays, I am less motivated to carefully manage my iTunes library because it’s only on one device, whereas I can access my Spotify library across multiple devices. That’s the one I find myself carefully creating folders of playlists for, organizing and sorting tracks and playlists. A primary reason for the success of Spotify for my listening habits is the social and collaborative nature of it. It’s easy to share tracks with others, make a playlist for a DJ set that I went to to share with others, contribute to a weekly collaborative playlist with a community of fellow music-lovers, or to follow playlists created by artists and DJs I love. My local library can give me a lot, but it can’t give me that community interaction.

Indeed, in 2015 that’s something I identified as lacking. I felt that it was harder to feel part of a music culture, writing:

“It’s harder than it used to be to feel connected with music. It’s not a stream or a subculture one is tapped into anymore, because it’s so distributed on the web. There’s so much music, and it lives in so many different services, that the music culture has imploded a bit.”

I feel completely differently these days, thanks to a vibrant live music community in San Francisco. I loathe Facebook, but the groups that I’m a part of on that site enable me to feel connected to a greater music scene and community that supplement my connection to music and music discovery. Ironically, Facebook groups have also helped my music culture experience become more local. The music blogs that I used to be able to tap into are now largely defunct, or have multiple functions (the burning ear also running vinyl me please, or All Things Go also providing news and an annual festival in DC). Instead yet another way I discover new music is by paying attention to the artists and DJs that people in these Facebook groups are talking about and posting tracks and albums from. 

Despite the challenges of a local music library, I keep buying digital music partially because I made a promise to myself when I was younger that I’d do so when I could afford to, partially to support musicians and producers, and partially because I distrust that streaming services will stick around with all the music I might want to listen to. I’d rather “own” it, at least as best as I can when it’s a digital file that risks deletion and decomposition over time. 

Music discovery in the past was equal parts discovery and collection, with a hefty dose of listening after I collected new music.

A flowchart showing Discover -> Collect -> Listen in a triangle, with listen connecting back to discoverI’d do the following when discovering new music:

  • Writing down song lyrics while listening to the radio or while working my retail job, then later looking up the tracks to check out albums from the library to rip to my family computer.
  • Following music blogs like The Burning Ear, All Things Go, Earmilk, Stereogum, Line of Best Fit, then downloading what I liked best from their site from MediaFire or MegaUpload to save to my own library.
  • Trolling through illicit LiveJournal communities or invite-only torrent sites to download discographies for artists I already liked, or might like.

Over time, those music blogs shifted to using SoundCloud, the online communities and torrent sites shuttered, and I started listening to more music on streaming sites instead. The loop stopped going from discovery to collection and instead to discovery, like, and discovery again. 

Find a new track, listen, click the heart or the plus sign, and move on. Rarely do you remember to go back and listen to your fully-compiled list of saved tracks (or even if you do, trying to listen to the whole thing on shuffle will be limited by the web app, thanks SoundCloud). 

A flowchart showing a cycle from discover to like and back again using arrows.

This type of cycle is faster than the old cycle, and more focused on engagement with the service (rather than the music) and less on collecting and more on consuming. In some ways, downloading music was like this too. When I accidentally deleted my entire music library in 2012, the tatters of my library that I was able to recover from my iPod was a scant representation of my full collection, but included in that library was discographies that I would likely never listen to. Now that it’s been years, there have been a few occasions where I go back and discover that an artist I listen to now is in that graveyard of deleted songs, but even knowing that, I’m not sure I would’ve gotten to it any sooner. I was always collecting more than I was listening to. 

Streaming music lets me collect in the same way, but without the personal risk. It just makes me dependent on a third-party entity that permits me to access the tracks that they store for me. I end up with lists of liked tracks across multiple different services, none of which I fully control. These days my music discovery is now largely driven by 3 services: Spotify, Shazam, and Soundcloud. Spotify pushes algorithmic recommendations to me, Shazam enables me to discover what track the DJ is currently playing when I’m out at a DJ set, and Soundcloud lets me listen to recorded DJ sets as well as having excellent autoplay recommendations. In all of them I have lists of tracks that I may never revisit after saving them. Some of them I’ll never be able to revisit, because they’ve been deleted or the service has lost the rights to the track. 

In 2015 I lamented the fragmentation of music discovery, but looking back, my music discovery was always shared across services, devices, and methods—the central iTunes library was what tied the radio songs, the library CDs, the discography downloads, and the music blog tracks together. The real issue is that the primary music discovery modes of today are service-dependent, and each of those services provides their own constructs of a music library. I mentioned in 2015 that:

“my library is all over the place. iTunes is still the main home of my music—I can afford to buy new music when I want —but I frequent Spotify and SoundCloud to check out new music. I sync my iTunes library to Google Play Music too, so I can listen to it at work.” 

While this is still largely true, I largely consume Spotify when I’m at work, listen to SoundCloud sets or tracks from iTunes when I’m on-the-go with my phone, and listen to Spotify or iTunes when I’m on my personal laptop. That’s essentially 2.5 places that I keep a music library, and while I maintain a purchase pipeline of tracks from Spotify and SoundCloud into my iTunes library, it’s a fraction of my discoveries that make it into my collection for the long term. The days of a true central collection of my library are long since past. 

It seems a feat, with all these digital cloud music services streaming music into our ears, to have a local music library. Indeed, what’s the point of holding onto your local files when it becomes so difficult to access it? iTunes is becoming the Apple Music app, with the Apple Music streaming service front and center. Spotify is, well, Spotify. And SoundCloud continues to flounder yet provides an essential service of underground music and DJ sets. Google Play Music exists, but only has a web-based player (no client) to make it easier to access and listen to your local library after you’ve mirrored it to the cloud. Streaming is convenient. But streaming music lets others own your content for you, granting you subscription access to it at best, ruining the quality of your music listening experience at worst. 

A recent essay by Dave Holmes in Esquire talks about “The Deleted Years”, or the years that we stored music on iPods, but since Spotify and other streaming services, have largely moved on from. As he puts it, 

“From 2003 to 2012, music was disposable and nothing survived.”

Perhaps it’s more true that from 2012 onward, music is omnipresent and yet more disposable. It can disappear into the void of a streaming service, and we’ll never even know we saved it. At least an abandoned iPod gives us a tangible record of our past habits. 

As Vicki Boykis wrote about SoundCloud in 2017

“I’m worried that, for internet music culture, what’s coming is the loss of a place that offered innumerable avenues for creativity, for enjoyment, for discovery of music that couldn’t and wouldn’t be created anywhere else. And, like everyone who has ever invested enough emotion in an online space long enough to make it their own, I’m wondering what’s next.”

I’ll be here, discovering, collecting, liking, and listening for what’s next.

Reflecting on a decade of (quantified) music listening

I recently crossed the 10 year mark of using Last.fm to track what I listen to.

From the first tape I owned (Train’s Drops of Jupiter) to the first CD (Cat Stevens Classics) to the first album I discovered by roaming the stacks at the public library (The Most Serene Republic Underwater Cinematographer) to the college radio station that shaped my adolescent music taste (WONC) to the college radio station that shaped my college experience (WESN), to the shift from tapes, to CDs, (and a radio walkman all the while), to the radio in my car, to SoundCloud and MP3 music blogs, to Grooveshark and later Spotify, with Windows Media Player and later an iTunes music library keeping me company throughout…. It’s been quite a journey.

Some, but not all, of that journey has been captured while using the service Last.fm for the last 10 years. Last.fm “scrobbles” what you listen to as you listen to it, keeping a record of your listening habits and behaviors. I decided to add all this data to Splunk, along with my iTunes library and a list of concerts I’ve attended over the years, to quantify my music listening, acquisition, and attendance habits. Let’s go.

What am I doing?

Before I get any data in, I have to know what questions I’m trying to answer, otherwise I won’t get the right data into Splunk (my data analysis system of choice, because I work there). Even if I get the right data into Splunk, I have to make sure that the right fields are there to do the analysis that I wanted. This helped me prioritize certain scripts over others to retrieve and clean my data (because I can’t code well enough to write my own).

I also made a list of the questions that I wanted to answer with my data, and coded the questions according to the types of data that I would need to answer the questions. Things like:

  • What percentage of the songs in iTunes have I listened to?
  • What is my artist distribution over time? Do I listen to more artists now? Different ones overall?
  • What is my listen count over time?
  • What genres are my favorite?
  • How have my top 10 artists shifted year over year?
  • How do my listening habits shift around a concert? Do I listen to that artist more, or not at all?
  • What songs did I listen to a lot a few years ago, but not since?
  • What personal one hit wonders do I have, where I listen to one song by an artist way more than any other of their songs?
  • What songs do I listen to that are in Spotify but not in iTunes (that I should buy, perhaps)?
  • How many listens does each service have? Do I have a service bias?
  • How many songs are in multiple services, implying that I’ve probably bought them?
  • What’s the lag between the date a song or album was released and my first listen?
  • What geographic locations are my favorite artists from?

As the list goes on, the questions get more complex and require an increasing number of data sources. So I prioritized what was simplest to start, and started getting data in.

 

Getting data in…

I knew I wanted as much music data as I could get into the system. However, SoundCloud isn’t providing developer API keys at the moment, and Spotify requires authentication, which is a little bit beyond my skills at the moment. MusicBrainz also has a lot of great data, but has intense rate-limiting so I knew I’d want a strategy to approach that metadata-gathering data source. I was left with three initial data sources: my iTunes library, my own list of concerts I’ve gone to, and my Last.fm account data.

Last.fm provides an endpoint that allows you to get the recent tracks played by a user, which was exactly what I wanted to analyze. I started by building an add-on for Last.fm with the Splunk Add-on Builder to call this REST endpoint. It was hard. When I first tried to do this a year and a half ago, the add-on builder didn’t yet support checkpointing, so I could only pull in data if I was actively listening and Splunk was on. Because I had installed Splunk on a laptop rather than a server in ~ the cloud ~, I was pretty limited in the data I could pull in. I pretty much abandoned the process until checkpointing was supported.

After the add-on builder started supporting checkpointing, I set it up again, but ran into issues. Everything from forgetting to specify the from date in my REST call to JSON path decision-making that meant I was limited in the number of results I could pull back at a time. I deleted the data from the add-on sourcetype many times, triple-checking the results each time before continuing.

I used a python script (thanks Reddit) to pull my historical data from Last.fm to add to Splunk, and to fill the gap between this initial backfill and the time it took me to get the add-on working, I used an NPM module. When you don’t know how to code, you’re at the mercy of the tools other people have developed. Adding the backfill data to Splunk also meant I had to adjust the max_days_ago default in props.conf, because Splunk doesn’t necessarily expect data from 10+ years ago by default. 2 scripts in 2 languages and 1 add-on builder later, I had a working solution and my Last.fm data in Splunk.

To get the iTunes data in, I used an iTunes to CSV script on Github (thanks StackExchange) to convert the library.xml file into CSV. This worked great, but again, it was in a language I don’t know (Ruby) and so I was at the mercy of a kind developer posting scripts on Github again. I was limited to whatever fields their script supported. This again only did backfill.

I’m still trying to sort out the regex and determine if it’s possible to parse the iTunes Library.xml file in its entirety and add it to Splunk without too much of a headache, and/or get it set up so that I can ad-hoc add new songs added to the library to Splunk without converting the entries some other way. Work in progress, but I’m pretty close to getting that working thanks to help from some regex gurus in the Splunk community.

For the concert data, I added the data I had into the Lookup File Editor app and was up and running. Because of some column header choices I made for how to organize my data, and the fact that I chose to maintain a lookup rather than add the information as events, I was up for some more adventures in search, but this data format made it easy to add new concerts as I attend them.

Answer these questions…with data!

I built a lot of dashboard panels. I wanted to answer the questions I mentioned earlier, along with some others. I was spurred on by my brother recommending a song to me to listen to. I was pretty sure I’d heard the song before, and decided to use data to verify it.

Screen image of a chart showing the earliest listens of tracks by the band VHS collection.

I’d first heard the song he recommended to me, Waiting on the Summer, in March. Hipster credibility: intact. Having this dashboard panel now lets me answer the questions “when was the first time I listened to an artist, and which songs did I hear first?”. I added a second panel later, to compare the earliest listens with the play counts of songs by the artist. Maybe the first song I’d heard by an artist was the most listened song, but often not.

Another question I wanted to answer was “how many concerts have I been to, and what’s the distribution in my concert attendance?”

Screen image showing concerts attended over time, with peaks in 2010 and 2017.

It’s pretty fun to look at this chart. I went to a few concerts while I was in high school, but never more than one a month and rarely more than a few per year. The pace picked up while I was in college, especially while I was dating someone that liked going to concerts. A slowdown as I studied abroad and finished college, then it picks up for a year as I get settled in a new town. But after I get settled in a long-term relationship, my concert attendance drops off, to where I’m going to fewer shows than I did in high school. As soon as I’m single again, that shifts dramatically and now I’m going to 1 or more show a month. The personal stories and patterns revealed by the data are the fun part for me.

I answered some more questions, especially those that could be answered by fun graphs, such as what states have my concentrated music listens?

Screen image of a map of the contiguous united states, with Illinois highlighted in dark blue, indicating 40+ concerts attended in that state, California highlighted in a paler blue indicating 20ish shows attended there, followed by Michigan in paler blue, and finally Ohio, Wisconsin, and Missouri in very pale blue. The rest of the states are white, indicating no shows attended in those states.

It’s easy to tell where I’ve spent most of my life living so far, but again the personal details tell a bigger story. I spent more time in Michigan than I have lived in California so far, but I’ve spent more time single in California so far, thus attending more concerts.

Speaking of California, I also wanted to see what my most-listened-to songs were since moving to California. I used a trellis visualization to split the songs by artist, allowing me to identify artists that were more popular with me than others.

Screen image showing a "trellis" visualization of top songs since moving to California. Notable songs are Carly Rae Jepsen "Run Away With Me" and Ariana Grande "Into You" and CHVRCHES with their songs High Enough to Carry You Over and Clearest Blue and Leave a Trace.

I really liked the CHVRCHES album Every Open Eye, so I have three songs from that album. I also spent some time with a four song playlist featuring Adele’s song Send My Love (To Your New Lover), Ariana Grande’s Into You, Carly Rae Jepsen’s Run Away With Me, and Ingrid Michaelson’s song Hell No. Somehow two breakup songs and two love songs were the perfect juxtaposition for a great playlist. I liked it enough to where all four songs are in this list (though only half of it is visible in this screenshot). That’s another secret behind the data.

I also wanted to do some more analytics on my concert data, and decided to figure out what my favorite venues were. I had some guesses, but wanted to see what the data said.

Screen image of most visited concert venues, with The Metro in Chicago taking the top spot with 6 visits, followed by First Midwest Bank Ampitheatre (5 visits), Fox Theater, Mezzanine, Regency Ballroom, The Greek Theatre, and The Independent with 3 visits each.

The Metro is my favorite venue in Chicago, so it’s no surprise that it came in first in the rankings (I also later corrected the data to make it its proper name, “Metro” so that I could drill down from the panel to a Google Maps search for the venue). First Midwest Bank Ampitheatre hosted Warped Tour, which I attended (apparently) 5 times over the years. Since moving to California it seems like I don’t have a favorite venue based on visits alone, but it’s really The Independent, followed by Bill Graham Civic Auditorium, which doesn’t even make this list. Number of visits doesn’t automatically equate to favorite.

But what does it MEAN?

I could do data analysis like that all day. But what else do I learn by just looking at the data itself?

I can tell that Last.fm didn’t handle the shift to mobile and portable devices very well. It thrives when all of your listening happens on your laptop, and it can grab the scrobbles from your iPod or other device when you plug it into your computer. But as soon as internet-connected devices got popular (and I started using them), listens scrobbled overall dropped. In addition to devices, the rise of streaming music on sites like Grooveshark and SoundCloud to replace the shift from MediaFire-hosted and MegaUpload-hosted free music shared on music blogs also meant trouble for my data integrity. Last.fm didn’t handle listens on the web then, and only handles them through a fragile extension now.

Two graphs depicting distinct song listens and distinct artist listens, respectively, with a peak and steady listens through 2008-2012, then it drops down to a trough in 2014 before coming up to half the amount of 2010 and rising slightly.

Distinct songs and artists listened to in Last.fm data.But that’s not the whole story. I also got a job and started working in an environment where I couldn’t listen to music at work, so wasn’t listening to music there, and also wasn’t listening to music at home much either due to other circumstances. Given that the count plummets to near-zero, it’s possible there were also data issues at play.  It’s imperfect, but still fascinating.

What else did I learn?

Screen image showing 5 dashboard panels. Clockwise, the upper left shows a trending indicator of concerts attended per month, displaying 1 for the month of December and a net decrease of 4 from the previous month. The next shows the overall number of concerts attended, 87 shows. The next shows the number of iTunes library songs with no listens: 4272. The second to last shows a pie chart showing that nearly 30% of the songs have 0 listens, 23% have 1 listen, and the rest are a variety of listen counts. The last indicator shows the total number of songs in my iTunes library, or 16202.

I have a lot of songs in my iTunes library. I haven’t listened to nearly 30% of them. I’ve listened to nearly 25% of them only once. That’s the majority of my music library. If I split that by rating, however, it would get a lot more interesting. Soon.

You can’t see the fallout from my own personal Music-ocalypse in this data, because the Library.xml file doesn’t know which songs don’t point to actual files, or at least my version of it doesn’t. I’ll need more high-fidelity data to determine the “actual” size of my library, and perform more analyses.

I need more data in general, and more patience, to perform the analyses to answer the more complex questions I want to answer, like my listening habits of particular artists around a concert. As it is, this is a really exciting start.

If you want more details about the actual Splunking I did to do these analyses, I’ll be posting a blog on the official Splunk blog. That got posted on January 4th! Here it is: 10 Years of Listens: Analyzing My Music Data with Splunk.

Kill Legacy Apple Software


Benedict Evans pointed out in a recent newsletter, “there’s a story to be written about Apple feeling its way from a piecemeal legacy technology stack for services, evolved bit by bit from the old iPod music store of a decade ago, to an actual new unified platform, something that it is apparently building.”

I’d argue for a focused set of decoupled applications, rather than a new unified platform. iTunes has bloated beyond practicality. The App store doesn’t work well for users or developers. Here’s where I think the future of these applications lies.

Continue reading

The Evolution of Music Listening

Pitchfork recently published a great longform essay on music streaming. It covered the past, history, and present of music streaming, and brought up a lot of great points. These are my reactions.

The piece discussed how “the “omnivore” is the new model for the music connoisseur, and one’s diversity of listening across the high/low spectrum is now seen as the social signal of refined taste.” It would be interesting to study how this omnivority splits across genres, age groups, and affinities. I find myself personally falling into omnivore status, as I am never able to properly define my music taste according to genre, and my musical affinities shift daily, weekly, monthly, with common themes.

Also discussed is the cost of music, whether it be licensing, royalties, or record label advances. Having to deal with the cost of music is a difficult matter. I wonder if I would have been such a voracious consumer of music if I hadn’t grown up with so many free options with the library, the radio, and later, music blogs. Now that I’m older, I make the effort to purchase music when I feel the artist deserves it, but as I distance myself (incidentally, really) from storing music on my computer, that effort becomes less important to expend.

Continue reading

Autobiography through (Musical) Devices (Part Rogue)

Inspired in part by Cyborgology’s Autobiography through Devices series

Autobiography Through Devices (Part 1)

Autobiography Through Devices (Part 2)

I grew up surrounded by music. Dancing wildly in the living room to REM’s Don’t Go Back to Rockville and Rusted Root’s Ecstasy with my siblings as we were toddlers remain fond childhood memories of mine. As I grew older I kept listening to my parents’ music, including an entrenched eighties phase, and as I left Junior High, I owned a Train tape, a Cat Stevens Classics CD, and Motion City Soundtrack’s first album, I Am The Movie, among others. I shied away from the popular music of my peers in Junior High, and avoided Alkaline Trio, System of a Down, and Blink 182 (this was a mistake, I might add).

Continue reading

What would I say?

made an effort to the most obnoxious article

What would I say? Something you think when posting on social media sites, when offering up your opinion about something in the news, and now, the name of an app that emerged from HackPrinceton just a few days ago.

So popular the server intermittently goes down, forcing you to access a cached copy of the site or not be able to post automatically to facebook (instead screenshotting the page to share), it was created by Pawel, Vicky, Ugne, Daniel, Harvey, Edward, Alex, and Baxter. However, they didn’t win anything there (per HackPrinceton’s Facebook event).  But now their creation has gone viral. Their creation has been profiled on the Huffington Post, with an article titled, “Your Facebook Statuses are Gibberish. Here’s Proof.“, as well as Slate and BusinessInsider. Even the New Yorker has profiled the app (revealing that Baxter, is in fact, a dog).

But what is so appealing about this app? Friends and I have already used the app, and we’ve all been delighted to discover something that nonsensically “understands” us, by spitting our own words back at us. Others have had the same reaction, posting about it with #wwis or #whatwouldisay, noting how the robot just “gets” them.

Continue reading