Wrapping up 2020: Spotify, SoundCloud, and Last.fm data

Another year, another Spotify Wrapped campaign, another effort to analyze the music data that I collect and compare it to what Spotify produces. This year I have last.fm listening habit data, concert attendance and ticket purchase data, livestream view activity data, my SoundCloud 2020 Playback playlist, and the tracks on my Spotify top 100 songs of 2020 playlist.

Screenshot of Spotify Wrapped header image, top artists of disclosure, lane 8, kidnap, tourist, and amtrac, top songs of apricots, atlas, idontknow, cappadocia, know your worth, minutes listened of 59,038 and top genre of house.

It’s always important to point out that the data covered in the Spotify Wrapped campaign only covers the time period from January 1st, 2020 to October 31st, 2020. I discuss the effects of this misleading time period in Communicate the data: How missing data biases data-driven decisions. Of course, writing this post on December 2nd, nearly the entire month of December is missing from my own analyses. I’ll follow up (on Twitter) about any data insights that change over the next few weeks.

Top Artists of the Year #

screenshot of spotify wrapped top artists, content duplicated in surrounding text.

Spotify says my Top 5 artists of the year are:

  1. Disclosure
  2. Lane 8
  3. Kidnap
  4. Tourist
  5. Amtrac

My own data shows some slight permutations.

Screenshot of Splunk table showing top 10 artists in order: tourist with 156 listens, amtrac with 155 listens, booka shade with 147 listens, jacques greene with 134 listens, lane 8 with 129 listens, bicep with 128 listens, kidnap with 114 listens, ben böhmer with 111 listens, cold war kids with 110 listens, and sjowgren with 99 listens

My top 5 artists are nearly the same, but much more influenced by music that I’ve purchased. The overall list instead looks like:

  1. Tourist
  2. Amtrac
  3. Booka Shade
  4. Jacques Greene
  5. Lane 8

For the second year in a row, Tourist is my top artist! Kidnap still makes it into the top 10, as my 7th most-listened-to artist so far of 2020.

Disclosure, somewhat hilariously, doesn’t even break the top 10 artists if I am relying on Last.fm data instead of only Spotify. What’s going on there? Turns out Disclosure is my 11th-most-listened to artist, with 97 total listens so far this year. If I dig a little bit deeper, looking at the song Know Your Worth which Spotify says I’ve listened to the most in 2020 by Disclosure, I can see exactly why this is happening.

Screenshot showing the track_name Know Your Worth listed 5 times, with different artist permutations each time, Khalid, Disclosure & Khalid, Disclosure & Blick Bassy, Khalid & Disclosure, and Khalid, with total listens of 20 for all permutations.

Disclosure’s latest album, ENERGY, includes a number of collaborations. Disclosure is the main artist for most of these tracks, but in some cases (like with Know Your Worth, which came out as a single February 4, 2020) the artist can be inconsistently stored by different services.

As a result, the Last.fm data has a number of different entries for the same track, with differently-listed artists for each one. Last.fm stores only one artist per track, whereas Spotify stores an array of artists for each track. This data structure decision means that Disclosure should have had about 127 total listens, and been my 7th-most-listened-to artist of 2020, instead of 11th.

This truncated screenshot shows some examples of the permutations of data that exist in my Last.fm data collection, with a total listen count of 127 for Disclosure during 2020.

Screenshot showing additional permutations of Disclosure artist data, such as Disclosure & slowthai, Disclosure & Common, and Disclosure & Channel Tres.

I had a sneaking suspicion that my Booka Shade listening habits are primarily concentrated on a few songs from an EP that he put out this year, so I dug into how many tracks my total listens for the year were spread across.

Table showing top 10 artists and total listens, with total tracks for each artist as well. Tourist has 62 tracks for 159 listens, Amtrac has 59 tracks for 155 listens, Booka Shade has 64 tracks for 147 listens, Jacques Greene has 33 tracks for 134 listens, Lane 8 has 60 tracks for 129 listens, Bicep has 46 tracks for 128 listens, Kidnap has 35 tracks for 114 listens, Ben Böhmer has 51 tracks for 111 listens, Cold War Kids has 53 tracks for 110 listens, and sjowgren has 15 tracks for 99 listens.

Instead, it turns out that my listens to Booka Shade are actually the most distributed across tracks of all of my top 10 artists. Sjowgren is also an outlier here, because they’ve never released an album, so they only have 15 songs in their overall discography yet still made the top 10 artist listens.

Returning my comparison between Spotify and Last.fm data, Amtrac and Lane 8 are in both top 5 lists. This is somewhat expected, because if I look at the top 10 list for artists that I’ve most consistently listened to—artists that I’ve listened to at least once in each month of 2020—both Amtrac and Lane 8 place high in that list.

Screenshot of a table showing top 10 consistently listened to artists, with Lane 8 being listened to at least once in all 12 months of 2020, Amtrac 11 months, Caribou 11 months, Disclosure 11 months, Elderbrook 11 months, Kidnap 11 months, Kölsch 11 months, Tourist 11 months, Ben Böhmer 10 months, and CamelPhat for 10 months.

Given that only 2 days of December have happened as I write this, it’s unsurprising that I’ve only listened to one artist in every month of 2020.

Top Songs of 2020 #

Enough about the artists—what about the songs?

Screenshot of top 5 songs from spotify wrapped, duplicated in surrounding text.

According to Spotify, my Top 5 songs of the year are:

  1. Apricots by Bicep
  2. Atlas by Bicep
  3. Idontknow by Jamie xx
  4. Cappadocia by Ben Böhmer feat. Romain Garcia
  5. Know Your Worth by Disclosure feat. Khalid

That pretty closely matches my top 5 list according to Last.fm, with some notable exceptions.

Screenshot of Splunk table with top 10 songs of last.fm data, Apricots by Bicep with 38 listens, Atlas by Bicep with 32 listens, Idontknow by Jamie xx with 22 listens, White Ferrari (Greene Edit) by Jacques Greene with 21 listens, That Home Extended by The Cinematic Orchestra with 20 listens, Lalala by Y2K and bbno$ with 19 listens, Trish’s Song by Hey Rosetta! with 18 listens, Wonderful by Burna Boy with 18 listens, Somewhere feat. Octavian by the Blaze with 17 listens, and Yes, I Know by Daphni with 17 listens.

My top 5 tracks according to Last.fm are:

  1. Apricots by Bicep (38 listens)
  2. Atlas by Bicep (32 listens)
  3. Idontknow by Jamie xx (22 listens)
  4. White Ferrari (Greene Edit) by Jacques Greene (21 listens)
  5. That Home Extended by The Cinematic Orchestra (20 listens)

The first 3 tracks match, though of course Spotify has an incomplete representation of those listens—I have 29 streams of Apricots according to Spotify.

Screenshot of Spotify wrapped showing 29 streams of Apricots by Bicep

However, since I bought the track almost as soon as it came out, I also have another 9 listens that have happened off of Spotify. There were also some mysterious things happening with Spotify and Last.fm connections around that time as well, so it’s possible some listens are missing beyond these numbers.

What’s up with the 4th track on the list, though? Where is that in Spotify’s data? It’s actually a bootleg remix of the Frank Ocean song White Ferrari that Jacques Greene shared on SoundCloud and as a free download earlier this year, so it isn’t anywhere on Spotify. It did, however, make it onto my top tracks of 2020 on SoundCloud:

Screenshot of top 13 tracks in SoundCloud, with Jacques Greene - White Ferrari (JG Edit) listed as the 11th track.

And again, this is a spot where metadata intrudes again and leads to some inconsistent counts. If I look at all the permutations of White Ferrari and Jacques Greene in my data for 2020, the total number of listens should actually be a bit higher, at 23 total listens:

Screenshot of Splunk table showing the two permutations of the Jacques Greene remix, with 21 listens for the Greene Edit version and 2 listens for the JG Edit version, for a total of 23.

This would actually make it my 3rd-most popular song of 2020 so far, and I’m listening to it as I write this paragraph, so let’s go ahead and call that total number 24 listens.

The 5th-most popular song and 7th-most popular song of 2020 make the case that I haven’t been sleeping very well this year (though I recall these tracks also showed up in 2019 as well…), because those 2 tracks comprise my “Insomnia” playlist that I use to help me fall asleep on nights when I’ve been, perhaps, staying up too late doing data analysis like this.

You can see the influence of consistent listening habits with top artist behaviors when you look at the top 10 songs that I’ve consistently listened to throughout 2020, with 2 songs by Kidnap, one by Bicep, and another by Amtrac.

Table of tracks listened to consistently in 2020, Never Come Back by Caribou listened to at least once in 8 months of 2020, Start Again by Kidnap with 8 months, Accountable by Amtrac with 7 months, Atlas by Bicep with 7 months, Calling out by Sophie Lloyd with 7 months, Made to Stray by Mount Kimbie for 7 months, Moments (Ben Böhmer Remix) by Kidnap with 7 months, Somewhere feat. Octavian by the Blaze with 7 months, The Promise by David Spinelli with 7 months, and Without You My Life Would Be Boring by The Knife with 7 months.

To me, though, this table mostly underscores how much music discovery this year involved. I didn’t return to the same songs month after month during 2020. Likely as a result of all the DJ sets I’ve been streaming (as I mentioned in my post about Listening to Music while Sheltering in Place) this has been quite a year for music discovery, and breadth of listening habits.

My top 10 songs of 2020 had a total of 222 listens across them. However, I have a total of 14,336 listens for the entire year, spread across 8,118 unique songs in total.

duplicated in surrounding text

Even with possible metadata issues, that’s still quite the distribution of behavior. Let’s dig a bit deeper into artist discovery this year.

Artist Discovery in 2020 #

In my post earlier this year about my listening behavior while sheltering in place, I discovered that my artist discovery numbers in 2020 seemed to be way up compared with 2018 and 2019, but weren’t actually that far off from 2017 numbers.

What I see when comparing my 2020 artist discovery statistics from my Last.fm data and my Spotify data is even more interesting. In contrast to what seemed to be true in last year’s post, Wrapping up the year and the decade in music: Spotify vs my data (For what it’s worth, last year’s number should have been 1074, instead of 2857 artists discovered—data analysis is difficult), Spotify’s data is much higher than the number I calculated this year.

duplicated in surrounding text

According to Spotify, I discovered 2,051 new artists, whereas my Last.fm data claims that I only discovered 1,497 artists this year.

duplicated in surrounding text

Similarly, Spotify claims that I listened to 4,179 artists this year, whereas my Last.fm data indicates that I listened to 3,715 artists.

duplicated in surrounding text

Again, this comes down to data structures and how the artist metadata is stored for each service. I wrote about the importance of quality metadata for digital streaming providers earlier this year in Why the quality of audio analysis metadatasets matters for music, but it’s also apparent that the data structures for those metadatasets are just as important for crafting data insights of varying value.

Because Spotify stores all artists that contributed to a track as an array, I can listen to a track with 4 contributing artists on it, 1 of which I’ve listened to before, and according to Spotify, I’ve now discovered 3 artists and listened to 4, whereas according to Last.fm, I’ll have either listened to 1 artist that I’ve already heard before, or a new artist, possibly called “Luciano & David Morales".

Screenshot of two artist names, Luciano, and Luciano & David Morales.

Spotify would store the second artist as Luciano, David Morales, thus allowing a more accurate count of listens for the Luciano artist. Similarly, my artist discovery data includes some flawed data, such as YouTube videos that got incorrectly recorded.

Screenshot of 3 artist names in my data, Billie Joe Armstrong of Green Day, Billy Joel and Jimmy Fallon Form 2, and Biosphere.

The Billy Joel and Jimmy Fallon duet of The Lion Sleeps Tonight never gets old, but it appears the original video is no longer on YouTube so I’m not going to link it.

Screenshot of two artist names in my data, &lez and ‘Coming of age ceremony’ Dance cover by Jimin and Jung Kook.

This becomes clear in my top 20 artist discoveries of 2020 chart, where BTS and Big Hit Labels are listed separately, although they are both indicative of one of my best friends joining BTS ARMY this year and sharing her enthusiasm with me.

Giant table of top 20 artists discovered in 2020, in order with first_discovered date last: Re.You with 85 listens starting July 12, 2020, Elliot Adamson, 75 listens, April 15 2020, Fennec, 53 listens, March 24 2020, Southern Shores, 52 listens, November 19 2020, Eelke Kleijn, 45 listens, August 10 2020, Christian Löffler, 43 listens, April 2 2020, Icarus, 35 listens, April 2 2020, Monkey Safari, 35 listens, April 15 2020, Black Motion, 34 listens, April 30 2020, BTS, 31 listens, September 29 2020, Bronson, 31 listens, May 9 2020, Love Regenerator, 30 listens, March 30 2020, Eltonnick, 29 listens, April 27 2020, Jerro, 27 listens, April 29 2020, Theo Kottis, 27 listens, June 16 2020, Dennis Cruz, 26 listens, June 22 2020, Da Capo, 25 listens, May 10 2020, Bit Hit Labels, 21 listens, June 30 2020, HYENAH, 20 listens, June 4 2020, KC Lights, 20 listens, September 22 2020

Ultimately I’m grateful that the top 20 artists of 2020 are all artists that I discovered during the pandemic and have excellent songs that I love and continue to listen to. Many of the sparklines that represent my listening activity for these artists throughout the year have spikes, but mostly my listening patterns indicate that I’ve been returning to these artists and their songs multiple times after first discovery. Some notable favorites on this list are KC Lights' track Girl and Dennis Cruz’s track El Sueño, plus the entire Fennec album Free Us Of This Feeling.

Genre Discovery in 2020 #

The most-commented-on data insight from #wrapped2020 is probably the genre discovery slide.

Screenshot of spotify screenshot showing “You listened to 801 genres this year, including 294 new ones”.

According to Spotify, I listened to 801 genres this year, including 294 new ones. I’m not even sure I could name 30 genres, let alone 300 or 800. Where are these numbers coming from?

It turns out that, much like storing artist data as an array for each song, Spotify stores genre data as an array for each artist. This means that each artist can be assigned multiple genres, thus successfully inflating the number of genres that you’ve listened to in 2020.

For example, if I use Spotify’s API developer console to retrieve the artist information for Tourist, with a Spotify ID of 2ABBMkcUeM9hdpimo86mo6, it turns out that he has 6 total genres associated with him in Spotify’s database: chillwave, electronica, indie soul, shimmer pop, tropical house, and vapor soul.

Screenshot of JSON response from Spotify API call, content duplicated in surrounding text.

I could start discussing the possible meaningless of genres as a descriptive tool, the lack of validation possible for such a signifier, the lack of clarity about how these genres were defined and also assigned to specific artists, but that’s best for another blog post.

Instead, let’s look at what little genre data I do have available to me more generally.

duplicated in surrounding text

According to Spotify, my top genres were:

  1. House
  2. Electronica
  3. Pop
  4. Afro House
  5. Organic House

All of these make sense to me, except for Organic House, because I don’t know what makes house music organic, unless it’s also grass-fed, locally-sourced, and free range. Perhaps Blond:ish is organic house.

I don’t have any genre data from Last.fm, since the service only stores user-defined tags for each artist, and those are not included in the data that I collect from Last.fm today. Instead, I have the genres assigned by iTunes for the tracks that I’ve purchased from the iTunes store.

The top 8 genres of music that I added to my iTunes library in 2020 by purchasing tracks from the iTunes store are:

  1. Dance (124 songs)
  2. Electronic (121 songs)
  3. House (78 songs)
  4. Pop (37 songs)
  5. Alternative (27 songs)
  6. Electronica (12 songs)
  7. Deep House (10 songs)
  8. Melodic House & Techno (9 songs)

duplicated in surrounding text

Clearly, this is a very selective sample, and is only tied to select purchasing habits, which are roughly correlated to my listening habits.

I shared all of this genre data to essentially look at it and go “wow, that wasn’t very insightful at all”. Let’s move on.

Time Spent Listening to Music in 2020 #

The last metric I want to unpack from Spotify’s #wrapped2020 campaign is the minutes listened data insight. According to Spotify, I spent 59,038 minutes listening to music this year.

relevant content duplicated in surrounding text

According to my own calculations, I spent roughly 81,134 minutes listening to music in 2020.

duplicated in surrounding text

Let’s talk about how both of these metrics are super flawed!

Spotify counts a song as streamed after you listen to it for more than 30 seconds (per their Spotfiy for Artists FAQ), so it’s logical to assume that this minutes listened metric likely from a calculation of “number of streams for a track” x “length of track” and then rounded and converted to minutes. It could even result from an different type of calculation, “number of total streams” x “average length of track in Spotify library”, but I have no way of knowing if either of these are accurate besides tweeting at Spotify and hoping they’ll pay attention to me.

Unfortunately for all of us, but mostly me, my own minutes listened metric is just as lazily calculated. I don’t have track length data for all the tracks that I listen to and I don’t know at what point Last.fm counts a track as being worthy of a scrobble. I do have a list of how much time I spent listening to livestreamed DJ sets online, and I do have some excellent estimation skills. I calculated my number of 81,134 minutes so far in 2020 by calculating and assuming the following:

Using those averages and estimates, I calculated the total amount of time I spent listening to music across Last.fm listening habits, concerts and DJ sets attended (no festivals this year), and livestreams that I watched online, thus arriving at 81,134 minutes. That doesn’t count any DJ sets that I listened to on SoundCloud, and certainly the combination of a 4 minute track length estimate with the uncertainty of what qualifies a track as being scrobbled makes this data insight somewhat meaningless.

Regardless, let’s compare this estimated time spent listening in minutes against the total number of minutes in a year.

Total minutes listened (81,134) as a gauge compared with total minutes in a year (525,600)

Beautiful. I still remembered to sleep this year. No matter which dataset I use, however, it’s clear that I’ve listened to more music in 2020 than in 2019. Spotify’s metric for this same time period in 2019 was 35,496 minutes. The less-flawed but less-complete metric I used last year, calculated using the track length stored in iTunes multiplied by the number of listens for that track, indicated that I spent 14,296 minutes listening to music in 2019.

As one final Spotify examination, let’s dig into the Spotify Top 100 playlist.

Top 100 Songs of 2020 Playlist #

Alongside the fancy graphics and data insights in the #wrapped2020 campaign, Spotify also creates a 100 song playlist, likely (but not definitively) the top 100 songs of the time period between January 1st, 2020 and October 31st, 2020.

Screenshot of graphics depicting 811 total listens of the spotify top 100, 69 unique artists on that playlist, andn 83 of which were discovered in 2020.

I found my playlist this year to be relatively accurate, perhaps because I spent more time listening to Spotify than I might have in previous years, or perhaps they made some internal data improvements, or both! I often spend more time listening to SoundCloud if I’m traveling a lot, listening to offline DJ sets on plane flights; or listening to Apple Music on my iPhone, with songs that I’ve added from my iTunes library. Without much time spent commuting or traveling this year, it’s likely that my listening habits remained fairly consolidated.

duplicated in surrounding text

Similarly to what I discovered about my top 10 tracks, I had relatively distributed music interests this year. The 811 total listens for all 100 songs in my Spotify playlist represent just 0.06% of my total listens in 2020 so far.

duplicated in surrounding text

Despite my overall listening habits being relatively distributed across lots of artists and songs, the Top Songs playlist is somewhat more consolidated, with 69 artists performing the 100 songs on the playlist. Nice.

duplicated in surrounding text

It’s clear that I spent most of this year exploring and discovering new artists, given that 83 of my top songs of 2020 according to Spotify were songs that I discovered in 2020.

Thanks for coming on this journey through my music data with me. I’ll be back at the actual end of the year to dive deeper into my top 10 artists of the year, top 10 consistent artists of the year, my music purchasing activity, as well as some more livestream and concert statistics to round out my 2020 year in music.

Stacked area chart showing the top 10 artists of 2020, duplicated from previously.