Wrapping up 2020: Spotify, SoundCloud, and Last.fm data
Another year, another Spotify Wrapped campaign, another effort to analyze the music data that I collect and compare it to what Spotify produces. This year I have last.fm listening habit data, concert attendance and ticket purchase data, livestream view activity data, my SoundCloud 2020 Playback playlist, and the tracks on my Spotify top 100 songs of 2020 playlist.
It’s always important to point out that the data covered in the Spotify Wrapped campaign only covers the time period from January 1st, 2020 to October 31st, 2020. I discuss the effects of this misleading time period in Communicate the data: How missing data biases data-driven decisions. Of course, writing this post on December 2nd, nearly the entire month of December is missing from my own analyses. I’ll follow up (on Twitter) about any data insights that change over the next few weeks.
Top Artists of the Year #
Spotify says my Top 5 artists of the year are:
- Disclosure
- Lane 8
- Kidnap
- Tourist
- Amtrac
My own data shows some slight permutations.
My top 5 artists are nearly the same, but much more influenced by music that I’ve purchased. The overall list instead looks like:
- Tourist
- Amtrac
- Booka Shade
- Jacques Greene
- Lane 8
For the second year in a row, Tourist is my top artist! Kidnap still makes it into the top 10, as my 7th most-listened-to artist so far of 2020.
Disclosure, somewhat hilariously, doesn’t even break the top 10 artists if I am relying on Last.fm data instead of only Spotify. What’s going on there? Turns out Disclosure is my 11th-most-listened to artist, with 97 total listens so far this year. If I dig a little bit deeper, looking at the song Know Your Worth which Spotify says I’ve listened to the most in 2020 by Disclosure, I can see exactly why this is happening.
Disclosure’s latest album, ENERGY, includes a number of collaborations. Disclosure is the main artist for most of these tracks, but in some cases (like with Know Your Worth, which came out as a single February 4, 2020) the artist can be inconsistently stored by different services.
As a result, the Last.fm data has a number of different entries for the same track, with differently-listed artists for each one. Last.fm stores only one artist per track, whereas Spotify stores an array of artists for each track. This data structure decision means that Disclosure should have had about 127 total listens, and been my 7th-most-listened-to artist of 2020, instead of 11th.
This truncated screenshot shows some examples of the permutations of data that exist in my Last.fm data collection, with a total listen count of 127 for Disclosure during 2020.
I had a sneaking suspicion that my Booka Shade listening habits are primarily concentrated on a few songs from an EP that he put out this year, so I dug into how many tracks my total listens for the year were spread across.
Instead, it turns out that my listens to Booka Shade are actually the most distributed across tracks of all of my top 10 artists. Sjowgren is also an outlier here, because they’ve never released an album, so they only have 15 songs in their overall discography yet still made the top 10 artist listens.
Returning my comparison between Spotify and Last.fm data, Amtrac and Lane 8 are in both top 5 lists. This is somewhat expected, because if I look at the top 10 list for artists that I’ve most consistently listened to—artists that I’ve listened to at least once in each month of 2020—both Amtrac and Lane 8 place high in that list.
Given that only 2 days of December have happened as I write this, it’s unsurprising that I’ve only listened to one artist in every month of 2020.
Top Songs of 2020 #
Enough about the artists—what about the songs?
According to Spotify, my Top 5 songs of the year are:
- Apricots by Bicep
- Atlas by Bicep
- Idontknow by Jamie xx
- Cappadocia by Ben Böhmer feat. Romain Garcia
- Know Your Worth by Disclosure feat. Khalid
That pretty closely matches my top 5 list according to Last.fm, with some notable exceptions.
My top 5 tracks according to Last.fm are:
- Apricots by Bicep (38 listens)
- Atlas by Bicep (32 listens)
- Idontknow by Jamie xx (22 listens)
- White Ferrari (Greene Edit) by Jacques Greene (21 listens)
- That Home Extended by The Cinematic Orchestra (20 listens)
The first 3 tracks match, though of course Spotify has an incomplete representation of those listens—I have 29 streams of Apricots according to Spotify.
However, since I bought the track almost as soon as it came out, I also have another 9 listens that have happened off of Spotify. There were also some mysterious things happening with Spotify and Last.fm connections around that time as well, so it’s possible some listens are missing beyond these numbers.
What’s up with the 4th track on the list, though? Where is that in Spotify’s data? It’s actually a bootleg remix of the Frank Ocean song White Ferrari that Jacques Greene shared on SoundCloud and as a free download earlier this year, so it isn’t anywhere on Spotify. It did, however, make it onto my top tracks of 2020 on SoundCloud:
And again, this is a spot where metadata intrudes again and leads to some inconsistent counts. If I look at all the permutations of White Ferrari and Jacques Greene in my data for 2020, the total number of listens should actually be a bit higher, at 23 total listens:
This would actually make it my 3rd-most popular song of 2020 so far, and I’m listening to it as I write this paragraph, so let’s go ahead and call that total number 24 listens.
The 5th-most popular song and 7th-most popular song of 2020 make the case that I haven’t been sleeping very well this year (though I recall these tracks also showed up in 2019 as well…), because those 2 tracks comprise my “Insomnia” playlist that I use to help me fall asleep on nights when I’ve been, perhaps, staying up too late doing data analysis like this.
You can see the influence of consistent listening habits with top artist behaviors when you look at the top 10 songs that I’ve consistently listened to throughout 2020, with 2 songs by Kidnap, one by Bicep, and another by Amtrac.
To me, though, this table mostly underscores how much music discovery this year involved. I didn’t return to the same songs month after month during 2020. Likely as a result of all the DJ sets I’ve been streaming (as I mentioned in my post about Listening to Music while Sheltering in Place) this has been quite a year for music discovery, and breadth of listening habits.
My top 10 songs of 2020 had a total of 222 listens across them. However, I have a total of 14,336 listens for the entire year, spread across 8,118 unique songs in total.
Even with possible metadata issues, that’s still quite the distribution of behavior. Let’s dig a bit deeper into artist discovery this year.
Artist Discovery in 2020 #
In my post earlier this year about my listening behavior while sheltering in place, I discovered that my artist discovery numbers in 2020 seemed to be way up compared with 2018 and 2019, but weren’t actually that far off from 2017 numbers.
What I see when comparing my 2020 artist discovery statistics from my Last.fm data and my Spotify data is even more interesting. In contrast to what seemed to be true in last year’s post, Wrapping up the year and the decade in music: Spotify vs my data (For what it’s worth, last year’s number should have been 1074, instead of 2857 artists discovered—data analysis is difficult), Spotify’s data is much higher than the number I calculated this year.
According to Spotify, I discovered 2,051 new artists, whereas my Last.fm data claims that I only discovered 1,497 artists this year.
Similarly, Spotify claims that I listened to 4,179 artists this year, whereas my Last.fm data indicates that I listened to 3,715 artists.
Again, this comes down to data structures and how the artist metadata is stored for each service. I wrote about the importance of quality metadata for digital streaming providers earlier this year in Why the quality of audio analysis metadatasets matters for music, but it’s also apparent that the data structures for those metadatasets are just as important for crafting data insights of varying value.
Because Spotify stores all artists that contributed to a track as an array, I can listen to a track with 4 contributing artists on it, 1 of which I’ve listened to before, and according to Spotify, I’ve now discovered 3 artists and listened to 4, whereas according to Last.fm, I’ll have either listened to 1 artist that I’ve already heard before, or a new artist, possibly called “Luciano & David Morales".
Spotify would store the second artist as Luciano, David Morales, thus allowing a more accurate count of listens for the Luciano artist. Similarly, my artist discovery data includes some flawed data, such as YouTube videos that got incorrectly recorded.
The Billy Joel and Jimmy Fallon duet of The Lion Sleeps Tonight never gets old, but it appears the original video is no longer on YouTube so I’m not going to link it.
This becomes clear in my top 20 artist discoveries of 2020 chart, where BTS and Big Hit Labels are listed separately, although they are both indicative of one of my best friends joining BTS ARMY this year and sharing her enthusiasm with me.
Ultimately I’m grateful that the top 20 artists of 2020 are all artists that I discovered during the pandemic and have excellent songs that I love and continue to listen to. Many of the sparklines that represent my listening activity for these artists throughout the year have spikes, but mostly my listening patterns indicate that I’ve been returning to these artists and their songs multiple times after first discovery. Some notable favorites on this list are KC Lights' track Girl and Dennis Cruz’s track El Sueño, plus the entire Fennec album Free Us Of This Feeling.
Genre Discovery in 2020 #
The most-commented-on data insight from #wrapped2020 is probably the genre discovery slide.
According to Spotify, I listened to 801 genres this year, including 294 new ones. I’m not even sure I could name 30 genres, let alone 300 or 800. Where are these numbers coming from?
It turns out that, much like storing artist data as an array for each song, Spotify stores genre data as an array for each artist. This means that each artist can be assigned multiple genres, thus successfully inflating the number of genres that you’ve listened to in 2020.
For example, if I use Spotify’s API developer console to retrieve the artist information for Tourist, with a Spotify ID of 2ABBMkcUeM9hdpimo86mo6, it turns out that he has 6 total genres associated with him in Spotify’s database: chillwave, electronica, indie soul, shimmer pop, tropical house, and vapor soul.
I could start discussing the possible meaningless of genres as a descriptive tool, the lack of validation possible for such a signifier, the lack of clarity about how these genres were defined and also assigned to specific artists, but that’s best for another blog post.
Instead, let’s look at what little genre data I do have available to me more generally.
According to Spotify, my top genres were:
- House
- Electronica
- Pop
- Afro House
- Organic House
All of these make sense to me, except for Organic House, because I don’t know what makes house music organic, unless it’s also grass-fed, locally-sourced, and free range. Perhaps Blond:ish is organic house.
I don’t have any genre data from Last.fm, since the service only stores user-defined tags for each artist, and those are not included in the data that I collect from Last.fm today. Instead, I have the genres assigned by iTunes for the tracks that I’ve purchased from the iTunes store.
The top 8 genres of music that I added to my iTunes library in 2020 by purchasing tracks from the iTunes store are:
- Dance (124 songs)
- Electronic (121 songs)
- House (78 songs)
- Pop (37 songs)
- Alternative (27 songs)
- Electronica (12 songs)
- Deep House (10 songs)
- Melodic House & Techno (9 songs)
Clearly, this is a very selective sample, and is only tied to select purchasing habits, which are roughly correlated to my listening habits.
I shared all of this genre data to essentially look at it and go “wow, that wasn’t very insightful at all”. Let’s move on.
Time Spent Listening to Music in 2020 #
The last metric I want to unpack from Spotify’s #wrapped2020 campaign is the minutes listened data insight. According to Spotify, I spent 59,038 minutes listening to music this year.
According to my own calculations, I spent roughly 81,134 minutes listening to music in 2020.
Let’s talk about how both of these metrics are super flawed!
Spotify counts a song as streamed after you listen to it for more than 30 seconds (per their Spotfiy for Artists FAQ), so it’s logical to assume that this minutes listened metric likely from a calculation of “number of streams for a track” x “length of track” and then rounded and converted to minutes. It could even result from an different type of calculation, “number of total streams” x “average length of track in Spotify library”, but I have no way of knowing if either of these are accurate besides tweeting at Spotify and hoping they’ll pay attention to me.
Unfortunately for all of us, but mostly me, my own minutes listened metric is just as lazily calculated. I don’t have track length data for all the tracks that I listen to and I don’t know at what point Last.fm counts a track as being worthy of a scrobble. I do have a list of how much time I spent listening to livestreamed DJ sets online, and I do have some excellent estimation skills. I calculated my number of 81,134 minutes so far in 2020 by calculating and assuming the following:
- An average track length of 4 minutes
- An average concert length of 3 hours
- An average DJ set length of 4 hours
- An average festival length of 8 hours
Using those averages and estimates, I calculated the total amount of time I spent listening to music across Last.fm listening habits, concerts and DJ sets attended (no festivals this year), and livestreams that I watched online, thus arriving at 81,134 minutes. That doesn’t count any DJ sets that I listened to on SoundCloud, and certainly the combination of a 4 minute track length estimate with the uncertainty of what qualifies a track as being scrobbled makes this data insight somewhat meaningless.
Regardless, let’s compare this estimated time spent listening in minutes against the total number of minutes in a year.
Beautiful. I still remembered to sleep this year. No matter which dataset I use, however, it’s clear that I’ve listened to more music in 2020 than in 2019. Spotify’s metric for this same time period in 2019 was 35,496 minutes. The less-flawed but less-complete metric I used last year, calculated using the track length stored in iTunes multiplied by the number of listens for that track, indicated that I spent 14,296 minutes listening to music in 2019.
As one final Spotify examination, let’s dig into the Spotify Top 100 playlist.
Top 100 Songs of 2020 Playlist #
Alongside the fancy graphics and data insights in the #wrapped2020 campaign, Spotify also creates a 100 song playlist, likely (but not definitively) the top 100 songs of the time period between January 1st, 2020 and October 31st, 2020.
I found my playlist this year to be relatively accurate, perhaps because I spent more time listening to Spotify than I might have in previous years, or perhaps they made some internal data improvements, or both! I often spend more time listening to SoundCloud if I’m traveling a lot, listening to offline DJ sets on plane flights; or listening to Apple Music on my iPhone, with songs that I’ve added from my iTunes library. Without much time spent commuting or traveling this year, it’s likely that my listening habits remained fairly consolidated.
Similarly to what I discovered about my top 10 tracks, I had relatively distributed music interests this year. The 811 total listens for all 100 songs in my Spotify playlist represent just 0.06% of my total listens in 2020 so far.
Despite my overall listening habits being relatively distributed across lots of artists and songs, the Top Songs playlist is somewhat more consolidated, with 69 artists performing the 100 songs on the playlist. Nice.
It’s clear that I spent most of this year exploring and discovering new artists, given that 83 of my top songs of 2020 according to Spotify were songs that I discovered in 2020.
Thanks for coming on this journey through my music data with me. I’ll be back at the actual end of the year to dive deeper into my top 10 artists of the year, top 10 consistent artists of the year, my music purchasing activity, as well as some more livestream and concert statistics to round out my 2020 year in music.