Announcements

Help Wizard

Step 1

NEXT STEP

FAQs

Please see below the most popular frequently asked questions.

Loading article...

Loading faqs...

VIEW ALL

Ongoing Issues

Please see below the current ongoing issues which are under investigation.

Loading issue...

Loading ongoing issues...

VIEW ALL

Techniques for deduplicating tracks

Techniques for deduplicating tracks

Each time a track is released (or rereleased), it creates a new track ID. This presents issues for my application, where it would be more convenient to have ID for the recording. Do you guys have any tips to help identify which tracks are really the same?

Reply
4 Replies

If tracks are the same,  they can share an isrc (external_ids). You can get them with the Get Several Tracks endpoint.

XimzendSpotify Star
Help others find this answer and click "Accept as Solution".
If you appreciate my answer, maybe give me a Like.
Note: I'm not a Spotify employee.

ISRCs were my first intuition, but many of the cases I looked at showed that they often have the same issue with rereleases. 

You can achieve this by checking how many times a track with the same name and artists occur and then deleting all but 1 of the ones that occur more than once. However, since they recently downgraded the delete tracks endpoint, if you try to delete duplicate tracks that ARE actually the same track (ie they have the same ID) then all of them will be removed instead of just the extra ones. 

 

Here is how I am doing it in my app:

 

 

   def remove_duplicates_from_playlist(self, playlist_id: str):
        playlist_items = self.get_playlist_items(playlist_id)
        track_map = {}
        for idx, item in enumerate(playlist_items):
            uri = item["track"]["uri"]
            track_name = item["track"]["name"].lower()
            track_artists = set([a["name"].lower() for a in item["track"]["artists"]])
            hash_key = hash(f"{track_artists}{track_name}".lower())
            if track_map.get(hash_key):
                track_map_hash_key_uris = [t["uri"] for t in track_map[hash_key]]
                if uri not in track_map_hash_key_uris:
                    # Don't add multiple instances of the exact same track, because they will all be removed
                    track_map[hash_key].append({"uri": uri, "positions": [idx]})
                    
        
            else:
                track_map[hash_key] = [{"uri": uri, "positions": [idx]}]
        to_remove = []
        for hash_key, tracks in track_map.items():
            if len(tracks) > 1:
                to_remove.extend(tracks[:-1])

        if to_remove:
            # Delete endpoint is limited to 100 items
            batch = to_remove[:100]
       self.spotify_client.playlist_remove_specific_occurrences_of_items(playlist_id, batch)
            if len(to_remove) > 100:
                # Repeat the whole process if there are more than 100 to remove, as the positions of the tracks will have changed 
                self.remove_duplicates_from_playlist(playlist_id)
           

 

 

JPerez's app spotify-dedup achieves proper deduplication for both exact and non exact matches though. Check this thread for more info on this 

Suggested posts