Announcements

Help Wizard

Step 1

NEXT STEP

Techniques for deduplicating tracks

Techniques for deduplicating tracks

Each time a track is released (or rereleased), it creates a new track ID. This presents issues for my application, where it would be more convenient to have ID for the recording. Do you guys have any tips to help identify which tracks are really the same?

Reply
4 Replies

If tracks are the same,  they can share an isrc (external_ids). You can get them with the Get Several Tracks endpoint.

ISRCs were my first intuition, but many of the cases I looked at showed that they often have the same issue with rereleases. 

You can achieve this by checking how many times a track with the same name and artists occur and then deleting all but 1 of the ones that occur more than once. However, since they recently downgraded the delete tracks endpoint, if you try to delete duplicate tracks that ARE actually the same track (ie they have the same ID) then all of them will be removed instead of just the extra ones. 

 

Here is how I am doing it in my app:

 

 

   def remove_duplicates_from_playlist(self, playlist_id: str):
        playlist_items = self.get_playlist_items(playlist_id)
        track_map = {}
        for idx, item in enumerate(playlist_items):
            uri = item["track"]["uri"]
            track_name = item["track"]["name"].lower()
            track_artists = set([a["name"].lower() for a in item["track"]["artists"]])
            hash_key = hash(f"{track_artists}{track_name}".lower())
            if track_map.get(hash_key):
                track_map_hash_key_uris = [t["uri"] for t in track_map[hash_key]]
                if uri not in track_map_hash_key_uris:
                    # Don't add multiple instances of the exact same track, because they will all be removed
                    track_map[hash_key].append({"uri": uri, "positions": [idx]})
                    
        
            else:
                track_map[hash_key] = [{"uri": uri, "positions": [idx]}]
        to_remove = []
        for hash_key, tracks in track_map.items():
            if len(tracks) > 1:
                to_remove.extend(tracks[:-1])

        if to_remove:
            # Delete endpoint is limited to 100 items
            batch = to_remove[:100]
       self.spotify_client.playlist_remove_specific_occurrences_of_items(playlist_id, batch)
            if len(to_remove) > 100:
                # Repeat the whole process if there are more than 100 to remove, as the positions of the tracks will have changed 
                self.remove_duplicates_from_playlist(playlist_id)
           

 

 

JPerez's app spotify-dedup achieves proper deduplication for both exact and non exact matches though. Check this thread for more info on this 

Suggested posts