Announcements

Help Wizard

Step 1

NEXT STEP

FAQs

Please see below the most popular frequently asked questions.

Loading article...

Loading faqs...

VIEW ALL

Ongoing Issues

Please see below the current ongoing issues which are under investigation.

Loading issue...

Loading ongoing issues...

VIEW ALL

How to interpret pitch and timbre values from audio_analysis?

How to interpret pitch and timbre values from audio_analysis?

I am currently working on a project that will read notes from a song, I would like to investigate the values for pitch and timbre, which are available in audio_analysis, in segments section, which is very convenient and great! But how do I interpret those? What are max/min values? Are there ranges for certain notes?
Reply
1 Reply

Hi Lauralicja, In Spotify's audio analysis, pitch refers to the perceived fundamental frequency of a sound, and is measured in Hertz (Hz). Timbre refers to the tone quality or texture of a sound, which is a complex concept that is difficult to quantify in absolute terms.

In the segments section of the audio analysis, the pitches array contains a series of pitch values, one for each 512-sample frame of the segment. The first value in the array corresponds to the pitch at the start of the segment, and subsequent values correspond to pitch changes over time.

The timbre array contains a series of 12-dimensional timbre feature vectors, one for each 512-sample frame of the segment. The first vector in the array corresponds to the timbre at the start of the segment, and subsequent vectors correspond to changes in timbre over time.

The pitch values in the pitches array are represented as decimal values between 0 and 1, with 1 representing the highest possible pitch (approximately 12,000 Hz). The mapping of pitch values to actual musical notes depends on the tuning system being used, but generally speaking, each octave is divided into 12 semitones, with each semitone corresponding to a specific pitch ratio (e.g. a semitone is a ratio of 2^(1/12) or approximately 1.0595). So, for example, a pitch value of 0.5 might correspond to the note A4 (440 Hz) in one tuning system, or Bb4 (466.16 Hz) in another.

The timbre feature vectors are more difficult to interpret in absolute terms, as they represent complex combinations of spectral and temporal features that contribute to the perceived tone quality of the sound. However, these feature vectors can be used to train machine learning models that can classify sounds based on their timbral characteristics.

Sorry... Kind of threw a lot at you lol I hope this helps! Let me know if you have any further questions.

Highest regards,

 

-Prague the Dog

Suggested posts

Type a product name