Wednesday 27 January 2010

MusicHackday: Stockholm

So in a touch more than 48hrs I'll be hoping on a plane to go the Stockholm MusicHackday. It should be excellent, if the last one I went to is any judge. I'll be joined by fellow ISMS member Mike Jewell. The hack is being formulated, but may involve The World Bank's api and some yet to be determined sources of listener statistics. Also, somehow the echonest's api will be involved because I need to leave stockholm with one of these. We may need some further assistance to get something done in 24hrs, so if you're going to be at the hack and are looking for some folk to hack with drop a line in the comments...

A bit about playlists and similarity

Sorry about the general radio silence of late. Many things going on, most of them interesting.
Lately I've been spending quite a bit of time considering various aspects of playlist generation and how they all fit together. Here are some of my lines of thought:
  1. Evaluation of a playlist. How? Along which dimension? (Good v. Bad, Appropriate v. Offensive, Interesting v. Boring)
  2. How do people in various functions create playlists? How does this process and its output compare to common (or state of the art) methods employed in automatic playlist construction. This is to say, are we doing it right? Are the correct questions even being asked?
  3. What is the relationship between notions of music similarity (or pairwise relationship in the generic) and playlist construction?

While all these ideas are interrelated, for now I'm going to pick at point (3) a bit. I'm coming to believe this is central in understanding the other two points as well, at least to an extent. There are many ways to consider how two songs are related. In music informatics this similarity is almost always content-based, even if it isn't content derived. This can include methods based on timbral or harmonic features or most tags or similar labels (though these sometimes get away from content descriptors). This paints some kind of picture but leaves out something that can be critical to manual playlist construct as it is commonly understood (e.g. in radio or the creation of a 'mixtape'), socio-cultural context. In order to have the widest array of possible playlist constructions, it is necessary to have as complete an understanding of the relationship between member songs (not just neighbors...). Put another way, the complexity of your playlist is maximally bound by the complexity of your similarity measure.
M<=Cs
Where M is some not yet existant measure of the possible semantic complexity of a playlist and s is a similar measure of the semantic complexity of the similarity measure used in the construction of that playlist. C is our fudge factor constant. Now, obviously there are plenty of situations where complex structure isn't required. But if the goal is to make playlists for a wide range of functions and settings, it will be required some times.

In practice what this means is that you can make a bag of songs from a bag of features. However, imparting long form structure is at a minimum dependant on a much more complex understanding of the relationships (eg. sim) between songs (say from social networks or radio logs...)

Anyway, this is all a bit vague right now. I'm working on some better formalization, we'll see how that goes. Anyone have any thoughts?