Wednesday 12 November 2008

speaking of making playlists

I'll be making some the old fashion way and broadcasting the result tonight (and every wednesday) from 6pm (GMT, about a half hour from the post time) till 8pm on Goldsmith Student Radio.  Live D'n'B and breakcore!

Sunday 12 October 2008

mypyspace status update

So MyPySpace has been getting a facelift.  Kurt with some input from users (we apparently have users!  Who knew.) has been refactoring the rdf translators and fine tuning the myspace ontology as well.  Most (all?) of these changes are also being reflected into the live service.  While all of this has been going on, I've refactored the page scraping, crawling and downloading into a much more sensible class architecture from it former stream of consciousness implementation (I believe the polite description is 'research code').  It still has quite a ways to go (alpha!) but it's starting to resemble an actual library.  If you want to play with my refactored bits you can check them out like this:

> svn co https://mypyspace.svn.sourceforge.net/svnroot/mypyspace/myspaceCrawler/trunk/ myspaceCrawler

then you can do nifty things like the following (inside your favorite python interpreter or script, I'm using ipython here):

In [1]: import mpsUser

In [2]: gearmonkey = mpsUser.mpsUser('http://www.myspace.com/gearmonkey')

You simply give the class a valid myspace user url to initialize it (this is my artist page.  If you want to play with this, don't feel the need to listen to my music...)

In [3]: gearmonkey.isArtist
Out[3]: True


In [4]: gearmonkey.downloadTracks('~/Music/mpsUsertest/gearmonkey/')
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/1_Cheeky.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/2_TrainTune.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/3_Give Way.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/gearmonkey/4_En La Selva Mvt II GMO vip.mp3; creating tag from scratch
Out[4]: (4, 4)

Then you can find out if your user is an artist by checking the boolean isArtist.  If it's an Artist, you can download their songs.  That return value is a tuple of (songs successfully downloaded, downloads attempted). 

In [5]: gearmonkey.songs[0].title
Out[5]: u'Cheeky'

As part of the download process, each song is an instance of the class mpsSong (more on that class in a bit).

You can use the mpsUser class to crawl the artist network like this:

In [6]: artistFriends = gearmonkey.findArtistTopFriends()

In [7]: artistFriends
Out[7]: 
[mpsUser.mpsUser instance at 0x1a3a7b0,
 mpsUser.mpsUser instance at 0x1c5d788,
 mpsUser.mpsUser instance at 0x1a3ad50,
 mpsUser.mpsUser instance at 0x1a3a968,
 mpsUser.mpsUser instance at 0x1c7fc88]

In [8]: artistFriends[0].artist
Out[8]: u'Mike'

In [9]: for entry in artistFriends:
   ...:     print entry.artist
   ...:    
Mike
Otto Von Schirach
GEIN
The Dead Hookers' Bridge Club
EVOL


In [10]: artistFriends[2].downloadTracks('~/Music/mpsUsertest/GEIN/')
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/1_Life Of Sin GEIN edit.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/2_Deadly Algorhythm GEIN Remix.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/3_GEIN KJ Sawka Break the Enemy.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/4_GEIN  Warden.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/5_GEIN vsThe ChosenAbomination.mp3; creating tag from scratch
INFO:root:No ID3 header found for /Users/bfields/Music/mpsUsertest/GEIN/6_GEIN  Hell Audio rmx.mp3; creating tag from scratch
Out[10]: (6, 6)

In [11]: artistFriends[2].topFriends
Out[11]: 
[u'11187934',
 u'2123795',
 u'2177245',
 u'706581',
 u'20492111',
 u'66601290',
 u'5017015',
 u'207669100',
 u'52365642',
 u'2186134',
 u'3378431',
 u'55609497',
 u'30244',
 u'26629700',
 u'80613962',
 u'74772580',
 u'28841051',
 u'317327']

In [12]: geinArtistFriends = artistFriends[2].findArtistTopFriends()

In [13]: geinArtistFriends
Out[13]: 
[mpsUser.mpsUser instance at 0x1d83a30,
 mpsUser.mpsUser instance at 0x1d95710,
 mpsUser.mpsUser instance at 0x1d99198,
 mpsUser.mpsUser instance at 0x1d95be8,
 mpsUser.mpsUser instance at 0x1d956c0,
 mpsUser.mpsUser instance at 0x1e6fa80,
 mpsUser.mpsUser instance at 0x1e81b98,
 mpsUser.mpsUser instance at 0x1da7080,
 mpsUser.mpsUser instance at 0x1e81b70,
 mpsUser.mpsUser instance at 0x1f2dee0]

In [14]: for friend in geinArtistFriends:
   ....:     print friend.artist
   ....:    
EVOL
GUERILLA®
THE GUN
Habit Recordings
Mumblz / Delusional
Tech Itch
Lost Soul Recordings
None
Donny
NECRO THE SEXORCIST SPECIAL EDITION CD/DVD SOON!!!


and so on and so forth.  Once you've initialized the songs for an artist you can use the mpsSong class structure to find things out about the songs as well:

In [15]: gearmonkey.songs
Out[16]: 
[mpsUser.mpsSong instance at 0x1a155d0,
 mpsUser.mpsSong instance at 0x1a2ff08,
 mpsUser.mpsSong instance at 0x1a338f0,
 mpsUser.mpsSong instance at 0x1a338c8]

In [17]: for song in gearmonkey.songs:
   ....:     print song.title + " by " + song.parent.artist + " has been played " + song.playcount + " times." 
   ....:    
Cheeky by G_M_O has been played 117 times.
TrainTune by G_M_O has been played 168 times.
Give Way by G_M_O has been played 88 times.
En La Selva Mvt II GMO vip by G_M_O has been played 9 times.

In [18]: 


There are also some simple hooks to call fftExtract on the songs of an artist but I'll save those bits for another post.   One quick note, I don't believe we've fixed the bug that prevents song downloads in the US (and maybe Canada), but the url requests have been changed slightly so if anyone tries it over there let me know.  All the scraping should be fine in the States and everything should work everywhere else.  Also, you need the mutagen ID3 tag library installed prior to using this.  

If any readers do give this a try let me know if you have any thoughts (especially interface related) down below.

Wednesday 17 September 2008

introductory thoughts on a playlist generation task in MIREX 2009

I'm currently knee deep in ISMIR activities, of which I will do a more thorough write up when everything has finished. Now, however, I think it will be useful to briefly discuss a MIREX 2009 task I'll be pushing for, that of automatic playlist generation. I think it will be very useful to formalize as much of this task as possible as soon as possible, in effort to encourage participation and avoid the nearly standardized MIREX august rush.

With that in mind, here some the basic starting questions followed by my [brief and biased] thinking on them:

  1. What is a playlist?

    From wikipedia:
    In its most general form, a playlist is simply a list of songs. The term has several specialized meanings in the realms of radio broadcasting and personal computers.
    The term originally came about in the early days of top 40 radio formats when stations would devise (and, eventually, publish) a limited list of songs to be played. ...
    As music storage and playback using personal computers became common, the term playlist was adopted by various media player software programs intended to organize and control music on a PC such as Nexus from NexTune. ...
    Some websites allow categorization, editing, and listening of playlists online, such as Project Playlist, Plurn, imeem and Webjay. ...

    This is a sensible starting place from a broad definitional stand, especially when we acknowledge that the exact MIREX will obviously be a subset within the idea of a playlist (more on that later...)

  2. How do we quantitatively evaluate an automatically generated playlist?
    Upon a brief reflection this is clearly the sticky bit. As such this question is being posed prior to any sort attempt to specify an exact task. Functionally speaking, if the evaluation is properly specified the ask description will practically fall out of it.
    There seem to be a few approach to this, though many tend to revolve around an idea of mining some form of ground truth out of crowd wisdom (here and here for instance)

  3. And of course, what exactly should the task be?
    Again, this is of course all very draftish, but my prime thought on this is to have a highly specific task of one (or all) of the following forms: query with song A B C algorithm generates song D (and perhaps song E, where order matters), the simple deviation of that form given A B E, generate C D. The other possibility is something alog the lines of give a query song and provide the next song that should occur in the list.

So anyway, those are my initial thoughts on this potential task for next years MIREX. Please comment if you have thoughts...

Thursday 4 September 2008

A brave new world

This morning as I 'opened' my NY Times, what do I discover but a review of the new google browser, not in the tech section or the style section, but as an editorial piece. Who would have thought that a New York Times editorial would read like it was on Ars Technica. Clearly these are interesting times.

Monday 1 September 2008

in between an ICMC and an ISMIR

So I'm gathering my thoughts on ICMC (I presented this poster) while simultaneously prepping for my poster/presentation for ISMIR (paper).

ICMC was very interesting. Kurt has a nice summary of a few of the notable papers. I did actually go to Bob Strum's presentation on Friday and there tech has some amazing potential as well as a rather slick interface (especially for research software...). Beyond this though, I spent a good portion of the paper sessions attending the discussions on aesthetic and philosophical considerations, in part because I have approximately no formal background in that sort of thing and as I'm setting out to make playlists (and mix them) in a way that's pleasing (or at least interesting). So I heard a very interesting talk by Gary Kendall presenting an attempt at an event based descriptor schema for Electro-Acoustic music (can't find the paper online...). While I must admit that the details of the schema's application to Electro-Acoustic music left something to be desired, I found the general constructs of attempting to define event based descriptors of things like expectation and energy expended to be very intriguing.

I have some video from the mobile phone orchestra performance, that I'll be posting a bit later, providing it's worth looking at.

Back to prepping for ISMIR.

Tuesday 24 June 2008

A mashed08 debrief

So over the weekend I attended the hackfest that BBC puts, Mashed08. It was a 2 day marathon sort of hackfest where BBC and other sponsors (Yahoo was pushing their new locator techFire Eagle for instance) walked you through some newish API or otherwise accessible tech then you had 24 hours to make something rad. Each sponsor gave out prizes in the end for the hacks they judged to be the best at using their bits.

Yves, Kurt and I set out to make something rad as well. The basic idea was to recommend BBC radio brands (shows) based on the music in your personal collection. We (mostly, kindof) made this work.

The idea is that you generate rdf describing your audio collection (or a subset) using gnat, which is a piece of the music ontology tools. This rdf file contains the MusicBrainz IDs for your tracks which can then be cross referenced against the available playlists for BBC radio shows in order to find the best matches.

The biggest problem we encountered (which seems obvious now but didn't when we started) is that a collection of non-trivial size generates way too many results to be useful and we didn't implement any sort of ranking scheme to present the results in a way that is useful, though this could easily be done and maybe one of us will add this at some point...

Also, there was a tech crunch write of the weekend that shortlisted our project, which is neat. (The Guardian has a more comprehensive article as well). Finally, for posterity:



more pictures

Wednesday 4 June 2008

Needs something to do tomorrow?

I'm going to be DJing at this event at the Dana Centre in London tomorrow. The playlist for my DJ set has been automatically generated using both social network information (from Myspace) and acoustic similarity measures. If you're intrigued, come to the event. if you can't make the event, the algorithm will be presented in an ISMIR paper this year, so you'll be able to see a bit of it there (if you're the ISMIR attending type. You know who you are...)

Thursday 22 May 2008

Greatest example of a need for data compression...

From the duke.
I believe the entropy of that set is actually zero, since both artists are functionally the same, so with an nice compression algorithm with the same two hour fixed transfer time, the transfer rate goes to infinity.

ha!

Wednesday 16 April 2008

wherein I poke at thisismyjam

as many readers of this blog probably already know, the echo nest recently announced their new web service based platform analyze and the platform's first app (written by them), thisismyjam.com. I went over there and checked it and while it's not without it's problems (perhaps most notably drastic song to song level jumps and a lack of phrase alignment) it's a fairly decent web app and has me wanting to play with their API, so it has achieved that goal at least.

You want to see my jam? reich + cage x ad nauseum =

Tuesday 18 March 2008

how much data?

In my work with kurt on MyPySpace we've been dealing with fairly large amounts of data, at least compared to average loads for content based MIR research. As a point of reference, I'd say a fairly standard music similarity or classification study will have a data set of something on the order of 10^3 songs, while our initial research research efforts have had a data set comprised of approximately 16,000 artists over about 55,000 songs. Further, there have been some studies (this one for instance) that use entire commercial mp3 datasets (said paper used Yahoo!'s digital download library of order 10^7 songs). These papers tend to deal particularly with the issues of large datasets as when things get that large it becomes impossible to brute force your way out of the situation.

So anyway, all of this has me thinking, how much music data is out there? How many musical recordings exist? Anybody know? I could google it a bit but I'm lazy.

Tuesday 26 February 2008

It was probably for the best.

Lawrence Lessig has decided not to run for congress. As they said on draftlessig.org:
Lessig: No to Running for Congress, Yes to Changing Congress

Here's the video, it's fairly self explanitory:


I can understand the logic behind this. Really, Jackie Speier is extremely well liked in the whole state and certainly in the district as well and frankly she's a really good progressive politician (in so far as such a thing exists). So, with this in mind, Change Congress! Let's take some money out of politics.

Wednesday 20 February 2008

Draft Lessig!

Draft Lessig!

I realize that many of the people reading this might not know who Larry Lessig is, so perhaps a brief intro, and then it should be clear to those who know me why he's exactly sort of congressperson I'd want (I'll take him in the Palo Alto district, the key is to get him into congress) Larry Lessig is the founder and ceo of the Creative Commons non profit set up to push the copyleft ideas of the same name. He is a law professor at the Stanford Law School, a founder of the school's Center for Internet and Society.Further, he is on the executive board of the Electronic Freedom Foundation and has worked in numerous ways to support the idea Free Software and Free Content in a general sense.

In response to this recent draft movement, Larry Lessig has officially started an exploratory committee. Watch this fantastic announcement video for more info about what his campaign would be about:







Change Congress!

h/t Matt Stoller over at Open Left