May 2006

View from the DJ Booth
View from the DJ Booth @ Olive

Last night’s gathering at Olive was a smashing success! After a couple of technical hiccups in the early going, the music flowed and the crowd thickened. Jeannie’s DJ prowess again caused heads to nod and booties to shake. We had so much fun on Thursday, we are returning to Olive for another night this Sunday, 5/28/2006 from 7 to late! We all know you don’t have to work on Monday, so what better place to pass the evening after Sunday BBQs? Come celebrate the official start of Summer with your pals.

Recent Listening Screen Shot one
Recent Listening Screen Shot

Last Sunday I wrote a PHP tool that displays my recently played tracks in the side bar of my blog. I used the web services APIs from three sources to do this: Last.FM (recent tracks), MusicBrainz (album name), and Amazon (album art, label, etc.). My main motivation for writing this application was to replace the “Now Playing” application that provided similar functionality for my blog. I lost that plug-in, along with the license key, when my PC crashed a few weeks ago. I could have re-installed the Now Playing plugin, or used one of several other plug-ins for WordPress out there, but I wanted to see how easy or hard it would be to do this myself. I considered this exercise a baby-step along the way towards migrating the browsing and discovery capabilities of Orpheus from a fat client application to a web-based tool. There are miles to go before I get there, but this is a start.

I called this tool a ‘mash-up‘ in the title, and to the extent that it fits wikipedia’s definition of a “web application that seamlessly combines content from more than one source into an integrated experience,” it may loosely be considered one, provided we remove the adverb “seamlessly” from that description. Hitting up 3 data sources iteratively produces some unseemly latency. I could have removed MusicBrainz from the equation if Last.FM published the album name in their XML feed of recent tracks, but they don’t. So 3 web services it is. At any rate, this is my first mash-up, so yay for me.

And yay for you, because I’m posting the code here for others to use. It’s been tested in WordPress and Mozilla Firefox. It is a tad slow, but easy to configure and use. Be aware, there is little in the way of error handling, so if any of the 3 web services has problems, all goes to hell in a handbasket. I’ve seen this happen on the MB query when I have crazy long track names, usually on Classical music. This code is licensed under Creative Commons‘ “Attribution-NonCommercial-ShareAlike 2.5” license. For those interested in how I built this, and what I learned in the process, read on!

Before I started hitting up various web services, my first brilliant idea was to hack the iTunes API, take all of the relevant track metadata, query Amazon for album art and all kinds of other good stuff, and post it to my blog as XML for parsing. This is exactly what Brandon’s tool does, so I would essentially be rebuilding his system with less and different features to suit my needs. Of course, this approach required that I know C/C-Objective, which I don’t. After nodding off reading some code examples, I decided to defer my mastery of C for a later date. Ultimately, if I am going to migrate Orpheus to the web, I’ll need some simple little iTunes plugin, but that can wait. I discovered during my research that it is possible to query the iTunes XML database directly without working through the iTunes API, providing a real time “snap shot” of the library. But there are challenges with doing this as well, and most of the data I could get from the iTunes DB I could get elsewhere. For now, I would avoid working with any local data at all, and rely exclusively on existing web service data and only one local plug-in, AudioScrobbler.

I was already using the AudioScrobbler plug-in for iTunes to post my listening behavior to last.FM. And, bless their hearts, last.FM is kind enough to offer up a WS API for accessing said data (as well as much more!). So I could get a live, on demand XML representation of my recently listened-to tracks via the Last.FM Web Service. As I mentioned earlier, Last.FM’s web service for recent tracks doesn’t return all of the metadata about a track. Most notably missing is the name of the album. Without the name of the album, an artist name and a track name only provide a partial picture of the song in question. Most ID3 tags describe the album name, so why isn’t it available on my recent listening tracks XML feed?

I don’t know if this ‘bug’ is related to the data the audioscrobbler plugin sends to last.FM, or last.FM just not publishing the track data in its entirety. Whatever the reason, I needed the album name in order to build a useful query for Amazon. I decided to use MusicBrainz to attempt to determine the album name. MB’s Web Service is cool, but somewhat ill-suited for my very specific and unusual request. I needed to know, given an artist name and a track name, what the most likely album was that the track appeared on. This is admittedely an ass-backwards way of going about things, but I needed that question answered. Tracks, naturally, can show up on a variety of albums — the Single, the EP, the LP, the bootleg, the remix, etc. My queries returned some peculiar results in a few circumstances, so I decided to employ some additional logic to decide if the album name returned from the MB query was reliable enough to use. This approach means I don’t get the name of the album in a lot of circumstances, which sucks. You can see how several of the albums have no cover art. If can find the (correct) album on MB, the code will query the Amazon web service for album art and all the other goodies they have.

Once all of the data is collected, it gets parsed and posted as an unordered HTML list. Links to Last.FM and Amazon pages are included, and mousing over the image or track listing will show what time the track was played (in London time, unfortunately…). Pretty spiffy.

All of this was done using REST requests (no Soap here) and PHP to parse the resulting XML files. I avoided using XSLT for processing the XML because my web server doesn’t have XSLT enabled in PHP. Plus, the data needed to get into PHP at some point, so I decided to just do the parsing in PHP using xml_parse_into_struct. I relied on several great resources to build this. These two were the most useful. Visit my site for other useful sites.

Download recentListening here. Feedback is always appreciated. Except negative feedback. Keep that to your bitter self!

This coming Thursday, I’ll be spinning some records and drinking some delicious cocktails at Olive in the glorious TL, SF. I’ll be on the tables from ~6:00 to ~7:00, so come around 7:00 if you want to hear a real DJ. My co-hosts will be Turntablistress DJ Jeannie Yang and the infamous Vijay Viswanathan.

We’ll be there from 6:30 until 10:00 or so, we have the tables until 9:30. Come visit for a drink, a chat and some music. Olive has a delicious menu as well, so stop on by!

The Anatomy of the Long Tail
The Anatomy of the Long Tail (via Wired Magazine)

Last Friday I went to see Chris Anderson discuss “The Long Tail of Time,” at the Long Now Foundation’s Friday “SALT” seminar. Chris presentated a new twist on his fascinating research about The Long Tail (more detail here, here, and here). He’s got a book coming out that I’m very much looking forward to reading.

Without going into any detail (read Chris’ book instead…), the “long tail” refers to the long, flatter tail of a power distribution. The yellow part on the graph to the right shows the long tail of a power curve. The feature of a power law that is of most interest to Internet-based businesses is that the total volume of the long tail can be equal to or greater than the total volume of the ‘head’ part of the curve. Therefore, all things being equal, there is as much market potential in the long tail, the numerous ‘obscure’ products, as there is in the steep part of the curve.

Traditional businesses often overlook products with a small potential consumer base because focusing on niche products would mean less space on their shelves for more popular (and thus more profitable) products. Internet-based businesses such as Amazon and Netflix can avoid this problem because their ‘shelf space’ is effectively unlimited. It costs them nothing, and does not negatively affect the most popular products, to keep certain obscure works in their inventory. The recommendation features of these sites also help to push demand down the tail, making works more profitable that might have been otherwise ignored. Thus, the Internet and accompanying technologies have opened the door to niche products, be they independent recording artists, obscure authors, or funky products sold on eBay.

Chris’s talk gave a concise overview of the Long Tail phenomenon, and then moved on to the actual subject of the talk, the notion that there is a temporal element to the Long Tail. Essentially, Anderson was describing how many of the products inhabiting the long tail are old products, classics, cult favorites, etc. The premise is that there is a market for these works, and by making them available, businesses might tap into new or undiscovered markets. It is not just ‘classic’ products that might benefit from monetizing the long tail. Recent ‘big hits’, best-sellers, hit albums, blockbuster movies, etc. that lose the huge popularity they enjoy might enjoy a much longer, if much smaller, interest as the work moves from the best seller list to the long tail. As an example, think of a John le Carré novel (I’m presently reading Absolute Friends…). Years ago, his work was on the bestseller lists, available in airport bookstores and big chain stores everywhere. But over time, new authors appear, new bestsellers are made, and older novels begin to fade from the consumer consciousness. Yet there remains people like me who want to buy and read these old best sellers. Why shouldn’t we have access to them?

And who says that just because a book loses popularity that it will continue down that path to obscurity? This ‘decay function’ in bestsellers is not in fact a one-way road to obscurity. In the case of le Carré, the success of the movie The Constant Gardener reinvigorated interest in his books, and his novels are enjoying a renaissance. Six months ago, I couldn’t find le Carré’s novels (with the exception of The Constant Gardener) at my local Cody’s or Border’s. They told me that they were no longer carrying his books. Yet last week, I walked into Cody’s in SF and found three of his works. Le Carré essentially went out of print, then as the result of external factors, his works gained popularity again and worked their way back up the tail, and back onto bookshelves at independent stores. Who knows if this would have happened without the proven demand that the Long Tail illustrated.

Many believe the economics of The Long Tail portends the death of the centralized gatekeeper. Large media companies (record labels, news corporations, publishers, major booksellers, etc) traditionally need to invest considerable resources in market analysis, determining what products have the greatest potential for success. They act as the gatekeeper of our culture. Most products never make it to market because somebody thinks they won’t be profitable. Chris calls this “pre-filtering” or something like that (he should know all about pre-filtering, since he is the editor at Wired). But it is possible we can move away from pre-filtering and let more products get onto the market, and rely on consumers to decide if the product is worthwhile. To be sure, media companies will still need to invest up-front for some projects like expensive blockbuster movies, but many other avenues for entertainment will open up if the cost of production and distribution are zero.

What would happen if we just opened the gates to content production, so that anyone, anywhere could produce an album, make a movie or write a novel and have it accessible to hundreds of millions of potential consumers? We might not need to answer this question theoretically, because it appears that our culture and our economy is headed in this direction.

A former professor of mine used to talk about “creating the technology and applications that will enable daily media consumers to become daily media producers”. I used to think this was fundamentally a bad idea. How much more amatuer crap do we need out there? Yet part of building the technologies and applications that enable ordinary citizens to be producers of media is building the technologies to sift through the greater volume of content to find the good stuff.

With better search technologies, tagging, collaborative filtering and print on demand, we can not only find the information we are looking for easier, but we can encounter works that were buried in history, bound to rot away in libraries. What if all those old rotting books, and all of our new, unfiltered content was indexed and available for our pursual? The cover story of last Sunday’s New York Times Magazine touched on this (I recommend reading the full article…):

“If you can truly incorporate all texts — past and present, multilingual — on a particular subject, then you can have a clearer sense of what we as a civilization, a species, do know and don’t know. The white spaces of our collective ignorance are highlighted, while the golden peaks of our knowledge are drawn with completeness. This degree of authority is only rarely achieved in scholarship today, but it will become routine.”

In short, there is tremendous value in making all works ever created available to be searched, indexed and discovered. The Long Tail is home to such a rich and diverse set of cultures. We should embrace it. I believe now, more so then I did a year ago, that opening the doors to the playground, making all content available to consumers and potential producers will surely result in much richer, more diverse and more creative content. To be sure, there are many hurdles to overcome, most of them legal, but ultimately, it appears this is what our future looks like. It is a bright day for the indy artist, researcher and student; and a bad day for independent, physically situated businesses, particularly booksellers and record shops, as the closure of Cody’s in Berkeley illustrates. Major content producers and distributors may lose their role as gatekeepers of our culture, but I think they are creative and savvy enough to find a new role in the Long Tail.

This is a quick moving area, and one that has tremendous implications for the content we create and consume, the business strategies that are built around this content, and the laws that govern it. On a closing note, I really enjoyed Chris’ presentation style. He used PowerPoint in a very refreshing way. There were no bullet points, no stupid slides with a sentence or two, or goofy MS Office graphics on them. Rather, each slide was a rich graph or chart, illustrating his talking points with well-visualized data. The talk was meandering at times, but it seemed that at each intersection, for each new topic, he had some great chart to highlight his point.