technology


Google Search

This is probably one of the least informative searches I’ve ever encountered on Google. I was googling the band ‘!!!‘ (pronounced Chk Chk Chk), but evidently Google can’t handle that kind of search. Funny. You’d think that someone would have encountered this bug and provided some sort of a fix or something…

Orpheus Logo
Explore the Musical Universe

Orpheus Visualizer
A Dynamic Graph of Artists

Despite my Waldorfian upbringing, I’m not schooled on the details of Greek mythology – otherwise I’d attempt a wise crack along those lines.

But rather than embarrass myself, I’ll just put it like this: My Master’s thesis from UC Berkeley’s iSchool (then S.I.M.S.) was a music exploration and discovery tool called Orpheus. It was well received in 2005, and ever since I’ve been promising/threatening to resurrect it and try to improve some of the more compelling features and turn them into a web-based application. This latter part is in the works, but the first two steps were to get the Orpheus server installed and running, and make the Exploration Tool/MP3 Player client available for download. These two steps, I’m thrilled to announce, are completed!

Orpheus is all about mining a wide variety of information about music from Internet sites like Last.FM, Pitchfork Music, and Wikipedia; from structured sources like MusicBrainz and Freebase; from users’ music libraries, and from user contributed metadata, and turning this information into knowledge that can be used to discover new artists and explore the musical universe in novel ways. It’s about finding compelling new means of exploration and discovery.

As several of our professors pointed out at the time, our approach could prove useful for exploring the corpora around many other types of media – books, academic articles, movies, etc. But my primary joy and passion is music, so music is the first and foremost focus of this project.

Without further delay, you can download the Orpheus client and start using it right away. If you are new to Orpheus, you might benefit from watching this slideshow first. Before you get too excited, the rich information that Orpheus collects hasn’t been updated in a while (I’m working on adding new feeds and cleaning up the feeds we currently have). So you might not find too much about new artists in Orpheus at the moment. But don’t let that stop you! Please be sure to contact me with any problems or bugs you encounter. And stay tuned for updates and new features.

Enjoy!

Olive DJ Booth
View From the DJ Booth @ Olive

The infamous Jeannie Yang and I will be trading off on the turntables at Olive tonight. Drop by for a drink, some delicious food and good music. Olive is under new ownership, and they have a revamped menu and some new faces behind the bar, but the same great vibe and tasty stuff. Jeannie will be spinning her eclectic house records, and I’ll be augmenting her set with intermittent old school hip-hop, soul, funk and anything else that pops into my head. I’ll be aided by the wonders of my new toy, Serato Scratch Live. Come check it out!

Arcus Logo

This weekend I started migrating the Arcus Associates website to its new home on MediaTemple.com. I hoped to complete the migration last night, but it’s dragging on into the work week. I apologize for any disruption or problems with email.

This move to a dedicated server will allow me to better manage the Arcus environment and website, improve and upgrade some of my software and tools, and support running the Orpheus backend server, as well as the database documentation and automated metadata browser (more on the reincarnation of Orpheus in a later post).

The DNS migration is presently under way. Please be patient with any returned emails or errors you receive in the next 24 hours. I’ll post an update with any problems or once the migration is complete.

IMG_0167
Explore the Musical Universe

Okay, so maybe you can’t actually use Orpheus right now, but you can finally take a look at the interface and get a sense for the application. I spent a few hours this weekend taking screen shots of my beloved Master’s thesis (created with Vijay Viswanathan and Jeannie Yang) and putting them online with some explanatory text. You can check out a slide show of Orpheus’ main functionality right here. If you’re looking for the database documentation or older project documents, you’ll still need to visit the thesis site. I’d like to say in the coming weeks you’ll see more content online about Orpheus, but that’s probably not going to happen. The main motivation to post this stuff online is that as a professional I have something to point people to when I talk about what Orpheus did and what it is capable of.

As part of this process, I also compiled and installed both the client and server on my local media machine. Orpheus still looks cool, and is still enormously fun and satisfying to play with. The music landscape has changed a lot in the last two years, but I still love the idea of Orpheus.

Hack This!
Hack This!
Lerdorf
Lerdorf discusses PHP 5
License Plates
Only in the Valley (Plates read ‘Algortm’ and ‘Hack Me’).
Beck
Beck Rocks HackDay06
Sims Reunion
SIMS reunion at Hack Day ’06

I’ve never been that enamored of Yahoo!, to be honest. For starters, the name bothers me. Call me a purist, but I don’t think words should have extraneous punctuation. Secondly, I stopped using many of Yahoo!’s (see what I mean?) features years ago as Google’s services started to come online. I snickered when the Yahoo Music Engine came online. In brief I felt as though much of what Yahoo! was trying to do Google was doing better.

But they’ve been doing some really cool stuff for a year or two now. Last year they opened the Yahoo! Research Lab in Berkeley and hired a bunch of really smart SIMS kids (and poached a faculty member to found it). They’ve acquired some pretty cool companies. They hire really, really well. And this weekend they are having the coolest tech event I’ve ever attended to showcase their awesome employees and APIs and motivate the developer community to get involved.

Welcome to Yahoo! Open Hack Day ’06. I attended yesterday’s workshops and was really blown away. Yahoo! is the shit. Seriously, where else can you get the downlow on PHP from the guy who wrote it, sit next to the person who started Flickr as you learn how to hack the Flickr API, and get a tutorial on the Yahoo UI platform library from the people who designed them and then rock out to a private Beck concert, replete with a live puppet show? Punk. Rock.

This was a great idea from start to finish. Yahoo! gets lots of smart engineers playing around with their tools and services, possibly adds a few of them to their payroll, and spreads tremendous good will among the developer community. Attending engineers and researchers, in turn, get treated to a series of enlightening talks by the leading minds in the industry, sees the internal workings of a cutting-edge company, and get to network with like-minded people.

I’m looking forward to seeing all of the cool hacks that come out of this weekend. I’m also excited to start working with all of the APIs I was exposed to (ha ha) yesterday. Hopefully this is the first of many hack days at Yahoo!

Warning: the following post is pretty boring. Move on before it’s too late, or read on if you are a geek!

In my downtime between contracts, I decided it was high time I finally automated the process of generating my resume in a variety of formats. Most people rarely need to update their resumes, and when they do, they might only need to create one or two versions of it — say an HTML version and a Word version. It’s fairly easy for most people to just update their resume by hand and make a couple of different versions of it when the need arises.

Being a consultant, however, my needs are more complicated. My contracts are generally short in nature — ranging anywhere from one month to a year, which means 1) I have to update my resume frequently, and 2) I have a long list of consulting work that can make my resume look kind of disorganized – a year here, six months there, etc, so it’s ideal to be able to show a summary version of my resume, and allow a greater level of detail if necassary. I consult for various kinds of organizations, and I perform different kinds of work depending on the needs of the organization, which means that I want to be able to highlight different components of my skill set based on context. Moreover, my clients have different requirements for how they want to receive my resume, which means I typically need to be able to provide my resume in at least four formats: HTML, Word, PDF, or Plain Text.

Add all of these requirements together, and you can see that maintaining my resume manually is a major league in the butt. It quickly becomes cumbersome to make sure all the formats are up to date, that I can provide a detailed or summary level view of my work, and that all of the different versions are in sync across formats.

My goal was to have a model driven, single document containing all of my information, and use that document as a ‘system of record’ from which any number of views and formats could be generated. When I was in graduate school, I actually created a resume exactly along these lines in a Document Engineering class. The document was written in XML, compliant with an XML Schema that I had designed. I refined the schema and the document, and used it as my starting point. The schema itself was designed to be finely grained and applicable to any technically-oriented resume. Once the model and document were completed, I turned towards generating the necassary views.

For the HTML version, I wrote a series of PHP classes representing the various objects in my XML resume: Person, Qualifications, Degrees, Experience, Projects and Publications, etc. I then simply read in the XML document, loaded the objects, and generated two different views: an overview and a detail view. I chose to develop PHP objects rather than use a simple XML transform because ultimately I would like to use these objects for a more graphical, dynamic representation of my skill set. For now, though, I simply wanted to dynamically generate the content that I wanted to display on my website.

That was the easy part. The challenge came when I needed to replicate these views as text, pdf and html. I spent a lot of time screwing around with LaTex/Tex and various PHP classes and talking to several people before finally opting to use XSL-FO, (aka XSL 1.0), and the Apache FOP processor to get the job done. I’ve got to say that the latest version of FOP (0.9.2) is the cat’s meow. Using FOP, it was possible to use a single XSL stylesheet to publish to RTF (Readable in Word) and PDF.

I spent more time than I would have liked solving my resume problems, but the outcome is that I never have to hand edit my resume in multiple formats again. That’s well worth the work. My resume itself also showcases some of my technical skills in XML, PHP and CSS. All of the code used to generate my resume is available in the code examples area of my website. If you want to use any of the code to generate your own resume, feel free to download and use all of the code. The XML document containing my resume is also available. Just ask if you need help, and please give credit where it’s due.

[updated 6/7/6 8:45 am: fixed link] 
Today, I’ll be blogging over on The Great Whatsit, one of my favorite community blogs. My topic is “Is Technology Killing Privacy?” The post should be up on the site today around 8:00 AM New York time. This is a first hack at trying to get my head around one particular element of privacy in the digital age — the use of our information by corporations to build profiles of our behavior. Enjoy, and as always comments are welcome.

Recent Listening Screen Shot one
Recent Listening Screen Shot

Last Sunday I wrote a PHP tool that displays my recently played tracks in the side bar of my blog. I used the web services APIs from three sources to do this: Last.FM (recent tracks), MusicBrainz (album name), and Amazon (album art, label, etc.). My main motivation for writing this application was to replace the “Now Playing” application that provided similar functionality for my blog. I lost that plug-in, along with the license key, when my PC crashed a few weeks ago. I could have re-installed the Now Playing plugin, or used one of several other plug-ins for WordPress out there, but I wanted to see how easy or hard it would be to do this myself. I considered this exercise a baby-step along the way towards migrating the browsing and discovery capabilities of Orpheus from a fat client application to a web-based tool. There are miles to go before I get there, but this is a start.

I called this tool a ‘mash-up‘ in the title, and to the extent that it fits wikipedia’s definition of a “web application that seamlessly combines content from more than one source into an integrated experience,” it may loosely be considered one, provided we remove the adverb “seamlessly” from that description. Hitting up 3 data sources iteratively produces some unseemly latency. I could have removed MusicBrainz from the equation if Last.FM published the album name in their XML feed of recent tracks, but they don’t. So 3 web services it is. At any rate, this is my first mash-up, so yay for me.

And yay for you, because I’m posting the code here for others to use. It’s been tested in WordPress and Mozilla Firefox. It is a tad slow, but easy to configure and use. Be aware, there is little in the way of error handling, so if any of the 3 web services has problems, all goes to hell in a handbasket. I’ve seen this happen on the MB query when I have crazy long track names, usually on Classical music. This code is licensed under Creative Commons‘ “Attribution-NonCommercial-ShareAlike 2.5” license. For those interested in how I built this, and what I learned in the process, read on!

Before I started hitting up various web services, my first brilliant idea was to hack the iTunes API, take all of the relevant track metadata, query Amazon for album art and all kinds of other good stuff, and post it to my blog as XML for parsing. This is exactly what Brandon’s tool does, so I would essentially be rebuilding his system with less and different features to suit my needs. Of course, this approach required that I know C/C-Objective, which I don’t. After nodding off reading some code examples, I decided to defer my mastery of C for a later date. Ultimately, if I am going to migrate Orpheus to the web, I’ll need some simple little iTunes plugin, but that can wait. I discovered during my research that it is possible to query the iTunes XML database directly without working through the iTunes API, providing a real time “snap shot” of the library. But there are challenges with doing this as well, and most of the data I could get from the iTunes DB I could get elsewhere. For now, I would avoid working with any local data at all, and rely exclusively on existing web service data and only one local plug-in, AudioScrobbler.

I was already using the AudioScrobbler plug-in for iTunes to post my listening behavior to last.FM. And, bless their hearts, last.FM is kind enough to offer up a WS API for accessing said data (as well as much more!). So I could get a live, on demand XML representation of my recently listened-to tracks via the Last.FM Web Service. As I mentioned earlier, Last.FM’s web service for recent tracks doesn’t return all of the metadata about a track. Most notably missing is the name of the album. Without the name of the album, an artist name and a track name only provide a partial picture of the song in question. Most ID3 tags describe the album name, so why isn’t it available on my recent listening tracks XML feed?

I don’t know if this ‘bug’ is related to the data the audioscrobbler plugin sends to last.FM, or last.FM just not publishing the track data in its entirety. Whatever the reason, I needed the album name in order to build a useful query for Amazon. I decided to use MusicBrainz to attempt to determine the album name. MB’s Web Service is cool, but somewhat ill-suited for my very specific and unusual request. I needed to know, given an artist name and a track name, what the most likely album was that the track appeared on. This is admittedely an ass-backwards way of going about things, but I needed that question answered. Tracks, naturally, can show up on a variety of albums — the Single, the EP, the LP, the bootleg, the remix, etc. My queries returned some peculiar results in a few circumstances, so I decided to employ some additional logic to decide if the album name returned from the MB query was reliable enough to use. This approach means I don’t get the name of the album in a lot of circumstances, which sucks. You can see how several of the albums have no cover art. If can find the (correct) album on MB, the code will query the Amazon web service for album art and all the other goodies they have.

Once all of the data is collected, it gets parsed and posted as an unordered HTML list. Links to Last.FM and Amazon pages are included, and mousing over the image or track listing will show what time the track was played (in London time, unfortunately…). Pretty spiffy.

All of this was done using REST requests (no Soap here) and PHP to parse the resulting XML files. I avoided using XSLT for processing the XML because my web server doesn’t have XSLT enabled in PHP. Plus, the data needed to get into PHP at some point, so I decided to just do the parsing in PHP using xml_parse_into_struct. I relied on several great resources to build this. These two were the most useful. Visit my del.icio.us site for other useful sites.

Download recentListening here. Feedback is always appreciated. Except negative feedback. Keep that to your bitter self!

Last Friday I went to see Chris Anderson discuss “The Long Tail of Time,” at the Long Now Foundation’s Friday “SALT” seminar. Chris presentated a new twist on his fascinating research about The Long Tail (more detail here, here, and here). He’s got a book coming out that I’m very much looking forward to reading.

Without going into any detail (read Chris’ book instead…), the “long tail” refers to the long, flatter tail of a power distribution. The yellow part on the graph to the right shows the long tail of a power curve. The feature of a power law that is of most interest to Internet-based businesses is that the total volume of the long tail can be equal to or greater than the total volume of the ‘head’ part of the curve. Therefore, all things being equal, there is as much market potential in the long tail, the numerous ‘obscure’ products, as there is in the steep part of the curve.

Traditional businesses often overlook products with a small potential consumer base because focusing on niche products would mean less space on their shelves for more popular (and thus more profitable) products. Internet-based businesses such as Amazon and Netflix can avoid this problem because their ‘shelf space’ is effectively unlimited. It costs them nothing, and does not negatively affect the most popular products, to keep certain obscure works in their inventory. The recommendation features of these sites also help to push demand down the tail, making works more profitable that might have been otherwise ignored. Thus, the Internet and accompanying technologies have opened the door to niche products, be they independent recording artists, obscure authors, or funky products sold on eBay.

Chris’s talk gave a concise overview of the Long Tail phenomenon, and then moved on to the actual subject of the talk, the notion that there is a temporal element to the Long Tail. Essentially, Anderson was describing how many of the products inhabiting the long tail are old products, classics, cult favorites, etc. The premise is that there is a market for these works, and by making them available, businesses might tap into new or undiscovered markets. It is not just ‘classic’ products that might benefit from monetizing the long tail. Recent ‘big hits’, best-sellers, hit albums, blockbuster movies, etc. that lose the huge popularity they enjoy might enjoy a much longer, if much smaller, interest as the work moves from the best seller list to the long tail. As an example, think of a John le Carré novel (I’m presently reading Absolute Friends…). Years ago, his work was on the bestseller lists, available in airport bookstores and big chain stores everywhere. But over time, new authors appear, new bestsellers are made, and older novels begin to fade from the consumer consciousness. Yet there remains people like me who want to buy and read these old best sellers. Why shouldn’t we have access to them?

And who says that just because a book loses popularity that it will continue down that path to obscurity? This ‘decay function’ in bestsellers is not in fact a one-way road to obscurity. In the case of le Carré, the success of the movie The Constant Gardener reinvigorated interest in his books, and his novels are enjoying a renaissance. Six months ago, I couldn’t find le Carré’s novels (with the exception of The Constant Gardener) at my local Cody’s or Border’s. They told me that they were no longer carrying his books. Yet last week, I walked into Cody’s in SF and found three of his works. Le Carré essentially went out of print, then as the result of external factors, his works gained popularity again and worked their way back up the tail, and back onto bookshelves at independent stores. Who knows if this would have happened without the proven demand that the Long Tail illustrated.

Many believe the economics of The Long Tail portends the death of the centralized gatekeeper. Large media companies (record labels, news corporations, publishers, major booksellers, etc) traditionally need to invest considerable resources in market analysis, determining what products have the greatest potential for success. They act as the gatekeeper of our culture. Most products never make it to market because somebody thinks they won’t be profitable. Chris calls this “pre-filtering” or something like that (he should know all about pre-filtering, since he is the editor at Wired). But it is possible we can move away from pre-filtering and let more products get onto the market, and rely on consumers to decide if the product is worthwhile. To be sure, media companies will still need to invest up-front for some projects like expensive blockbuster movies, but many other avenues for entertainment will open up if the cost of production and distribution are zero.

What would happen if we just opened the gates to content production, so that anyone, anywhere could produce an album, make a movie or write a novel and have it accessible to hundreds of millions of potential consumers? We might not need to answer this question theoretically, because it appears that our culture and our economy is headed in this direction.

A former professor of mine used to talk about “creating the technology and applications that will enable daily media consumers to become daily media producers”. I used to think this was fundamentally a bad idea. How much more amatuer crap do we need out there? Yet part of building the technologies and applications that enable ordinary citizens to be producers of media is building the technologies to sift through the greater volume of content to find the good stuff.

With better search technologies, tagging, collaborative filtering and print on demand, we can not only find the information we are looking for easier, but we can encounter works that were buried in history, bound to rot away in libraries. What if all those old rotting books, and all of our new, unfiltered content was indexed and available for our pursual? The cover story of last Sunday’s New York Times Magazine touched on this (I recommend reading the full article…):

“If you can truly incorporate all texts — past and present, multilingual — on a particular subject, then you can have a clearer sense of what we as a civilization, a species, do know and don’t know. The white spaces of our collective ignorance are highlighted, while the golden peaks of our knowledge are drawn with completeness. This degree of authority is only rarely achieved in scholarship today, but it will become routine.”

In short, there is tremendous value in making all works ever created available to be searched, indexed and discovered. The Long Tail is home to such a rich and diverse set of cultures. We should embrace it. I believe now, more so then I did a year ago, that opening the doors to the playground, making all content available to consumers and potential producers will surely result in much richer, more diverse and more creative content. To be sure, there are many hurdles to overcome, most of them legal, but ultimately, it appears this is what our future looks like. It is a bright day for the indy artist, researcher and student; and a bad day for independent, physically situated businesses, particularly booksellers and record shops, as the closure of Cody’s in Berkeley illustrates. Major content producers and distributors may lose their role as gatekeepers of our culture, but I think they are creative and savvy enough to find a new role in the Long Tail.

This is a quick moving area, and one that has tremendous implications for the content we create and consume, the business strategies that are built around this content, and the laws that govern it. On a closing note, I really enjoyed Chris’ presentation style. He used PowerPoint in a very refreshing way. There were no bullet points, no stupid slides with a sentence or two, or goofy MS Office graphics on them. Rather, each slide was a rich graph or chart, illustrating his talking points with well-visualized data. The talk was meandering at times, but it seemed that at each intersection, for each new topic, he had some great chart to highlight his point.

« Previous PageNext Page »