Author: Nathan Willis
Flickr is a massive collection of digital photographs from individuals all over the world, linked together by user-to-user relationships like “friends and family” or interest groups, and by content-based associations like keyword tags and collections. The success of the site lies in the fact that it links every entity — be it a person, a photograph, a date, or a comment — to dozens or hundreds of other entities based on those relationships. That makes it easy to jump from one picture or person that you like to others, which in turn gets you more connected to the community.
As you might guess, making that much interconnectedness run smoothly requires some pretty heavy-duty database work. Flickr runs on MySQL, the most popular open source database, and it has from the beginning. Today Flickr has around a quarter of a million users and serves around 5,000 pages per minute, generating about 100,000 database queries.
When the company was first starting out, according to Flickr’s Web development lead, Cal Henderson, it quickly ruled out Oracle on the basis of cost. “A small startup buying Oracle and Windows 2003 licenses runs out of money quickly,” he said.
Henderson says that MySQL has scaled great. “The issue is application design. If you design your app without thinking about scaling, then you’re going to have issues. Then at a certain point, after optimizing all your bad queries and better indexing, vertical partitioning into clusters is easy enough — but again, only if you think about it in your app design from early on.”
As the site grew over the course of last year, Flickr moved to a replicated architecture, letting a cluster of slave servers handle things like searching and retrieving data — operations that do not require making changes to the database and that statistics showed account for upwards of 90% of the load.
Of course, that growth has continued, which means that the scaling will have to continue as well. Henderson says that the company keeps an active eye on MySQL development and has been testing newer releases internally. In particular, he hopes that within the next year the native clustering of MySQL 4.1 will be stable enough to let them make the switch.
Open on the inside
Apart from the database that manages the enormous volume of photos and ongoing discussion, a lot disparate pieces go into making Flickr function. The list of tools Flickr employs reads like a Top 40 of open source projects. When you upload that adorable picture of your cat, it is resized by ImageMagick. The EXIF and IPTC metadata tags are extracted with Perl. Java is used for processing daemons and for FlickrLive, a real-time chat application. The Web pages you see are generated from Smarty templates and PHP scripts.
Henderson says that the availability of open source tools saved them months of development time. “The sheer amount of PHP and Perl code out there means someone’s done everything before. We use bits of PEAR and Smarty for all the templating.” He estimates that they use about 20,000 lines of open source code, compared to about 60,000 lines of in-house.
Moreover, open source options have enabled Flickr to respond quickly to user demand for new functionality. Take the case of the email uploading option: When cameraphone users started to ask if they could email their photos directly to their Flickr accounts, it was a feature that the Flickr team hadn’t anticipated, but they were quickly able to find a solution. They routed the messages to Postfix, the open source mail processor, which extracted the photos and fed them into an PHP script, where they would enter the upload queue just like any other.
Open on the outside
Flickr’s founder Stewart Butterfield said in an interview with Creative Commons that the company believes in and wants to support free culture. Consequently, the company decided to open up the Web application’s APIs last fall. Henderson told me that this was not a decision based on outside demand, but just the belief that it would be beneficial.
Nonetheless, since the APIs were opened, response has followed. Flickr has issued more than 500 API keys — which are free of charge — and established a Flickr-API group for demonstrating what can be done with them. There are bindings for Perl, Python, and .Net, among others, and they’re all documented. So far the project has spawned everything from a GTK upload tool to scripts that create mosaics based on searching for photos on Flickr and assembling the results into unusual patterns based on color and value.
The company is also encouraging its users to license their photos under a Creative Commons license. The site explains attribution, derivative works, and other considerations, and lets users select the terms they feel comfortable with. There is a prominent link to the Creative Commons Web site for more detail. To date more than 400,000 Flickr photos have been licensed under a Creative Commons license. That makes it far and away the largest repository of such royalty-free images.
Fantastic, you say, so there’s finally one place where I can go to find the bad vacation photos of every cameraphone-toting idiot on the entire Internet. It’s true that most of the photographs under Creative Commons license at Flickr are not stock-photo quality and you might never want to use them yourself. But you can’t deny the positive implications. Flickr has a growing community of active photographers. Many of them do create good work, and more importantly they are choosing to take and share photographs first, license them second. Flickr’s efforts to raise the consciousness of an existing group of shooting photographers is going to produce a desirable pool of images much faster than founding a “free stock photo” site first and then trying to attract photographers second.
Flickr is a success story built with the help of open source technology, and it’s good to see that they are giving back as well.
- Open Source