- By Edward Kamau -
I was recently hunting around the net for some RPMs and the thought occurred to me that there must be an easier way to find files and programs that you need. Besides, a number of mirrors that I've tried to use can be very uncooperative, with my connection being occasionally dropped and with significant difficulty connecting. So I thought, why not have a network of super mirrors, mirrors that mirror all other mirrors. In short a Comprehensive Linux Archive Network (CLAN)?Something akin to the Comprehensive Perl Archive Network, but for all Linux. Before you dismiss this as purely wishful thinking, let me tell you how I think it can be done.
The main difference between CLAN and other mirror systems such as CPAN or the Comprehensive R Archive Network (CRAN), would be sheer scale in storage and bandwidth requirements. This would have to be huge.
The Comprehensive Linux Archive Network
The idea behind CLAN is to have this global network of mirrors that contain all the free software thats fit to run on Linux, in all its forms and versions. RPMS,Debs, tar balls, binary, source everything. All the mirrors would be peers and would contain identical sets of software. A network connected Linux user looking for a particular file would be able type at the command line:
[linuxuser]$ clan -get somefile
and right away the file would be downloaded to their computer from the CLAN system. There would also be provision for use of wild cards and searches to find files. I'm sure lots of developers could program such a client side interface without breaking a sweat. Or perhaps adapt programs such as Debian's apt-get for the purpose. In addition to programs CLAN would also contain, HOWTOs, tutorials, magazine articles and perhaps even newsgroup postings that are pertinent to the programs in the archive.
The system itself should be only a mirror and should not, for security reasons, permit uploading into the nodes directly by application developers or authors. Everything contained in the archive should be mirrored off another website. This would mean that if Jane Developer wanted her new program included in the archive, she would create a website for the software and register it with CLAN. CLAN would then mirror it and it would be available in the archive. Or she could put it on Sourceforge, for example, which would itself be mirrored in the archive.
Beyond Plain Vanilla Mirroring
Developers with software in CLAN would fill out a simple web form with a short description of their program and a list of resources, mailing lists etc that are helpful for that program. This information would then be provided to anybody who downloaded the software. So Joe Linuxuser looking for somefile would be able to down load the file plus also a web page with pointers to helpful information about the program, a FAQ, a tutorial etc. They would then install the program and browse their own private and local, CLAN created, website with helpful materials about the program.
CLAN could even be used as a feedback channel for the program developer, with comments and perhaps even contributions flowing from users back to developers. So every time you download a file you would be prompted for your feedback which CLAN would then pass on to the registered developer. Thus, without even knowing who the developer of a program is you could pass on comments/insights/suggestions, even monetary pledges, to them, through CLAN.
Security would be of paramount importance. In order to maintain the integrity of the archive itself, access to the archive would be limited to the client clan program on the users computer. No ftp or web access to the programs in the archive. A secure download method would then be employed between the client and the server. The integrity of programs in the archive would also have to be assured. The main danger here would be getting trojaned versions of programs into the archive. Perhaps the best protection in this regard would be to have staggered updating of all the nodes. This means that the nodes would not always have the exact same set of programs. However, the advantage would be that, since all nodes would collect and verify MD5 information, they could then share this information amongst themselves to double check that each particular file is the same on all nodes. That way if a trojaned version of a particular file suddenly appeared and entered an archive node, it would be compared with genuine versions of the same program in other nodes and the trojan would likely be detected. This would not be an absolute guarantee,but would probably be quite robust in practice.
How to pay for all this
Needless to say all this would be expensive, from both a hardware and a bandwidth perspective. It would require terabytes of storage and much bandwidth. On the plus side, both bandwidth and terabyte storage are becoming less expensive. Perhaps each node of the network might comprise a small cluster of PC class machines with a bunch those new 200GB hard disks. Certainly its not outrageously expensive to build such a cluster these days. But it will still cost serious money, so how to do it? The most feasible source of revenue would be advertising. Users would be exposed to textual advertising when using CLAN at the command line and to graphical ads when using it in the GUI. CLAN would provide an opportunity for highly targeted advertising, if the ads were tied to the specific programs and applications being downloaded. For example a user downloading MySQL might see ads offering books on MySQL or commercial support contracts, magazine subscriptions, commercial add on programs etc. In fact individual free software consultants could advertise on CLAN so that when somebody downloads a particular program from a mirror in their region, that person also gets the consultant's name and number.
In addition to direct download advertising, CLAN would also maintain mailing lists that alert users when programs of interest are updated or become available. Such lists would also be a further means of advertising and raising revenues. Finally a CLAN website would also be maintained with sales of CLAN merchandise, advertising and pleas for donations.
Some formula would then be devised to share the available revenues amongst the network nodes. Perhaps this could be on the basis bandwidth used, so that, a mirror that had 10% of the total system wide bandwidth would receive 10% of net revenues.
Who would go for this
This still leaves the issue of who would put up the upfront money to establish a node. I believe large corporations such as HP and IBM may see this as a means of furthering the Linux cause and expanding the market and so may be willing to do this. Linux vendors such as Redhat and Suse may also be willing to do this. Universities and other companies that want to raise their profile in the community may be willing to participate as well.
Various inducements could also be offered, such as the ability to put a plug or ad for the company running a particular mirror at the outset of every download from that mirror. The main impediment, of course, will be the upfront cost, which may well be in the low tens of thousands for each node in the network. However, once the network is up and running, revenues from its operation would offset at least some of the day to day costs and a portion of those revenues could even be used to defray part of the startup costs for new nodes. Furthermore, since the entire system would be organized as a non profit, setting up and operating a node would be tax deductible in many countries.
I think the Linux world needs a simple and straight forward way of finding files on the internet. While many mirrors do exist, one often has to hunt around to find applications, particularly the more obscure ones. A system such as that described here would make life really easy when those missing dependencies errors show up. Besides it would take some pressure off the existing mirrors, especially when new distribution versions are released. What do you think? Say yes to CLAN!