MyHeritage.com marries open, closed, and “shared” source

141

Author: Tina Gasperson

Founder and CEO Gilad Japhet calls MyHeritage.com “a genealogy-based social network.” MyHeritage.com is written in PHP, but Japhet and his staff are testing the code on Windows ASP servers, using Phalanger, a “shared source” compiler that lets PHP run on .Net.

Japhet founded MyHeritage.com in 2003. “We were seeing a lot of success with social networks, but no one was targeting the most natural social unit — the family.” At MyHeritage.com, users can create community sites for their families and friends. Japhet says it is like MySpace.com, but “it doesn’t only belong to one person, it belongs to the entire family. They can get in touch, and share news, files, and historical photos, and post entries to the family tree.”

MyHeritage.com also boasts a powerful search engine specifically created for genealogy. “Traditional search engines crawl the Web and index regular pages and then let you search them,” Japhet says. “That’s not good for genealogy. Most data is inside databases. Google doesn’t search the Ellis Island database.” Japhet’s engine accesses public and private databases housing immigrant and census information and returns results that include alternate spellings.

The engine uses a signed Java client-side application, which “creates a very scalable architecture. If there are 10,000 users searching concurrently, we leverage the CPU power of all those computers.”

On the server side, MyHeritage runs on Apache, MySQL, and PHP. “When we set out to write the social network, we had to make a strategic choice of PHP or .Net,” Japhet says. “We chose PHP because of its amazing community power. We knew if we ever got stuck or needed help, we could leverage the community.

“In January 2004, when we started working on this, PHP was hardly object-oriented. We took a gamble on PHP5 in early beta. We wrote everything way before PHP5 was released, but by the time it was, our product was ready.”

As Japhet and his team benchmarked the code on different Web servers and platforms, they found a project called Phalanger. “One of the problems with PHP is that it is an interpretive language. You lack a compiler that can trap a lot of errors. We discovered this [shared source] product written by an excellent team of talented programmers in Czechoslovakia. We contacted them and they helped us compile our entire application, which was several hundreds of thousands of lines. We found that by using Phalanger and running our applications on Windows .Net, which was not something we originally intended to do, we got excellent performance.”

For Japhet, open source software has been a way to bootstrap the development and launch of his business. “The biggest benefit in choosing PHP was that we could use a large amount of third-party source and components and community-generated code. It saves you money, definitely. What is great about MySQL and PHP in particular is that from the moment we ask a tech question on the forum, we get answers in minutes. On the minus side, there’s no open source code accelerator. The only solution right now is to purchase an expensive product from Zend, which we don’t want to do.”

One of the most attractive features of MyHeritage.com is the face recognition software it provides for members. “We had the vision of harnessing face recognition technology because when you are dealing with historical photos, many people have a real problem identifying them.” Since face recognition is a very CPU-intensive operation, Japhet is running the same kind of benchmark testing he performed with the PHP code on the application servers. So far, he says, the Windows servers are outperforming Linux. “It is able, on the same hardware, to do better memory management and thread and process management. It is capable of providing a higher number of face recognitions per hour.”

Japhet isn’t discarding Linux yet, though. “We are doing a lot of tweaking to make sure we are leveraging each operating system to the max. Nobody has built a face recognition system that can handle a million photos a day. It isn’t clear exactly how to structure the system. We are optimizing and learning as we go.”