There was a time in computing when performance increases could be had by designing a more complex processor or turning up the clock speed. Those days are largely behind us and the most common solution at present is to add more cores to a symmetric multiprocessing (SMP) system, but this has practical scaling limits and there are downsides to a one-size-fits-all processor architecture.
The world’s most powerful computers make use of parallel computing, using configurations such as Linux-based Beowulf clusters to distribute workloads across many thousands of processor cores. Closer to the other end of the scale, with desktop and mobile devices, compute intensive graphics processing is handled by high performance GPUs that are finely tuned for the task at hand.
Parallel computing and heterogeneous systems — with mixed types of computational units — are able to break through SMP scaling limits and achieve increased performance with reduced power consumption. However, they also come with their own challenges, and making parallel computing easy to use has been described as “a problem as hard as any that computer science has faced.”
With these challenges in mind the Parallella project has set out to help close the knowledge gap by developing an affordable, high performance and truly open parallel computing platform.
The Parallella Computer
In October 2012 a Kickstarter campaign was successful in raising $898,921 to develop and produce an initial run of the Parallella computer, a system equipped with a dual-core ARM A9 processor and either a 16 or 64-core Adapteva Epiphany floating-point accelerator. The project had just short of 5,000 backers and pledges of $99 or more being rewarded with at least one board with a 16-core device.
The Parallella computer was inspired in no small part by Raspberry Pi and will be credit card-sized with 1GB RAM available to the host and using MicroSD storage, providing Gigabit Ethernet, USB 2.0 and HDMI ports, and plenty of general purpose I/O (GPIO) for expansion via daughter cards.
The Epiphany chip provides RISC floating-point cores each with 32KB of local memory,that are connected together by an on-chip mesh network, which allows one core to transparently access the memory of every other core. In contrast to GPUs, Epiphany is MIMD – meaning that cores are able to operate independently and the architecture is easier to program for a wider range of applications.
In addition to a dual-core ARM processor, the Xilinx Zynq system-on-chip that is being used provides programmable logic, which is where the interface to the Epiphany chip will be implemented. The Parallella computer is an open source hardware design and the schematics and PCB layout will be published along with the HDL source code for the Epiphany interface.
The hardware will ship with Ubuntu pre-loaded. Driver sources will also be provided and there has already been interest expressed in developing support for other distributions.
The Eclipse multicore IDE
Development is supported by the Epiphany SDK, which is based on GCC 4.7, GDB, the Eclipse IDE and newlib C library. This was developed by our Adapteva partner, Embecosm, who also managed the introduction of the Epiphany architecture into the GCC mainline.
Brown Deer Technology have developed a fully open source OpenCL implementation and this can be used to simplify the creation of applications which use both ARM and Epiphany cores.
The project is now looking to members of the community to lead on developing support for additional languages and frameworks with leads recently announced for Erlang and Python.
The uses to which backers have said that they will put their Parallella computers to use include sound processing, video encoding, 3D scanning, computer visioning, neural networks, physical simulation and, importantly, learning parallel programming!
Software-defined radio is an application that frequently comes up and Parallella is particularly well suited to this since the programmable logic it provides is situated between the ARM host, Epiphany accelerator and GPIO, allowing for digital radio hardware to be more easily integrated.
The 16-core Epiphany chip delivers 26 GFLOPS of performance and with the entire Parallella computer consuming only 5 watts, making it possible to prototype compute-intensive applications with mobile device power budgets or equally to construct energy-efficient HPC clusters.
Testing a Beowulf cluster assembled from Parallella prototypes
The first prototypes went out to backers at the end of December 2012 with additional boards going out in January. These are based on an off-the-shelf Zedboard development system plus an Epiphany daughter card. Thes are virtually identical to the final design.
The Adepteva team is now working to meet a challenging timescale as a beta version of the credit card-seized board is due in February with release 1.0 due to go out to thousands of backers in May.
The tool chain sources are on GitHub and SDK packages have been provided to specific backers, and these will be made publicly available when the final hardware ships.
Python lead, Mark Dewing, has had some initial success with compiling the Python-on-a-Chip interpreter for Epiphany, testing this via the functional simulator provided by the SDK. Meanwhile Erlang Solutions has been working out how to approach Erlang support and will be sharing their initial thoughts on this in the coming weeks.
The focus now is on completing the design and getting hardware out to backers, as well as establishing relationships between the Parallella community and those developing the languages, frameworks and applications that are vital to achieving the goal of democratizing access to parallel computing.
Andrew is an open technology consultant and writer; community lead for the Parallella project; and Open Source Hardware User Group (OSHUG) organizer.