Amid Shortages in Apache Spark Skillsets, Training Options Proliferate


The open source Big Data scene is red hot, but organizations are now dealing with shortages in people with relevant deployment and management expertise. There are simply not enough skilled workers to go around, especially when it comes to one of the hottest technologies of all: Apache Spark.

According to Dice, the most in-demand technology skills are in Big Data, with Spark at the top of the list. Although the need for these skills has increased in the past few years, employers are still challenged to find qualified candidates. The Taneja Group recently reported similar findings in a global survey sponsored by Cloudera of nearly 7,000 technical and managerial-level professionals working in Big Data. The survey found that nearly half of the respondents see the Big Data skills gap as the most significant challenge to deploying Spark, and one-third named complexity in learning Spark as a barrier to adoption.

According to the Taneja Group report: “Barriers to adoption [of Spark] and challenges remain, and are largely attributed to the Big Data skills gap and the ability to consume relevant training in a variety of formats (online, in-person, conference or tradeshow).”

However, the good news is that Spark training options are spreading out, and some of the best options are free or available at low cost. MapR, which focuses on Hadoop as well as Spark, offers numerous Spark training options, and Cloudera also has an expanded Spark training curriculum. For more information about Cloudera’s courses on Spark and to register for a class, you can visit Meanwhile, you can get a preview of MapR’s Spark Essentials course here.

How is a typical course structured? In MapR’s Spark Essentials course, in the first part of the course, students use Spark’s interactive shell to load and inspect data. The course describes the various modes for launching a Spark application, and students go on to build and launch a standalone Spark application. MapR notes that the concepts are taught using scenarios that form the basis of hands-on labs.

Cloudera University offers both instructor-led courses and on-demand training options. The courses are focused not just on Spark but on other tools in the Spark ecosystem, including Apache Impala, Apache Kudu, Apache Kafka, and Apache Hive. There is high demand for people with skills spanning across these data-centric, Apache-stewarded projects.

“Cloudera University has established itself as a valuable resource for preparing data professionals across every industry. We’ve seen throughout the years that organizations which invest in training up front drive deeper results from their big data initiatives and move more quickly from proof of concept into full production environments,” said Mark Morrissey, senior director, Education Programs at Cloudera. “The skills gap continues to be the biggest hurdle in our industry.”

There are other Spark training options that come along with technology bundles based on Spark. For example, Databricks, which is the company founded by the same team that created Apache Spark, has announced its Databricks Community Edition (DCE), a free version of a just-in-time data platform built on top of Spark. It comes with access to free, online courses that can arm you with top-notch Spark skills. With the Databricks Community Edition, users have access to 6GB clusters as well as a cluster manager and a notebook environment to prototype simple applications.

Databricks also offers a diversified set of Spark training options, including an option where an organization can have Databricks’ trainers teach workers in their own workplace environments. Databricks’ classes are structured to minimize time requirements, too. For example, it offers an Apache Spark Programming course that can be completed in three days.

Demand for people with Spark skills will only increase, and that will be partially driven by the huge investments that powerful companies are making. Leaders at IBM have called Spark “the most important new open source project in a decade” and is investing hundreds of millions of dollars in Spark-related initiatives.  The bottom line is that a little Spark education can go a long way.

Learn more about Spark at Apache: Big Data, which gathers developers, operators, and users working in Big Data for education, collaboration, and more. Check out the conference schedule and register now!