All Things Graph: Computing vs. Analytics vs. Transactions
Editor's Note: This article is paid for by IBM as a Diamond-level sponsor of Apache Big Data, held May 9-12 in Vancouver, B.C., and was written by Linux.com.
Graphs model relationships between objects and they’ve been in use since about 100 years ago in mathematics and mechanical computing. Graph representations were used much earlier than that as maps of physical networks such as nomadic trade routes. That’s because a graph is the best way to visualize information for fast and reliable human consumption. But it was the advent of the Internet and the Web that spurred sophisticated graph use in representing increasingly more complex data and networks. Now in this big data age, graph theory – in computing, analytics and transactions – continues to be highly popular and there’s good reason for that.
We talked with Alaa Mahmoud, master inventor for IBM Cloud Data Services, and IBM’s lead for IBM Graph, to explore how such an old concept is so uniquely fitted to modern data storage, exploration and mining. And, to discover a few tips in using all things graph, too.
Linux.com: As a quick insight into how graph theory is used today, can you give us a modern example?
Alaa Mahmoud: Graph databases are a great modern example. These are increasingly popular NoSQL databases that store data as vertices or nodes connected by edges, rather than storing the data in tabular form, meaning in rows and columns.
A property graph adds even more information than a basic graph database, because it enables you to add and store properties, i.e. any number of key/value pairs associated with the data, on both vertices and edges. This lets you easily see more details in data relationships.
Both intrinsic properties – those that are computed relative to the graph structure -- and extrinsic properties -- those that are not related to the graph structure -- can be added to the graph for increasingly complex analysis. To clarify the distinction, extrinsic properties are assigned values added to the edge that are not contained in the structure.
For example, on a graph of national sales data in a given time period, properties including regional and local sales figures, top selling products, and top buyers of the top products - plus any other related information - can be added to the edges of the graph, which both renders more insights and provides the ability to do more complex querying of the data.
Linux.com: The graph concept has been around a long time and used in many ways. In the last few years, its seen a resurgence in popularity. Why do you think that is?
Mahmoud: Today, everyone is talking about how to make sense of all the data we have. Graphs are uniquely suited to making sense of all the relationships in data.
This is because data relationships are always modeled as a graph, and so storing data as a graph eliminates the middle layer of work in transforming the data from the model to storage. Reducing the complexity and removing steps adds efficiency, and that’s appealing.
Using a graph query language - usually Gremlin developed by Apache TinkerPop - makes querying the data more natural, and allows us to create more complex queries in a shorter amount of time, versus using tabular or other formats or databases.
This means that developers can create queries in a language, such as Java, that is the same or similar to the code they write, which truly makes this work faster and easier.
Using graphs also reduces complexity because it’s a much easier, more natural way of thinking about data and the relationships within it. People don’t generally think of data relationships in a spreadsheet form. So why do the additional work of shoehorning data into tabs and rows if it only slows the work and complicates or restricts the analysis?
For all those reasons, using graph databases is very popular now and growing more so.
Linux.com: What about using a graph database in the cloud? Are the advantages similar to using the cloud overall, or are there extra advantages there for developers?
Mahmoud: Using a graph database in the cloud reduces the cost and complexity barriers for developers. And here’s how…
In an on-premises configuration, a graph database is built on top of other technologies, and that means a lot of components to work with. And in the end, it may or may not scale like you need it too. Further, working with so many pieces is cost-prohibitive for many developers.
But with a cloud-based graph database, developers get an API and credentials and they’re ready to go. Using a RESTful API, developers can use it on any computing platform and in any application that can make an HTTP request.
With a cloud solution, like IBM Graph, your users get high availability, and also a database that automatically scales as the data grows. Developers are getting security features that scale as the data and infrastructure expands. There’s also security for the data at rest and in motion, and the latest patches are automatically applied.
But perhaps the greatest advantage is that the user has a consistent experience even though the upgrading of the service under the hood happens frequently. Having a consistent experience means developers’ work isn’t interrupted or delayed by the need to continuously modify their code to match the technology they are running on top.
Linux.com: Certainly this big data-fueled age of large databases has added momentum to all things graph. Are there any areas or use cases where you think graph structures really outperform?
Mahmoud: The most dominant use in the list of typical use cases is with social networks, in representing people and their relationships as graphs. Not only is this a more natural way of thinking of these relationships, but it allows a way to add information on the edges that can help better define those relationships and bring additional understanding.
Graphs are also popularly used in recommendation engines, such as in retail transactions, for example. It’s one thing to discover that a person buys a certain item, and quite another to also see at what price range they’re likely to buy it, and other related details surrounding the purchase, in order to more accurately predict their future purchases.
Graphs are also very popular in security analysis and fraud detection, as well as in data governance and compliance.
Linux.com: Circular dependencies can be a problem – for example, with the use of some modules in open source software engineering. Are circular dependencies a problem in graph analytics or transactional graph structures?
Mahmoud: Algorithms can take care of circular dependencies in databases. The problems that do exist are not specific to graph databases, but instead are typical of NoSQL databases generally. A lot depends on implementation.
Linux.com: What are the biggest advantages of using the graph database approach?
Mahmoud: One of its primary strengths is increased productivity. For example, as mentioned before, in getting data from the model to the database, there’s no middle step. And, you can query the data in the same way you think about the data. This increases productivity by reducing complexity.
Graph databases reduce complexity in the data itself by enabling billions of edges between nodes and traversing it. This too increases productivity because you can see data relationships more readily and query it faster and with increasingly more complex queries.
Linux.com: Graph computing is a way of thinking as much as it is a set of tools. How does thinking in terms of graphs, processes and traversals make you stronger or better at finding the right outputs for improved decision-making?
Mahmoud: People in general are trained or inclined to think in terms of entities and relationships. Anything you can think of can be represented in a graph. And, we’re all used to seeing relational information presented in graphs.
So, it’s not so much a new way of thinking as it is supportive of the way we already think.
Linux.com: Got any tips to share in changing the thinking or extruding more from your data using graph anything?
Mahmoud: The turning point for us, and for a lot of people, was and is cloud-based graph databases, and graph everything really. Graph databases are powerful and love complex interconnected data.
To get started, just jump in with a cloud-based solution like IBM Graph. Hands-on is the best way to get familiar with it.
Linux.com: Anything else you’d like to add to this conversation that we haven’t mentioned yet?
Mahmoud: Using graph anything is more fun than you think, and a very natural way of doing what you’re trying to do in the first place – finding and understanding relationships between entities. It’s a natural way of modeling, a natural way of thinking, and a natural way discovering your data. There’s a reason graph databases are so popular and getting more popular every day. Give it a try and you’ll see exactly why that is.
Try out IBM Graph, IBM's enterprise-grade property graph as a service, for free.