Elevate your enterprise data technology and strategy at Transform 2021.
How would you feel if you saw demand for your favorite topic — which also happens to be your line of business — grow 1,000% in just two years’ time? Vindicated, overjoyed, and a bit overstretched in trying to keep up with demand, probably.
Although Emil Eifrem never used those exact words when we discussed the past, present, and future of graphs, that’s a reasonable projection to make. Eifrem is chief executive officer and cofounder of Neo4j, a graph database company that claims to have popularized the term “graph database” and to be the leader in the graph database category.
Eifrem and Neo4j’s story and insights are interesting because through them we can trace what is shaping up to be a foundational technology stack for the 2020s and beyond: graphs.
Graph analytics and graph databases
Eifrem cofounded Neo4j in 2007 after he stumbled upon the applicability of graphs in applications with highly interconnected data. His initiation came by working as a software architect on an enterprise content management solution. Trying to model and apply connections between items, actors, and groups using a relational database ended up taking half of the team’s time. That was when Eifrem realized that they were trying to fit a square peg in a round hole. He thought there’s got to be a better way, and set out to make it happen.
When we spoke for the first time in 2017, Eifrem had been singing the “graphs are foundational, graphs are everywhere” tune for a while. He still is, but things are different today.
What was then an early adopter game has snowballed to the mainstream today, and it’s still growing. “Graph Relates Everything” is how Gartner put it when including graphs in its top 10 data and analytics technology trends for 2021. At Gartner’s recent Data & Analytics Summit 2021, graph also was front and center.
Interest is expanding as graph data takes on a role in master data management, tracking laundered money, connecting Facebook friends, and powering the search page ranker in a dominant search engine. Panama Papers researchers, NASA engineers, and Fortune 500 leaders: They all use graphs.
According to Eifrem, Gartner analysts are seeing explosive growth in demand for graph. Back in 2018, about 5% of Gartner’s inquiries on AI and machine learning were about graphs. In 2019, that jumped to 20%. From 2020 until today, 50% of inquiries are about graphs.
AI and machine learning are in extremely high demand, and graph is among the hottest topics in this domain. But the concept dates back to the 18th century, when Leonhard Euler laid the foundation of graph theory.
Euler was a Swiss scientist and engineer whose solution to the Seven Bridges of Königsberg problem essentially invented graph theory. What Euler did was to model the bridges and the paths connecting them as nodes and edges in a graph.
That formed the basis for many graph algorithms that can tackle real-world problems. Google’s PageRank is probably the best-known graph algorithm, helping score web page authority. Other graph algorithms are applied to use cases including recommendations, fraud detection, network analysis, and natural language processing, constituting the domain of graph analytics.
Graph databases also serve a variety of use cases, both operational and analytical. A key advantage they have over other databases is their ability to model intuitively and execute quickly data models and queries for highly interconnected domains. That’s pretty important in an increasingly interconnected world, Eifrem argues:
When we first went to market, supply chain was not a use case for us. The average manufacturing company would have a supply chain two to three levels deep. You can store that in a relational database; it’s doable with a few hops [or degrees of separation]. Fast-forward to today, and any company that ships stuff taps into this global fine-grained mesh, spanning continent to continent.
All of a sudden, a ship blocks the Suez Canal, and then you have to figure out how that affects your business. The only way you can do that is by digitizing it, and then you can reason about it and do cascading effects. In 2021, you’re no longer talking about two to three hops. You’re talking about supply chains that are 20, 30 levels deep. That requires using a graph database — it’s an example of this wind behind our back.
Knowledge graphs, graph data science, and machine learning
The graph database category is actually a fragmented one. Although they did not always go by that name, graph databases have existed for a long time. An early branch of graph databases are RDF databases, based on Semantic Web technology and dating back about 20 years.
Crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata. This is why Google adopted the technology in 2010, by acquiring MetaWeb.
What we get by connecting data, and adding semantics to information, is an interconnected network that is more than the sum of its parts. This graph-shaped amalgamation of data points, relationships, metadata, and meaning is what we call a knowledge graph. Google introduced the term in 2012, and it’s now used far and wide.
Knowledge graph use cases are booming. Reaching peak attention in Gartner’s hype cycle for AI in 2020, applications are trickling down from the Googles and Facebooks of the world to mid-market companies and beyond. Typical use cases include data integration and virtualization, data mesh, catalogs, metadata, and knowledge management, as well as discovery and exploration.
But there’s another use of graphs that is blossoming: graph data science and machine learning. We have connected data, and we want to store it in a graph, so graph data science and graph analytics is the natural next step, said Alicia Frame, Neo4j graph data science director.
“Once you’ve got your data in the database, you can start looking for what you know is there, so that’s your knowledge graph use case,” Frame said. “I can start writing queries to find what I know is in there, to find the patterns that I’m looking for. That’s where data scientists get started — I’ve got connected data, I want to store it in the right shape.
“But then the natural progression from there is I can’t possibly write every query under the sun. I don’t know what I don’t know. I don’t necessarily know what I’m looking for, and I can’t manually sift through billions of nodes. So, you want to start applying machine learning to find patterns, anomalies, and trends.”
As Frame pointed out, graph machine learning is a booming subdomain of AI, with cutting edge research and applications. Graph neural networks operate on graph structures, as opposed to other types of neural networks that operate on vectors. What this means in practice is that they can leverage additional information.
Neo4j was among the first graph databases to expand its offering to data scientists, and Eifrem went as far as to predict that by 2030, every machine learning model will use relationships as a signal. Google started doing this a few years ago, and it’s proven that relationships are strong predictors of behavior.
What will naturally happen, Eifrem went on to add, is that machine learning models that use relationships via graphs will outcompete those that don’t. And organizations that use better models will outcompete everyone else — a case of Adam Smith’s “invisible hand.”
The four pillars of graph adoption
This confluence of graph analytics, graph databases, graph data science, machine learning, and knowledge graphs is what makes graph a foundational technology. It’s what’s driving use cases and adoption across the board, as well as the evolution from databases to platforms that Neo4j also exemplifies. Taking a decade-long view, Eifrem noted, there are four pillars on which this transition is based.
The first pillar is the move to the cloud. Though it’s probably never going to be a cloud-only world, we are quickly going from on-premises first to cloud-first to database-as-a-service (DBaaS). Neo4j was among the first graph databases to feature a DBaaS offering, being in the cohort of open source vendors Google partnered with in 2019. It’s going well, and AWS and Azure are next in line, Eifrem said. Other vendors are pursuing similar strategies.
The second pillar is the emphasis on developers. This is another well established trend in the industry, and it goes hand-in-hand with open source and cloud. It all comes down to removing friction in trying out and adopting software. Having a version of the software that is free to use means adoption can happen in a bottom-up way, with open source having the added benefit of community. DBaaS means going from test cases to production can happen organically.
The third pillar is graph data science. As Frame noted, graph really fills the fundamental requirement of representing data in a faithful way. The real world isn’t rows and columns — it’s connected concepts, and it’s really complex. There’s this extended network topology that data scientists want to reason about, and graph can capture this complexity. So it’s all about removing friction, and the rest will follow.
The fourth pillar is the evolution of the graph model itself. The commercial depth of adoption today, although rapidly growing, is not on par with the benefits that graph can bring in terms of performance and scalability, as well as intuitiveness, flexibility, and agility, Eifrem said. User experience for developers and data scientists alike needs to improve even further, and then graph can be the No. 1 choice for new applications going forward.
There are actually many steps being taken in that direction. Some of them may come in the form of acronyms such as GraphQL and GQL. They may seem cryptic, but they’re actually a big deal. GraphQL is a way for front-end and back-end developer teams to meet in the middle, unifying access to databases. GQL is a cross-industry effort to standardize graph query languages, the first one that the ISO adopted in the 30-plus years since SQL was formally standardized.
But there’s more — the graph effect actually goes beyond software. In another booming category, AI chips, graph plays an increasingly important role. This is a topic in and of its own, but it’s worth noting how, from ambitious upstarts like Blaize, GraphCore and NeuReality to incumbents like Intel, there is emphasis on leveraging graph structure and properties in hardware, too.
For Eifrem, this is a fascinating line of innovation, but like SSDs before it, one that Neo4j will not rush to support until it sees mainstream adoption in datacenters. This may happen sooner rather than later, but Eifrem sees the end game as a generational change in databases.
After a long period of stagnation in terms of database innovation, NoSQL opened the gates around a decade ago. Today we have NewSQL and time-series databases. What’s going to happen over the next three to five years, Eifrem predicts, is that a few generational database companies are going to be crowned. There may be two, or five, or seven more per category, but not 20, so we’re due for consolidation.
Whether you subscribe to that view, or which vendors to place your bets on, is open for discussion. What seems like a safe bet, however, is the emergence of graph as a foundational technology stack for the 2020s and beyond.
VentureBeat
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more