Advanced data science, machine learning and the power of knowledge graphs: what can we expect from this combination?

This article is contributed by Maya Natarajan, Sr. Director Product Marketing, Neo4j.

From bridging data silos and building data structures to accelerating the adoption of machine learning (ML) and artificial intelligence (AI), knowledge graphs are fundamental and allow companies to go beyond digital transformation. Defined by the Turing Institute, the UK’s national institute for data science and AI, as the best way “to encode knowledge for use at scale in open, scalable and decentralized systems”, knowledge graphs provide a perfect foundation for advanced data science initiatives. So why aren’t they better known and exploited?

It is a problem. Business leaders know the value of their data and are acutely aware that it contains the answers to their most pressing business questions. However, the information needed to improve the decision-making and business performance they need is not easy to come by. Hence the widespread interest in machine learning.

Knowledge graphs can help an organization try to move machine learning to useful, out-of-the-lab production status. Indeed, knowledge graphs are a special, non-disruptive layer of information on top of this complex data resource. They embed intelligence into data to dramatically improve its value, but without changing the existing data infrastructure. Let’s see how.

Stimulate action by providing assurance or insight

Knowledge graphs enhance existing technologies by providing better data management, better predictions, and better innovation, in part because they power AI and machine learning. In practice, use cases for knowledge graphs fall into two groups: action and decision. The purpose of the action graph is to drive action by providing assurance or insight. Data action charts automate processes for better results by providing data assurance, discovery, and understanding, and include examples such as data lineage, data provenance, data governance, compliance and risk management.

A great example of a data action graph is a knowledge graph that tracks objects in space, both functioning equipment and broken equipment. The ASTRIAGraph project monitors Earth’s orbit for space objects, including functional hardware and other “space junk”, striving to ensure safety, security and sustainability. Using a knowledge graph, the team can categorize many disparate data from the space domain to locate and track objects the size of a mobile phone to the largest satellite. The ASTRIAGraph predicts their trajectory, minimizes risk and provides complete visibility. With the aim of maximizing decision-making intelligence, ASTRIAGraph organizes information and creates models of the spatial domain and the environment.

Paving the way for AI

The real magic of knowledge graphs comes into play when you use them to support AI and machine learning, uncovering patterns and anomalies. A decision-making knowledge graph highlights trends in data to augment analytics, machine learning, and data science initiatives. With all of this, it’s no surprise that Gartner recently stated, “Up to 50% of Gartner’s AI-themed inquiries involve a discussion of the use of graph technology.”

We know this from talking to customers. Going from an action graph to sophisticated decision graphs powering AI and machine learning is a typical graph technology journey for many data science teams we work with, with knowledge graphs at the center .

Accompaniment of the ML course at each stage

From finding data and training machine learning models to analyzing predictions and applying the results, knowledge graphs improve every step of the machine learning process.

In the first stage of data discovery, knowledge graphs can be used for data lineage to track data that feeds machine learning. In the next phase of training a machine learning model, knowledge graphs allow the engineering of graph features using simple graph queries or more complex graph algorithms, such as centrality, community detection, etc. The results of such algorithms can be written back into the knowledge graph, enriching it further.

The next step in sophistication is the use of graph embeddings. Graph integrations provide a way to encode the nodes and relationships of a knowledge graph into a structure suitable for machine learning. In effect, integrations turn your knowledge graph into numbers and “learn” all of its features. Relationships are highly predictive of behavior, so using connected and contextualized features maximizes the predictive power of machine learning models.

Once a machine learning model has been developed, knowledge graphs can be used for investigations and counterfactual analyzes by data scientists to understand if a model is useful and to make accurate predictions.

Knowledge graphs in action

Let’s watch the decision in action. UBS, for example, has created a detailed data lineage and governance tool that provides great transparency into the data flows that feed into its risk reporting mechanisms to comply with financial compliance regulations.

Another example is NASA, which has decades of mission experience that hasn’t been cataloged well. NASA has created a knowledge graph-enriched application to sift through millions of documents, reports, project data, lessons learned, scientific research, medical analysis, geospatial data, and computer logs. As a result, an ancient Apollo-era breakthrough in the 1960s solved a problematic problem in its 21st-century Orion class of crewed spacecraft. It saved a million dollars of taxpayers’ money by eliminating the need for two years of work to reinvent the wheel.

And in life sciences, a major global pharmaceutical company is working with knowledge graphs to help clinicians know when to best intervene for complex diseases. His data science team used graph algorithms to find patients who had specific types and patterns of travel, and find others with similar experiences. This information is used to train its machine learning model, analyze predictions and bring results back to help clinicians make better decisions. And we’re talking scale – this company’s knowledge graph contains three years of patient visits, tests, and diagnoses across tens of billions of records.

By using the power of knowledge graphs, AI and machine learning models are better able to represent relationships. This means organizations using them can find more accurate interpretations of complex data, put context back into the data, and train the AI ​​to be a trusted partner.

This is a powerful trend that we are seeing more and more in data science. No wonder the c suite is waking up with this innovation.

Maya Natarajan is senior director of product marketing at the leader in native graph database Neo4j

Norma A. Roth