Amazon Neptune Updates: The Future of Machine Learning, Data Science, and Graph Databases

Data models and query languages ​​are certainly a rather dry topic for those who don’t belong to the inner circle of enthusiasts. Graph data models and query languages ​​are no exception to that rule, but one of the main reasons I tried to track the development of that area.

Graphs are the fastest growing area of ​​databases, the largest segment of enterprise software.Good example: a recent series of funding rounds culminates Neo4j’s $ 325 Million Series F Financing Round, Its valuation exceeded $ 2 billion.

Neo4j is one of the oldest graph database vendors and is now also the most funded vendor. But that doesn’t mean it’s the only thing worth noting. Amazon entered the graph database market with Neptune in 2018Since then, many advances have been made.

Today, Amazon announces support for openCypher, Neo4j’s Cypher-based open source query language. Take the opportunity to clarify what this means and how it relates to the future of graph databases and revisit the interesting developments in Neptune’s support for machine learning and data science.

Build a bridge with openCypher

Developers can now use the popular graph query language openCypher as follows: Amazon Neptune, Offers more choices for building or migrating graph applications. Neptune now supports the top three most popular graph query languages, Cypher, Gremlin, and SPARQL.

In addition, Neptune adds support for Bolt, Neo4j’s binary protocol. This suggests a feature that allows customers to leverage existing tools they are accustomed to, and more specifically Neo4 tools. But there are other reasons why this is important.

There are two main data models used to model graphs. RDF and Labeled Property Graph (LPG).. Neptune supports both, with SPARQL acting as the RDF query language and Gremlin acting as the LPG query language. Gremlins are doing a lot for itBecause it has almost ubiquitous support and provides a lot of control over graph scanning. But that can also be a problem.

Gremlin, Apache Tinkerpop Project is an imperative query language. This is in contrast to declarative Query languages ​​such as SQL, Cypher and SPARQL, Gremlin queries need to not only represent what to get, but also specify how to do it. In that respect, Gremlin is similar to a programming language.

Amazon Neptune architecture. Neptune’s capabilities have been enhanced with openCypher support to increase weapon flexibility.


Not all users can comfortably use Gremlin in all scenarios. But if you want to use the LPG model, that’s it. Amazon seems to acknowledge this, even though it employs a major contributor to Apache Tinkerpop. Adding support for openCypher makes working with Neptune’s LPG engine more familiar.

Neptune’s LPG and RDF support is possible to host two different engines internally, one for each data model. Adding support for openCypher doesn’t change that-at least not yet. But RDF * Might be so.. RDF *, also known as the RDF Star, is an update to the RDF standard that also allows modeling of LPG graphs.

In that area, there is ongoing work in both RDF and LPG working groups. In addition to Amazon and Neptune, other RDF vendors are also adding experimental support for openCypher. The big picture here is the work in progress approved by ISO. GQL..

GQL is the new standard for graph query languages, aimed at unifying today’s fragmented landscapes. GQL is expected to be done for graph databases in the same way that SQL is for relational databases. Amazon is actively working on both RDF * and GQL efforts.

Ultimately, this will allow Neptune to integrate two different engines now. But Amazon isn’t the only story here. The promise is that Amazon can do it internally and that all graph database users can do it system-wide. Use a single data model and a single query language.

Data Science and Machine Learning Features: Notebooks and Graph Neural Networks

There are still some ways in GQL. Standardization efforts are always complex, And adoption is not fully guaranteed. However, Neptune also demonstrates another important development of graph databases: the integration of data science and machine learning capabilities.

Developing graph applications and navigating graph results is greatly facilitated in the following ways: IDE A visual exploration tool tailored for this purpose. Many graph database vendors have built-in tools into their products for these purposes, but until recently Neptune relied solely on third-party integration.

The way Neptune’s team chose to address this gap was to develop AWS Graph Notebook.. Notebooks are very popular with data scientists and machine learning practitioners, who can combine code, data, visualizations, and documents to collaborate.

AWS Graph Notebook is the following open source Python package. Jupyter notebook Supports graph visualization. It supports both Gremlin and SPARQL, and will eventually support openCypher as well. Amazon was initially adopted by the data science and machine learning crowds, but seems to believe that notebooks are also popular among developers.


Neptune ML is the codename given by Amazon to integrate the Neptune graph database with the graph machine learning capabilities of SageMaker and DGL.


You have to wait to see if the bet will pay off. What is certain, however, is that providing notebook support enhances Neptune’s appeal to data science and machine learning use cases.But that’s not all Neptune has to offer there-enter Neptune ML..

Amazon is touting Neptune ML as a way to make easy, fast, and accurate predictions on graphs using Graph Neural Networks (GNN). Neptune ML Amazon SageMaker And open source Deep graph library (DGL), Amazon is contributing.

GNN is a relatively new branch of deep learning with the interesting feature that modeling data as a graph leverages additional contextual information that can be modeled to train deep learning algorithms. GNN is considered the cutting edge of machine learning and is more accurate in forecasting than traditional neural networks.

GNN and graph database integration It’s a natural match. GNN can be used for node-level and edge-level prediction. That is, you can infer additional data and connections in the graph. You can use them to train your model and infer use case properties such as fraud prediction, ad targeting, customer 360, recommendations, identity resolution, and knowledge graph completion.

Again, Neptune isn’t the only one to incorporate notebooks and machine learning into its offerings. These features not only support data science and machine learning crowds, but also upgrade the developer and end-user experience. Better tools, better data, better analytics-all of which result in better end-user applications. That’s what every vendor is aiming for.

Amazon Neptune Updates: The Future of Machine Learning, Data Science, and Graph Databases

Source link Amazon Neptune Updates: The Future of Machine Learning, Data Science, and Graph Databases

Related Articles

Back to top button