Graph databases, such as Neo4j, Apache Spark GraphX, DataStax Enterprise Graph, IBM Graph, JanusGraph, TigerGraph, AnzoGraph, the graph portion of Azure Cosmos DB, and the subject of this review, Amazon Neptune, are good for several kinds of applications involving highly connected data sets, such as providing recommendations based on social graphs, performing fraud detection, providing real-time product recommendations, and detecting incursions in network and IT operations. These are areas where traditional, relational databases tend to become inefficient and slow because of the need for complex SQL joins operating on large data sets.
Neptune is a fully managed graph database service with ACID properties and immediate consistency, which has at its core a purpose-built, high-performance graph database engine that is optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports two of the most popular open source graph query languages, Apache TinkerPop Gremlin and W3C SPARQL. The popular Cypher Query Language (CQL) used in Neo4j started off proprietary, but later became open source.
Gremlin and SPARQL address different kinds of graph database. Gremlin, like CQL, is for property graph databases; SPARQL is for Resource Description Framework (RDF) triples, designed for the web. Gremlin is a graph traversal language; SPARQL is a query language with SELECT and WHERE clauses.
The Amazon Neptune implementation allows both the Gremlin and SPARQL languages in a single database instance, but they can’t see each other’s data. The reason for allowing both is to let new users figure out which works better for their needs.
The Neptune documentation has samples using the Gremlin-Groovy, Gremlin-Java, and Gremlin-Python variants of Gremlin. Neptune allows Gremlin in the console, HTTP REST calls, Java, Python, .Net, and Node.js programs. On the SPARQL side, Neptune supports the Eclipse RDF4J console and workbench, HTTP REST calls, and Java programs.