Neo4j is an open-source graph database implemented in Java. The developers describe Neo4j as “embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables”. According to DB-Engines report, Neo4j is the most popular graph database.
As a graph database, Neo4j focuses more on the relationships between values than on the commonalities among sets of values (such as collections of documents or tables of rows). Neo4j is small enough to be embedded into nearly any application. On the other hand, Neo4j can store tens of billions of nodes and as many edges. Let’s get started with Neo4j !
The above screenshot is for the Dashboard tab of the Neo4j’s webadmin interface. It shows a summary totals of the nodes and relations in the current installation. It also shows a timeline of their creation. Till now we have nothing.
Click on the Data browser tab. This tab will be used to CRUD operations on our database.
click on + Node to create a new node.
Node is a vertex between edges that may hold data as a set of key-value pairs. The box on the top-left shows the node number (auto-increment). Now click on + Add property to add key-value pairs, you can add as much as you want.
let’s add another node with [firstName: Mark, lastName: Anthony]
Now let’s add a friend relationship between mark and john. Click + Relationship to add a relationship. Enter from node number, to node number, and the relationship type (just a string). Then click Create.
Just like nodes, relationships can contain properties. Click the + Add Property button and enter the properties [firstmet : “SXSW Conference”, LinkedIn : “Yes”] so we can keep track of where john and mark first met and weather they connected on LinkedIn or not.
Now let’s click switch view mode button and see how our graph is look like
The Style button brings up a menu where you can choose which profile is used for rendering the graph visualization. To see more useful information on the diagram, click Style and then New Profile. This will take you to the “Create new visualization profile” page.
Enter “Social Network” as the new visualization profile name. Select Box from the Show as dropdown box. Change the Label from {id} to {id}: {prop.firstName} (prop.lastName}. You may change the font size or color. Click Save. Now your graph should look like:
Although the web interface is an easy way to make a few edits, we need a more powerful interface.
There are several languages that interoperate with Neo4j: Java code, REST, Cypher, Ruby console, and others. The one we’ll use today is called Gremlin. Gremlin is a graph traversal language written in the Groovy programming language. You don’t not need any knowledge of Groovy to use Gremlin. The Gremlin console is available in the Web Admin; just click the
Console link at the top, and choose Gremlin.
Since Gremlin is a general-purpose graph traversal language, it uses general mathematic graph terms. Where Neo4j calls a graph data point a node, Gremlin prefers vertex, and rather than relationship, Gremlin calls it an edge. g is a variable that represents the graph object. Graph actions are functions called on it.
Gremlin operations can be treated as a series of pipes. Each pipe takes a collection as input and pushes a collection as output. A collection may have one item, many items, or no items at all. The items may be vertices, edges, or property values. Gremlin, Pipes, and many more are parts of a bigger system developed by Tinkerpop.
For example, the outE pipe takes in a collection of vertices and sends out a collection of edges. The series of pipes is called a pipeline and expresses declaratively what the problem is (not the steps to solve the problem like in imperative programming).
Gremlin is built on top of a Java project named Pipes. Suppose that Mark would like to know the colleagues of his friend John.
the query will be something like that:
Gremlin query works quite similar to jQuery in terms of expressing the graph traversal process. Consider the following HTML snippet:
*
section 1
*
section 2
Getting the section 1 string using jQuery
Getting the section 1 string using Gremlin
To get a collection containing just one specific vertex, we can filter it from the list of all nodes. This is what we have been doing so far using g.V.filter{}. The The V property is the list of all nodes, from which we subset the desired set. But when we want the vertex itself, we need to
call next(). This method retrieves the first vertex from the pipeline. It’s similar to the difference between an array of one element and the element itself. To show the difference, call the class property of filter and next and see the returned object.
Pipe-processing units called steps in Gremlin and it falls in the following types.
Transform steps : take an object and emit a transformation of it.
Filter steps : decide whether to allow an object to pass or not.
sideEffect : pass the object, but yield some side effect.
branch : decide which step to take.
For a complete list of steps in each type, refer to this. Groovy also has a map function (a la mapreduce) named collect() and a reduce function named inject(). Using these, you can preform
mapreduce-like queries.
Graph traversal is sufficient but it is not business friendly. Business is not talking in edges and vertices. Gremlin lets us creating new steps that are semantically meaningful to the data stored in the graph.