Getting started with Neo4j

Neo4j is an open-source graph database implemented in Java. The developers describe Neo4j as “embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables”. According to DB-Engines report, Neo4j is the most popular graph database.

As a graph database, Neo4j focuses more on the relationships between values than on the commonalities among sets of values (such as collections of documents or tables of rows). Neo4j is small enough to be embedded into nearly any application. On the other hand, Neo4j can store tens of billions of nodes and as many edges. Let’s get started with Neo4j !

Download and Installation

Download the latest community release for your platform from here (in my case it is 1.9.1 for windows ).
Extract the archive to your preferred location (in my case it is C:\neo4j\)
Double click Neo4j.bat in the bin directory. You will see a lot of output appears and then the window will close and leave a blank command window.
- If you don’t have JDK installed, you need to install it first and configure its environment variables.
Now open your browser and navigate to http://localhost:7474. If everything went fine you will see the welcome message of the webadmin interface of Neo4j

When you close this message you will see the webadmin interface. You can reopen the welcome message by clicking on the Guide button on the top-right corner on the webadmin.

Neo4j’s Web Interface

The above screenshot is for the Dashboard tab of the Neo4j’s webadmin interface. It shows a summary totals of the nodes and relations in the current installation. It also shows a timeline of their creation. Till now we have nothing.

Click on the Data browser tab. This tab will be used to CRUD operations on our database.

click on + Node to create a new node.

Node is a vertex between edges that may hold data as a set of key-value pairs. The box on the top-left shows the node number (auto-increment). Now click on + Add property to add key-value pairs, you can add as much as you want.

let’s add another node with [firstName: Mark, lastName: Anthony]

Now let’s add a friend relationship between mark and john. Click + Relationship to add a relationship. Enter from node number, to node number, and the relationship type (just a string). Then click Create.

Just like nodes, relationships can contain properties. Click the + Add Property button and enter the properties [firstmet : “SXSW Conference”, LinkedIn : “Yes”] so we can keep track of where john and mark first met and weather they connected on LinkedIn or not.

Now let’s click switch view mode button and see how our graph is look like

The Style button brings up a menu where you can choose which profile is used for rendering the graph visualization. To see more useful information on the diagram, click Style and then New Profile. This will take you to the “Create new visualization profile” page.

Enter “Social Network” as the new visualization profile name. Select Box from the Show as dropdown box. Change the Label from {id} to {id}: {prop.firstName} (prop.lastName}. You may change the font size or color. Click Save. Now your graph should look like:

Although the web interface is an easy way to make a few edits, we need a more powerful interface.

Neo4j via Gremlin

There are several languages that interoperate with Neo4j: Java code, REST, Cypher, Ruby console, and others. The one we’ll use today is called Gremlin. Gremlin is a graph traversal language written in the Groovy programming language. You don’t not need any knowledge of Groovy to use Gremlin. The Gremlin console is available in the Web Admin; just click the
Console link at the top, and choose Gremlin.

Since Gremlin is a general-purpose graph traversal language, it uses general mathematic graph terms. Where Neo4j calls a graph data point a node, Gremlin prefers vertex, and rather than relationship, Gremlin calls it an edge. g is a variable that represents the graph object. Graph actions are functions called on it.

g.V gets the graph vertices
g.E gets the graph edges
g.v(0) gets vertex 0.

g.v(1).map() lists all vertex 1 properties
g.v(1).firstName returns property firstName of vertex 1

g.V.filter { } gets (walk in Groovy terminology) the vertices that evaluate true according to the filter condition (closure in Groovy terminology) between the curly braces {}. The it keyword inside the closure represents the current object and is automatically populated. g.V.filter{it.firstName == ‘Mark’}
g.v(2).outE gets the outgoing edges of the given vertex(2).
g.v(1).inE gets the incoming edges for the given vertex(1).
g.v(2).outE.firstMet gets the firstMet property of the outgoing edge from vertex 2. Can be applied for inE.

inV gets the incoming vertices. You could retrieve any property of these vertices

outV gets the outgoing vertices
out shorthand version of outE.inV
in shorthand version of inE.outV
g.addVertex([propertyName : ‘Property value’]) add a vertex to graph with the specified properties.
g.addEdge(vertex1, vertex2, ‘relationName’) add an edge from vertex1 to vertex2 with relation relationName.

Pipes

Gremlin operations can be treated as a series of pipes. Each pipe takes a collection as input and pushes a collection as output. A collection may have one item, many items, or no items at all. The items may be vertices, edges, or property values. Gremlin, Pipes, and many more are parts of a bigger system developed by Tinkerpop.

For example, the outE pipe takes in a collection of vertices and sends out a collection of edges. The series of pipes is called a pipeline and expresses declaratively what the problem is (not the steps to solve the problem like in imperative programming).

Gremlin is built on top of a Java project named Pipes. Suppose that Mark would like to know the colleagues of his friend John.

the query will be something like that:

Gremlin query works quite similar to jQuery in terms of expressing the graph traversal process. Consider the following HTML snippet:



  
*     
    section 1  
      
    
*     
    section 2

Getting the section 1 string using jQuery
- $(’[id=navigation]’).children(’li’).children(’[name=section1]’).text()
Getting the section 1 string using Gremlin
- g.V.filter{it.id==‘navigation’}.out.filter{it.tag==‘li’}.
  out.filter{it.name==‘section1’}.text

Pipeline vs. Vertex

To get a collection containing just one specific vertex, we can filter it from the list of all nodes. This is what we have been doing so far using g.V.filter{}. The The V property is the list of all nodes, from which we subset the desired set. But when we want the vertex itself, we need to
call next(). This method retrieves the first vertex from the pipeline. It’s similar to the difference between an array of one element and the element itself. To show the difference, call the class property of filter and next and see the returned object.

Pipe-processing units called steps in Gremlin and it falls in the following types.

Transform steps : take an object and emit a transformation of it.
Filter steps : decide whether to allow an object to pass or not.
sideEffect : pass the object, but yield some side effect.
branch : decide which step to take.

For a complete list of steps in each type, refer to this. Groovy also has a map function (a la mapreduce) named collect() and a reduce function named inject(). Using these, you can preform
mapreduce-like queries.

Domain-Specific Steps

Graph traversal is sufficient but it is not business friendly. Business is not talking in edges and vertices. Gremlin lets us creating new steps that are semantically meaningful to the data stored in the graph.