Understanding Neo4J: Comprehensive Guide for Data Enthusiasts

Shikha Sharma 14 Feb, 2023 • 11 min read

Introduction

For decades the data management space has been dominated by relational databases(RDBMS); that’s why whenever we have been asked to store any volume of data, the default storage is RDBMS. But now we can’t think like that as we have a flood of unstructured or semi-structured data, which requires reliable technology. Many alternative solutions available in the market prove better than relational databases at solving specific categories of problems. Neo4j is one such database; it is the world’s leading open-source Graph Database that stores data in the form of nodes and relationships. This highly scalable and schema-free (NoSQL) database is developed using Java technology.

Learning Objectives

Here is what we’ll learn by reading this guide thoroughly:

  • A common understanding of graph databases and how they differ from relational databases.

  • Knowledge of the graph-related terms, i.e., nodes and relationships.

  • An understanding of the Neo4j and its special features.

  • Understanding Cypher Query Language to interact with Neo4J and set up the Neo4J environment to run CQL queries.

  • Insights into creating different types of nodes and relationships using CQL in Neo4J.

Overall, by reading this guide, we will gain a comprehensive understanding of Neo4J and CQL to store data and interact with that. We will be equipped with the knowledge and ability to use graph databases effectively.

This article was published as a part of the Data Science Blogathon.

Table of Contents

  1. Diving into the Basic Terms of Neo4J
    1.1. What is a Graph Database?
    1.2. What is Node?
    1.3. What is a Relationship?
  2. The Features Offered By Neo4J
  3. What is Cypher Query Language(CQL)?
  4. Setting Up Neo4J Environment
    4.1.  Starting the Server
  5. Creating Nodes in Neo4J
  6. Creating Relationships in Neo4J
  7. Conclusion

Diving into the Basic Terms of Neo4J

Neo4j - Quick Guide

Source: www.tutorialspoint.com

What is a Graph Database?

In this technical era, most of the data is present in the form of the relationship between multiple objects, and sometimes the relationship becomes more valuable than the data.

Relational databases store data in the form of a table because the data is highly structured and can be easily fitted into rows and columns. Relational databases store the same type of data, but they cannot explain the relationship between the data to the extent a graph database explains it.

What is a Graph Database? {Definition, Use Cases & Benefits} | Neo4J

Source: phoenixnap.com

On the other hand, graph databases are capable of storing relationships as first-class entities. The term “graph” in a graph database specifies the pictorial representation of a set of objects where a couple of objects are connected by links. Any graph is made up of two elements: first is the node (vertices), and second is the relationship(edge).

A graph database is used to model the data in the form of a graph, i.e., in the form of nodes and relationships. Let’s understand the terms node and relationship in detail!

What is Node?

An entity node in a graph depicts an object with one or more properties that can have zero or more labels. Generally, a node is a label or name for entities or data records in a graph. For example, we can create a Node labeled/named Employee, Student, Company, User, Person, Technology, etc.

A node can have any number of data values known as Properties, and similar nodes can also have different properties. For example, a person named Michel works for Microsoft, Jennifer works for Neo4J, and Justin does not work for any company. So Michel and Jennifer have a property ‘works’ and a value Microsoft and Neo4J, respectively, but Justin won’t have this property ‘works’; this is possible in Neo4J as it is a schema-free database.

Neo4J

Source: neo4j.com

What is a Relationship?

A relationship depicts the association of nodes with additional restrictions on requiring both a starting and ending node. We need relationships to connect the nodes, which tell us how a data record is connected. For example – Jeniffer likes Graphs, or Jeniffer works for Neo4J. Also, a relationship can have properties; for example, Jeniffer has been friends with Michel since 2018.

Some common examples of Graph Databases are Neo4J, Oracle NoSQL Database, HypherGraphDB, OrientDB, AllegroGraph, GraphBase, and InfiniteGraph.

The Features Offered By Neo4J

Neo4j, Network Exploration, and Optimization 4 Java is the most famous graph database management system used to represent the data in the form of graphs. Instead of using the old-school format of rows and columns to store data, Neo4J can store the data in the form of nodes and relationships. It is mostly used in database management systems when we have many interconnecting relationships or have multiple connections between nodes.

Neo4J is a schema-optional Database Management System which means we didn’t need any predefined schema to store the relational or non-relational data in Neo4J. Regarding the management of networking data, Neo4J is the best-suited technology because there is no need to set up primary key or foreign key constraints to any data, and we can add any relation between any nodes. Some of the common areas where Neo4J is proven as the most suited Database Management System include Fraud Detection, Network Diagram, Real-time product recommendation, and Graph-based searches of digital assets.

Special Features of Neo4J

Below are the notable features of Neo4j, which define why Neo4J is one of the famous NoSql graph databases.

  • Flexible schema:− Neo4j follows a data model named graph model where the graph has nodes (entities), and these nodes are connected to each other (known as relationships). Key-value pair format is used to store the data of nodes and relationships explicitly known as properties.
    Neo4j supports the flexible schema where we can add or remove properties as per requirement.

  • ACID properties:− Neo4j supports the database’s famous ACID (Atomicity, Consistency, Isolation, and Durability) property.

  • Scalability and reliability:− Neo4j enables database scaling by increasing the number of reads/writes and the volume without affecting the integrity of data and speed of query processing. Neo4j also facilitates replication for data safety and reliability.

  • Built-in web application:− A famous built-in Neo4j Browser web application is provided by Neo4J, which is used to create and retrieve the graph data.

  • Indexing:− Using Apache Lucence, Neo4j supports Indexing as well.

  • Drivers:− There are various drives with which Neo4j can work. Some are−

    • REST API- To work with various programming languages like Java, Spring, Scala, etc., Neo4J has the Rest API.

    • Java Script has Java script drivers to work with UI MVC frameworks like Node JS.

    • Cypher API and Native Java API- To develop Java applications, it supports two types of Java API: Cypher API and Native Java API. Apart from that, it also allows you to work with multiple databases such as MongoDB, Cassandra, etc.

General Features of Neo4J:

In addition to the above features, Neo4J has some general features that force us to use this graph database to build our projects. Some important features are:-

  • Neo4J supports UNIQUE constraints to remove the redundancy of data.

  • Neo4J supports Native graph storage with Native GPE(Graph Processing Engine).

  • Neo4J also allows us to export the query data to JSON file format and XLS file format.

  • The major drawback of using SQL- like languages is the complexity of Joins. To retrieve any related or connected data in a relational database, we have to use joins, making the queries very complex and difficult to write. But, in Neo4J, we don’t require complex Joins because it is a graph database, and all nodes are already connected. So, we can easily retrieve its adjacent node or relationship details without using Joins.

What is Cypher Query Language(CQL)?

Cypher is a powerful declarative pattern-matching query language offered by Neo4j to interact with the data stored in the database. Cypher uses ASCII art syntax for depicting graphs in the output. It is a very easy-to-learn language and allows one to create and retrieve relations between data without using difficult queries like Joins. CQL follows SQL-like syntaxes, which are very easy and also available in a human-readable format.

ASCII-Art Syntax: The ASCII-Art syntax is used by Neo4J to create patterns and depict graphs.

(A)-[:works_for]->(B)

  • The small brackets “( )” are used in the Neo4j to represent the data nodes.

  • The arrow symbol ” -> “ is used to define the relationship.

  • The square brackets ” [ ] ” are used in Neo4J to represent the type of relationship between the nodes like [: works_for]

So we can decode the above ASCII-Art Syntax as A and B are two nodes related to each other where A works for B.

To run the Cypher queries for creating databases, firstly, we have to install and set up the Neo4J environment on our PC or laptop. Let’s rush with that!

Setting Up Neo4J Environment

Follow the below steps to download the Neo4J on your windows laptop:-

Step 1: Click on the official site of Neo4J using the link here, and it will direct you to the homepage of the site. As shown below screenshot, just click on the download button present in the top-right corner of the page.

Neo4J

Step 2: Now you have jumped to the actual download page, where two downloading options are available: the community edition and the enterprise edition of Neo4j. You must now click on the community edition to download the Neo4J software.

Step 3: It will start downloading the community version of software compatible with your current operating systems(for example, 64-bit or 32-bit). You can check the downloaded file named “neo4j-community_windows-x64_3_1_1.exe” in your download folder.

Step 4: Install this “.exe” file by accepting the license agreement to install the Neo4J server.

You’re all set with software installation and can proceed with this built-in browser.

Starting the Server

Open the Neo4j application from your windows start menu and create/choose a database. Here, we’ll choose the automatically created database named “default. graphic”.

Then go to URL [http://localhost:7474/browser/] in your browser to access the Neo4J. You can also type “: server connect” into the white box shown at the top of the page and press the “play” button on the right.

You can use the default username/password (neo4j/neo4j) to log in to the database. After that, set a new password of your own choice. Now, you can run the Cypher queries in this available environment.

Creating Nodes in Neo4J

As we already discussed, a node is nothing but data or a record in a graph database. To make our nodes using Cypher, we use the CREATE clause. With the help of CREATE clause, we can create the following types of nodes:

1. C

To create a single node in Neo4j, specify the name of the node to be created along with the CREATE statement, and you can end it with an optional semicolon(;).

Syntax:-

CREATE (node_name);

Example:-

In the below screenshot, we created a node “person” using Cypher Query Language.

CREATE (person);
Creating a Single Node

Verification

To verify the creation of the node type, you can execute the following code:

MATCH (n) RETURN n

2. Creating Multiple Nodes

To create multiple graph nodes, you can use the create clause of Neo4j CQL; all you have to do is just pass the names of the graph nodes you want to create and separate them with a comma.

Syntax:-

CREATE (node1_name,node2_name);

Example:-

In the below screenshot, we created two nodes, “person” and “user,” using Cypher Query Language.

CREATE (person), (user);

Creating Multiple Nodes

3. Creating a Node with a Label

In Neo4J, we can create nodes with labels to group or classify the nodes using labels.

Syntax:-

CREATE (node_name:label_name);

Example:-

In the below screenshot, we created a node “Einstein” with the label “Scientist” using the CREATE clause of Cypher Query Language.

CREATE (Einstein: Scientist);

Creating a Node with a Label

4. Creating a Node with Multiple Labels

Here, we can create a single node with multiple labels by separating the labels with a colon “:”

Syntax:-

CREATE (node_name:label1_name:label2_name:labeln_name);

Example:-

In the below screenshot, we created a node “Einstein” with the labels “Scientist” and “Person” using the CREATE clause of Cypher Query Language.

CREATE (Einstein: Scientist: Person);

Creating a Node with Multiple Labels

5. Creating a Node with Properties

In Neo4J, properties are the key-value pairs using which we can store the additional information of the node or a node keeps its data. We can use the CREATE clause to create a node with its properties; we have to specify these properties separated by commas within the curly(flower) braces “{ }.”

Syntax:-

CREATE (node_name:label_name {key1:value1,key2:value2,....});

Example:-

In the below screenshot, we created a node “Einstein” with the label “Scientist” with properties name, date of birth, and city using the CREATE clause of Cypher Query Language.

CREATE (Einstein: Scientist{name:” Albert Einstein ”, DOB:”14-March-1879” Country:” Germany ”);

Creating a Node with Properties

6. Returning the Created Node

In Neo4J, to view the newly/recently created node, we can use the RETURN clause with CREATE statement.

Syntax:-

CREATE (node_name:label_name {key1:value1,key2:value2,....}) RETURN node;

Example:-

In the below screenshot, we displayed the newly created node “Einstein.”

Returning the created node

CREATE (Einstein: Scientist{name:” Albert Einstein ”, DOB:”14-March-1879” Country:” Germany ”})
RETURN Einstein;

Creating Relationships in Neo4J

As we already discussed, a relationship is nothing but an element present in a graph database to connect the nodes of a graph. The relationships define the data’s direction, category, and form patterns. We use the CREATE clause to create relationships between nodes in Cypher. With the help of CREATE clause, we can create the following things:

1. Creating Relationships

In Neo4J, we can use the CREATE clause to create the relationship between nodes. All we need to do is, specify the relationship within the square brackets “[ ]” based upon the direction of the relationship it is placed between the hyphen “ – ” and arrow “ → ” as we can see in the below syntax.

Syntax:-

CREATE (node1_name)-[:RelationshipType]->(node2_name)

Example:-

As shown in the below screenshot, first of all, we need to create two nodes, “Einstein” and “DE,” using the create clause.

CREATE (Einstein: Scientist{name:” Albert Einstein”, DOB:”14-March-1879” })
CREATE(DE:Country:{name:”Germany”})

Creating Relationships

Now, we have to create a relationship called “Scientist_OF” between these two nodes −

CREATE (Einstein)-[r:Scientist_OF]->(DE)

Neo4J

Finally, return both the nodes(Einstein and DE) to see the created relationship.

RETURN Einstein, DE

Neo4J

2. Creating Relationships Between Existing Nodes

In Neo4J, we can also create the relationship between the existing nodes with the help of the MATCH clause.

Syntax:-

MATCH (x:label1_name), (y:label2_name) 
   WHERE x.name = "node1_name" AND y.name = " node2_name" 
CREATE (x)-[: Relationship_name]->(y) 
RETURN a,b

Example:-

In this example, we are creating the relationship between two existing nodes, “scientist” and “country.”

MATCH (x:Scientist), (y:Country) WHERE x.name = "Albert Einstein" AND y.name = "Germany" 
CREATE (x)-[r: SCIENTIST_OF]->(y) 
RETURN x,y

Neo4J

3. Creating Relationships with Labels and Properties

In Neo4J, we can use the CREATE clause to create a relationship with labels and properties.

Syntax:-

CREATE (node1_name)-[label_name:Relationship_Type {key1:value1, key2:value2, . . . n}]-> (node2_name)

Example:-

In the below screenshot, we created the relationship between two nodes “scientist” and “country,” along with the properties of the relationship using Cypher Query Language.

MATCH (x:Scientist), (y:Country) WHERE x.name = "Albert Einstein" AND y.name = "Germany" 
CREATE (x)-[r: SCIENTIST_OF{Papers:4, Patents:50]->(y) 
RETURN x,y

Neo4J

4. Creating a Complete Path

In Neo4j, we can create a path formed using a continuous relationship by CREATE clause.

Syntax

CREATE p = (Node1_name {properties})-[:Relationship_Type]->
   (Node2_name {properties})[:Relationship_Type]->(Node3_name {properties}) 
RETURN p

Example:-

Let’s first create the third node, namely, “Noble_Prize.”

CREATE(NP:prize:{name:” Noble_Prize”})

Now we are going to create a complete path between three nodes, namely, “scientist,” “country,” and “prize.”

CREATE path=(Einstein{name:” Albert Einstein”})-[:SCIENTIST_OF]->(DE {name: "Germany"})-[: WINNER_OF]->(prize {NP: "Noble_Prize"})   
RETURN path

Neo4J

Conclusion

In this guide, we have discussed some of the important topics that will help you to start your career in data analysis. Using these practical topics as a reference, you can better understand the concept of Neo4J and Graph databases which will help prepare for interviews and set a pace for becoming a data analyst, developer, data scientist, etc. The key takeaways from this data blogs are-

  1. We discussed graph databases, which are needed whenever we store our data in the form of nodes and relationships.

  2. It is a schema-optional Database Management System and a graph database used to store a massive volume of data in the form of graphs.

  3. Although we have many databases, Neo4J is famous because of its unique features like flexible schema, acid properties, no joins, etc.

  4. Like SQL is used to interact with relational databases, Cypher Query Language(CQL) is used to interact with Neo4j. It is a powerful declarative pattern-matching query language that interacts with our graph database.

  5. At last, we discussed various practical queries of CQL to create nodes and relationships.

 The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Shikha Sharma 14 Feb 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear