Graph Analytics Serverless for AuraDB

This Jupyter notebook is hosted here in the Neo4j Graph Data Science Client Github repository.

The notebook shows how to use the graphdatascience Python library to create, manage, and use a GDS Session.

We consider a graph of people and fruits, which we’re using as a simple example to show how to connect your AuraDB instance to a GDS Session, run algorithms, and eventually write back your analytical results to the AuraDB database. We will cover all management operations: creation, listing, and deletion.

If you are using self managed DB, follow this example.

1. Prerequisites

This notebook requires having an AuraDB instance available and have the Graph Analytics Serverless feature enabled for your project.

You also need to have the graphdatascience Python library installed, version 1.15 or later.

%pip install "graphdatascience>=1.15" python-dotenv

from dotenv import load_dotenv

# This allows to load required secrets from `.env` file in local directory
# This can include Aura API Credentials and Database Credentials.
# If file does not exist this is a noop.
load_dotenv(".env")

2. Aura API credentials

The entry point for managing GDS Sessions is the GdsSessions object, which requires creating Aura API credentials.

import os

from graphdatascience.session import AuraAPICredentials, GdsSessions

# you can also use AuraAPICredentials.from_env() to load credentials from environment variables
api_credentials = AuraAPICredentials(
    client_id=os.environ["CLIENT_ID"],
    client_secret=os.environ["CLIENT_SECRET"],
    # If your account is a member of several project, you must also specify the project ID to use
    project_id=os.environ.get("PROJECT_ID", None),
)

sessions = GdsSessions(api_credentials=api_credentials)

3. Creating a new session

A new session is created by calling sessions.get_or_create() with the following parameters:

A session name, which lets you reconnect to an existing session by calling get_or_create again.
The DbmsConnectionInfo containing the address, user name and password to an AuraDB instance
The session memory.
The cloud location.
A time-to-live (TTL), which ensures that the session is automatically deleted after being unused for the set time, to avoid incurring costs.

See the API reference documentation or the manual for more details on the parameters.

from graphdatascience.session import AlgorithmCategory, SessionMemory

# Explicitly define the size of the session
memory = SessionMemory.m_8GB

# Estimate the memory needed for the GDS session
memory = sessions.estimate(
    node_count=20,
    relationship_count=50,
    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)

from datetime import timedelta

from graphdatascience.session import DbmsConnectionInfo

# Identify the AuraDB instance
# you can also use DbmsConnectionInfo.from_env() to load credentials from environment variables
db_connection = DbmsConnectionInfo(
    uri=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)

# Create a GDS session!
gds = sessions.get_or_create(
    # we give it a representative name
    session_name="people_and_fruits",
    memory=SessionMemory.m_4GB,
    db_connection=db_connection,
    ttl=timedelta(minutes=2),
)

4. Listing sessions

You can use sessions.list() to see the details for each created session.

from pandas import DataFrame

gds_sessions = sessions.list()

# for better visualization
DataFrame(gds_sessions)

5. Adding a dataset

We assume that the configured AuraDB instance is empty. We will add our dataset using standard Cypher.

In a more realistic scenario, this step is already done, and we would just connect to the existing database.

data_query = """
  CREATE
    (dan:Person {name: 'Dan',     age: 18, experience: 63, hipster: 0}),
    (annie:Person {name: 'Annie', age: 12, experience: 5, hipster: 0}),
    (matt:Person {name: 'Matt',   age: 22, experience: 42, hipster: 0}),
    (jeff:Person {name: 'Jeff',   age: 51, experience: 12, hipster: 0}),
    (brie:Person {name: 'Brie',   age: 31, experience: 6, hipster: 0}),
    (elsa:Person {name: 'Elsa',   age: 65, experience: 23, hipster: 1}),
    (john:Person {name: 'John',   age: 4, experience: 100, hipster: 0}),

    (apple:Fruit {name: 'Apple',   tropical: 0, sourness: 0.3, sweetness: 0.6}),
    (banana:Fruit {name: 'Banana', tropical: 1, sourness: 0.1, sweetness: 0.9}),
    (mango:Fruit {name: 'Mango',   tropical: 1, sourness: 0.3, sweetness: 1.0}),
    (plum:Fruit {name: 'Plum',     tropical: 0, sourness: 0.5, sweetness: 0.8})

  CREATE
    (dan)-[:LIKES]->(apple),
    (annie)-[:LIKES]->(banana),
    (matt)-[:LIKES]->(mango),
    (jeff)-[:LIKES]->(mango),
    (brie)-[:LIKES]->(banana),
    (elsa)-[:LIKES]->(plum),
    (john)-[:LIKES]->(plum),

    (dan)-[:KNOWS]->(annie),
    (dan)-[:KNOWS]->(matt),
    (annie)-[:KNOWS]->(matt),
    (annie)-[:KNOWS]->(jeff),
    (annie)-[:KNOWS]->(brie),
    (matt)-[:KNOWS]->(brie),
    (brie)-[:KNOWS]->(elsa),
    (brie)-[:KNOWS]->(jeff),
    (john)-[:KNOWS]->(jeff);
"""

# making sure the database is actually empty
assert gds.run_cypher("MATCH (n) RETURN count(n)").squeeze() == 0, "Database is not empty!"

# let's now write our graph!
gds.run_cypher(data_query)

gds.run_cypher("MATCH (n) RETURN count(n) AS nodeCount")

6. Projecting Graphs

Now that we have imported a graph to our database, we can project it into our GDS Session. We do that by using the gds.graph.project() endpoint.

The remote projection query that we are using selects all Person nodes and their LIKES relationships, and all Fruit nodes and their LIKES relationships. Additionally, we project node properties for illustrative purposes. We can use these node properties as input to algorithms, although we do not do that in this notebook.

G, result = gds.graph.project(
    "people-and-fruits",
    """
    CALL {
        MATCH (p1:Person)
        OPTIONAL MATCH (p1)-[r:KNOWS]->(p2:Person)
        RETURN
          p1 AS source, r AS rel, p2 AS target,
          p1 {.age, .experience, .hipster } AS sourceNodeProperties,
          p2 {.age, .experience, .hipster } AS targetNodeProperties
        UNION
        MATCH (f:Fruit)
        OPTIONAL MATCH (f)<-[r:LIKES]-(p:Person)
        RETURN
          p AS source, r AS rel, f AS target,
          p {.age, .experience, .hipster } AS sourceNodeProperties,
          f { .tropical, .sourness, .sweetness } AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
      sourceNodeProperties: sourceNodeProperties,
      targetNodeProperties: targetNodeProperties,
      sourceNodeLabels: labels(source),
      targetNodeLabels: labels(target),
      relationshipType: type(rel)
    })
    """,
)

str(G)

7. Running Algorithms

You can run algorithms on the constructed graph using the standard GDS Python Client API. See the other tutorials for more examples.

print("Running PageRank ...")
pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
print(f"Compute millis: {pr_result['computeMillis']}")
print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
print(f"Centrality distribution: {pr_result['centralityDistribution']}")

print("Running FastRP ...")
frp_result = gds.fastRP.mutate(
    G,
    mutateProperty="fastRP",
    embeddingDimension=8,
    featureProperties=["pagerank"],
    propertyRatio=0.2,
    nodeSelfInfluence=0.2,
)
print(f"Compute millis: {frp_result['computeMillis']}")
# stream back the results
gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True, db_node_properties=["name"])

8. Writing back to AuraDB

The GDS Session’s in-memory graph was projected from data in our specified AuraDB instance. Write back operations will thus persist the data back to the same AuraDB. Let’s write back the results of the PageRank and FastRP algorithms to the AuraDB instance.

# if this fails once with some error like "unable to retrieve routing table"
# then run it again. this is a transient error with a stale server cache.
gds.graph.nodeProperties.write(G, ["pagerank", "fastRP"])

Of course, we can just use .write modes as well. Let’s run Louvain in write mode to show:

gds.louvain.write(G, writeProperty="louvain")

We can now use the gds.run_cypher() method to query the updated graph. Note that the run_cypher() method will run the query on the AuraDB instance.

gds.run_cypher(
    """
    MATCH (p:Person)
    RETURN p.name, p.pagerank AS rank, p.louvain
     ORDER BY rank DESC
    """
)

9. Deleting the session

Now that we have finished our analysis, we can delete the session. The results that we produced were written back to our AuraDB instance, and will not be lost. If we computed additional things that we did not write back, those will be lost.

Deleting the session will release all resources associated with it, and stop incurring costs.

sessions.delete(session_name="people_and_fruits")

# or gds.delete()

# let's also make sure the deleted session is truly gone:
sessions.list()

# Lastly, let's clean up the database
gds.run_cypher("MATCH (n:Person|Fruit) DETACH DELETE n")