Gremlin

Thu, 04 Aug 2022 17:42:32 GMT

 Properties

Key Value
Identifier gremlin
Name Gremlin
Type Topic
Creation timestamp Thu, 04 Aug 2022 17:42:32 GMT
Modification timestamp Thu, 19 Sep 2024 06:53:49 GMT
Tags
lpg

Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph. Every Gremlin traversal is composed of a sequence of (potentially nested) steps. A step performs an atomic operation on the data stream. Every step is either a map-step (transforming the objects in the stream), a filter-step (removing objects from the stream), or a sideEffect-step (computing statistics about the stream). The Gremlin step library extends on these 3-fundamental operations to provide users a rich collection of steps that they can compose in order to ask any conceivable question they may have of their data for Gremlin is Turing Complete.

Traversal Steps

  • Lambda steps (and derived steps)
    • map
      • id, label, match, path, select, order, ...
    • flatMap
      • coalesce, in, inE, inV, out, ...
    • filter
      • and, coin, has, is, or, where, ...
    • sideEffect
      • aggregate, inject, profile, property, subgraph, ...
    • branch
      • choose, repeat, union, ...
  • Other steps
    • barrier, cap, ...
  • Step modulators
    • as, by, emit, option, ...
  • Predicates
    • gt, eq, lt, neq, within, without, ...

Bulk Ingestion

To bulk ingest data into JanusGraph using the Gremlin-Python library from two CSV files—one for vertices and another for edges—you can follow the example below. The code reads the CSV files, processes them into vertices and edges, and loads them into JanusGraph.

Vertices CSV (vertices.csv)

id,label,name,age
1,person,Alice,30
2,person,Bob,25
3,person,Charlie,35

Edges CSV (edges.csv)

source,target,label
1,2,knows
2,3,knows
1,3,works_with

Python code

import csv
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# Connect to the TinkerPop server
graph = Graph()
g = graph.traversal()
	.withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 'g'))

# Function to load vertices from CSV
def load_vertices(csv_file):
    with open(csv_file, mode='r') as file:
        reader = csv.DictReader(file)
        for row in reader:
            # Add each vertex, with its id as a unique property, and other properties
            g.addV(row['label']).property('id', row['id']) \
             .property('name', row['name']) \
             .property('age', int(row['age'])) \
             .iterate()

# Function to load edges from CSV
def load_edges(csv_file):
    with open(csv_file, mode='r') as file:
        reader = csv.DictReader(file)
        for row in reader:
            # Create edges by linking vertices based on the source and target vertex IDs
            g.V().has('id', row['source']).as_('src') \
             .V().has('id', row['target']).as_('tgt') \
             .addE(row['label']).from_('src').to('tgt').iterate()

# Load the vertices and edges into JanusGraph
load_vertices('vertices.csv')
load_edges('edges.csv')

# Close the connection
g.remoteConnection.close()

Explanation

  1. Connection Setup:
    • We connect to JanusGraph using DriverRemoteConnection with the WebSocket (ws://localhost:8182/gremlin)
  2. Load Vertices:
    • The load_vertices() function reads the vertices.csv file, and for each row, it creates a vertex with a label (e.g., person), and attaches properties like id, name and age
    • The id field is added as a property to uniquely identify the vertex
  3. Load Edges:
    • The load_edges() function reads the edges.csv file. For each row, it finds the source and target vertices using the id property and creates an edge between them using the specified edge label (e.g., knows, works_with)
  4. Close Connection:
    • After loading the data, the connection to the Gremlin server is closed

Assumptions:

  • The id field in the vertices CSV is used to uniquely identify each vertex
  • The edges CSV references the vertices by their id values in the source and target fields

Back to top


Notes
Thu, 19 Sep 2024 06:59:53 GMT
TinkerPop server and console

Server

$ bin/gremlin-server.sh start
$ bin/gremlin-server.sh stop

Console (client)

# In the terminal
$ bin/gremlin.sh

# In the Gremlin console
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :remote console
gremlin> g.V()
gremlin> g.E()
gremlin> :exit


Back to top