Gremlin
Thu, 04 Aug 2022 17:42:32 GMT — Properties
Properties
Key | Value |
---|---|
Identifier | gremlin |
Name | Gremlin |
Type | Topic |
Creation timestamp | Thu, 04 Aug 2022 17:42:32 GMT |
Modification timestamp | Thu, 19 Sep 2024 06:53:49 GMT |
Tags
lpgGremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph. Every Gremlin traversal is composed of a sequence of (potentially nested) steps. A step performs an atomic operation on the data stream. Every step is either a map-step (transforming the objects in the stream), a filter-step (removing objects from the stream), or a sideEffect-step (computing statistics about the stream). The Gremlin step library extends on these 3-fundamental operations to provide users a rich collection of steps that they can compose in order to ask any conceivable question they may have of their data for Gremlin is Turing Complete.
Traversal Steps
- Lambda steps (and derived steps)
- map
id
,label
,match
,path
,select
,order
, ...
- flatMap
coalesce
,in
,inE
,inV
,out
, ...
- filter
and
,coin
,has
,is
,or
,where
, ...
- sideEffect
aggregate
,inject
,profile
,property
,subgraph
, ...
- branch
choose
,repeat
,union
, ...
- map
- Other steps
barrier
,cap
, ...
- Step modulators
as
,by
,emit
,option
, ...
- Predicates
gt
,eq
,lt
,neq
,within
,without
, ...
Bulk Ingestion
To bulk ingest data into JanusGraph using the Gremlin-Python library from two CSV files—one for vertices and another for edges—you can follow the example below. The code reads the CSV files, processes them into vertices and edges, and loads them into JanusGraph.
Vertices CSV (vertices.csv)
id,label,name,age
1,person,Alice,30
2,person,Bob,25
3,person,Charlie,35
Edges CSV (edges.csv)
source,target,label
1,2,knows
2,3,knows
1,3,works_with
Python code
import csv
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
# Connect to the TinkerPop server
graph = Graph()
g = graph.traversal()
.withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 'g'))
# Function to load vertices from CSV
def load_vertices(csv_file):
with open(csv_file, mode='r') as file:
reader = csv.DictReader(file)
for row in reader:
# Add each vertex, with its id as a unique property, and other properties
g.addV(row['label']).property('id', row['id']) \
.property('name', row['name']) \
.property('age', int(row['age'])) \
.iterate()
# Function to load edges from CSV
def load_edges(csv_file):
with open(csv_file, mode='r') as file:
reader = csv.DictReader(file)
for row in reader:
# Create edges by linking vertices based on the source and target vertex IDs
g.V().has('id', row['source']).as_('src') \
.V().has('id', row['target']).as_('tgt') \
.addE(row['label']).from_('src').to('tgt').iterate()
# Load the vertices and edges into JanusGraph
load_vertices('vertices.csv')
load_edges('edges.csv')
# Close the connection
g.remoteConnection.close()
Explanation
- Connection Setup:
- We connect to JanusGraph using
DriverRemoteConnection
with the WebSocket (ws://localhost:8182/gremlin
)
- We connect to JanusGraph using
- Load Vertices:
- The
load_vertices()
function reads thevertices.csv
file, and for each row, it creates a vertex with a label (e.g.,person
), and attaches properties likeid
,name
andage
- The
id
field is added as a property to uniquely identify the vertex
- The
- Load Edges:
- The
load_edges()
function reads theedges.csv
file. For each row, it finds the source and target vertices using theid
property and creates an edge between them using the specified edge label (e.g.,knows
,works_with
)
- The
- Close Connection:
- After loading the data, the connection to the Gremlin server is closed
Assumptions:
- The
id
field in the vertices CSV is used to uniquely identify each vertex - The edges CSV references the vertices by their
id
values in thesource
andtarget
fields
Notes
TinkerPop server and console
Server
$ bin/gremlin-server.sh start
$ bin/gremlin-server.sh stop
Console (client)
# In the terminal
$ bin/gremlin.sh
# In the Gremlin console
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :remote console
gremlin> g.V()
gremlin> g.E()
gremlin> :exit