This is the third in a series of articles that about getting started with Neo4j in python. In the first two articles I discussed getting a Neo4j instance setup on your system and how to use python-embedded bindings to create some nodes and relationships in a sample graph.
In this article we'll progress to (hopefully) more challenging territory by taking the HydraGraph that we created in article 2 and exposing it as a RESTful API.
To achieve this I used a couple of python libraries within the script to deliver the primary functionality.
Firstly, for interacting with the underlying graph, I chose the Neo4jRestClient created by Javier de la Rosa, which is a user-friendly python library created on-top of the Neo4j REST Service. One thing I did note in working on this script (and across the article series as a whole) is that the number of available blogs and articles on this particular library is still relatively small, so I am pleased to be able to add to that growing body of work through this effort.
To create and expose an API I generally use the lightweight Flask framework, which is built on-top of Werkzeug and Jinja 2.
One thing also worth nothing is that when I use Flask locally, I tend to set-up a virtual environment. This is described more fully in this excellent article.
Right, lets get down to business. The goal of the script is to handle incoming HTTP requests, parse the request into a Cypher query and return the results of that query as well-formed JSON in the HTTP response. This article will describe the steps required to do this locally on your own machine. A later article will illustrate how to deploy this as a live service on an Apache2 instance. As ever I'm running Ubuntu 12.04 64-bit via Virtualbox on my trusty Macbook Pro.
The code composition for defining the endpoint locally (it's different for a deployed instance) is as a url, in this case I've chosen /_api/hydraGraph. This is how you will access your service when you are testing it later on, so choose something that describes your service well. For one of our clients we have a solution that requires a double-digit number of services, and if they were badly defined here debugging would be a serious challenge.
The way I usually think about the incoming HTTP request is that it hits the
@app.route and this fires the function
api_response() method. Flask doesn't seem too particular what you name this function just as long as it is positioned directly under the
@app.route routing. I then use
api_response() as a pseudo main-method that will call out to the various other functions within the script. In this case I've created the Neo4j connection directly and then I declare two local variables which call out to two functions:
These two functions
getRels are the real heart of the script. Lets look at
getNodes in further detail. When called it requires a
db connection to be passed, so that we can ensure we are talking to the correct graph and also means I can reuse the function across scripts pretty easily.
getNodes function starts with a Cypher query construct which enables us to query the underlying graph. To learn more about the Cypher query language in Neo4j there are some good tutorials on the Neo4j site.
In this simple case we just want to query the graph and get back all the nodes and their properties.
This will execute a query on the underlying graph, which returns a list of objects in
RAW format. This can then be manipulated using some of the standard python list-manipulation methods. Here I implemented an iteration over the list of node objects, popping them off the data structure and assigning the various object attributes to local variables. One point to note is that in the loop over the
querySequenceObject I intentionally omit the root node as I have no use for it in the final output.
An interesting complexity which needs to be addressed is ensuring we can identify the correct UID for each node assigned in the graph. This is important because eventually you will need to link relationships to nodes and you can use these UID's to create the start and end-points for the relationship.
If you add a simple print statement to the variable
self from the code snippet above you will see that a URL string is returned. This has something to do with the choice of RAW as the output type from the Cypher query, although I haven't found an alternative to this yet (please do let me know if there is an obvious trick that I am missing here).
To correctly identify the node we need to undertake some url parsing. First step is to use the existing python
urlparse library, which you can see is included at the top of the script. Once applied the
urlparse function leaves us with the tail of the url and we can handle this with some regex (a note of thanks to my co-founder James for his help with this bit)
The following function takes in the unparsed url string and returns everything after the last '/' in the string, which is the id of the node in the graph. This value is passed back to the
This line is performing two operations on the local variables that we have derived from the node object popped off the result of the Cypher query. The inner operation calls the
createNodeJSON function and passes the
description variables to it. This function simply parses these data points and returns a well-formed
This JSON object is then appended to the
nodeJSON list and this list works as a handy JSON object store so that no matter how many nodes are on the graph they will all be iterated through, parsed and transformed into JSON objects and eventually returned to the
api_response() method at the start of the script.
Next I access the relationships in the graph through the
getRels() function. It is defined in a very similar manner to the nodes however we need to handle the additional complexity that comes from each relationship having a start node and an end node.
Looking in further detail at the
getRels() function we can use the Cypher query to access the graph and to create the response list,
querySequenceObject, with the result objects for each relationship on the graph.
Now, armed with this, I iterate over the list of relationship objects and use the various helper functions to parse the urls and extract the required information.
Returning to the top of the script again and the
api_response() function, we can now see two local variables
rels which contain the well-formed JSON objects created above. These in turn are appended to a result JSON object and using the Flask
json.dumps method is prepared for the HTTP response that Flask will return for us.
We can create the HTTP response as follows:
One point to note, due to cross-domain issues, we typically wrap all of our HTTP requests in callbacks (ie JSONP). Thus, there is additional code to handle, store and append these Wrappers to each of the incoming request and outgoing responses, which I've included for completeness. Just make sure to append the
?callback=[random string] at the end of your url.
So there you have it. In order to get the script running make sure you've got your virtual environment activated (if you are using one), start the Neo4j server (see part 2 for info on this) and you can run the script with the following command:
Also worth noting is the Flask debugger which helps when you have any issues getting your requests to correctly resolve. That is controlled by this snippet of code which is appended to the bottom of your script.
So now that the script is up and running we can navigate to the target local url in your browser (eg: