How to Build a GraphQL API for Text Analytics with Python, Flask and Fauna
GraphQL is a query language and server-side runtime environment for building APIs. It can also be considered as the syntax that you write in order to describe the kind of data you want from APIs. What this means for you as a backend developer is that with GraphQL, you are able to expose a single endpoint on your server to handle GraphQL queries from client applications, as opposed to the many endpoints you’d need to create to handle specific kinds of requests with REST and turn serve data from those endpoints.
If a client needs new data that is not already provisioned. You’d need to create new endpoints for it and update the API docs. GraphQL makes it possible to send queries to a single endpoint, these queries are then passed on to the server to be handled by the server’s predefined resolver functions and the requested information can be provided over the network.
Running Flask server
Flask is a minimalist framework for building Python servers. I always use Flask to expose my GraphQL API to serve my machine learning models. Requests from client applications are then forwarded by the GraphQL gateway. Overall, microservice architecture allows us to use the best technology for the right job and it allows us to use advanced patterns like schema federation.
In this article, we will start small with the implementation of the so-called Levenshetein distance. We will use the well-known NLTK library and expose the Levenshtein distance functionality with the GraphQL API. In this article, i assume that you are familiar with basic GraphQL concepts like BUILDING GraphQL mutations.
Note: We will be working with the free and open source example repository with the following:
In the projects, Pipenv was used for managing the python dependencies. If you are located in the project folder. We can create our virtual environment with this:
…and install dependencies from Pipfile.
We usually define a couple of script aliases in our Pipfile to ease our development workflow.
It allows us to run our dev environment easily with a command aliases as follows:
The flask server should be then exposed by default at port 5000
. You can immediately move on to the GraphQL Playground, which serves as the IDE for the live documentation and query execution for GraphQL servers. GraphQL playground uses the so-called GraphQL introspection for fetching information about our GraphQL types. The following code initializes our Flask server:
It is a good practice to use the WSGI server when running a production environment. Therefore, we have to set-up a script alias for the gunicorn with:
Levenshtein distance (edit distance)
The levenshtein distance, also known as edit distance, is a string metric. It is defined as the minimum number of single-character edits needed to change a one character sequence a to another one b. If we denote the length of such sequences |a|
and |b|
respectively, we get the following:
Where
1(ai?bj)
is the distance between the first i
characaters of a
and the first j
character of b
. For more on the theoretical background, feel free to check out the Wiki.
In practice, let’s say that someone misspelt ‘machine learning’ and wrote ‘machinlt learning’. We would need to make the following edits:
Edit | Edit type | Word state |
---|---|---|
0 | – | Machinlt lerning |
1 | Substitution | Machinet lerning |
2 | Deletion | Machine lerning |
3 | Insertion | Machine learning |
For these two strings, we get a Levenshtein distance equal to 3. The levenshtein distance has many applications, such as spell checkers, correction system for optical character recognition, or similarity calculations.
Building a Graphql server with graphene in Python
We will build the following schema in our article:
Each GraphQL schema is required to have at least one query. We usually define our first query in order to health check our microservice. The query can be called like this:
query {
healthcheck
}
However, the main function of our schema is to enable us to calculate the Levenshtien distance. We will use variables to pass dynamic parameters in the following GraphQL document:
We have defined our schema so far in SDL format. In the python ecosystem, however, we do not have libraries like graphql-tools, so we need to define our schema with the code-first approach. The schema is defined as follows using the Graphene library:
We have followed the best practices for overall schema and mutations. Our input object type is written in Graphene as follows:
Each time, we execute our mutation in GraphQL playground:
With the following variables:
{
"input": {
"s1": "test1",
"s2": "test2"
}
}
We obtain the Levenshtein distance between our two input strings. For our simple example of strings test1
and test2
, we get 1. We can leverage the well-known NLTK library for natural language processing (NLP). The following code is executed from the resolver:
It is also straightforward to implement the Levenshtein distance by ourselves using, for example, an iterative matrix, but I would suggest to not reinvent the wheel and use the default NLTK functions.
Serverless GraphQL APIs with Fauna
First off some introductions, before we jump right in. It’s only fair that I give Fauna a proper introduction as it is about to make our lives a whole lot easier.
Fauna is a serverless database service, that handles all optimization and maintenance tasks so that developers don’t have to bother about them and can focus on developing their apps and shipping to market faster.
Again, serverless doesn’t actually mean “NO SERVERS” but to simply put it: what serverless means is that you can get things working without necessarily having to set things up from scratch. Some apps that use serverless concepts don’t have a backend service written from scratch, they employ the use of cloud functions which are technically scripts written on cloud platforms to handle necessary tasks like login, registration, serving data etc.
Where does Fauna fit into all of this? When we build servers we need to provision our server with a database, and when we do that it’s usually a running instance of our database. With serverless technology like Fauna, we can shift that workload to the cloud and focus on actually writing our auth systems, implementing business logic for our app. Fauna also manages things like, maintenance and scaling which usually calls for concern with systems that use conventional databases.
If you are interested in getting more info about Fauna and it’s features, check the Fauna docs. Let’s get started with building our GraphQL API the serverless way with the GraphQL.
Requirements
- Fauna account: That’s all you need, an account with Fauna is all you need for the session, so click here to go to the sign up page.
- Creating a Fauna Database
- Login to your Fauna account once you have created an account with them. Once on the dashboard, you should see a button to create a new database, click on that and you should see a little form to fill in the name of the database, that resembles the ones below:
I call mine “graphqlbyexample” but you can call yours anything you wish. Please, ignore the pre-populate with demo data option we don’t need that for this demo. Click “Save” and you should be brought to a new screen as shown below:
Adding a GraphQL Schema to Fauna
- In order to get our GraphQL server up and running, Fauna allows us to upload our own graphQL schema, on the page we are currently on, you should see a GraphQL options; select that and it will prompt you to upload a schema file. This file usually contains raw GraphQL schema and is saved with either the
.gql
or.graphql
file extension. Let’s create our schema and upload it to Fauna to spin up our server.
- Create a new file anywhere you like. I’m creating it in the same directory as our previous app, because it has no impact on it. I’m calling it
schema.gql
and we will add the following to it:
Here, we simply define our data types in tandem to our two tables (Notes
, and user
). Save this and go back to that page to upload this schema.gql
that we just created. Once that is done, Fauna processes it and takes us to a new page — our GraphQL API playground.
We have literally created a graphql server by simply uploading that really simple schema to Fauna and to highlight some of the really cool feats that Fauna has, observe:
- Fauna automatically generates collections for us, if you notice, we did not create any collection(translates to Tables, if you are only familiar with relational databases). Fauna is a NoSQL database and collections are technically the same as tables, and documents as to rows in tables. If we go to the collections options and click on that we had see the tables that were auto-generated on our behalf, courtesy of the schema file that we uploaded.
- Fauna automatically creates indexes on our behalf: head over to the indexes option and see what indexes have been created on behalf of the API. Fauna is a document-oriented database and does not have primary keys or foreign-keys as you have in relational databases for search and index purposes, instead, we create indexes in Fauna to help with data retrieval.
- Fauna automatically generates graphql queries and mutations as well as API Docs on our behalf: This is one of my personal favorites and I can’t seem to get over just how efficient Fauna does this. Fauna is able to intelligently generate some queries that it thinks you might want in your newly created API. Head over back to the GraphQL option and click on the “Docs” tab to open up the Docs on the playground.
As you can see two queries and a handful of mutations already auto-generated (even though we did nit add them to our schema file), you can click on each one in the docs to see the details.
Testing our server
Let’s test out some of these queries and mutations from the playground, we also use our server outside of the playground (by the way, it is a fully functional GraphQL server).
Testing from the Playground
- We will test our first off by creating a new user, with the predefined
createUser
mutations as follows:
If we go to the collections options and choose User, we should have our newly created entry(document aka row) in our User
collections.
- Let’s create a new note and associate it with a user as the author via it’s document ref
id
, which is a special ID generated by Fauna for all documents for the sake of references like this much like a key in relational tables. To find the ID for the user we just created simply navigate to the collection and from the list of documents you should see the option(a copy Icon)to copy Ref ID:
Once you have this you can create a new note and associate is as follows:
- Let’s make a query this time, this time to get data from the database. Currently, we can fetch users by ID or fetch a note by it’s ID. Let’s see that in action:
You must have been thinking it, what if we wanted to fetch info of all users, currently, we can’t do that because Fauna did not generate that automatically for us, but we can update our schema so let’s add our custom query to our schema.gql
file, as follows. Note that this is an update to the file so don’t clear everything in the file out just add this to it:
Once you have added this, save the file and click on the update schema option on the playground to upload the file again, it should take a few seconds to update, once it’s done we will be able to use our newly created query, as follows:
Don’t forget that as opposed to having all the info about users served (namely: name, email, password) we can choose what fields we want because it’s a GraphQL and not just that. It’s Fauna’s GraphQL so feel free to specify more fields if you want.
Testing from without the playground – Using python (requests library)
Now that we’ve seen that our API works from the playground lets see how we can actually use this from an application outside the playground environment, using python’s request library so if you don’t have it installed kindly install it using pip as follows:
pip install requests
- Before we write any code we need to get our API key from Fauna which is what will help us communicate with our API from outside the playground. Head over to security on your dashboard and on the keys tab select the option to create a new key, it should bring up a form like this:
Leave the database option as the current one, change the role of the key from admin to the server and then save. It’ll generate for you a new key secret that you must copy and save somewhere safe, as an environment variable most probably.
- For this I’m going to create a simple script to demonstrate, so add a new file call it whatever you wish — I’m calling mine
test.py
to your current working directory or anywhere you wish. In this file we’ll add the following:
Here we add a couple of imports, including the requests library which we use to send the requests, as well as the os module used here to load our environment variables which is where I stored the Fauna secret key we got from the previous step.
Note the URL where the request is to be sent, this is gotten from the Fauna GraphQL playground here:
Next, we create a query which is to be sent this example shows a simple fetch query to find a user by id (which is one of the automatically generated queries from Fauna), we then retrieve the key from the environment variable and store it in a variable called a token, and create a dictionary to represent out headers, this is, after all, an HTTP request so we can set headers here, and in fact, we have to because Fauna will look for our secret key from the headers of our request.
- The concluding part of the code features how we use the request library to create the request, and is shown as follows:
We create a request object and check to see if the request went through via its status_code and print the response from the server if it went well otherwise we print an error message lets run test.py
and see what it returns.
Conclusion
In this article, we covered creating GraphQL servers from scratch and looked at creating servers right from Fauna without having to do much work, we also saw some of the awesome, cool perks that come with using the serverless system that Fauna provides, we went on to further see how we could test our servers and validate that they work.
Hopefully, this was worth your time, and taught you a thing or two about GraphQL, Serverless, Fauna, and Flask and text analytics. To learn more about Fauna, you can also sign up for a free account and try it out yourself!
The post How to Build a GraphQL API for Text Analytics with Python, Flask and Fauna appeared first on CSS-Tricks.
You can support CSS-Tricks by being an MVP Supporter.