A Beginners Handbook for RedisGears in Python

Ajay Kumar Reddy 29 Mar, 2023

9 min read

Introduction

Redis is a popular in-memory database that may be a cache, database, or message broker. It’s suitable for high-performance, real-time applications that are in need of low-latency data access. Strings, lists, sets, geospatial indexes, and hyperloglogs are some of the data types supported by Redis in python. It allows you to expand its Data Structures by inputting different Redis Modules.

To process Redis data, one must connect to Redis with any of the programming languages it supports, then pull the data from Redis to the disk, and then process it. This is the process followed by almost all the developers working with databases. But do you know that Redis provides an alternative for this, which is the RedisGears? With RedisGears, all the data processing takes place close to Redis itself, in the memory, thus making it much more efficient.

Learning Objectives

How to download and start using RedisGears?
Understanding RedisGears
Learn different functions used in RedisGears
Understanding how to integrate Redis Stream with RedisGears.

This article was published as a part of the Data Science Blogathon.

What is RedisGears? Where does it Fit in?

RedisGears is a flexible data processing Framework built on top of Redis that developers can use to create and run functions that implement data flow in Redis. The Framework handles the data’s distribution and deployment, which simplifies the process for the developer. RedisGears API is user-friendly, making it easy to use in different languages.

Developers can leverage RedisGears to process Redis Data in real-time by writing Python / JavaScript code and sharing this processing workload by performing it parallel across the Redis Cluster. Redis Gears can work closely with Redis Streams, an advanced data structure for processing real-time, high-volume data streams. This allows for effective data processing in Redis with multiple models while remaining highly programmable. In this article, we will get introduced to RedisGears in Python

With a built-in C- API and Python interpreter, RedisGears allow you to write and run Python scripts within Redis. This allows you to process streams directly within the memory close to the Redis data, thus making it faster and more efficient. These can be deployed in servers with no changes required. One disadvantage might be that it’s still in the early stage and does not include many functions for data processing, but this will be changed with the release of RedisGears 2.0 in the near future. It can be used in applications requiring performing tasks, including real-time analytics, processing, and machine learning.

Downloading RedisGears Module

To download and use RedisGears for Python, firstly, one needs to download the RedisGears module and then install the respective client for Python. This section will be about downloading the RedisGears module. RedisGears is one of the modules provided by Redis, so one needs to have a Redis Server to start using it. Else, we can download the official Redis Image from DockerHub, which will get us starting in a few minutes. The following commands can be used to pull the redislabs/redismod Image that contains the RedisGears module.

$ docker pull redislabs/redismod:edge

This downloads the redislabs/redismod Image, which includes all the latest releases of Redis and it contains select Redis modules, which includes the RedisGears module

Now after downloading the Image from DockerHub, it’s time to run the Image; running the Image will start a Redis server. But we want to use the RedisGears module, so we need to include this while calling the docker run command, which is as follows:

CODE

Note: Type the entire command in a single line not in multiple lines

$ docker run -p 6379:6379 redislabs/redismod:edge --loadmodule 
/var/opt/redislabs/lib/modules/redisgears.so Plugin 
/var/opt/redislabs/modules/rg/plugin/gears_python.so

The following command will start a Redis Server, and Port 6379 of the Container will be exposed to that of the host because Redis listens to this Port. The –loadmodule option will load the RedisGears module before starting the server, and the Plugin option is for using a Python client for RedisGears.

Now to test the RedisGears module, launch the Redis CLI by calling redis-cli in the CMD and then type the following command and press enter:

Here we invoked a function called RG.PYEXECUTE. All of the RedisGear commands are started with the prefix RG. And RG.PYEXECUTE is a command to run a RedisGear function in Python. This executes a function “GearsBuilder().run()” which iterates over the keyspace and returns them. At present, there are no keys saved in the server, hence it returns an empty list/set.

Exploring RedisGears in Python

To use the RedisGears commands in Python, we need to install the gearsclient library in python, which is a client for RedisGears in Python. Currently, the gearsclient library is only supported for python versions below 3.8. I’m using Python version 3.7.15 for running the gearsclient library. To install the gearsclient library, you need to type the following in the command prompt:

$ pip install git+https://github.com/RedisGears/redisgears-py.git

This installs the gearsclient library for Python, and we are ready to go. To test whether this library is working or not, type in the following commands:

from gearsclient import GearsRemoteBuilder as  GB
import redis

print(GB().run())

This will output ([ ], [ ]) to the terminal, which represents that there are no keys stored in the Redis server and there are no errors. The first list is for keys, and the next list is for errors. If we look closely at the above code, we can see that this is similar to what we have performed directly in the redis-cli.

Here GearsRemoteBuilder that GB() is a context builder; this will take in all the data from the Redis, then it will be followed by data flow operations, which will change the data and finally followed by an action. In the above example, there are no data flow operations; we have directly used the action that is run(), so the GB() reads all the keyspace and outputs are the same because here we have an empty keyspace, it is an empty list. Let’s try adding data to Redis and then see how the code outputs.

from gearsclient import GearsRemoteBuilder as  GB
import redis
import pprint

pp = pprint.PrettyPrinter()
client = redis.Redis(host='localhost', port=6379)
client.set("Database","Redis")
client.hset('user:Tharun',mapping = {'age':22,'height':178})

pp.pprint(GB().run())

In the above code, we are adding some data to Redis, a simple key-value pair, and a hash to Redis. Then we are executing GB().run(). Here pprint is used for better formatting of the output.

In the first list, in the output, there are two elements, one each for the data available in Redis, one containing the key-value pair, and the other containing the hash that we have stored in Redis from the above code. Each element in the list is again a dictionary. The first key in the dictionary is an event, stating that this is an event-driven function. We have not written any event-driven function, so it is set to None. The remaining keys can be understood as it is, i.e., what is the key name, what is the type of the data, and what are its corresponding values? Finally, the second list is empty, as there were no errors.

Reader Type and Keys Pattern Matching

We have seen that the data from the Redis is being flown into the function GB(). This function converts this data into something called Records. A Record represents any data in a data function flow, and these Records pass from one function to another. By default, the GB() reads all the keys from the Redis server. This can be changed by changing the reader parameter in the GB() function. By default reader = “KeysReader”, which reads all the keys and their values from the server. There are also other readers like KeysOnlyReader, which reads only keys but not their values, and StreamReader, which reads only the Stream messages.

Also, we can provide a pattern for the key so that we want to read only those specific keys. Like for example, if we want to read all the keys starting with “user”, we specify “user*”, that is “user” followed by a “ * “, this, we pass it in the run(). So the output will only have keys starting with the name “user”.

Let’s try to implement both the reader type and this key pattern matching. In this example, we will try to read only keys and not their values, and also we will read only the keys starting with “student,” which can be done as follows:

from gearsclient import GearsRemoteBuilder as  GB
import redis
import pprint

pp = pprint.PrettyPrinter()
client = redis.Redis(host='localhost', port=6379)
client.set("Database","Redis")
client.hset('student:Sandya',mapping = {'age':21,'height':167})
client.hset('student:Charan',mapping = {'age':22,'height':172})
client.hset('teacher:Vishwas',mapping = {'age':35,'height':165})
client.hset('teacher:Tharun',mapping = {'age':40,'height':169})

pp.pprint(GB(reader='KeysOnlyReader').run('student*'))

In the above code, we have created 4 hashes, 2 for students and two for teachers. Also, in the GB() function, we have set the reader to KeysOnlyReader and have used pattern matching in the run() function.

Thus we get the following output as shown below, the first list contains only the keys, and these keys start with the name student. The same can also be achieved by a filter function.

GB(reader="KeysOnlyReader").filter(lambda x: x.startswith('student')).run()

This filter function is similar to the python filter function, which takes in a function, checks the input data with the function, and keeps only the matching data. So at the start, all the keys are scanned and turned to Records. These Records are passed to the filter function. As our Records contain both student and teacher, this filter function filters only the students using the lambda function. Each input record is denoted by x. These filtered Records are sent to the run() function, which outputs them. This is the typical data flow in RedisGears, how data is turned into Records and then moved from one function to another, undergoing changes and finally outputted.

Mapping and CountBy in RedisGears

In the previous section, we have seen the filter function, similar to the python filter function. There is a map function, which again is similar to the map function of python. The map function takes in the Records and modifies them. For example. We have student Records; each student record has multiple key-value pairs like age and height. Now, what if we want only height? Then we can use the map function to extract only the height value from each student record. This can be done using the below code:

from gearsclient import GearsRemoteBuilder as  GB
import redis
import print as pp

pp.pprint(GB().map(lambda x: x['value']['height']).run('student*'))

The database already has two students because we have saved them in the previous code. Now, when the above code is run, the output will contain a list containing only the student hash’s height values. So, first, all the key-value pairs are scanned by GB() and turned into input Records; these Records then flow into the map function. Now each key has a value, and this value is again a key-value pair (that is, age and its value, height and its value). So in the map, we are using a lambda function to get each input Record, and within the values of each Input Record, we are only taking the height value. Then this transformed Record is sent to the run(), where we pass a string so that we only need the student heights.

The countby is another functionality provided by RedisGears. The function used for this is countby(). This function can be used to count the occurrence of values within the given Records. For example, how many students have their age at 22? Let’s try this out using the gearclient in Python:

from gearsclient import GearsRemoteBuilder as  GB
import redis
import pprint

pp = pprint.PrettyPrinter()
client = redis.Redis(host='localhost', port=6379)
client.hset('student:Sandya',mapping = {'age':21,'height':167,'score':95})
client.hset('student:Charan',mapping = {'age':22,'height':172,'score':82})
client.hset('student:Zack',mapping = {'age':22,'height':175,'score':99})
client.hset('student:Khan',mapping = {'age':22,'height':159,'score':87})
client.hset('teacher:Vishwas',mapping = {'age':35,'height':165})
client.hset('teacher:Tharun',mapping = {'age':40,'height':169})

pp.pprint(GB().countby(lambda x: x['value']['age']).run("student*"))

In the above function, we are saving 4 students’ data and 2 teachers’ data in the Redis server. Our aim is to count the occurrence of each age in the students’ data, like how many students are present at age 22. So the data, i.e., the input Records, are directly flown into the countby() function. In this function, we use lambda to extract the value of the age for each hash data. Then in the run(), using a pattern match, we are telling to keep only the student records. Now let’s look at the output.

In the output, we see two lists, which is the typical output that is returned. The first list shows two dictionaries, each containing a key-value pair. Here the key is the age because we extracted the age from each Record in the countby() function, and the value is the occurrence of that key within the input Records. So the age 22 occurs 3 times in the student Records, and the age 21 occurs one time in the student Records. This means that there are 3 students belonging to the age of 22 and one the age of 21.

Conclusion

RedisGears is a programmable engine for data preprocessing built by Redis. It can be used to process data in real time. RedisGears is supported by multiple languages like Python and JVM. RedisGears simplifies complex workflows through its simple-to-use API. It’s a powerful tool to deal with real-time processing with ease and efficiency. And its ability to work even in redis clusters makes it even more versatile.

The key takeaways from this article include the following:

RedisGears makes it faster for data pre-processing because it takes place in Redis.
Allows to perform different operations on data, from counting to aggregation.
RedisGears can be used to write event-driven functions too.
RedisGears can work in situations when the database is divided into clusters.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.