Move over Pandas, there’s a new tool in town.
Given the speed of change in our field, we’re not given to making overly confident predictions, but one thing we’re pretty sure of – the data sets that you’re asked to analyse are going to get larger and more complex in future. We also predict you not going to want to wait around for the results.
Terality promises to get the same results from your analytics processes as Pandas, but can do it up to 20 times faster, and the best part – you won’t have to learn new syntax to do it.
If you’re a Python user, you’ll know that the Pandas library makes it possible to view a .csv file as a data table. It’s more than just the visible representation of the data, though, and the expressive data structures allow us to manipulate the data for efficient analysis. If you’re familiar with Pandas, you’ll also know that it doesn’t handle very large data sets well.
The problems with Pandas have been known for a while, and alternatives to handle big data workloads are not new. One option is to get a bigger machine with a larger, more efficient memory, but that has significant cost implications for most users. Tools like Spark address the issues through batch processing, using in-memory cache to speed up the process. Terality is different because it uses cloud processing.
Once you install Terality and create an account, the processing will no longer be carried out on your machine, but on the cloud. Their Python client can be imported into a Jupyter notebook, and the coding is synchronous with Pandas – in other words, you won’t have to learn to code in a new way.
The data is currently processed on Amazon Web Services secured S3 cloud storage, based in Frankfurt, and subject to GDPR rules. They promise to delete all data after 3 days. There is a free version of the service which allows you to process 500GB of data a month, and then after that, processing in billed by the sized of the operation you plan to carry out. Give it a try on https://www.terality.com .