In this tutorial, you will learn how to build a data pipeline using Python to an Apache Cassandra database on a Docker container. The primary purpose of this project is to gain a better understanding of NoSQL databases and become more knowledgeable of situations where it may be more appropriate to use a NoSQL database instead of a relational database. By the end of this tutorial, it is my goal for you to have a basic understanding of how you can (1) set up a Docker container with Apache Cassandra installed on it and (2) utilize Python to establish a data pipeline to Apache Cassandra.
If you have beginner experience with Python and you’re looking to jump into the data engineering world, or if you’re currently a data engineer that’s looking to expand your skillset, then this tutorial will be particularly useful for you.
Database: Apache Cassandra
Tool(s): Docker, TablePlus
Completing the tutorial will allow you to put the following on your resume:
Developed a data pipeline using Python to insert a large data set into an Apache Cassandra NoSQL database on a Docker container and incorporated best practices for the extraction, transformation, and loading of the data.
You should have a basic understanding of programming in Python.