Site Impact – Big Data Pipeline

Stewart Smith

Archive article - published on April 30 2020

Site Impact needed a way to modernize and scale-out their data management procedures. We were tasked with architecting a cloud-run platform, built to scale their business operations by presenting normalized data for better analysis.

The challenge

Site Impact reached out to us to discuss the ability of tackling uptime and scale problems faced with their current data management platform. Their current workflow for leveraging the data was also a piecemeal approach and consisted of a lot of manual data manipulations.

The solution

After assessing the project during a Discovery phase, we were able to extract an MVP in order to produce what was desired. We focused heavily on Data Science and built a multi-functional data pipeline that allowed the client to provide data to run through standardizing and deduping processes that allowed anyone in their organization to analyze the exported data wherever they desire.

The results

Key Technologies: BigQuery – Ability to execute efficient SQL queries on tables 400GB large with hundreds of millions of rows of data, some tables spanning 600+ columns. Composer – Airflow provides a dependency-driven ETL pipeline which runs all needed manipulations and automatically presents the data up to BigQuery. Dataproc – Pyspark code utilizing dataproc’s compute, built to handle PB’s of data.

We focused heavily on Data Science and built amulti-functional data pipeline that allowed the client to provide data to run through standardizing and deduping processes in real time or batch form.

About Site Impact, LLC

Site Impact are one of the leading providers in data and marketing resources. Specializing in multi-channel direct marketing services.

Industry: Advertising & Marketing

Primary project location: United States

About WALTLabs.io, LLC

At WALT Labs, we provide professional services around application modernization and cloud strategy.

Products: Google Cloud Platform

First Published on July 30, 2020

Stewart Smith
Share this post

Let’s just have a chat and see where this goes.

Book a meeting