Portrait of Harshit Shukla, Data Engineer

Harshit Shukla

Hi, I’m Harshit—a Data Engineer who automates chaos by choice, and writes ETLs so robust they make coffee jealous.

With 4+ years wrangling data for global enterprises, I transform spreadsheets into insights, and legacy warehouses into cloud-native solutions. My pipelines on Azure and Databricks scale faster than my caffeine intake!

I optimize Spark jobs until they run smoother than butter on a GPU, and my Python scripts can *sometimes* predict my next snack.

Whether collaborating across teams or taming jungle-sized datasets for German automotive giants, I live for building scalable data platforms—and chasing “why did this job fail at 2am?” mysteries.

Let’s build the future of data, automate the boring parts, and have a laugh when Airflow goes down right before a demo!

Let's Connect

Projects

Flipkart Data Analysis

Ever wondered what drives e-commerce giants behind the scenes? I built and optimized scalable ETL pipelines to dissect Flipkart’s massive datasets—finding insights hidden in the clicks, carts, and checkouts. Sometimes, even my queries needed a coffee break!

Technologies used: Python Spark SQL Databricks
View Project
Apache Spark Analysis

From confessions to config files, this project dives deep into Spark—creating streamlined notebooks that decode SQL logic and Spark magic. It’s where data meets velocity, and where I make distributed computing look easy… most days!

Technologies used: Python Spark SQL Databricks
View Project

Skills

Python

If there’s a bug in my Python code, I’ll find it—usually after coffee. I’ve wrangled data and built scripts faster than you can say “import this”.

SQL

My SQL queries are optimized for breakfast, lunch, and big data crunches. I dream in SELECTs and sometimes forget semicolons—not proud, just honest!

PySpark

I’ve orchestrated PySpark pipelines and survived cluster chaos—I code distributed tasks smoother than Spark’s own DAG scheduler.

Databricks

My notebooks in Databricks are so organized, even my future self thanks me. I wrangle data and run jobs with a click and a bit of luck.

Microsoft Azure

I confidently deploy scalable data pipelines on Azure—because why fear the cloud when you can automate it?

Airflow

I schedule data jobs and chase DAG errors before breakfast. My Airflow graphs look so good, they could hang in a data museum.

Power BI

I turn cold, hard data into dashboards so lively, even executives stare. If you spot a typo in a chart, it's just to check if you’re paying attention!

Git

My commits are frequent and descriptive—unless it’s Friday at 5PM. I resolve merge conflicts like a diplomat who pairs well with donuts.

Docker

I containerize with skill. My local builds are cleaner than my kitchen—both have great uptime!

Kubernetes

I orchestrate containers and pods—my clusters stay happy, except when they rebel (usually Mondays).

Cloud Storage

I store, retrieve, and secure petabytes of data—because the cloud should keep data dry, not developers waiting!

ETL

My ETL jobs transform messy data into analytics gold. And yes, I document every step—because chaos shouldn’t show up in production.

Data Warehousing

I architect data warehouses sturdy enough to survive quarterly reviews (and random pivots from management).

Contact Me

Email

Inbox zero is a work in progress—send me your thoughts, questions, or favorite Python memes. I reply faster if you say “urgent bug” or offer coffee.

LinkedIn

I enjoy connecting with fellow data enthusiasts, recruiters, and the occasional motivational speaker. If your message includes “cutting-edge tech,” I’ll probably reply within the hour!

GitHub

Check out my code, open issues, or fork a repo! I review pull requests with the same care I use to debug production code (that is, a lot).

Reach out, collaborate, or just say hi. I promise, my responses are more reliable than your WiFi!