Available for new roles

Durai
Shanmugaraj

Data Engineer · Walmart Global Tech

4 years building large-scale data pipelines, ETL systems, and data warehouse solutions. Specializing in PySpark, Kafka, BigQuery, and cloud-native architectures — processing 10M+ daily transactions at scale.

DE
10M+ Daily Transactions
500GB Data / Day
50% Cost Reduction
4 yrs Experience

Experience

May 2025 — Present Walmart Global Tech Chennai, India

Data Engineer III

  • Designed a cross-system earnings reconciliation framework integrating Offer, Driver Management, Earnings, and Payments data as a single source of truth for driver payouts.
  • Led development of NEC and INC driver tax dashboards, providing Ops teams real-time visibility for regulatory compliance.
  • Uncovered $2M+ payment gaps by reconciling incentive calculation logic across LMD systems, enabling corrected payouts.
  • Ingested 500K+ daily JSON events (100GB/day) from Kafka into Google Cloud Storage for scalable driver earnings analytics.
  • Created 10+ BigQuery external/managed tables and materialized views, reducing ad-hoc query setup time by 40%.
  • Designed optimized BigQuery transformations and partitioned models, reducing query latency by 35%.
  • Built and managed 10+ Scala Spark pipelines processing 500GB/day with complex transformation logic.
  • Deployed 10+ Airflow DAGs automating data workflows, reducing manual intervention by 60%.
  • Reduced infrastructure cost by 50% by migrating persistent Dataproc clusters to serverless batch clusters.
KafkaBigQueryScala SparkAirflowGCPDataproc
June 2022 — May 2025 LTIMindtree Client: Citi Bank

Data Engineer

  • Designed a scalable risk detection pipeline using PySpark, Hive, Presto, and AWS S3, reducing false positives by 15% across 10M+ daily transactions across 5+ regions.
  • Migrated Oracle data warehouse to AWS EMR for AML monitoring, reducing query times by 20% and saving $50K/year in licensing costs.
  • Transformed and structured 10M+ daily financial records into JSON format for Investigation team risk analysis workflows.
  • Developed AML detection rules in PySpark, identifying 2K+ fraudulent transactions per month.
  • Optimized Spark jobs, reducing processing time by 20% for a 1TB dataset (3B records).
  • Resolved 95% of SIT/UAT issues using Hive, Presto, and Spark SQL ensuring smooth production deployment.
PySparkAWS EMRHivePrestoS3AML

Technical Skills

Languages

Python Scala SQL PySpark Shell

Big Data

Apache Spark Apache Kafka Apache Hive Presto/Trino Apache Hudi Delta Lake

Cloud

AWS GCP Azure BigQuery Databricks Dataproc AWS EMR

Scheduling & CI/CD

Apache Airflow Autosys Jenkins GitHub Astronomer

Data Warehousing

Data Lakes PostgreSQL MySQL Star Schema SCD

Emerging Tech

Generative AI Agentic AI Data Mesh DataOps GitHub Copilot

Awards & Certifications

Shooting Star Award

Recognized for innovative Spark optimizations that reduced cluster costs by 30% and accelerated pipeline latency by 20%.

🏆

Super Crew Spot Award

Honored for rapid upskilling in AWS cloud, enabling seamless migration of 10+ legacy pipelines to AWS EMR within 3 months.

Databricks Certified

Associate Developer for Apache Spark 3.0

AWS Certified

Cloud Practitioner — Amazon Web Services

Education

Panimalar Engineering College

Bachelor of Engineering — Computer Science and Engineering

June 2018 – May 2022 · Chennai, Tamil Nadu

Let's work together

Open to senior Data Engineer and Staff DE roles at product-based companies.

+91 88258 45010