E-Commerce - Data Engineering

Tech Stack: Python, AWS, PySpark, MySQL, Tableau, ETL

  • Extracted and imported e-commerce dataset into database and performed ETL functions, transforming data in accordance with business requirements, and stored it in data warehouse connected to Tableau to visualize historical data trends.
  • Setup RDS MySQL instance and optimized ingestion of data from S3 to RDS using Lambda function.
  • Automated ETL job (Glue with PySpark) to distribute data processing on potential larger dataset (400K rows).
  • Used S3 as staging table with Athena during ETL job and loaded transformed data into Redshift warehouse.
  • Connected Redshift to Tableau and translated business requirements into actionable reports.