F1 Data Analysis with Azure
Tech Stack: Python, Azure Databricks, PySpark, Spark SQL, Azure Data Lake Storage (ADLS), Delta Lake, Azure Data Factory
Github URL: Project Link
- Analyzed Formula 1(F1) racing data(1950-Present) sourced from the Ergest Developer API leveraging Azure Databricks, PySpark, and Spark SQL for the analysis.
- Imported data into Azure Data Lake Storage (ADLS) and utilized Databricks notebooks for processing and ingesting data into a raw layer applying schema and storeing in the columnar Parquet format.
- Performed transformations on the ingested data using Databricks notebooks to generate interactive dashboards for analysis in the presentation layer.
- Employed Azure Data Factory for scheduling and monitoring the data pipeline, which was later migrated to the Delta Lakehouse architecture to comply with GDPR regulations and enable time travel capabilities.