This project demonstrates a complete end-to-end data analytics lifecycle for a complex healthcare dataset. The primary goal was to transform raw operational data into actionable business intelligence for improving cost management and patient satisfaction.
The solution showcases the crucial integration of Data Modeling (Star Schema), Advanced SQL, and Statistical Analysis in Python to validate hypotheses, identify key cost drivers, and deliver strategic recommendations.
The process involved:
- Data Understanding and profiling in Python (Pandas).
- Designing and implementing a Star Schema in MySQL.
- Performing rigorous Data Cleaning and Transformation using advanced SQL techniques.
- Integrating SQL and Python for Statistical Analysis and validation.
- Drawing Strategic Insights and preparing final reports.
The dataset consists of eight related operational tables covering patient visits, costs, and providers. The data is used strictly for educational and portfolio purposes. Note: Due to the sensitivity of healthcare data and licensing, the raw data files are not shared in this repository.
This repository contains all code and final outputs for the project.
| Folder/File | Description |
|---|---|
1_Healthcare_Data_CSV_Files/ |
(NOT INCLUDED in repo due to license/sensitivity) Placeholder for the 8 raw operational CSV tables. |
2_Healthcare_Data_Final_Deliverables/ |
Contains the finalized outputs, reports, and code scripts. |
README.md |
Project documentation and overview (this file). |
01_Data_Understanding_and_Profiling.ipynb(Jupyter Notebook)02_Star_Schema_DDL_and_Transformation.sql(SQL Script)03_Advanced_SQL_EDA.sql(SQL Script)04_Python_Statistical_Analysis.ipynb(Jupyter Notebook)05_Final_Analysis_Report.pdf(PDF Report)06_Presentation_Slides.pptx(PowerPoint Slides)
- Data Modeling: Star Schema design, Relational Database implementation, Normalization, Primary/Foreign Key Constraint Management.
- Advanced SQL: Window Functions, Subqueries,
CASElogic for conditional aggregation, Data Cleaning and Transformation. - Python: Pandas for Data Understanding and ETL staging, Statistical Analysis (Correlation/Distribution), Seaborn for Visualization.
- Business Intelligence: Translating complex findings into actionable insights and strategic recommendations for business stakeholders.
- Database Design Precedes Analysis: Emphasized the importance of designing a scalable Star Schema to ensure data integrity and ease of querying.
- SQL-Python Synergy: Demonstrated the efficiency of using SQL for transformation and Python for granular Statistical Modeling and hypothesis validation.
- Actionable Strategy: The necessity of linking analytical metrics directly to strategic outcomes (Cost Management, Quality Improvement) for maximum business impact.
- **You can also view this project on my personal portfolio website:
- **π Wix Portfolio β Healthcare Data Analytics
Mukesh Shirke
- [GitHub Profile Link]
- [Portfolio Site Link]
- [LinkedIn Profile Link]
This project is for educational and portfolio purposes only.