Data Science Group Project
Topic: "Correlating Lifestyle Factors with Rising Stroke Incidence in Young Adults"
- Investigated how lifestyle factors such as stress, alcohol consumption, smoking, and unhealthy eating behaviors correlate with the
increasing incidence of strokes in young adults.
- Utilized Python libraries (Pandas, Matplotlib, NumPy, Seaborn and SciPy) for data cleaning, preparation, manipulation and visualization to effectively manage and analyze 5 large datasets including Stress Data, Alcohol Consumption Data,
Smoking Data, Unhealthy Eating Behaviors Data and Stroke Data.
- Conducted statistical testing, Chi-square tests, and ANOVA to identify significant correlations in each dataset.
- Applied machine learning techniques including Classification, Linear Regression, K-Nearest Neighbors, Decision Trees,
Logistic Regression and Random Forest Classifiers using Scikit-learn to the Stroke Data to explore how stress, alcohol consumption, smoking and unhealthy eating behaviors correlate with the increasing incidence of strokes in young adults.
- Project link: