NYC House Price Prediction Model
Year: 2024
Explored COVID-19’s impact on NYC real estate by cleaning and visualizing data, identifying key features, and predicting house prices using Lasso Regression. Applied hypothesis tests and binary encoding for feature transformation.
Skills: Data cleaning and visualization, Hypothesis testing, Feature engineering, Predictive modelling
Project Overview:
The goal of this project was to explore the impact of COVID-19 on NYC real estate prices and to predict house prices using the most reliable model. I conducted hypothesis tests, cleaned the NYC house sales data, and explored and visualized datasets to identify key features relevant to predicting house prices. I converted categorical and non-numerical features into numerical values using binary encoding and built a prediction model using Lasso Regression.
Reflection:
This project marked my first significant experience with feature engineering, as I had previously focused on deep learning models and neural networks. In hindsight, I realize I was too conservative in removing features that were only loosely related to house prices. This resulted in a dataset with too many dimensions, complicating the model and leading to performance below the set threshold.
If I were to approach a similar project again, I would prioritize deeper domain research to better understand the factors that truly influence the target variable—like house prices. This would enable more precise feature selection, resulting in a more streamlined and effective model. It was a valuable lesson, and I'm now better prepared to build more efficient models in the future.
Skills Acquired:
Data Science, Data Visualization, Hypothesis Testing, Feature Engineering, Data Cleaning, Model Evaluation