AI CASE STUDY

Predicting home prices with machine learning magic



Role: AI Specialist
Tools: RapidMiner
PDF version AI Report - PDF


Project overview

In this academic project, I partnered with Karina Diana Templer to develop a predictive AI model to estimate housing prices. Framed within a fictional real estate agency context, the objective was to improve pricing accuracy using supervised machine learning. We utilized a dataset from Melbourne, Australia (sourced via Kaggle) to simulate real-world complexity and potential for broader market application.

Methods

We explored two supervised learning models:
  • Multiple Linear Regression: Used for its simplicity and interpretability, this model analyzed how multiple property features affect selling prices.
  • Random Forest: An ensemble model combining decision trees, which delivered stronger performance in terms of R² score and RMSE, making it a more reliable choice for our predictive goals.

Data preparation

The dataset required careful preprocessing to improve the accuracy and reliability of our model. We began by identifying and removing missing or inconsistent values that could skew predictions. Next, we filtered out extreme outliers that didn’t reflect typical market behavior, ensuring the model would generalize well. Finally, we excluded non-essential features that added noise rather than insight, keeping the dataset lean and focused on the most influential variables.

Ethical considerations

We prioritized responsible AI principles:
  • Fairness: Minimizing bias in data and predictions.
  • Transparency: Making model logic and outcomes understandable.
  • Accountability: Considering the broader impact of automated pricing decisions.

Results & model performance

To evaluate model performance, we split the dataset into 70% for training and 30% for testing. The Random Forest model outperformed Linear Regression in both accuracy and generalization.

Random Forest achieved 86% accuracy on the training set and 71% on the test set, indicating strong predictive power with some acceptable variance.

Linear Regression yielded 77% accuracy during training and 73% on the test set, demonstrating more consistent but slightly lower overall performance.

These results suggest that while Random Forest is more powerful in learning complex patterns, Linear Regression offered better balance between training and test accuracy, making both models valuable depending on the use case.

Potential applications

This predictive model can be applied in several real-world scenarios to support strategic decision-making in the housing sector. It can assist real estate investors in evaluating property potential, provide accurate pricing benchmarks for appraisers, guide urban planners in development strategies, and offer valuable insights for market analysts. With continued refinement and training on diverse datasets, the model has the potential to scale across different regions and become a versatile tool in the global real estate landscape.