Xome is one of the largest home search engines in the world. At the heart of Xome’s mission is to constantly look for a better way of doing things when it comes to real estate, which is one of the reasons Xome was voted Best Real Estate App for Auctions in 2023 by Investopedia.
That includes leveling the playing field with the Xome auction platform, improving the number of home options on the app, and providing the Xome Value Estimate on properties through the Xome Valuation Model. All of these empower people to find clarity and comfort when making one of the largest and most important transactions they’ll make in their lifetime.
Our whole purpose at Xome is to help keep the dream of home ownership alive. But we can’t do that without the features, products, and teams that power that mission. Xome’s automated digital solutions are powered by data and machine learning, and we continue to invest more resources into the Xome platform to help our customers win.
Because we want to allow the Xome Valuation Model to learn and become smarter with every update, our teams use a machine learning lifecycle to quickly iterate enhancements and create a better experience for Xome’s users.
What is the machine learning lifecycle?
The machine learning lifecycle is an iterative process for developing and deploying machine learning models, including automated valuation models (AVMs). It consists of various stages that guide teams through a machine learning project to ensure that it is well-planned, efficient, and effective in addressing the problem it is attempting to solve.
Types of machine learning models
There are 4 main types of machine learning models that the lifecycle helps to develop and deploy:
- Supervised learning models
- Unsupervised learning models
- Semi-supervised learning models
- Reinforcement learning models
The learning model used specifically for AVMs is a supervised learning model, in which the model is trained on labeled data sets. Each data point has both input features and corresponding output labels. The model learns to map input features to the correct output labels during training and is then used to make predictions on new data.
Common types of supervised learning models include regression, recommendation, and classification models, which are all variants of the entire machine learning lifecycle.
The machine learning lifecycle, explained
The machine learning lifecycle has six core steps. It starts with defining the problem and planning out the project. Then it moves to collecting, cleaning, processing, and managing the data used to train and test the model during the next step of engineering.
After engineering the model, the team evaluates the model for deployment to development and production. The last step after deployment is to set up monitoring and maintenance to continually improve the model.
Let’s dive deeper into each step of the cycle, including the impact on AVM models like the one that powers the Xome Value Estimate.
1. Problem definition and planning
The planning phase is what lays the groundwork for a successful project. During this phase, key decisions and preparations are made to define the project’s objectives, scope, requirements, and feasibility.
This phase typically starts with defining the business problem and assessing whether it needs a machine learning solution in the first place. Business leaders and teams will need to determine the availability of data and resources required for developing the model, if there are any legal constraints, and if it’s robust and scalable.
Depending on how this phase pans out, the project then moves into the next phase of collecting and preparing data.
2. Data collection and preparation
Data, training, and testing are three major components of machine learning. But to train and test data, we need to first collect and prepare that data for processing.
For real estate AVMs, features can include fields such as square footage, lot size, and number of bedrooms and bathrooms. What an AVM is trying to do is understand the relationship between each of these features.
For example, AVM vendors use the Home Price Index and other data sources to source property data and price trends that help the model to understand the relationship between features of a property and its price. The estimated valuation is the end result.
Next, the data needs to be cleaned to handle missing values, duplicates, and outliers that can negatively impact the performance of the machine learning model. Once the data is cleaned and the quality is verified, data feature distribution and statistical analysis is the next step to process the data.
Finally, the data must be managed using data storage solutions, such as databases or data warehouses, so pipelines can be created to extract, transform, and load data into those storage solutions. Then data maintenance and governance rules will be set to ensure that the data remains high quality, secure, and readily available.
3. Designing the model
The design and engineering phase uses all the information gathered in the planning and data collection phases to build the model. In this phase, data scientists and machine learning engineers design, build, and optimize the machine learning models to solve the specific problem defined in the earlier stages of the lifecycle.
The primary goal of the model engineering phase is to create a high-performing and accurate machine learning model. For the Xome Valuation Model, that goal is to accurately predict estimated valuations of properties.
First, understand the target variable to predict and select your model, whether it’s a basic regression model, recommendation algorithm, or classification model. Next, configuration settings, also called hyperparameters, are selected to find the optimal settings that result in the best model performance.
Once the benchmark ML model is set, the model must be trained and validated using the preprocessed data from the data preparation phase. The training process for AVMs involves feeding the model with input data, optimizing the weights, and adjusting its internal parameters to minimize the prediction errors or loss function.
4. Evaluating the model
During model evaluation, the data is divided into three categories: train, validation, and test sets. The training set is used to optimize weights, while the validation set helps determine if the model is overfitting or moving in the right direction.
The unseen test set is then used to predict and assess the model’s performance. Real estate AVMs are typically compared with industry standards to ensure accuracy.
If the model is too complex or trains for too long on a specific sample, it can start to learn irrelevant information and be “overfitted.” When a model is overfitted, it’s unable to effectively generalize new data and understand real patterns.
One way to prevent this is to set aside part of the training dataset as the test set to check for overfitting. If the test data has a high error rate and the training data has a low error rate, that could be a sign of overfitting.
If the model’s performance is not satisfactory, further adjustments are made. The training and evaluation process may be repeated until an acceptable result is achieved.
5. Model deployment
After the model is tested, validated, and performs well on evaluation metrics, it’s then deployed to development and production. The model should be integrated with the application or system where it will be used for predictions and classifications on new, unseen data.
This is an iterative phase where the model performance is evaluated and tested in production to further tune the model based on user acceptance.
6. Monitoring and maintenance
From there, the final stage is monitoring and maintaining the model. There should be automated monitoring procedures set up to measure the model’s performance constantly and ensure that it continues to work effectively and reliably.
For the Xome Valuation Model, this includes monitoring performance metrics, evaluating data quality for any errors and biases, hardware, software, and the response from end users. It may take time to discover deep problems in the model, so it should also be tested for various cases where problems could occur.
When everything is working as expected, the process continues past that to constantly improve the model. When improvements need to be made, the machine learning lifecycle restarts.
How Xome is solving machine learning challenges
The Xome Valuation Model strives to solve traditional machine learning challenges as one of the most advanced real estate valuation models in the industry. Xome’s model consists of several main models, built on smart AI, that use unbiased historical and updated property data.
That data is then used to adjust the estimated prices of a property and learn meaningful insights from the relationship between property features and different property types. The model continues to improve with every iteration, with dozens of sub-models that branch out from the main models with their own unique characteristics.
The Xome Valuation Model team is constantly evaluating and re-evaluating the data quality of input data to make sure it meets the business objective, while continually monitoring model performance. And that’s thanks to our dedication to the machine learning lifecycle as a mechanism for quick iteration, allowing us to update and enhance our model with speed and at scale.