Zillow is one of the largest real estate and rental marketplaces in the world, with a database of 100 million homes in the US. The company pioneered data-driven, automated home value estimates with Zestimate.
On this episode, we speak with Zillow's Chief Analytics Officer and Chief Economist, Dr. Stan Humphries, to learn how Zillow uses data science, statistics, artificial intelligence, and big data to make real estate predictions as part of the digital transformation of real estate.
Dr. Stan Humphries is the chief analytics officer of Zillow Group, a portfolio of the largest and most vibrant real estate and home-related brands on Web and mobile. Stan is the co-author of the New York Times Best Seller “Zillow Talk: The New Rules of Real Estate.” Michael Krigsman is an industry analyst and host of CXOTALK
For more information, see https://www.cxotalk.com/episode/data-science-zillow-stan-humphries-chief-analytics-officer
Check out more CXOTALK episodes: https://cxotalk.com/episodes
Follow us on Twitter: https://twitter.com/cxotalk
From the transcript:
(01:31) I’ve been with Zillow since the very beginning back in 2005, when what became Zillow was just a glimmer in our eye. Back then, I worked a lot on just algorithms, and some part development pieces; kind of a lot of the data pieces within the organization. We launched Zillow in February of 2006, and back then, I think people familiar with Zillow now may not remember that between our first couple of years between 2006 and 2008, all you could find on Zillow was really all the public record information about homes and displayed on a map. And then, a Zestimate, which is an estimated the home value of every single home, and then a bunch of housing indices to help people understand what was happening to prices in their local markets. But, we really grew the portfolio of offerings to help consumers from there and added in ultimately For Sale listings, mortgage listings, a mortgage marketplace, a home improvement marketplace, and then, along the way, also brought in other brands. So now, Zillow Group includes not only Zillow brand itself, Zillow.com but also Trulia, as well as StreetEasy in New York, Naked Apartments, which is a rental website in New York, HotPads, and a few other brands as well. So it’s really kind of grown over the years and last month, all those brands combined got about 171 million unique users to them online. So, it’s been a lot of fun kind of seeing it evolve over the years.
(06:13) How has the Zestimate changed?
(06:19) if you look at when we first rolled out in 2006, the Zestimate was a valuation that we placed on every single home that we had in our database at that time, which was 43 million homes. And, in order to create that valuation in 43 million homes, it ran about once a month and we pushed a couple terabytes of data through about 34 thousand statistical models, which we thought was, and was, compared to what had been done previously, was an enormously more computationally sophisticated process. But if you flash forward to today; well actually I should just give you a context of what our accuracy was back then. Back in 2006 when we launched, we were at about 14% median absolute percent error on 43 million homes. So what we've done since, is we've gone from 43 million homes to 110 million homes today where we put valuations on all 110 million homes. And, we've driven our accuracy down to about 5% today which, we think, from a machine learning perspective, is actually quite impressive because those 43 million homes that we started with in 2006 tended to be in the largest metropolitan areas where there was a lot of transactional velocity. There were a lot of sales and price signals with which to train the models.
(07:52) What's in the rest of, as we went from 43 million to 110, you're now getting out into places like Idaho and Arkansas where there are just fewer sales to look at. And, it would have been impressive if we had kept our error rate at 14% while getting out to places that are harder to estimate. But, not only did we more than double our coverage from 43 to 110 million homes but we also almost tripled our accuracy rate from 14% down to 5%.
(08:22) Now, the hidden story of how we’re able to achieve that was basically by throwing enormously more data, collecting more data, and getting a lot more sophisticated algorithmically in what we are doing, which requires us to use more computers. Just to give a context, I said that back when we launched, we built 34 thousand statistical models every single month. Today, we update the Zestimate every single night and in order to do that, we generate somewhere between 7 and 11 million statistical models every single night, and then when we’re done with that process, we throw them away, and we repeat again the next night. So, it’s a big data problem.