In an attempt to be a bit more inclusive with my blog I’ve resolved to push out content that offers some commentary on issues that might be of interest to more than just a community of R users (don’t worry, I still sit in front of the R machine for like 10 hrs a day so the quasi-technical and programming stuff won’t go away). Some things I’m going to try to comment more on:

  • Interesting posts from the blog-o-sphere involving natural resources or public policy
  • Cool journal articles I’ve read
  • Interesting popular media takes on data or data analysis in general

My first order of business is a book review of sorts. Since NPR won’t stop hounding me about all the cool shit (data analytics, real estate economics) in the book Zillow Talk: The new rules of real estate, I decided to read it. I was underwhelmed.

Let me get something out of the way right off the bat: If this were a cool coffee table book about trends in real estate pricing data I would not have written this post.

However, I do not believe this was just supposed to be a cool coffee table book about real estate trends. The forward and introduction to the book make it quite clear that the authors believe:

  1. They have better data on real estate transactions than everybody else
  2. They have the technical chops to analyze that data in ways it has never been leveraged before.
  3. The trends they are presenting are not simply interesting correlations. They have done the statistically rigorous work of uncovering the drivers of interesting real estate phenomenon.

Ok, so that’s my jumping off point: if the book were just about cool trends I would enjoy it, but it’s not. It’s about statistically significant effects that Zillow has uncovered in their data mining….and I’m going to review it as such.

Next, the obligatory disclaimer: Zillow does objectively have a ton of smart people working for them. They have people with Ph.D’s in Economics from schools that I could never get into. It is entirely plausible that everything in the book is rock solid and way better than anything I could ever do. If that is indeed the case, I will point out that none of that rock-solid analysis made it into the book. Despite the fact that they make some pretty bold claims about what they have ‘uncovered using the data’, the authors don’t really show very much of their work.

Two things are going to come up frequently in the following discussion. If you are familiar with concepts of ‘controlling’ for factors in empirical analysis and establishing ‘marginal effects’ you should probably skip this. If not, here is a brief crash course on some important statistical jargon:

A quick aside on what I mean by “control for”: let’s start with the basic hedonic pricing model. This model says that the price per square foot of a home is a function of it’s attributes. A really simple version of this model would say that the price/sq.ft of home i, in location j, P_{ij} is determined by: the number of bedrooms and the number of bathrooms.

P_{ij}= \alpha + \beta_{1}Bedrooms + \beta_{2}Bathrooms

In a super simplified example where

  • \beta_{1}= value of an additional bedroom = $60
  • \beta_{2}= value of an additional bathroom = $50
  • \alpha= intercept term j $100

Ok, so what does it mean to control for something? If we observe that homes with 3 bedrooms tend to sell for $110/sq.ft more than homes with 2 bedrooms we might be tempted to say that adding a bedroom adds $110/sq.ft to the value of a home. But homes with more bedrooms tend to have more bathrooms. So can we really add $110/sq.ft to the value of a home just by adding a bedroom? Or are confusing the value of an extra bedroom with the value of an additional bathroom?

The way to answer this question is to ‘control’ for the number of bathrooms when producing our estimate of the value of an additional bedroom. We ‘control’ for the presence of a variable in a model by adding that variable into the model.

The statistical principal at work here is partial correlation. That is, we observe that home prices and number of bedrooms in the home are positively correlated and that home with 3 bedrooms tend to sells for around $100/sq.ft more than home with 2 bedrooms. But it’s incomplete to say that an additional bedroom causes a $100/sq.ft rise in home price because number of bedrooms is positively correlated with a third variable: number of bathrooms.

A related concept is the ‘marginal effect’ of a variable. The marginal effect of an additional bedroom on the value of a home in our example is $60. ‘Marginal Effect’ basically means if held everything else in the model constant what is the effect of a 1 unit change in the variable of interest. For this example think of it like this:

A house with 3 bedrooms and 2 bathrooms would be worth $380/sq.ft according to our model:

P_{ij} = 100 + (60*3) + (50*2)

A house with 4 bedrooms and 2 bathrooms (adding a bedroom but keeping everything else the same) is worth:

P_{ij} = 100 + (60*4) + (50*2) = 440

And a house with 4 bedrooms and 3 bathrooms (adding one more of each):

P_{ij} = 100 + (60*4) + (50*3) = 490

The marginal effect of an additional bedroom is $60 because that is the increase in home price we observe if we add 1 bedroom and keep everything else in the model the same.

Ok, on with our discussion:

The Starbucks Effect
My first beef with the book is the chapter that I think is pretty well-known by now in a lot of circles: The Starbucks Effect. Basically, the authors claim to have found evidence that, not only do home prices tend to appreciate faster when the home is in close proximity to a Starbucks, but that they have determined that having a Starbucks close-by is a DRIVER of home price appreciation.

The critical thing to keep in mind here is that (I believe) there are some very good reasons to believe that home prices in the vicinity of a Starbucks might appreciate faster than non-Starbucks proximate homes…One of those reasons is that Starbucks might be very good at identifying up-and-coming neighborhoods to locate their stores in. By that I mean Starbucks doesn’t cause home price appreciation but, because their business model caters to the young upwardly mobile demographic, it is important for their business to be able to identify areas that are “ripe for gentrification” (or are just generally in high demand).

It is important to be clear about this: although the book is obviously meant to be kind of a coffee table conversation starter (despite the lip-services give to data analysis there is very little in the way of technical detail on how any of their analysis was carried out), the authors are very careful to: i) acknowledge that what I pointed out above is a legitimate reaction to the claim that Starbucks drives nearby property values higher, and ii) that they considered this possibility and tested for it.

How did they test to make sure that the presences of a Starbucks is a driver and not just a correlated feature of ‘hot neighborhoods’?

According to the book, here is what they did: they tested for the presence of an overall ‘Coffee Shop Effect’ by comparing homes in close proximity to a Starbucks and homes in close proximity to a Dunkin’ Donuts.

In fact, between 1997 and 2012, homes now located near Starbucks and Dunkin’ Donuts followed similar historical trajectories, substantially outpacing the overall home value appreciation. But their paths diverged during the recent housing recovery such that, today, homes near Dunkin’ Donuts have appreciated 80% since 1997 whereas homes near Starbucks have appreciated 96%, almost doubling their value.

I have a few issues with this comparison:

First, I think Dunkin’ Donuts v. Starbucks is a bit of a strawman. From what I know of Dunkin’ Donuts, their target demographic is not young professionals with amazing credit and a lots of disposable income. So if the story really is that Starbucks is just good at identifying neighborhoods that are about to take off, comparing them to Dunkin’ Donuts is useless. Because even if Dunkin’ Donuts was equally good at identifying neighborhoods about to ‘take off’, they would probably choose not to locate there precisely because it’s not their market.

If I am right in assuming that Dunkin’ Donuts and Starbucks are going after different segments of the coffee/breakfast/morning ritual market, there are some unaccounted for trends baked into this bad comparison: we know that during the housing market crash properties in the middle income family demographic were hit a lot harder than luxury town-homes marketed at the newly married-2yrs out of law school-couple. Why?

  1. these properties were more heavily leveraged (purchased with smaller down payments) and were sold to people that couldn’t afford an underwater property so were more likely to be foreclosed on in the crisis.
  2. these properties were also more likely to appeal to middle income families – exactly the kind of people who couldn’t get mortgage loans after the great shake-out because banks were/are requiring big down payments and pristine credit

So it’s entire possible (I think) that the ‘Starbucks properties’ (sheik town home with a bunch of marble shit everywhere) didn’t get punished in the crash (because the kind of people who buy these properties don’t need FHA loans and shit, they have a lot of cash) the same way the ‘Dunkin’ Donuts properties’ did.Ok, I admit most of my above criticisms are supposition…the, “You say the effect is because of this, I say it could be because of something different.” This a valid concern because the authors unequivocally claim to have found a specific effect and I feel they didn’t do enough to show that the result they found wasn’t driven by something different from what they claim. I believe my above concerns are valid, HOWEVER, if I’m going to criticize I owe the authors something more than, “I came up with an alternate story that you did not disprove, ergo I’m right”…so here it is:

The bigger issue for me is that Starbucks v. Dunkin’ Donuts ‘control’ is kind of on the right intellectual track but it’s half-assed. The problem with assigning causality to the observation that properties proximate to Starbucks appreciated faster than other properties is that we really need to control for the universe of other things that make housing prices go up and down. The binary comparison of Starbucks adjacent properties and Dunkin’ Donuts-adjacent properties is a decent start but it’s too ad-hoc.

The idea of assigning causality is conceptually rooted in the idea that if we can control for all of the other things (other than being located near a Starbucks) that make any two properties different in value, the difference in price remaining after we control for all that other shit must be attributable to last remaining source of heterogeneity…distance from the nearest Starbucks.

I don’t like to criticize without something constructive so let me offer this by ways of answering the question Zillow would ask me if they saw this (“well, how would you do it?”):

In the policy analysis literature, there is a pretty well-vetted way of partialing out multiple sources of variation in order to get to the true effect of a particular policy. The approach of quasi-control (or more recently synthetic-control) is pretty well established as a credible way to gauge the marginal impact of a policy. In this case, we can think of the policy as being ‘located near a Starbucks.’

The key then would be to create some ‘partner’ neighborhoods to neighborhoods in close-proximity to Starbucks. These ‘partner’ neighborhoods would need to be statistically similar to the Starbucks neighborhoods in every aspect except the presence of a Starbucks.

One way to create these ‘partner’ neighborhood is a method called synthetic control. I highly recommend this paper by Abadie, Diamond, and Hainmueller on the synthetic control methodology in natural experiments

Once the ‘partner’ neighborhoods are established, one could use the familiar Difference-in-Differences estimation technique to establish whether the Starbucks neighborhoods grew faster or slower than the ‘partner’ neighborhoods and whether this Starbucks Effect is statistically significant.

An even better way to establish a ‘Starbucks Effect’ would be a natural experiment. This might be impossible but I doubt it. I bet you could find neighborhoods located in areas that have restrictions on the presence of chain stores (I don’t think these restrictions are rampant but they do in exist in many places). If you had an area that had actual laws/regulations prohibiting Starbucks from entering you might find some neighborhoods that Starbucks would like to locate in but can’t. If you could find a set of these neighborhoods and compare them the Starbucks neighborhoods, then you would have a pretty strong case for having found evidence for or against ‘The Starbucks Effect.’

Chapter 18: The Gayborhood Phenomenon
The claim embedded here is that property values in recent years have appreciated more in gay friendly neighborhoods than elsewhere and that fact is an encouraging statement about social change.

The tactic employed in this chapter will be familiar to anyone who has ever reviewed or evaluated a seemingly interesting and ultimately disappointing statistical analysis.

Step 1: make an interesting claim with broad societal implications.

In this case that claim is that real estate data tells us an especially interesting and important story about the expansion of rights for the LGBT community over the past several decades.

Step 2: reduce the big claim down to some kernel that can be shown empirically.

The undercurrent to the first 3 or 4 pages of this chapter is that if home prices in ‘gay neighborhoods’ are rising faster than elsewhere that would be a sign of society’s increasingly progressive attitudes toward the LGBT community.

Step 3: Provide a ‘good story’. It’s also usually important that the ‘good story’ have the, “I never thought about it that way but it makes total sense” factor.

In this case the story is that during years of peak homophobia (1970s) these enclaves that are now hot (the Castro in SF, Andersonville in Chicago) were cheaper and more tolerant than surrounding areas…providing affordable and safe communities for gay men. Next, as society slowly began to recognize ‘gay rights and culture’, these areas began appreciating. Aided by the fact that, because these neighborhoods had high densities of people with a shared lifestyle, they were willing to reinvest in the community.

Step 4: Provide evidence for Step 3.

This is where the disappointment comes in: the data analysis offered in support of the claim that ‘gay neighborhoods are now coveted (p.267)’ comes in three forms:

  • an observation that in the post-year 2000 world, real estate prices in San Francisco’s Castro neighborhood were 40% higher than the average for SF Metro Area.
  • A plot showing that home prices in Capitol Hill (Seattle), The Castro (San Francisco), Andersonville (Chicago), and Greenwich Villages (New York) all appreciated faster than the city-wide average.
  • A bold, unsubstantiated claim on p. 268 that, “trust us, this is also happening in Albuquerque, El Paso, Louisville, and Virgina Beach (no stats provided).

Here again I don’t see much evidence that the authors really tried to get at the marginal effect of ‘gay-friendiness.’ Sure, property values in the Castro, Andersonville (Chicago), and Capitol Hill (Seattle) appreciated faster than surrounding neighborhoods but how do we know that this isn’t a ‘gentrification effect’? or a ‘cupcake store effect’, or even a ‘mustache effect.’ I bet if you go to the Castro right now you’ll see more mustaches per capita than in surrounding neighborhoods….does that mean mustaches cause property values to rise?

I would also love to know, in their analysis, how a neighborhood got classified as ‘gay’. I accept the premise (although this is hotly debated in a lot of circles) that Census data post 1990 contains information on sexual orientation. Sure, for places like the Castro in SF or Oak Lawn in Dallas the classification is pretty obvious…but what exactly was the classification scheme here? Did a neighborhood just need to have more gay residents than surrounding neighborhoods? I imagine there might be a lot of neighborhoods with almost no gay population to speak of that could get classified as gay under this scheme by virtue of being surrounded by really, really gay intolerant neighborhoods. Maybe a neighborhood had to have a certain amount of gay residents (30%, 40%, more than the national average, etc.)? In this case, did the authors test the sensitivity of their conclusions to this arbitrary cut-off value? If they did, it didn’t make it into the book.

In the interest of being even-handed, the authors do cite a paper from the Journal of Urban Economics by Christafore and Leguizamon that found a 1.1% price growth premium to having an additional gay couple in the neighborhood, if the neighborhood was socially liberal (as measured by a small proportion of people that supported the Defense of Marriage Act (COMA)). The study also found that, in socially conservative neighborhoods, an additional gay couple caused a 1% decline in property values.

I don’t really have anything bad to say about this paper, I enjoyed it immensely. I’m not totally convinced that it shows what the authors say it does, mostly because they didn’t really control (in a methodological sense) for the fact that gay couples tend to self select into high-amenity areas (the common story goes that gay couples face a higher cost to having children and therefore generally have fewer and therefore have more money to spend on housing upgrades like premium fixtures and designer kitchens and whatnot). They did what I consider an ad-hoc control by only including areas in the study that contained (what they felt were) relatively homogeneous properties. This isn’t a bad approach – it’s one I’ve taken many times myself – but it does tend to burden the reader in the sense that I now have to decide if I trust the authors’ ad-hoc filter.

Just to bring it back to Zillow’s data and analysis before I close out: kudos to them for citing a good paper from the scientific literature that supports their (Zillow’s) claim that there is a price premium to living in the ‘gayborhood.’ However, the claim made throughout this book is that they (Zillow) have the data and data technicians to uncover these interesting hidden relationships. This claim is not supported in any way by deferring to the peer-reviewed literature rather than their own analysis to justify the claim that property values are rising faster in gay neighborhoods because society is coming to ‘covet’ gay enclaves.