Finding the perfect house using open data

—Saturday, May 24 2014

1283d6a2014211e3a98422000aa80fc9_7

My childhood backyard.

Growing up in rural Mississippi had its perks. Most days my brother and I would come home from school, grab a snack from the fridge, and head outside to fish in the pond tucked behind the tree line or play basketball with the neighborhood kids. It was very much a community where everyone knew everyone. I had known many of the kids from my graduating class since second grade and my parents knew their parents since highschool and earlier.

As beautiful as my hometown was, it was, like many small towns, economically depressed and void of all but the necessities. As I grew older, I became frustrated by this. We had one small grocery store, one stop light, two movie rental places and not a single fast food restaurant. We had no book stores, no electronic stores, no big-box stores and only a couple of places to grab a bite to eat.

Truth be told, the gas station (one of those big ones, forever known as “The BP” long after a name change) was where we picked up most of our take out. When highway 72 was expanded to four lanes, a nearby gas station was converted to a “super” station. It was packed with a pizza place, fresh sub sandwiches, and the best chicken strips that have ever graced the space below a heat lamp. It was the community watering hole.

The lack of access eventually wore on me. As I started to grow my design skills (my Hail Mary escape from factory work), I would hear of my peers in larger cities going out to eat, to the movies, or just having a beer at a nearby bar. Beer, by the way, was illegal where I grew up. Not just the consumption of it, the mere possession of it.

6307658831_df66e3f10d_b

Portland - Oregon by Patrick M. License CC BY-NC-SA 2.0

By 2007, after a couple of years on the outskirts of Memphis, TN, I had finally had enough and convinced my wife to move to Portland, OR. We knew very little about Portland at the time. In fact, we knew so little about the Pacific Northwest, we moved in the dead of winter, in a sports car, driving through the Cascade mountain range. It’s not exactly something I’m proud of; a hilariously ill-conceived cross-country trip.

When we moved to Portland we decided we wanted to be in the center of it all. There was so much life around us; so much happening relative to our rural upbringing. I wanted to be as close to Downtown as I could be so I could go as often as I liked.

We eventually settled on the Pearl District. A relatively new residential development that was previously occupied by a rail yard. Finally, I would have the access that I once craved. Almost anything I wanted was a short walk or streetcar ride away. All that I wished for growing up, I would have.

8bb23018889a11e18cf91231380fd29b_7

Fast forward 7 years and we’re still in the Pearl District. We’ve added a member to our family who now desires access to the things I had while growing up. He wants a yard to play in, a basketball goal and a proper house where he can get excited without disturbing our neighbors.

It’s an interesting scenario. The Pearl District and Downtown aren’t exactly teaming with affordable single family homes and the further you move out of the city center, generally the less access you have to the many things a dense urban area has to offer. But, in reality, I only wished for a couple of things out of a new location.

The journey begins

After thinking about the problem, I decided to list out a set of criteria for a location I would want to live. Other factors will eventually come into play, but I wanted to narrow down the city into “target zones”—that is, zones that meet a set of defined criteria.

  • Walking distance to a grocery store: Living across the street from a grocery store has spoiled me.
  • Walking distance to a rail stop: This will allow me to get to other locations in the city without a car relatively quickly. One could argue the bus system is just as good, but I would argue that it isn’t and I much prefer rail.

I defined walking distance as ~5 blocks, but ~10 blocks is still a pretty sane distance. I want to be close to a grocery store and close to a MAX or Streetcar stop. Unfortunately, none of the real estate applications I tried had a feature like this so I decided to create what I needed using open data that I had already been working with for some time now.

Gathering the data

This section makes heavy use of GDAL and PostGIS. Both of which can be installed using homebrew on OSX.

We’ll need 3 open datasets.

After downloading the data, we’ll want to reproject the building dataset to EPSG:4326 so that all of the data shared the same projection. EPSG:4326 (WGS84) is a common projection and the projection of the other two datasets so I went with that.

ogr2ogr -t_srs EPSG:4326 -f "ESRI Shapefile" building-footprints.shp  Building_Footprints_pdx.shp

Note: EPSG:2913, the original projection of the building dataset, is probably the more accurate choice if we were concerned with a high level of precision.

Now we need to create a PostGIS-enabeld Postgres database

createdb portland
psql -c "create extension postgis" -d portland

Finally, we need to import the datasets to our freshly created Postgres database. See the shp2pqsql docs for an explanation of the flags used. Essentially we want to import as latin1 (there are some encoding errors in the building dataset), force 2d geometry and create an index on the geometry column.

shp2pgsql -W "latin1" -t 2D -I -D -d -s 4326 building-footprints.shp building_footprints | psql -d portland
shp2pgsql -W "latin1" -t 2D -I -D -d -s 4326 osm-polygons.shp osm_polygons | psql -d portland
shp2pgsql -W "latin1" -t 2D -I -D -d -s 4326 trimet-rail-stops.shp trimet_rail_stops | psql -d portland

After we have the data imported into Posgtres, we can begin to find target geometries (buildings) that meet the criteria we set. The first thing we want to do is find the zones around all supermarkets using st_expand. Our units are decimal degrees and 0.0045 is about the desired distance of ~5 blocks. We’re not too worried about being a little off here.

select st_expand(geom, 0.0045) as zone, 5 as score from osm_polygons where osm_polygons.shop='supermarket'

zones

At this point, we have large rectangle geometries. As you can tell from the image above, some buildings lie in overlapping zones. We want to score those buildings for every zone they intersect with. In similar fashion, we want to find the zones around rail stops and public parks.

The next thing we want to target individual buildings instead of merely drawing a large box around a zone. We can find all buildings that intersect a zone using st_intersects.

select * from supermarket_zones inner join buildings on st_intersects(supermarket_zones.zone, buildings.geom) where buildings.subarea='City of Portland'

screen shot 2014-05-24 at 11 42 14 am

Finally, we want to group identical geometries, sum the score and stuff the target buildings into a new table. We don’t necessarily need to add the target buildings to a new table, but continuously running these queries across the entire building set can be slow.

select sum(score) as rank, gid, geom into target_homes from target_buildings group by 2, 3;

Here’s what everything looks like combined which can be run using psql -f score-buildings.sql -d portland.

Styling the results

Mapbox has a great crash course on TileMill. I highly recommend you check it out if you’re unfamiliar with it.

The last major thing we need to do is to visualize the results on a map. For this, I’ll use the open source editor Tilemill by Mapbox. This step is pretty straight forward since we’ve already done the hard work of extracting and scoring the buildings we’re interested in. We only need to supply Tilemill with the name of the table we stored our target homes in.

screen shot 2014-05-24 at 11 54 05 am

Finally, let’s fill each building with a color based on its score using CartoCSS.

#homes {
  [rank >= 5] {
    polygon-fill: #ea97ca;
  }

  [rank >= 10] {
    polygon-fill: #3581ac;  
  }

  [rank >= 15] {
    polygon-fill: #4dac26;
  }

   [rank >= 20] {
    polygon-fill: #a9bb29;
  }
}

We can add more layers to help give some context to our results. Here’s what I ended up with after adding layers for rail lines, rail stops and supermarkets.

screen shot 2014-05-24 at 12 04 49 pm

And that’s it. Now I have a good idea of locations I can check out in my quest to find a house nearby a supermarket and rail line. Nothing like quickly grabbing a beer for a backyard barbecue and a little game of basketball. And hey, when the Trail Blazers play? I’ll just walk a few blocks and hop on the MAX.