Wednesday, March 3, 2010

How to geocode and find distance between two addresses using ASP.NET

Many websites these days allow the users to find location based points of interests. Examples would be a website that provides a list of schools in your zipcode or restaurants x miles from your address (or current location). Further, the user is allowed to sort or group the points of interests by proximity and so on. In this article, I'll provide step by step process of implementing geocoding and finding distances between two points on a map.

Now, as you see, there is no real way to calculate actual distances solely by zipcodes and addresses. We need to be able to figure out the coordinates (Latitudes and Longitudes) for those zipcodes and addresses. This process of finding Latitude/Longitude for a location is called geocoding. Once we have the Latitudes and Longitudes, we can use the Haversine formula to figure out the actual distance.

Step1: How to Geocode a Zipcode (convert a zipcode into latitude & longitude)

The easiest way to do it by simply getting this data from US census website - http://www.census.gov/geo/www/gazetteer/places2k.html . The file of interest is "ZCTAs (ZIP Code Tabulation Areas)". Below is a sample code that can parse this file:

TextReader tr = File.OpenText(@"C:\zcta5.txt");
while ((line = tr.ReadLine()) != null)
{
    string zip = line.Substring(2, 5);
    string latitude = line.Substring(136, 10);
    string longitude = line.Substring(146, 11);
    //Code to save this to database
}

Note: It would be faster to bulkload this file instead of parsing it line by line, the above code is just for reference.

Step2: How to geocode an address? This is important because we'd like to geocode the actual addresses provided by the user. For example, the user might provide his/her address looking for restaurants within 5 miles of the address. This is also necessary for any non-US zipcodes (or postal codes) or a list of addresses that you might be working with (for example, you might have a list of school's addresses and your website might let the users find schools within 50 miles of a zipcode).

To geocode such addresses we will use Google Maps API's geocoding service. This is basically a RESTful webservice that takes the address and other parameters and returns the Latitudes and Longitudes for the same. A sample uri would look like - http://maps.google.com/maps/geo?q=[your-address]&output=csv&key=[your-key]. You'll have to sign up at http://code.google.com/apis/maps/signup.html to obtain the "key" parameter. Once you have the key, you can run all your addresses through this and obtain the coordinates. Sample C# code is below:

foreach(var address in AddressList)
{
    Uri uri = new Uri("http://maps.google.com/maps/geo?q=" +       System.Web.HttpUtility.UrlEncode(address) + "&output=csv&key=[your key]");
    WebClient webClient = new WebClient();
    string[] geoCodedAddress = webClient.DownloadString(uri).Split(',');

    string latitude = geoCodedAddress[2];
    string longitude = geoCodedAddress[3];

    //Save the extracted coordinates
}

Note: If you are running a console application then System.Web won't seem to expose the HttpUtility class. The solution is to add a reference to System.Web.

Step 3: How to calculate distance between two coordinates (two sets of latitudes and longitudes)? Since earth is spherical (well not exactly), we use what is known as Haversine formula to calculate the distance. The reason being, the distance here is a curve along the surface of earth as opposed to being a straight line between two points. I won't go into the actual formula here though, below is the sample C# code that implements it:

public double GetDistance(Coordinate location1, Coordinate location2)
{
    double diffLat = Math.PI / 180 * Convert.ToDouble((location1.Latitude - location2.Latitude));
    double diffLon = Math.PI / 180 * Convert.ToDouble((location1.Longitude - location2.Longitude));
    return 3960 * 2 * Math.Asin(Math.Min(1, Math.Sqrt(
      Math.Sin(diffLat / 2) * Math.Sin(diffLat / 2) +
      Math.Cos(Math.PI / 180 * Convert.ToDouble(location1.Latitude)) *
      Math.Cos(Math.PI / 180 * Convert.ToDouble(location2.Latitude)) *
      Math.Sin(diffLon / 2) * Math.Sin(diffLon / 2))));
}

Note: Coordinate is just a struct with Latitude and Longitude as exposed properties. The conversions to double can be removed as well as needed.

In conclusion, we covered the extracting geocoded zipcodes from census database, geocoding any random addresses using google map's API and finally Haversine formula for calculating the actual distance between two geocoded locations (distance between Latitudes and Longitudes). This should allow us to find distance between any set of locations given the full or partial addresses or zipcodes.

No comments:

Post a Comment