Using Azure Functions to Geocode an input file

I was recently asked by a Customer if Azure offered any Geocoding services, not having a clue I subsequently beetled off to investigate potential options – the two that came immediately to mind were Azure Functions and Azure Data Factory.   This post will look at the initial Functions PoC I put together – it should be noted that if you were to look at doing this in Production you’d need to think about the implications of Function timeouts and supporting restarts.

Building on my recent post for using Azure Logic Apps to FTP files into Azure Blob Storage I’ll now walk you through a simple example Azure Function that monitors a given Azure Blob container and processes any new file that arrives.  The code and sample file supporting this post are available at the following repo: https://github.com/ianalderman/AZF_Samples, you will also need a Bing Maps API key for this – you can sign up for one of these at: https://www.microsoft.com/maps/create-a-bing-maps-key.aspx.

As a heads up as part of this walk through we will be adding nuget packages into our functions, its relatively painless don’t worry 🙂

Create Function App

Let’s start by creating a function app from the portal click on New and type Function App published by Microsoft and click Create.

On the new function app blade enter:

  • App name: <a unique name for the function app>
  • Resource Group: select Create new and enter a name for the resource group
  • Location: select your location – in my example I chose North Europe

Click Create

Create our function

Once the function app has been provisioned we can create our function.  Navigate to the function app and click New Function.  This will present you with a new screen where you can select the type of function you would like to create.  In our instance we are going to go for BlobTrigger-CSharp, this template will create a default trigger where we can specify a storage account and location to monitor for new files which will cause the function to run. So click on BlobTrigger-CSharp

This will present you with some initial information you will need to fill in:

  • Name your function: GeoCodeFile
  • Path: inbound/{name}
  • Storage account connection
    • Click new
    • Click Create New
      • Enter a descriptive unique name for the storage account
      • Click OK
  • Click Create

At this point we have now created a basic function which will monitor the new storage account you created for any new blobs created in a container named “inbound” (don’t worry we’ll create that later) – which we configured in the Path element above.  The {name} will contain the name of the blob created and relates to the variable name used within the function.  When you clicked Create it should have taken you to the Develop pane and you can see in the default procedure that is created a variable called name:

Before we go any further let’s drop a file in to our storage account and make sure it all works as expected.  I prefer to use Azure Storage Explorer, however for simplicity we’ll stick with the portal for this post – feel free to use storage explorer if you are familiar with it.  So in the portal navigate to the storage account created above (a quick way is to use the Search resources box at the top of the screen and type in the account name), on the storage account blade click Blobs.

On the Blob service blade click on +Container, for Name enter inbound and click Create, your newly created container will now appear in the Blob service blade.  To save some time later on repeat this process to create another container called processed.

With that done select our inbound container and the container blade will open.  On this blade click Upload, browse for a sample file to upload and simply click Upload.

Return now to our function app – again you can quickly search for it using the search box at the top of the screen.  Select our newly created function and click Monitor – you should see an entry in the Invocation log select the invocation and have a look at the log you can see that it has output the file name and size which should hopefully come as no surprise after you looked at the default function code that’s created!

Hmm Nugets…

Sadly these are not the McDonald’s type but rather the handy packages that you can import into your solutions to save you time and reinventing the wheel.  For our example we will making use of two handy packages – CsVHelper for processing our CSV address file and Geocoding.net which makes connecting up to the Bing Maps API a breeze.  If we were using Visual Studio we would happily use the Package Manager to load the nugets into our solution and we could happily reference them from our code – its a little different in Azure Functions.

Open the Develop pane for our function and click on View Files (highlighted in the yellow box below)

At the moment you can see there are two files – function.json and run.csx.  In the screenshot above you can see the contents of the function.json file this describes the bindings and state of your function – you can see our trigger defined here at the moment.  run.csx contains our actual code for the function.  Right now we are going to introduce a new file project.json a great description of these files and the various steps we will do as part of this walkthrough can be found at: https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-csharp .

In your preferred editor (I used Visual Studio Code for my example) create a new file and paste the following into it:

{
  "frameworks": {
    "net46":{
      "dependencies": {
        "CsvHelper": "2.16.3",
        "Geocoding.net": "3.6.0"
      }
    }
   }
}

Save the file as project.json.  As you can see the contents of this file defines the nuget packges and their versions that we need for our function to run.  Our next challenge is to upload the json file so our function can use it.  In our Function App pane click on Function app settings (yellow box below)

From there click on Go to Kudu (red box) – this will open a new tab, you can find out more about Kudu on their WIKI.  The screen is split into two sections – the top half is the bit we will be working with today, it allows you to manage the files that make up your solution.  The bottom half is a familiar command window for managing your app.  Click on site and then wwwroot you should then see a folder for our function – GeoCodeFile, click on GeoCodeFile – you should now see the same two files we saw earlier – function.json and run.csx

To upload our project.json simply drag and drop it onto the top half of the Kudu window.

As an interesting aside if you notice in the screenshot above, each time we navigated down a folder in the explorer window the command window changed its working directory.  You can also see in the screenshot above that our project.json file is now uploaded.

Configure Environment Variable

We have one final piece of set up to do for our code – you will need your Bing Maps API key at this stage.  Return to the Function app settings pane we used earlier and this time click on Configure app settings.  In the blade that opens scroll down and under App settings add a new key called BingMapsKey and for the value enter your API key once done click Save.

Your Bing Maps key will now be available for your functions and we’re now ready to start building up our function.

More Bindings

We need to define the output binding for the blob containing our Geocoded addresses.  In the function pane click Integrate and then New output.  From the list of options that appears select Azure Blob Storage and click Select

On the editor screen that appears we need to change the Path to be processed/{rand-guid}. Earlier on we created a container called processed and this is where we define it as the target for our geocoded file.  The {rand-guid}.csv means Azure functions will name the file with a random GUID (Globally Unique Identifier) for the file name with the extension “.csv” as can be seen below:

We also need to change Storage account connection to be the connection we created to reference our new storage account.  Once done click Save

Let’s Code

OK its time to build up our function – the raw source code is available at this link if you’d like to simply cut and paste it in one go.  Navigate to the Develop pane for our function and over the next few sections we will build up our code for the completed function.

Define references

To start with we will need to pull in the references our function requires:

#r "Microsoft.WindowsAzure.Storage"

using System;
using Geocoding.Microsoft;
using CsvHelper;
using Microsoft.WindowsAzure.Storage.Blob;

As noted on the developer reference mentioned earlier certain assemblies are included by default, a small subset of external assemblies can be referenced using the #r syntax.  We have used this above to enable us to use the Azure Storage libraries.  We have also pulled in our nuget based packges – Geocoding.Microsoft and CsvHelper.

Helper code

Now we need to define a struct to store our geocoded result so we add the code below after defining references:

public struct LatLong {
            public double Lat;
            public double Long;
}

Next up we build our Geocoding helper function:

private static LatLong GeoCodePostCode(string AddressToParse) {
    BingMapsGeocoder geocoder = new BingMapsGeocoder(Environment.GetEnvironmentVariable("BingMapsKey"));
    IEnumerable addresses = geocoder.Geocode(AddressToParse);
    LatLong result = new LatLong();
    result.Lat = addresses.First().Coordinates.Latitude;
    result.Long = addresses.First().Coordinates.Longitude;
    return result;
}

So this function takes in a simple string (named AddressToParse) representing the address we need to geocode – it could be a simple post code or a whole address – and returns our struct we defined previously (LatLong).  We use the BingMapsGeocoder class of Geocoding package passing in our API key stored in the Environment variable we defined in the App settings.  The Geocoder returns a list of potential matches, for simplicity in this example we store them in a collection and simply take the first entry which we store in the variable result that is returned at the end.

Run!

Now that we have the supporting bits in place let’s work on the “entry point” – the bit of code that is executed by Azure Functions directly.  You’ll remember the default code had a run function – we’re updating that a bit here:

public static void Run(CloudBlockBlob myBlob, out string outputBlob, TraceWriter log)
{
    log.Info($"Processing file:{myBlob}");
    string markedUp = string.Empty;


    using (var stream = myBlob.OpenRead())
    {
        using (CsvReader csv = new CsvReader(new StreamReader(stream))){
            while (csv.Read()) {
                log.Info($"Input Post Code:{csv.GetField(4)}");
                LatLong geocoderesult = GeoCodePostCode(csv.GetField(0) + "," + csv.GetField(1) + "," + csv.GetField(2) + ", " + csv.GetField(3) + "," + csv.GetField(4));
                markedUp += csv.GetField(0) + "," + csv.GetField(1) + "," + csv.GetField(2) + "," + csv.GetField(3) + "," + csv.GetField(4) +  "," + geocoderesult.Lat + "," + geocoderesult.Long +"\r\n";
                log.Info($"Lat:{geocoderesult.Lat}");
            }
        }
        
    }

    outputBlob = markedUp;
}

Let’s go through this shall we?  Right at the top we ensure that when our new blob is passed to the function (as myBlob) rather than just the name of the blob we had before (it was defined as a string), we get a C# object which exposes a number of useful properties and methods we can use.  We output a string which is written to the new blob (outputBlob).

Our first step is pretty simple – output to the log file the name of the file we are processing, we then set up our variable which will contain the contents of our new CSV file.  The next part is a bit of a faff, my efforts with CsvHelper required a stream to work with so we open the blob file as a stream for CsvHelper to process.  With the While statement we loop through the CSV file until the end and for each line we output to the log the Post Code column we’re checking (column 5 from our CSV file).

Next up we call our helper function – combining each of the address fields in the CSV file into a full address to geocode.  Once we have that its a simple matter of writing out the entry again and adding on the lat & long columns.
When we run out of rows we dump the whole string we have built up to the variable representing the Output blob and Azure functions takes care of creating and writing the blob for us.

Let’s geocode…

To test our function we need some addresses to look up for my sample data I tracked down some addresses from the UK Open data portal, an extract of which ended up in my sample CSV file available at: https://github.com/ianalderman/AZF_Samples/blob/master/InputFiles/GeoCodeFile/address.csv

To take our new function for a test drive upload the sample file to the inbound container, wait a minute or so and check out the processed container – bingo!

OK I know when you open it there are some commas that should have been escaped etc., I also know if you run over 3.5K records through it the function times out 🙂 Hopefully though this has shown you some of the potential power of Azure Functions.  Look out for a future post where I use a similar technique for Azure Data Factory…

2 Comments

  • Reza 02/12/2016 at 08:48

    Great article. One option to avoid timeouts for larger datasets is to use another function to chunk your original input file and use a queue to inform your geocode function that there is work to be done. In this way you could even scale out your geocode function to ensure that the end to end process completes within a certain time.

    Reply
  • Deploying your Azure Function via ARM Template – My Thought Lab 20/12/2016 at 22:27

    […] I have added the last parameter – what’s going on here?  For my simple GeoCodeFile Function there is only one connection for the storage account where the files are manipulated, in building […]

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.