Project Description
You can try our GNA Parser at https://hipcoder.com to generate a GNA Code for your address!
Imagine different entries in your database with these addresses.
| Unit 6B, 12 Argyle St. MELBOURNE VIC 3002
| Flat 6B 12 Argyle STREET Melbourne Victoria 3002
| 6b/12 argyle st melbourne
They would all be pointing to the same address. But how can you use these for data science when they all look so different to the computer?
Plus Codes
The best solution would be if we use something like PLUS CODES, an encoding for latitude/longitude to represent addresses. The codes look like this, and anyone can generate them. It doesn't even need google, because it's simply an algorithm. Anyone can implement it.
But where do we get our coordinates from? Do we want to rely on external parties like Google?
Our Idea : GNA Codes
First, take a look at GNAF.
The Geocoded National Address File or GNAF has all the addresses in Australia PLUS their latitude longitude coordinates. We can derive Plus Codes from the coordinates - we call them GNA Codes.
GNA Parser
To produce GNA Codes we first clean addresses using our parser. It standardises punctuations, capitalizations, flats/units notations, level/floor styles and the likes in addresses into a normalized address format.
No machine learning was used. It's a simple yet effective algorithm, taking into account australia post addressing ruleset and is our biggest contribution for this project. We have more details about the algorithm at our github page.
From these cleaned addresses, it becomes possible to query them effectively onto the GNAF database, allowing us to calculate the GNA Codes.
The Demo
You can try our GNA Parser at https://hipcoder.com.
Through the usage of a web portal such as these, stakeholders can determine what their GNA code for their address is. Those with ambiguous addresses like "Corner of two streets" addresses can use such a portal to nominate a proper location, which would then be standardizable for use in data science and the like.
But as the first steps, the algorithm is a good first step that would greatly help standardization.
How it can benefit data science
In summary, with GNA Codes we can have interoperability between datasets from different sources. All without relying on external services like Google.