GNA Codes

Project Info

Team Members


Deddy and 2 other members with unpublished profiles.

Project Description


You can try our GNA Parser at https://hipcoder.com to generate a GNA Code for your address!


Imagine different entries in your database with these addresses.

| Unit 6B, 12 Argyle St. MELBOURNE VIC 3002
| Flat 6B 12 Argyle STREET Melbourne Victoria 3002
| 6b/12 argyle st melbourne

They would all be pointing to the same address. But how can you use these for data science when they all look so different to the computer?

Plus Codes

The best solution would be if we use something like PLUS CODES, an encoding for latitude/longitude to represent addresses. The codes look like this, and anyone can generate them. It doesn't even need google, because it's simply an algorithm. Anyone can implement it.

But where do we get our coordinates from? Do we want to rely on external parties like Google?

Our Idea : GNA Codes

First, take a look at GNAF.

The Geocoded National Address File or GNAF has all the addresses in Australia PLUS their latitude longitude coordinates. We can derive Plus Codes from the coordinates - we call them GNA Codes.

GNA Parser

To produce GNA Codes we first clean addresses using our parser. It standardises punctuations, capitalizations, flats/units notations, level/floor styles and the likes in addresses into a normalized address format.

No machine learning was used. It's a simple yet effective algorithm, taking into account australia post addressing ruleset and is our biggest contribution for this project. We have more details about the algorithm at our github page.

From these cleaned addresses, it becomes possible to query them effectively onto the GNAF database, allowing us to calculate the GNA Codes.

The Demo

You can try our GNA Parser at https://hipcoder.com.

Through the usage of a web portal such as these, stakeholders can determine what their GNA code for their address is. Those with ambiguous addresses like "Corner of two streets" addresses can use such a portal to nominate a proper location, which would then be standardizable for use in data science and the like.

But as the first steps, the algorithm is a good first step that would greatly help standardization.

How it can benefit data science

In summary, with GNA Codes we can have interoperability between datasets from different sources. All without relying on external services like Google.


#geocoding algorithm parser

Data Story


We used the GNAF Database to get geocoded locations.

We used the VIC address dataset (https://discover.data.vic.gov.au/dataset/vicmap-address1) and the schools list to test the addresses.

Found a lot of ambiguous addresses (Corner of streets) addresses, but that is something to be taken care of not just with algorithm, but with portals and policy making.


Evidence of Work

Video

Homepage

Team DataSets

School Locations 2022

Description of Use We found a bunch of schools that use Corner of addresses. Someone should probably tell these schools to use proper non-ambiguous addresses! Anyways the data was used to test our parser a bit.

Data Set

Vicmap Address

Description of Use Used some of these address names to validate our parser

Data Set

Geoscape Geocoded National Address File (G-NAF)

Description of Use Used to get geolocation of addresses once they are cleaned

Data Set

Australia Post Address Abbreviations

Description of Use used by our parser to standardise addresses

Data Set

Challenge Entries

Better use of data - Connecting addresses in datasets

Addresses in datasets are a complicated thing. For example, they may have typos and abbreviations, they might rely on the context of the dataset (is this Melbourne Australia, or Melbourne Florida), and they may be more or less specific. Given an address or dataset containing addresses, how can we discover useful connections with other datasets given a large corpus of potential information?

Eligibility: Participants must use one or more datasets from Data.Vic to be eligible.

Go to Challenge | 5 teams have entered this challenge.