FunnelCat

Project Info

Team Members


Maedi , Tayela Prichard , Tim Clark

Project Description


The Problem

"addresses were originally created for human-to-human communication within a given context, not for data science. How can we solve addresses and more easily connect our datasets?"

  • An address that is human-readable isn't understood by machines
  • An address that is machine-readable isn't understood at all by humans

Existing models rebuild another world to mirror ours, where addresses are IDs associated to latitudes and longitudes. Then the user must fuzzy search their address and select a pre-made entity. But when their address is not found the system falls down. User input is not easily integrated into this 'non-matching' model of the real world. In other words, real world address data does not match existing geolocation data.

The Solution

FunnelCat bridges the gap between human-readable and machine-readable locations. Any location can quickly be converted to a structured hierarchy representing proximity from one address to the other. The more addresses that are input into the system (regardless of their format), the more accurate the results.

Because a street address is an entirely human construct, we need to work with this concept centred around entities that are close to other entities, rather than imposing a narrow pinpoint lat/long model and having addresses strictly match that. Human beings create new addresses all the time; the address comes first, then we distill a place down to a single point. Let's work with the data first as it's generated, rather than reverse engineering our pre-built models to match a string.

After all, an address is more than a point in space, it's a name needed for memorisation and directions. Have you ever given directions and said to start at Latitude: 37.818637° S, Longitude: 144.9637° E, then take a right at 37.8150° S, 144.9665° E, bearing 60 degrees north until you hit Town Hall?

FunnelCat highlights what can be achieved by "going back to basics", by having simple human-readable addresses which can quickly be converted to machine-readable addresses, which are easily categorised and geolocated.


#address #geolocation

Data Story


Data Format

Addresses are entered simply as strings in the traditional human-readable formats such as:
- 1 Milky Road, Beautiful Town, Victoria, Australia
- 7 Sensational Avenue, Korma Creek, Victoria

They are then converted into their most basic parts:

Data Structure

Internally FunnelCat is implemented as a search tree.

What happens when multiple addresses match when funnelling data through the tree?

When multiple matches are possible, the address is flagged for manual intervention. Realistically most data sets will have enough context such as country and state to mitigate manual intervention. Other techniques can be used to improve accuracy on big datasets; unit numbers and street numbers have a particular range, and when a new address is presented without a country/state, then an address that is within the range of one match and outside the range of another match, is likely a match with the first one.


Evidence of Work

Video

Project Image

Team DataSets

School Locations 2022

Description of Use This dataset is a series of addresses, which FunnelCat uses to create a search tree from. Although simple, FunnelCat highlights what can be achieved by "going back to basics", by having simple human-readable addresses which can quickly be converted to machine-readable addresses, which are easily categorised and geolocated.

Data Set

Challenge Entries

Better use of data - Connecting addresses in datasets

Addresses in datasets are a complicated thing. For example, they may have typos and abbreviations, they might rely on the context of the dataset (is this Melbourne Australia, or Melbourne Florida), and they may be more or less specific. Given an address or dataset containing addresses, how can we discover useful connections with other datasets given a large corpus of potential information?

Eligibility: Participants must use one or more datasets from Data.Vic to be eligible.

Go to Challenge | 5 teams have entered this challenge.