Machine learning based entity resolution to the rescue
Databases somehow always end up with duplicate entries but we can solve that using machine learning based entity resolution (a.k.a record linkage, fuzzy matching, etc).
Entity resolution typical requires:
1) Deduplication (removal of exact copies of records)
2) Record Linkage (records that may reference the same business)
3) Canonicalization (ensuring data with more than one representation are in a standardised form)
Only steps 1 and 2 were addressed during this challenge of which out of 47404 records, 1920 unique businesses were identified using csvdedupe (https://github.com/dedupeio/csvdedupe)
Perhaps you can even use this during form filling and validation to reduce any further duplicates.
NB. Using Excel for step 1, and csvdedupe for step 2 which is simply a CLI program the only evidence of work is the training data generated by the program.
Evidence of Work
IP Australia Govhack 2018 sample data
Bounty: Finding all the like needles in the haystack