Marcela - Team leader, Programmer (Python) and data processor
Liam - Programmer (Python, typescript, C++)
Leonard - Data processor, project flow multi-tasker, the little bits
Jerem - Video/Audio creator
Claudia - Creative skills: artist / designer, project flow multi-tasker
According to the Australian Electoral Commission (2020), there are over 16 500 000 Australians who are enrolled to vote in national Australian elections. This number continues to grow, as enrolment and voting is compulsory for every Australian citizen from the age of 18, onwards. Politics can be a complex and confusing topic to understand. It is this understanding that directly impacts Australia as a country, as well of the lives of every Australian person.
Through the words, actions, and outcomes of the leaders we elect to lead our country.
So do you know what’s happening in parliament and what’s being discussed? Is it based on what you’ve heard on the media? Or something your family and friends have said? Maybe you’re not too sure. It’s easy to get caught up on what different sources say, it can be quite a challenge to know exactly what politicians have actually said. Don’t worry, For the Record is here to help!
For the Record is an interactive prototype which uses publicly available government data, to allow users to search for keys topic words of interest. For the Record visualises data across a timeline, making it easy to follow the trend of topics discussed in parliament through the years and know who spoke about them. Users can effortlessly access the relevant transcripts that contain the respective topics that they are interested in learning about.
Our data is sourced from The Australian Government’s Department of the Prime Minister and Cabinet website from ‘PM Transcripts’. The data is visualised and simple to follow, making For the Record is user friendly, accurate and reliable.
Voting is a crucial right and responsibility which Australians share – shaping the future of Australia and of our everyday lives. Politics is confusing enough as it is, For the Record ignores the noise generated by external sources and makes politicians accountable for their own words.
Victoria Challenge: Learning from the Past
How might we use data assets from Victoria and other jurisdictions to better understand bushfire events and their effects on communities and the environment?
Data set: CFA Total Fire Ban Histroy since 1945
Hint and tip: Try searching "fire" for a report of when "fire" has been mentioned in the transcripts as well as a visual display of firebans over the years
Victoria Challenge: Citizen Science
How might we create a citizen science experiment to support a better understanding of what is happening in the State of Victoria?
Australia (National) Challenge: The language of leadership
In times of crisis words can inspire and unite us, but they can also provoke division and conflict. How has the language of Australia’s leaders changed over time? How can we represent these changes in public discourse within a historical timeline?
Data set: PM Transcripts repository
Region: International: Awareness, understanding and respect – How can Open Data help the #BLM movement?
The Black Lives Matter (#BLM) movement is not new, neither are racial injustices. However, in 2020 a series of racially motivated deaths, brutality and profiling in the US sent shockwaves around the world. Over 15 Million people took to the streets around the world to protest, and demand change. What can Open Government Data do to help the movement?
Hint and tip: Try searching "Aboriginal" to see how often these topics are discussed in the transcripts
Our demonstratable product.
When assessing the datasets we decided on making “For the Record”, a tool for cross referencing when topics were mentioned during prime minister transcripts. The website of our chosen dataset, 'pmtranscripts.pmc.gov.au', provided a user search function to look for key words in all the 41,740 transcripts. We thought, how can we take this further and we envisioned a tool that makes extracting, visualising and interacting with this information easier.
XML files containing these transcripts were available for download and could be manipulated with Python. We had the files cleaned and then stored into new data structures. The length of some of these transcripts initially caused the issue of excel cell overflow, so we moved on to using ‘.JSON’ format as our preferred data structure. It was found that extracting close to 42,000 transcript entries would produce a mass of data around 25 gigabytes. This meant we would need to use an external database.
We had many choices when it came to AWS services like Elasticsearch, S3, RDS, DynamoDB and Comprehend, however some of these services like Comprehend took a while to process the commonly large transcripts which left our program slow. Our next step forward would have to be taking a step back. We re-evaluated our options and conceded to use Python to pre-process our data.
By loading the XML files into Python directly, we were able to isolate and extract the key data from the transcripts themselves. These raw transcripts were then cleaned with an automatic keyword extraction algorithm by a package called “rake-nltk”.
Functionally this tool enables rapid removal of the most used English words and the exclusion of certain punctuation for key statements and phrases made throughout the transcripts.
The library also makes use of a novel and independent method for automatically extracting keywords as a sequence of one or more words from individual documents. The algorithm method for this list generation can be found in a paper titled " Automatic Keyword Extraction from Individual Documents " on ResearchGate. It was really interesting watching how different terms appeared in different times throughout our australian history. For example, phrases containing the word “gun” were prevalent in the 1990s, and cyber security buzz words are more common in modern day australia.
Currently, ‘For the record’ will exist as a prototype for GovHack.