Project Description
I use AI to analyse political debates in the House of Reps, highlighting constructive and divisive moments. By promoting more data around trends in parliamentary debates, I aims to improve democracy and showcase responsible AI use.
Motivation
It took me until Saturday lunch time to settle on a project, and I was largely inspired by the work from
Department of Home Affairs Strengthening Australian democracy A practical agenda for democratic resilience
Specifically, some of the mentors explained how community connectedness is important and in Australia
we have this incredible asset of a democratic society, which (like all assets) should be looked after.
Reading through the report, I came across the example of Disagree Better
Disagree Better –Amid intense political
polarisation in the United States, the National
Governors Association (NGA) is encouraging
governors across the country to reduce partisan
animosity and ‘disagree better’ by fostering
respectful debate and modelling positive ways
of working through policy problems. Building on
the promising effects of a video which featured
two opposing governor candidates advocating
for bipartisanship and pro-democratic norms,
the NGA has designed a toolkit of customisable
public-facing interventions such as organising
‘service projects’ to bring communities together
through volunteerism, recording an ad or writing
an op-ed with someone from another party, and
hosting debates at colleges and universities that
model healthy conflict.
That made me think about whether AI could be a useful tool in helping us Disagree Better in Australia. And given
the importance of political discourse in our democracy, using AI to analyse parliamentary discourse seemed like the
logical next step.
While of course what happens in Parliament is political, this project is not meant to be a debate about one party versus
another, but rather about how we could all learn to get along better and set examples that are meaningful to improve
outcomes for all Australians.
In putting together the interactive dashboards, I have avoided having a us VS them view on anything. The dashboard
allows anyone to explore the dimensions of the data, and see (and understand) how AI has classified the discourse,
and leaves everyone to draw their own conclusions!
Data Story
A big motivation for me in participating in GovHack is to make public datasets more accessable, so as part of this
challenge I am providing a full copy of the JSON version of the Hansard dataset for over 10 years and the source code I used to download all the Hansard files and convert them from XML to JSON
This means everyone has a dataset that is far more usable than PDF or XML of individual sittings for several reasons:
- Ease of Analysis and Processing:
- Structured Data: JSON is a structured data format, making it easy to parse and analyze with programming languages and data analysis tools. This allows for efficient searching, filtering, and extraction of specific information. PDFs and XML, while structured in their own ways, require more complex parsing and extraction techniques.
- Machine Readability: JSON is inherently machine-readable, allowing for automated analysis and integration with other datasets or applications. This enables researchers, journalists, and developers to extract insights and patterns from the Hansard data without manual intervention. PDFs and XML, while machine-readable to some extent, require more complex processing for machine understanding.
- Data Integration and Interoperability:
- Standard Format: JSON is a widely adopted standard for data exchange, making it easily compatible with various systems and applications. This allows for seamless integration of the Hansard data with other datasets or tools for comprehensive analysis and research. PDFs and XML, while standard formats, require more complex transformations for integration with other datasets or tools.
- API Friendliness: JSON is commonly used in APIs, making it easy to access and interact with the Hansard data programmatically. This enables developers to build applications and services that leverage the Hansard data for various purposes. PDFs and XML, while accessible through APIs, require more complex handling for programmatic interaction.
Efficiency and Scalability:
Compact Size: JSON is a relatively compact data format, making it efficient to store and transmit large volumes of data. This is crucial for a dataset spanning over 10 years of Hansard transcripts, which would be significantly larger in PDF or XML format.
Database Compatibility: JSON is natively supported by many modern databases, making it easy to store and manage the Hansard data for efficient querying and retrieval. PDFs and XML, while storable in databases, require more complex handling and indexing for efficient querying.
- Accessibility and Openness:
- Text-Based: JSON is a text-based format, making it accessible to a wide range of users and tools. This promotes openness and transparency, enabling more people to access and analyze the Hansard data for various purposes. PDFs and XML, while accessible, require specific software or tools for viewing and analysis.
In summary: A JSON version of the Hansard dataset for over 10 years offers significant advantages in terms of ease of analysis, data integration, efficiency, and accessibility. It enables researchers, journalists, developers, and the public to leverage the vast amount of parliamentary information for various purposes, promoting transparency, accountability, and informed decision-making.