Project Description
Deliver best in class data regulation and curation, data governance, and privacy and security by connecting a downloadable and customisable open source GPT model, and use open government data to train the AI.
Most people and organisations nowadays use openly available GPT models and feed the AI with their personal, behavioral, and organisational data while failing to realise that they continue to help its monopoly and global dsta dominance.
To give an analogy: it looks and feels like when everyone was quick to provide their personal data to search engines and social networks back in the day. All the while, these overseas entities have become too huge and powerful to control and regulate.
Our ambitious proposal is simple: Create and train our own Australian Government GPT models to deliver efficiency to Australians, while our local governments are afforded the regulation and curation of training the model.
This GovHack project seeks to harness the power of open government health-related data and cutting-edge artificial intelligence (AI) technology to provide the public with an innovative and accessible platform for information retrieval.
With the integration of an advanced open source AI model such as GPT such as H2O.ai, this project aims to revolutionize how individuals find and process health information as a sample case, thereby contributing to improved public health outcomes and knowledge dissemination.
Given that privacy and security is an issue with enabling a public GPT model, our proposal is to leverage the open source AI but use a private database, therefore having full control of data curation and AI training.
By leveraging open government health datasets, the project addresses the increasing demand for accurate, up-to-date information among the public.
The deck used for the presentation can be seen here:
https://docs.google.com/presentation/d/1vkDSUEF_nUgN8eskd8q2gTfsGmloyjq6YC9h61CwmIg/
Data Story
Our team have used Health.gov.au contextual data to train a GPT model (h2o.ai) to respond to questions that help define various health and medical managment terms.
This open source technology is customisable for use of various organisations and is also able to run in the local machines, therefore it also supports the case of providing customisations to it depending on the specific needs of organisations. This also supports the connection to custom cloud data servers for use and storage of data.
For the video demo, h2o gpt was run in our local machine, and contextual language based data from health.gov.au was used to train the llm.
The question and answer demo in the video would show that the llm works competitively well. A few initial use cases we can see besides using this as a contextual Q&A bot health related matters are a Centrelink-GPT (social security), an ATO-GPT (taxation), among many others.
A challenge we have identified is the inability to read charts and images which is critical to data and insight direction. As proposed short term solution for future is the cleanup of data using foundational rule based data cleaning techniques such as concatenation and expansion. This can be done using tools such as Microsoft excel or using code such as Python. A long term solution would be the integration of other models such as an image interpretation, or a
Another challenge we have identified is the time to train the model. As all GPT models take time to train. As a metaphor: this GPT once downloaded and used starts out as a "baby". It needs to be fed with the right data to be able to respond to queries. While this is a perpetual natural challenge for GPT models, this can be turned as a massive opportunity to train and regulate the data that is being "ingested" by the LLM.
The key difference of this proposal is the use of a private GPT model that access secure curated databases which provides regulation and curation of data, which is unlike most GPT setups available at the time of this proposal.
With the sample integration of a private GPT and language model, the project aims to revolutionise how individuals find and process health information, thereby contributing to improved public health outcomes and knowledge dissemination.
By leveraging open government health datasets, the project addresses the increasing demand for accurate, up-to-date health information among the public. With the AI's natural language processing capabilities, users can engage in intuitive and human-like conversations to inquire about health topics, receive guidance on wellness practices, and access relevant resources. This approach is poised to bridge the gap between complex health data and layperson comprehension, enabling users to make informed decisions about their health and well-being.
The proposed integration of AI and open data aligns with the principles of transparency, accessibility, and citizen empowerment promoted by GovHack. This project serves as a testament to the potential of collaboration between technology and public data to enhance public services and information delivery. As we navigate a rapidly evolving health landscape, this initiative paves the way for a more informed and empowered society by transforming data into actionable insights that impact individuals' lives.