Indigenous Storyteller is a unique project that taps into the power of Open Data and Generative AI to create informative, engaging, and immersive stories based on indigenous cultures. Focusing on Australian indigenous languages, the project uses an open-source dataset provided by the Queensland government, which contains a wealth of information on the local indigenous tongues.
Currently, AI has limited comprehension of these languages due to scarcity of data. This project aims to bridge this gap by providing AI with the necessary data to understand and interpret these languages effectively.
The data input to the system includes language name, pronunciation, common words, geographical locations, images, and other related information. While AI may not produce informative contents independently, this additional data allows it to weave fascinating narratives based on indigenous culture, highlighting the uniqueness and diversity of different tribes' languages and cultures.
Unleashing the Power of Open Data with Generative AI
In this project, our aim was to leverage the power of generative AI and open data to understand and generate content related to indigenous languages in Queensland, Australia. The primary challenge we encountered was the limited amount of available data for these languages, making it difficult for AI systems to comprehend and work with them effectively. To overcome this obstacle, we utilized an open source dataset obtained from the Queensland government's website, specifically the interactive Indigenous languages map.
The Indigenous languages map dataset offers a comprehensive snapshot of indigenous languages in Queensland. It encompasses over 150 Aboriginal and Torres Strait Islander language groups, providing valuable details such as where these languages were spoken, dialects, sample words, and relevant images from the State Library's collection. The dataset was procured as part of the State Library's Indigenous Languages Project, which aims to raise awareness of the linguistic diversity in Queensland and support language research and community language revival efforts.
Applying AI to Indigenous Languages
Due to the scarcity of data for indigenous languages, AI systems have traditionally struggled to comprehend and generate content accurately. However, by integrating this open data into our project, we sought to empower AI algorithms to better understand and work with these languages.
We developed a generative AI system that utilized the Queensland government's Indigenous languages map dataset as an external data source. By training our AI model on this dataset, it gained a deeper understanding of the indigenous languages, enabling it to generate content relevant to these languages with greater accuracy and context.
Potential Impact and Future Directions
Our project's utilization of open data and innovative AI techniques has significant potential impact. Firstly, it enhances our understanding and appreciation of the linguistic diversity and rich cultural heritage of indigenous languages in Queensland. Secondly, it creates opportunities for language research and community-driven language revival initiatives.
By providing AI systems with an extensive dataset of indigenous language information, we hope to contribute to the documentation, preservation, and revitalization of these traditional languages. Furthermore, this project serves as a stepping stone towards AI-driven initiatives that respectfully work alongside indigenous communities, acknowledging their role as the true custodians of language heritage and knowledge.
The project successfully combined generative AI and open data to address the challenge of limited data availability for indigenous languages in Queensland. By leveraging the open source Indigenous languages map dataset, we enabled AI models to better grasp and generate content related to these languages. Our goal is to contribute to language heritage preservation, revival, and community-driven research. This project emphasizes the importance of recognizing and respecting the Traditional Owners, Elders, language custodians, and community members who hold the core ownership of language knowledge.
- Junchen You
- OpenAI August 19 version (GPT-3.5 & GPT-4)