Back to Projects

Team Name:

BhutekoBhat


Team Members:


Evidence of Work

OpenMind

Project Info

Team Name


BhutekoBhat


Team Members


Mahesh , Bibek

Project Description


OpenMind: Government AI with Uncompromising Accuracy

OpenMind is an accuracy-first AI system designed specifically for government systems. It addresses the critical challenge of deploying AI in governmental contexts where 99%+ accuracy is required and hallucination is unacceptable.

We achieve 99%+ level of accuracy by using a multi state model and provide complete transparency of the complete flow taken by the system to generate the response based on the query.

Key Innovations

Dataset-Bounded AI that only responds based on the available data. The model would politely decline to any requests that. cannot be answered based on the given data and simply deny to respond to the user prompt.

The deployment provides trace to each and every query asked to the model. This is most important in government work for auditing and validating the responses.

Complete citations and referenced datasets are provided with each query so that the user can easily verify or research deeper on the actual data.

The model provides accurate reasoning which is validated by another higher accuracy model to ensure the response stems from accurate logic and correct dataset.

Architecture Diagram

How does it work?

We have used lang graph state machines to orchestrate the whole workflow. There are 4 different tools that will be used by the application.
- BasicTool (this is the SQL Executor for now)
- Relevance Checker Tool
- Accuracy Checker Tool
- Chatbot (Actual LLM)
- Citation and Sourcing Checker Tool.

The way the flow works is chatbot is the middle bot which orchestrates all other states. After running the query to fetch the data using the BasicTool - it checks for relevance, accuracy and provides citation with the data as well. This increases the accuracy of the platform.


Data Story


We evaluated the models with different datasets and prompts and found out grok-4o-mini to appropriately run the correct queries with low token cost. This makes the solution approachable and effective.

We also wrote a simple evaluation framework eval.py which takes a ground truth and evaluates the response of the model whether it matches the ground truth.

The data was critical to drive the evaluation framework and for choosing the appropriate model, along with correct stages and states in the lang graph model.


Evidence of Work

Video

Team DataSets

Budget 2024-2025 and Portfolio Budget Statements (PBS) - Tables and Dataset

Description of Use We used this dataset to evaluate the calculation abilities of the model because it contains a lot of numerical data. We curated the model so that it is able to generate queries that can calculate aggregated values such as min, max, count using this dataset.

Data Set

HR / Employee Dataset

Description of Use We used this dataset to evaluate the model and curate the prompt based on the responses given by the model. We achieved 99.3% accuracy with the evaluation done over 138 questions prepared from this dataset.

Data Set

Challenge Entries

An Accurate and Trustworthy Chatbot for Data Interactions

How can government agencies deploy conversational and analytical AI to interrogate complex datasets while maintaining the high degree of accuracy and auditability standards required for official decision-making?

Ask your data, trust the answer

Eligibility: Open to all, but special consideration will exist for teams with a local NT lead.

Go to Challenge | 24 teams have entered this challenge.