The international AI community has long wrestled with the limitations of English-centric benchmarks. As OpenAI notes, about 80% of the world’s population does not speak English as their primary language. In a country like India—with 22 official languages, many regional tongues, and code-mixed forms like “Hinglish”—this linguistic richness presents a major challenge for AI systems.
Existing multilingual benchmarks mainly focus on translation tasks or multiple-choice formats and are often saturated, with many models achieving near-ceiling performance. This makes them less effective in measuring real progress in understanding culture, nuance, and everyday reasoning in non-English contexts.
What Is IndQA?
IndQA (short for “Indian Questions & Answers”) is OpenAI’s new benchmark, launched in November 2025, designed to evaluate how well AI models understand, reason, and respond to prompts grounded in Indian languages and cultural contexts.
Key features:
- Contains 2,278 questions, created natively (not simply translated) in Indian languages.
- Covers 12 languages: Bengali, English, Hindi, Hinglish (code-mixed), Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.
- Spans 10 cultural domains: Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation.
- Developed in collaboration with 261 domain experts—including journalists, linguists, artists, and scholars—each question is accompanied by an ideal answer and a grading rubric.
- Employs a rubric-based scoring system: Each response is evaluated against expert-defined criteria with weighted points, rather than simple “right/wrong” grading.
- Uses adversarial filtering: only questions that prior state-of-the-art models struggled with were retained, preserving space for improvement.
Why It Matters
For Indian Context & Beyond
India is one of the world’s most linguistically diverse nations and a major market for AI tools. A benchmark like IndQA signals a crucial shift: from asking “Can the model translate?” to “Can the model understand the culture, nuance, and lived experience of the user?”
For AI Systems and Developers
For developers, data scientists, and SaaS builders across South Asia and other multilingual regions, this benchmark highlights several imperatives:

- Multilingual support is no longer optional—models must handle regional languages, code-mixing, and colloquial usage.
- Cultural grounding is essential: a model may know facts, but can it reason using local literary references, food culture, media idioms, or religious contexts?
- Evaluation sophistication: rubric-based scoring emphasizes depth, reasoning, and relevance—not just keyword accuracy.
- Localized datasets: benchmarking against region-specific data helps identify performance gaps for real-world optimization.
For the Research Community
IndQA advances inclusive AI by training and measuring systems that truly work for non-English speakers and culturally diverse contexts. It sets the foundation for similar benchmarks in other regions—Africa, Southeast Asia, and the Middle East—where language and culture play defining roles in communication.

Implications for Developers and AI Builders
For backend engineers, AI developers, and SaaS founders building multilingual chatbots or RAG-based systems, IndQA provides a clear roadmap:
- Build for regional languages and cultural nuance—not just English.
- Test models with real cultural knowledge questions—covering topics like local cuisine, festivals, literature, or daily life.
- Design your own rubric-based evaluation frameworks: prompts in native languages, expert-defined criteria, and weighted scoring.
- Cultural context and localized reasoning can become defining product differentiators.

Challenges & Caveats
- Not a language leaderboard: IndQA is not intended to rank languages, since questions differ for each.
- Adversarial structure: Questions were selected to challenge even the most advanced models, so low scores indicate opportunity for progress, not failure.
- Cultural scope: While broad, India’s linguistic and cultural diversity is even greater—future versions may expand to more dialects and domains.
- Performance gaps: Even leading AI models achieved only around 34.9%, indicating there’s still a long journey ahead in mastering multilingual and multicultural reasoning.
IndQA marks a defining step in the evolution of inclusive AI evaluation. It recognizes that true intelligence in AI is not just about data or accuracy—it’s about context, language, and culture.
Future AI systems aiming for global relevance will need to demonstrate proficiency not only in understanding but also in reasoning within diverse linguistic and cultural frameworks. IndQA is paving the way for this transformation, ensuring that the next generation of AI systems is not only smarter—but also more humanly aware and culturally connected.
The experts behind IndQA
- A Nandi Award-winning Telugu actor and screenwriter with over 750 films
- A Marathi journalist and editor at Tarun Bharat
- A scholar of Kannada linguistics and a dictionary editor
- An International Chess Grandmaster who coaches top-100 chess players
- A Tamil writer, poet, and cultural activist advocating for social justice, caste equity, and literary freedom
- An award-winning Punjabi music composer
- A Gujarati heritage curator and conservation specialist
- An award-winning Malayalam poet and performance artist
- A professor of history, specializing in Bengal’s rich cultural heritage
- A professor of architecture, focusing on Odishan temples