Smart Audits: A Critical Appraisal of iCISA’s AI-in-Audit Showcase
iCISA of the CAG building in Noida (Image iCISA Website)
In the upcoming iCISA volume, the focus should shift from adjectives to analytics — using data tables to quantify the impact of digital verification and AI tools.
By P. Sesh Kumar
New Delhi, November 5, 2025 — The 10th edition of iCISA’s (CAG of India’s International Centre for Information Systems Audit) journal is an energetic compendium on artificial intelligence in public audit. It reads like a festival brochure for the future — full of promise, clever diagrams, and sweeping claims.
But when the dust settles, three questions matter: has AI improved the quality of audit evidence, the speed of audit cycles, and the coverage and spread of audit scrutiny — and where is the demonstrable public value?
This review finds that the publication largely celebrates potential rather than proving performance. Two SAI-India pieces — one by Ashish Kumar Shukla and Sameer Asif and another attributed to Ajay Yeshwant that describes image analysis, collusion detection and an IIT-Madras LLM —explain concepts and pipelines, but stop short of quantifying outcomes: no cycle-time reductions, no recoveries, no percentages of error prevented, no coverage gains tied to AI.
The omission of concrete comparisons to peer SAIs-NAO-UK, GAO-USA, and ANAO-Australia-further weakens the case, especially because those SAIs have published auditable frameworks, cross-government diagnostics, and results narratives that anchor the hype to hard numbers and governance tests.
The sales pitch versus the scoreboard
If one judged by tone alone, iCISA’s compilation suggests SAI India is entering an AI-first era: image-based verification tools, no-code network analytics to reveal bid-rigging, and a domain-specific LLM trained on audit artifacts with IIT-Madras to assist auditors in real time.
The direction is right. The problem is evidence. For each initiative, the publication describes what the tool can do; it rarely shows what the tool has done in the field, with dates, entities, samples, error rates, and independently verifiable impact.
Without that, the reader gets an R&D prospectus rather than an audit performance account. A high-energy brochure is not a benefits realisation report.
Quality
Quality, in audit terms, must be traced to better evidence and fewer misstatements. The publication’s SAI-India article outlines image analysis for verification and a graph-based approach for collusion signals. Yet there is no sensitivity-specificity profile for the image classifier, no false-positive/false-negative rates, no inter-rater reliability against human auditors, no post-hoc validation on adjudicated bid-rigging cases. Assertions of “smarter audits” are not substitutes for a measurement plan
Speed
Speed is measurable: planning days saved, sample-selection time reduced, lead-sheet preparation automated. The anthology does not report cycle-time deltas-say, “average performance audit planning shrank from 8 weeks to 3 using the LLM’s retrieval and scoping prompts,” or “financial audit journal-entry testing shrank by 60% due to automated anomaly triage.” Without such numbers, readers cannot tell if AI is cutting work or merely adding steps.
Coverage and spread
Coverage is where AI should shine: full-population tests, stratified risk-layers, and anomaly maps that pull auditors beyond traditional samples. Here again, the narrative is conceptual. There are no before-and-after coverage ratios, no expansion from, for example, 2% transaction testing to 100% rules-based scanning plus targeted ML drills; no state-wise or scheme-wise “spread” metrics indicating the breadth of deployment across Indian ministries or states.
Why CAG Missed All-India Audit of Construction Workers’ Welfare
The two SAI-India chapters: what they say-and what they don’t
1) “Artificial Intelligence & Machine Learning: Introduction — The Dawn of Intelligent Machines” — Ashish Kumar Shukla & Sameer Asif
This piece is a lucid primer. It covers AI/ML concepts, cross-sector applications, and ethical considerations with readable clarity. For training new auditors, it is useful. For proving public value, it is thin. The essay does not present SAI India case studies with quantified outcomes, does not map model classes to audit assertions, and does not report any completed audits where AI materially altered the opinion, secured recoveries, or changed a government program. It is best read as a classroom chapter, not as a results chapter.
2) “SAI India’s AI initiatives” — Ajay Yeshwant —describing image-based verification, collusion analytics and an IIT-Madras domain LLM
This article is closer to the ground. It claims: a user-friendly image-verification app; network analytics to detect coordinated bidding using graph theory and Apriori; and a domain LLM trained on inspection reports, draft paragraphs and SARs to power a chatbot for real-time precedent retrieval. The design choices make sense. What’s missing is operational proof.
There are no pilot baselines, no precision/recall for collusion flags benchmarked against concluded vigilance cases, no time-series showing draft paragraph throughput before and after the LLM, no external validation by PAC or comments of line ministries, and no linkage to citizen outcomes such as fraud deterred or benefits reclaimed. In a Supreme Audit Institution, methods matter; but results matter more.
The conspicuous silence on NAO-UK, GAO-USA, ANAO-Australia
The publication does not showcase the measurable, publicly documented AI-related work of principal peer SAIs. That is a missed chance-partly because these offices have published concrete artefacts that translate easily into SAI India’s own performance language.
NAO-UK has reported on the use of AI in government, surveying adoption status, governance gaps, and readiness across departments-an outward-facing diagnostic that set the stage for oversight and prioritisation.
It continues to embed data analytics and automation in its digital audit platform, tying modernisation to audit quality assurance, not just curiosity. NAO’s published impact statement shows billions in positive financial impact-while not all of that is “AI,” it demonstrates a culture of quantifying benefits per pound that AI initiatives are expected to reinforce.
GAO-USA built and updates an AI Accountability Framework structured around governance, data, performance, and monitoring-explicitly crafted for auditors and agencies, with questions and procedures that exam teams can lift into engagements. GAO has already applied this lens, issuing dozens of recommendations to federal agencies on AI implementations-an outputs-to-outcomes chain that SAI India’s publication does not mirror.
ANAO-Australia has pivoted to governance-first audits of AI, exemplified by its 2024–25 examination of the tax authority’s AI use, while openly stating it is building deeper technical assurance capacity. Its Insights and corporate plan explain how AI is being trialled internally and how lessons will flow into audit methodology and entity guidance.
None of this appears in iCISA’s anthology as concrete benchmarks, nor as comparative tables showing “what peers measure and publish” versus “what SAI India currently discloses.” The effect is subtle but consequential: the reader is left with ideas, not standards; aspirations, not yardsticks.
Why this matters
India’s audit system lives or dies by credibility in Parliament and with citizens — the latter are significant stakeholders according to its Vision and Mission statement. If “AI in audit” remains a slogan in conference halls, it risks becoming another “mission mode” label detached from the ledgers it is supposed to clean. The fix is not more celebratory chapters; it is evidence architecture: baselines, pilots, external validation, and public-facing impact dashboards. SAI India does not need to “match hype”-it needs to publish proofs.
What a results-ready AI audit story from SAI India should contain —immediately
It should begin with named pilots in three audit lines-financial, compliance, and performance-each with a before/after dataset and a peer-reviewable protocol:
- Financial audit anomaly analytics: publish the share of journal entries auto-triaged, manual hours saved, exceptions escalated, and misstatements corrected, by entity and year. Include reviewer agreement rates between machine triage and engagement partner decisions, and show where the tool was wrong-and why.
- Procurement collusion detection: pick concluded tenders where investigative bodies have already established cartel behaviour. Run the graph-Apriori engine retrospectively and publish AUROC, precision/recall, and lift against random. Then run it prospectively on live data and document confirmed actions-tenders re-bid, vendors blacklisted, savings realised.
- Performance audit scoping with the domain LLM: instrument the entire scoping workflow. Publish planning — time reduction, breadth of evidence marshalled, number of relevant precedents surfaced per hour, and the proportion of LLM — suggested leads that survived auditor scepticism and appeared in final audit findings.
Wrap these with PAC-aligned metrics: recommendations accepted, actions implemented, rupees recovered/avoided. Unless AI changes these, it remains a lab project, not a public-value engine.
A closer read of the two SAI-India articles
Ashish Kumar Shukla & Sameer Asif deliver a solid pedagogical primer. It situates AI/ML historically and thematically and is appropriate for capacity building. The weakness is its distance from the audit floor: it does not map models to assertions (existence, completeness, valuation), does not discuss materiality thresholds for automated tests, and does not offer even a small, anonymised SAI India case where AI altered scope, testing, or opinion. In a training academy’s journal, that is forgivable; in a compilation meant to showcase use, it is a gap.
The SAI India initiatives article by Ajay Yeshwant is stronger on architecture-image classification for on-ground verification, graph analytics for tenders, and an IIT-Madras domain LLM for retrieval and drafting support. But it remains results-agnostic: it does not disclose datasets, hyper parameters, fairness checks on beneficiary analytics, or governance guardrails for hallucinations in the LLM. There is no adverse-impact testing across languages and states, no disclosure of how many “flags” survived auditor review, and no explanation of how PAC or ministries have reacted to AI-assisted findings. Absent those, the public-facing story is incomplete.
Why leave out NAO, GAO and ANAO?
Perhaps the editorial choice was to foreground SAI community contributions beyond the usual Anglo-sphere. Fair. But a results narrative benefits from standards and comparators. NAO’s government-wide AI diagnostic frames adoption reality, not just aspirations; GAO’s framework hands auditors a ready-to-apply rulebook; ANAO’s governance audits show how to begin when technical model-assurance capacity is still maturing. iCISA’s edition would have been stronger had it included a short “What good looks like” appendix harvesting these artefacts into SAI India’s own playbook.
Way forward: from brochure to balance sheet
First, commit to public metrics. For every AI initiative, publish one quality metric, one speed metric, and one coverage metric-quarterly. Tie them to PAC-meaningful outcomes: recommendations implemented and money saved or safeguarded.
Second, adopt a peer framework. Lift GAO’s four-pillar AI Accountability Framework into SAI India’s audit manuals, with India-specific procedures and checklists, and publish how often they are applied and with what results.
Third, institutionalise external validation. Ask a rotating panel-academics, ex-CAG officers, data-science leads from GSTN/UIDAI/RBI-to review model performance and hallucination control, and publish the reviews with management responses.
Fourth, tell the story with evidence. In the next iCISA volume, replace adjectives with tables: show that image verification reduced field-visit time by X% while catching Y additional discrepancies; show that network analytics led to Z retenderings and ₹N savings; show that the domain LLM cut scoping time in half and increased the diversity of sources cited in draft paragraphs by a measurable factor.
When the scoreboard speaks, the slogan writes itself.
(This is an opinion piece, and views expressed are those of the author only)
Is India’s CAG Truly Independent Amid Subtle Govt Pressures?
Follow The Raisina Hills on WhatsApp, Instagram, YouTube, Facebook, and LinkedIn