Project Overview
AI Investigator
Supporting Investigations at MTA
A secure AI assistant to help investigators work with complex cases
MTA · INNO call · December 2025
The Problem
Investigations are becoming cognitively overwhelming.
Data Overload
Modern investigations generate huge amounts of data from multiple sources.
Fragmented Information
Information is spread across many systems and formats.
Time Pressure
Investigators must connect facts under tight deadlines.
Hidden Knowledge
Important reasoning often stays only in people's heads.
Key Insight
Investigators spend too much time managing information instead of analysing it.
The Real Challenge
It is not about having more data. It is about making sense of it all while maintaining rigorous standards.
Why This Is a Risk for MTA
Operational and legal risks.
Slower Investigations
Manual information processing creates bottlenecks and delays case resolution.
Missed Connections
Higher risk of missing critical links between evidence, people, and events.
Inconsistent Reasoning
Different approaches between cases lead to quality variations.
Legal Vulnerability
Greater exposure if decisions cannot be explained or justified later.
This is Not an IT Problem
This affects case quality and legal certainty.
Institutional Risk
When experienced investigators leave, their reasoning methods often leave with them.
Why Existing Tools Are Not Enough
Why current solutions do not solve this.
Public AI Tools
Cannot be used with sensitive investigation data. Security and confidentiality requirements prohibit cloud-based AI.
Cloud Systems
Conflict with data protection requirements. Sensitive data cannot leave controlled environments.
Existing Investigation Tools
Focus on storage, not reasoning. They manage documents but don't help analyse them.
Human Analysis Alone
Does not scale with growing data volumes. Cognitive limits are real.
What Investigators Need
Support, not more data or more screens.
The Gap
There is no ready-made solution that combines AI capability with the security and compliance requirements of law enforcement.
Our Proposal
AI Investigator – concept overview.
Secure & Internal
A secure AI assistant that runs entirely inside MTA infrastructure.
Organises, Not Replaces
Helps investigators organise evidence. Does not replace professional judgment.
Supports Thinking
Supports the thinking process, not the decision-making process.
Human Verification
Always requires human verification before any output is used.
Important Clarification
The system does not make decisions. Investigators remain fully responsible.
Design Principle
AI as assistant, human as authority. This is not automation, it is augmentation.
What the System Actually Helps With
Practical support for investigators.
Summarising Large Case Files
Quickly distil key facts from hundreds of pages of documents.
Highlighting Connections
Surface links between people, events, and transactions that might otherwise be missed.
Supporting Hypothesis Work
Help formulate and compare investigative hypotheses systematically.
Preserving Reasoning
Capture investigative reasoning for later review or court explanation.
The Outcome
Investigators see the full picture faster and more clearly.
Practical Focus
Every feature is designed around real investigative workflows, not abstract AI capabilities.
Data Security and Legal Compliance
Security and compliance by design.
On-Premise Only
Runs fully on MTA premises. Complete infrastructure control.
No Data Leaves
No data leaves the organisation. Air-gapped operation possible.
Designed to Comply With:
GDPR
Data protection rights
LED
Law Enforcement Directive
EU AI Act
High-risk systems requirements
Compliance First
Legal compliance is built in, not added later.
By Design
Security and compliance requirements shaped the architecture from the beginning.
Why This Fits the INNO Call
Why this is an innovation project.
Introduces AI into Daily Work
Brings AI capabilities into real investigative workflows for the first time.
Focuses on Adoption
Prioritises practical use and user acceptance over technical complexity.
Addresses Real Constraints
Designed around actual operational, legal, and security requirements.
Tests Safe AI Use
Explores how AI can be deployed safely in law enforcement contexts.
Key Point
This project is about uptake of AI, not research.
Innovation Focus
The innovation is in making AI practical and safe for law enforcement, not in AI research itself.
Pilot Scope
Controlled pilot, not full deployment.
This is Very Important
One Department
Investigations only
Limited Use Cases
Specific, defined scenarios
Limited Data Types
Controlled data scope
Clear Time Frame
Defined evaluation period
Pilot Goal
To test whether investigators actually use and benefit from the system.
Why Limited?
A controlled scope allows proper evaluation before any decision about broader deployment.
Expected Results of the Pilot
What success looks like.
Faster Understanding
Investigators grasp complex cases more quickly.
Reduced Cognitive Load
Less mental overhead managing information.
Better Traceability
Investigative reasoning is documented and reviewable.
Clear Decision Point
Evidence-based decision whether to scale further.
Evidence-Based Approach
The pilot produces evidence, not assumptions.
Measurable Outcomes
Success criteria will be defined before the pilot starts and measured throughout.
Why INNO Funding Is Needed
Why this cannot be done as normal procurement.
No Ready-Made Solution Exists
There is no off-the-shelf product that meets all security, compliance, and functional requirements.
Needs Controlled Testing
Must be tested in real conditions with real users before any commitment.
High Legal and Ethical Requirements
Law enforcement AI requires careful validation that commercial procurement cannot provide.
Requires Validation Before Scaling
Investment in scaling only makes sense after pilot success is demonstrated.
The Role of INNO
INNO enables safe innovation where commercial products fall short.
Innovation Gap
The gap between what exists and what's needed requires funded innovation, not standard procurement.
Next Steps
Next steps after this interview.
Agreement on Pilot Scope
Confirm department, use cases, and data types
Full INNO Application
Complete application with detailed budget and timeline
Detailed Technical and Legal Planning
Architecture, compliance framework, integration plan
Pilot Implementation and Evaluation
Deploy, test, measure, and decide
Ready to Proceed
The concept is ready. Agreement is needed to move forward.
Timeline
Each step builds on the previous one. Clear milestones ensure accountability throughout.
What Is Being Built?
A sovereign AI system for investigative intelligence.
AI Investigator in One Sentence
An air-gapped, on-premise AI system that helps investigators structure evidence, generate hypotheses, and preserve institutional knowledge, all while ensuring full data sovereignty and legal compliance.
Sovereign
100% on-premise, air-gapped. No data leaves the building. No cloud dependencies.
Intelligent
AI agents that retrieve, reason, and synthesize with full transparency and provenance.
Human-Centric
Augments investigators, not replaces them. Every AI output requires human verification.
Why "Air-Gapped"?
Sensitive case data never touches the internet. The system operates on isolated hardware within your secure facility.
Open Standards
Built on W3C PROV-O for provenance, ISO 20022 for financial data, and open-source AI models.
Why Is This Needed?
Addressing the "Cognitive Bottleneck" in modern investigations.
The Challenge
Cases now generate up to 1TB of data each, a 10x increase over the past decade. Meanwhile, experienced investigators are retiring faster than new recruits can be trained. Manual analysis simply cannot keep up.
- Information Overload
- Loss of Institutional Memory
- Fragmented Data Sources
The Solution
A sovereign, air-gapped AI system designed to augment human intelligence, not replace it. It uses RAG (Retrieval-Augmented Generation) to ensure grounded, verifiable answers.
- Efficiency: Automate routine synthesis.
- Evolution: Investigators train agents to automate complex tasks.
- Interoperability: Works alongside legacy systems & EU partners.
- Compliance: Native LED & GDPR adherence.
Core Objective
To create a "Digital Partner" that structures explicit data into a Knowledge Graph, freeing investigators to focus on expert judgment and high-level strategy.
Tacit Knowledge
"We can know more than we can tell". (Polanyi). The system is designed to capture the "why" behind decisions, not just the "what".
RAG Architecture
Retrieval-Augmented Generation ensures the AI only speaks from verified documents, eliminating "hallucinations" common in public LLMs.
Four Pillars of Trust
The principles that make AI safe for investigations.
Answers You Can Always Trace Back
The system never "makes things up." Every answer it gives is based only on the case documents that investigators have provided.
- Each conclusion can be traced back to specific files
- Investigators can always see where an answer comes from
- No hidden logic, no black-box results
→ Suitable for legal and audit review
Information is Structured, Not Just Stored
Instead of treating everything as loose documents, the system organises information into clear building blocks:
Investigators can confirm, reject, or link these elements as the case develops, creating a clear, structured view over time.
→ This helps keep complex cases understandable and consistent
Support for Professional Judgment
The system helps investigators think. It does not decide for them.
- Suggests possible explanations based on existing data
- Highlights alternative interpretations or risks
- Encourages investigators to double-check assumptions
→ Reduces blind spots without replacing expertise
Full Human Control at All Times
Investigators remain fully in charge.
- Every AI output must be reviewed by a human
- Investigators can correct or reject suggestions
- The system improves only through confirmed human input
→ Responsibility always stays with the investigator
Design Philosophy
These four pillars ensure the AI is a transparent assistant, not an autonomous decision-maker. The system amplifies human capability while preserving accountability.
EU AI Act Alignment
These pillars directly address high-risk AI requirements: transparency (Art. 13), human oversight (Art. 14), and accuracy (Art. 15).
Trust Through Design
Every interface element reinforces these principles, from verification buttons to provenance trails to the human-in-the-loop workflow.
How Does It Work?
Four core mechanisms that power the system.
1. Verifiable Intelligence (RAG)
The system uses Retrieval-Augmented Generation to ensure Provenance. Every AI output is grounded in uploaded case files with a clear chain of custody. No "black box" answers; only verifiable facts.
2. The Knowledge Object (KO)
Information is treated as discrete Knowledge Objects (Entities, Events, Evidence). These are verified, disputed, or linked by investigators, building a digital twin of the investigation.
3. Capturing Expert Intuition
The system actively generates hypotheses and simulates risks, prompting investigators to examine data in new ways and reflect on their reasoning.
4. Human Supervision & Agent Evolution
Investigators act as supervisors, not just users. By correcting and verifying AI outputs, they actively train the agents, gradually delegating more autonomy while retaining strategic control.
UI Implementation
The interface reflects these pillars via the Articulation Modal and Verification Buttons (Verify/Dispute), ensuring user agency over the Knowledge Graph.
Requirements Alignment
FUNC-01: RAG Pipeline
FUNC-115: Entity Confirmation
FUNC-105: Human Oversight Interface
What Does It Deliver?
Tailored AI roles for diverse investigation needs.
AI Assistant
For Patrol & Field Officers
- Voice-First Interface: Hands-free queries in Estonian/English.
- Quick Synthesis: "Summarize the last 3 reports on Suspect X".
- Procedural Guidance: Real-time access to protocols.
AI Analyst
For Intelligence Units
- Hypothesis Generation: "Suggest 3 explanations for these transactions".
- Scenario Simulation: "What if our unit freezes these assets?"
- Devil's Advocate: AI challenges the investigator's bias.
Audio-Secretary
For Investigators & Admin
- Auto-Transcription: Secure, on-premise meeting logs.
- Action Item Extraction: Automatically lists tasks from voice notes.
- Interview Analysis: Flags inconsistencies in statements.
AML Specialist
For Financial Crime
- DeFi & NFT Analysis: Tracking off-chain and on-chain assets.
- Cross-Border Patterns: Detecting complex laundering schemes.
- UBO Unravelling: Visualizing ownership chains.
Analytical Personas
The system allows users to switch "Personas" (e.g., Skeptic, Brainstormer) to get different perspectives on the same evidence (FUNC-13).
How Does It Handle Data?
The complete data lifecycle, from ingestion to insight.
Data Flow Pipeline
Documents, audio, structured data
Vector embeddings + Graph nodes
AI retrieval + reasoning
Human confirmation
Reports, visualizations
Data Sources Supported
- • Documents: PDF, Word, Excel, email archives
- • Media: Audio recordings, video transcripts
- • Structured: ISO 20022 bank data, registry exports
- • External: Europol SIENA, Interpol notices
Data Protection Guarantees
- • Sovereignty: 100% on-premise, air-gapped
- • Encryption: At rest and in transit (AES-256)
- • Retention: Smart Forgetting automates GDPR compliance
- • Audit: Full PROV-O provenance trail
Critical Privacy Safeguard
No data leaves your network. The AI models run locally. External queries (bank inquiries, registry lookups) require explicit human approval before execution.
LED Compliance
Data processing follows the Law Enforcement Directive (LED). Purpose limitation, data minimization, and access controls are built into the architecture.
Smart Forgetting
Automated retention policies ensure data is archived or purged according to legal requirements. The system tracks when and why data expires.
Key System Requirements
Derived from EU AI Act and LEA Operational Needs.
| ID | Requirement | Description | Priority |
|---|---|---|---|
| FUNC-01 | RAG Pipeline | All outputs must be grounded in verifiable source docs. | Critical |
| FUNC-105 | Human Oversight | Dedicated UI for human review and override of AI flags. | Critical |
| FUNC-125 | Data Sovereignty | Policy-Based Access Control (PBAC) enforces legal data sharing agreements. | Critical |
| FUNC-106 | No Emotion Rec | Strict prohibition of emotion recognition features. | Critical |
| FUNC-03 | Hypothesis Gen | Generate plausible explanations connecting evidence. | High |
| FUNC-150 | FRIA Generator | Generate Fundamental Rights Impact Assessment reports for EU AI Act. | High |
| FUNC-114 | Inst. Memory | Knowledge Base for "Lessons Learned" and patterns. | High |
Regulatory Compliance
The system is built "Compliance-First", adhering strictly to the EU AI Act's requirements for High-Risk AI systems in Law Enforcement.
AI Assistant Alternatives
Why a custom airgapped solution? Evaluating AI Assistant against market alternatives.
| Dimension | Palantir Gotham | IBM i2 Analyst | ChatGPT/Claude | AI Assistant |
|---|---|---|---|---|
| Cost & Licensing | ||||
| Pricing Model | Per-user annual €50k+ / analyst |
Perpetual + maint €25k + 20% |
Metered API Variable |
Tiered models €36k (infra) |
| Interoperability | Closed Ecosystem Hard to integrate |
Legacy Manual export |
High Cloud-only |
Open Standards ISO 20022 / JIT Ready |
| Sovereignty & Security | ||||
| Data Sovereignty | Configurable On-prem costly |
Full control On-prem |
None US Cloud |
Air-gapped 100% Sovereign |
| LED Compliance | Possible Audit needed |
Possible Manual |
Non-compliant Data export |
Native By Design |
| Knowledge Management | ||||
| Tacit Knowledge | No Explicit only |
No Visual only |
No Stateless |
Core Feature Intelligence Cycle |
Market Analysis
- Lock-in: Proprietary formats hold data hostage. AI Assistant uses open W3C standards.
- vs Palantir: Designed for structured data fusion, not capturing expert intuition. Cost-prohibitive for small agencies.
- vs ChatGPT: Public LLMs violate Data Sovereignty and LED compliance.
Unique Value Proposition
- Sovereign: Air-gapped & On-premise.
- Specialized: Built for Tacit Knowledge.
- Predictable: Fixed hardware cost.
- Compliant: Automated "Smart Forgetting".
System Context
High-level interactions between the Investigation Team and the System.
Privacy Safeguard
Manual Confirmation: Any privacy-affecting request to external data sources (e.g., bank inquiry) requires explicit human confirmation within the UI before execution. The AI cannot autonomously trigger these actions.
Strategic Interoperability
Designed for the EU ecosystem. Native support for ISO 20022 and JIT Workspaces ensures seamless cross-border cooperation and concurrent use with legacy tools (FUNC-160, FUNC-162).
Container Architecture
Internal components and AI Agent orchestration.
Agentic Workflow
Specialized agents handle distinct tasks (Retrieval vs. Reasoning). The Evaluation Engine constantly monitors agent performance.
Live Demo: AI Summarization Module
The AI ingests a stream of structured Knowledge Objects (KOs), representing disparate pieces of evidence, and synthesizes them into a coherent executive summary.
"At 02:35, silent alarm at Central Data Facility. Rear door unsecured. Guard J. Kask found unconscious".
"Camera 04 captures Blue Van (771-BKV) departing at 02:15. Driver unidentifiable. Logs 02:00-02:30 deleted".
"Suspect A. Tamm (Owner 771-BKV) claims alibi: 'Night Market 22:00-03:00'. Status: UNVERIFIED".
"USB Drive (Ev-001) recovered near rack 14. Contains encrypted partition. Traces of 'DarkSide' ransomware signature."
"Guard J. Kask blood sample positive for Zolpidem (sedative). Dosage consistent with forced ingestion approx 01:30."
"Vehicle 771-BKV detected by camera #442 (Pärnu Hwy) heading South at 02:45. Speed: 110km/h."
"Market vendor M. Tamm (no relation) states stall #42 was closed at 22:00. Contradicts Suspect A's alibi."
"Wallet 0x7a...f2 linked to A. Tamm received 2.5 BTC at 03:15. Sender wallet flagged as 'DarkSide Affiliate'."
"A. Tamm: Prior conviction (2021) for cyber-facilitated fraud. Known associate of 'The Broker' (Suspect B)."
"Firewall alert 02:10: Outbound SSH connection to IP 185.x.x.x (Moldova). 4.2GB data exfiltrated."
"Latent print lifted from Server Rack 14 handle. Match: A. Tamm (99.9% confidence)."
"Patrol unit reports individual matching description of 'The Broker' entering vehicle 771-BKV at 01:45."
"Post on 'BreachForums' at 03:30: 'Fresh gov database for sale. Estonia origin.' User: 'SilentNight'."
"Vehicle 771-BKV intercepted at 04:00. Laptop (Ev-002) found under passenger seat. Driver A. Tamm detained."
"Ev-002 contains SSH keys matching Central Data Facility server. Browser history shows access to 'BreachForums'."
"Suspect B ('The Broker') apprehended at safehouse. Confirms A. Tamm was hired for physical access."
Output (Mistral 7B)
Provenance Trace (PROV-O)
VerifiedSystem Specs
● OnlineModel: Mistral 7B (Ollama)
Input: JSON-LD Stream
Context: 8k Tokens
Mode: Air-gapped (Offline)
Why Summarize KOs?
Raw data is overwhelming. By summarizing structured KOs instead of raw text, the AI reduces hallucination risk because it is constrained to the "facts" already validated in the Knowledge Graph.
AI Assistant Desktop
Interactive prototype of the investigation workspace.
This prototype demonstrates the multi-widget dashboard interface. Use the sidebar to switch between views (Home, Dashboard, Documents, Analysis, Chat) and the persona selector to see role-specific configurations.
Widget-Based UI
The dashboard uses a flexible widget grid that adapts to each user's role. Investigators see the graph and timeline; analysts see scenarios and hypotheses.
Personas
Select a persona from the dropdown to see how the interface adapts: Investigator, Analyst, Supervisor, Prosecutor, Auditor, AML Specialist, or Audio Secretary.
The Intelligence Cycle: From Data to Insight
Making AI Reasoning Auditable and Explainable
Left panel shows the three-stage micro loop: Retrieve (via GraphRAG), Reason (via Graph-of-Thought), and Synthesize (via PROV-O). Right panel displays the macro loop showing the complete 1→2→3→4→5 cycle with live telemetry: PROV nodes created, confidence scores, and token consumption. The simulation demonstrates how tacit knowledge flows from raw experience through AI-mediated articulation to structured, reusable knowledge objects. Press "Run" to start the simulation. Colored nodes represent knowledge objects at different lifecycle stages.
Theoretical Backbone
This framework is the theoretical backbone that the system operationalizes. Each stage maps to specific system components and AI agents in the architecture.
Why this Cycle?
Traditional models say tacit knowledge can be made explicit but often skip the "how". This framework splits the process into two distinct actions: first explaining the reasoning (Articulation), then organizing it (Structuring). This provides a practical way to build AI that assists with each specific cognitive task.
Cycle Stages
- 1. Experience: (Dewey) grounds knowledge in action: learning by doing, not isolation.
- 2. Articulation: (Dennett, Polanyi) treats explanation as an elicited process requiring guided prompts.
- 3. Structuring: (Peirce) formalizes tacit insights through abductive reasoning.
- 4. Consolidation: (Weick) requires community sensemaking before knowledge enters the canon.
- 5. Innovation: (Whitehead) ensures knowledge remains dynamic, allowing pruning and refinement.
Dynamic vs. Linear
Traditional models present a linear spiral. This approach models investigation as a complex adaptive system with multiple feedback loops. Innovation can trigger new Experience; Consolidation can require return to Articulation.
Why Five Stages?
Each stage transforms knowledge through specific cognitive and social mechanisms. Unlike four-stage models, this separates Articulation from Structuring because they require different AI interventions: dialogue-based elicitation vs. abductive formalization.