AI Nearest Future

AI for Investigators

Improving knowledge transfer and operational efficiency in the investigation agency

Iren Irbe

PhD researcher in applied informatics, Tallinn University

Head of Unit, Investigations Dpt of Tax and Customs Board of Estonia

AI Near Future

Kurzweil Curve - exponential technological progress — Ray Kurzweil's exponential growth curve.

The Problem

Investigations are becoming cognitively overwhelming.

Data Overload

Modern investigations generate huge amounts of data (often over 1 TB per case). Current tools do not adequately support information processing.

Fragmented Information

Information is spread across multiple systems and different formats.

Time Pressure

Investigators must connect facts under tight deadlines.

Undocumented Experience

Important work-related experience and decision rationale often remain only in people's heads.

Key Insight

Investigators spend too much time managing information and too little time analysing it.

Information Overload (Miller, 1956)

Miller's research (1956) established that human working memory can hold approximately 7±2 items simultaneously. When data volume exceeds this cognitive capacity, processing degrades. Modern intelligence environments routinely exceed these limits, creating the "cognitive bottleneck" that severs the human connection required for tacit knowledge transfer.

Current Investigation Challenges

Slow Investigations

Manual information processing creates bottlenecks and delays case resolution times.

Missed Critical Connections

Higher risk that critical links between evidence, people, and events remain undiscovered.

Inconsistent Reasoning

Different approaches to similar cases result in quality variations.

Legal Vulnerability

Greater risk in situations where decisions cannot be adequately explained or justified later.

Shortcomings of Existing Solutions

Public AI Tools

Not suitable for processing sensitive investigation data. Security and confidentiality requirements prohibit the use of cloud-based AI solutions.

Cloud-Based Systems

Conflict with data protection requirements. Sensitive data must not leave the controlled environment.

Existing Investigation Tools

Focus on document storage, not analysis. They store information but do not support reasoning or establishing connections.

Human Analysis Alone

Does not scale with growing data volumes. Human cognitive limits are real.

Project as Applied Research Output

Based on research on tacit knowledge in high-stress work.

Focus

Decision-making under overload
Explaining reasoning
Knowledge loss when people leave

Solution

Practical tool based on research
Legally and organisationally compliant

Approach

Start from real work practices, not technology

Tacit Knowledge

Experiential knowledge that is hard to put into words, but manifests in skilled performance — intuition, pattern recognition, an "eye" for situations.

Proposed Solution — AI Investigator (Concept)

A secure AI assistant for investigators

A local, air-gapped AI solution that helps investigators organise information gathered through voice and text, think through possible explanations, and preserve important work-related knowledge — all while ensuring data never leaves the investigation agency and all activities comply with the law.

The system analyses previous cases, including audio recordings and notes, identifies recurring patterns, and helps consider possible future developments and preventive actions based on them.

Core Principles

Secure and Internal

Runs entirely on MTA infrastructure
No data is transmitted to cloud services or outside the agency
Data use and sharing comply with applicable legal restrictions (GDPR, LED, EU AI Act)

Verifiable and Transparent

All system responses are based on specific and verifiable source documents
The system clearly shows what conclusions and connections are based on
High-risk features (e.g., emotion recognition) are deliberately excluded

Organises and Analyses Past Events

Consolidates past cases, events, and evidence
Helps identify recurring behavioural and activity patterns
Supports thinking through risks and possible future developments

Supports Thinking, Not Prediction

Offers possible explanations and scenarios, not definitive predictions
Helps consider different behavioural patterns and their impact
Does not draw conclusions or make decisions on behalf of humans

Human Is Always the Decision-Maker

All system outputs require human confirmation
The investigator decides which patterns and scenarios to consider
Responsibility always remains with the investigator

Institutional Memory and Compliance

Helps preserve recurring patterns and lessons learned so they do not disappear with departing staff
Supports compliance with EU AI Act requirements, including preparing Fundamental Rights Impact Assessments (FRIA)

AI Support for Investigations

Investigation Impact

▸Faster understanding of complex cases
▸Reduced cognitive load
▸Clear, traceable reasoning (audit / court-ready)

Investigation Functions

What it does for the user

▸Summarises large case files
▸Finds connections (people, events, transactions)
▸Supports hypothesis testing
▸Preserves investigation logic

AI Foundations

How it is technically enabled

▸Processes large volumes of text and audio (incl. transcription)
▸Combines data from multiple sources (documents, emails, interviews)
▸Detects patterns and links across data
▸Handles foreign languages
▸Runs fully inside agency (secure, no external data sharing)

Practical Roles

AI Assistant

For Field Officers

Voice Control: hands-free queries in Estonian and English
Quick Summaries: e.g., "Summarise the last reports on suspect X"
Procedural Support: quick access to guidelines and protocols during work

AI Analyst

For Investigation Units

Hypothesis Generation: helps think through different scenarios
Impact Assessment: e.g., what may happen when certain measures are applied
Critical View: draws attention to possible bias or missed connections

Audio Secretary

For Investigators and Support Staff

Auto-Transcription: local transcription of meetings and conversations
Action Item Extraction: highlights action points from voice notes
Interview Support: helps spot inconsistencies in statements

AML Support

For Financial Crime Investigations

Complex Financial Flow Analysis: including new digital assets
Cross-Border Pattern Detection: helps see connections between different countries
UBO Unravelling: simple visualisation of ownership chains

Project Activities and Legal Assurance

Activity	Description
Project Preparation and Management	Project coordination and scope definition.
Solution and Infrastructure	On-premise server cluster, required hardware, and secure infrastructure for the solution.
Legal and Ethical Validation	Compliance with GDPR, LED, and EU AI Act (incl. FRIA).
Real-World Deployment	Using the solution in controlled conditions with investigators.
Training and User Support	User training and support during the development phase.
Measurement and Evaluation	Analysis of project outcomes and decision on further scaling.

Data Processing Logic and Protection

Data Flow Step by Step

Step	Stage	Description
1	Ingestion	Investigation-related materials are loaded into the system: documents, audio files, pictures, and structured data.
2	Indexing	Data is made searchable and interconnected for easy analysis.
3	Analysis	The system helps summarise information and surface connections, based solely on existing data.
4	Human Verification	All results are reviewed by the investigator and confirmed or corrected.
5	Output	Overviews, summaries, and visual views are generated to support the investigation.
6	Smart Forgetting	The system ensures data is kept only as long as the law permits. When the retention period expires, data is automatically archived or deleted — with a clear justification.

Supported Data Types

Documents: PDFs, Word and Excel files, emails
Audio and Video: recordings and their transcribed versions
Structured Data: bank transactions, registry data
Cooperation Channels: Europol, Interpol, and other authorised channels

How Data Protection Is Ensured

Data remains entirely within the organisation
All data is encrypted
The system maintains precise records of where information comes from and how it was used
Data usage is always controllable and auditable

Legal Compliance

Data processing complies with law enforcement requirements:

Data is used only for a specific purpose
Only necessary information is collected
Access is strictly limited

Retention Periods and Deletion

Every data type has a legally mandated lifecycle. The system enforces these automatically.

Data Type	Retention Period	Legal Basis	Exception Possible?
Criminal Case File evidence, protocols	10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes againsthumanity)	KrMS § 209 lg 2; VVm § 6 lg 1–4	Yes, if archivalvalue ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5
Surveillance File wiretaps, surveillance	Up to 50 years	KrMS § 126¹² lg 3	No CPC § 126¹² § 3 (strict limit)
Court File hearing protocols, decisions	10 yrs (after entry into force)	KrMS § 160¹ lg 6–7	Yes, if archivalvalue ArhS § 2 §§ 3–4; § 8 § 2
DNA / Fingerprint Data	Until criminal record deletion	KrMS § 206 lg 4	Yes, uponacquittal CPC § 206 § 4 (immediate deletion)
AI Reasoning Logs provenance, decisions	Min 6 months; technical docs 10 yrs after market withdrawal	EU AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1	Yes, extendable on dispute LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1
Personal Data in Case File	Until purpose is fulfilled	IKS § 17; LED Art. 4(1)(e), Art. 5	No, except as prescribed by law IKS § 17 lg 2; § 25 lg 3–4
Pattern Database Entries anonymous modus operandi	Indefinite (no personal data)	LED Art. 4(1)(c)	Not applicable Anonymous data, GDPR does not apply
Audit Log who, when, what	At least 3 years	IKS § 36; LED Art. 25(2)	Yes, extendable IKS § 36 lg 5; E-ITS/ISKE security class
Protocols and Recordings interrogations, observations	With case file (10–15 yrs)	KrMS § 146, § 148, § 150 lg 4	Yes, with case file VVm § 6 (follows main document)
Public Information under PIA	5–50 years (document type)	AvTS § 42	Yes, if archivalvalue ArhS § 2 § 3; § 8 §§ 1–2

Automatic Deletion

The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.

This prevents situations where investigators keep data "just in case" — fear that it might be needed later.

Requesting Exceptions

Extended retention requires a justified request. The system logs all exceptions and their justifications.

Example: archival-value documents, pending challenges, international cooperation.

LED vs GDPR

Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.

GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).

Retention Period Details

Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes against humanity, etc.

Surveillance File: CPC § 126¹² § 3: upon conviction until criminal record deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.

AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.

Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.

Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.

Data Lifecycle and System Context

How data moves within an air-gapped environment and how traceability and deletion are ensured.

Data Flow in an Air-Gapped System

Data flow: external sources → USB → case file storage → AI processing → export → secure archive with automatic deletion times.

What Data Is Collected and Where Does It Go?

Data Type	Input Method	Storage	Deletion
Device Data phones, computers	Via specialised software → USB → upload	Case File Storages, isolated	On case file closure or retentionperiod expiry
RegistryQueries banks, databases	Manual confirmation before each query	Case File Storages + provenance log	Automatic expiration control
CourtDecisions, Orders	USB or manual upload	Case File Storages	Per retentionrequirements
General Documents laws, procedures	By the Administrator	Separate shared knowledgebase	Versioncontrol based
AI Outputs summaries, hypotheses	Generated by the system	Within the case file + provenance graph	With the case file

Frequently Asked Questions About Data

Question	Answer
Do I need to re-upload data when returning to a case file?	No. All data loaded into a case file is retained until the file is closed or deleted. The investigator can immediately continue where they left off.
How does AI learn from case file documents?	AI does not train on user data. A pre-trained model is used. Only anonymous modus operandi is extracted from documents into the pattern database.
Does a "superdatabase" emerge where everything is cross-queryable?	No. Each case file is fully isolated. Cross-searching between case files is technically impossible. The pattern database contains only anonymous information.
Who deletes data and when?	The system tracks retention deadlines automatically. The investigator sets exceptions. On export, deletion times are set automatically (LED, AI Act requirements).
How can data be recovered after deletion?	The provenance graph shows where data originated. If needed, it can be re-queried from sources (if the source still permits).
How to export case file documents?	Reports + documents are exported to a secure archive. Automatic deletion times are set according to document type and legal requirements.

FAQ

Where is data stored? → Only on agency servers, within case files
Where does it go? → Deleted per retention deadlines
How does AI learn? → Uses pre-trained model; does not train on user data
Do case files cross? → No, each case file is fully isolated

Air-Gapped Workflow

Current practice: data is copied from "black computers" to USB → transferred to analysis network → stored in regional storage (up to 100TB). Access via RDP, printers allowed, disks not.

Provenance Tracking

Provenance = data history (where it came from and how it got here).

Source (e.g., which system, document, interview)
When it was created or retrieved
What transformations were applied (e.g., summarised, translated)
Links to original records

The provenance graph shows the full path of the data — so you can trace it back and re-query the original source if needed.

Metadata vs Provenance

Metadata = data about the data (descriptive labels).

File type, author, date
Keywords, tags
Case ID, document category

Metadata helps organise and find data, but it does not explain its origin or processing history.

Interoperability

Built-in support for ISO 20022 and JIT workspaces for cross-border cooperation (FUNC-160, FUNC-162).

AI Usage Restrictions

Situation	Reason	Action
Classified Materials state secrets, NATO	Air-gapped environment is necessary but not sufficient. Separate accreditation to the relevant classification level is required (ISKE, NATO security class). AI model assessment for classified processing.	Separate accredited environment; or manual processing
Source Protection Cases informants	Source identities must be protected	Source identities must not reach any log or pattern database.
Court Prohibition specific order	A court may prohibit automated processing in a specific case.	AI functions are blocked at the case file level
Data Subject Objection GDPR Art. 21 (where applicable)	In administrative cases, the data subject may object to profiling.	Manual review; AI output does not affect the decision

Live Demo: AI Summarisation Module

The AI ingests a stream of structured Knowledge Objects (KOs), representing different evidence sources, and synthesises them into a coherent, readable summary.

Input View (Knowledge Objects) Token Usage: 842/8192

// INGESTED EVIDENCE STREAM (JSON-LD)

KO-001 (Incident Report):
"At 02:35, silent alarm at Central Data Facility. Rear door unsecured. Guard J. Kask found unconscious".

KO-002 (Surveillance Log):
"Camera 04 captures Blue Van (771-BKV) departing at 02:15. Driver unidentifiable. Logs 02:00–02:30 deleted".

KO-003 (Suspect Interview):
"Suspect A. Tamm (Owner 771-BKV) claims alibi: 'Night Market 22:00–03:00'. Status: UNVERIFIED".

KO-004 (Forensics Preliminary):
"USB Drive (Ev-001) recovered near rack 14. Contains encrypted partition. Traces of 'DarkSide' ransomware signature."

KO-005 (Toxicology Report):
"Guard J. Kask blood sample positive for Zolpidem (sedative). Dosage consistent with forced ingestion approx 01:30."

KO-006 (ANPR Hit):
"Vehicle 771-BKV detected by camera #442 (Pärnu Hwy) heading South at 02:45. Speed: 110km/h."

KO-007 (Witness Statement):
"Market vendor M. Tamm (no relation) states stall #42 was closed at 22:00. Contradicts Suspect A's alibi."

KO-008 (Financial Intel):
"Wallet 0x7a...f2 linked to A. Tamm received 2.5 BTC at 03:15. Sender wallet flagged as 'DarkSide Affiliate'."

KO-009 (Background Check):
"A. Tamm: Prior conviction (2021) for cyber-facilitated fraud. Known associate of 'The Broker' (Suspect B)."

KO-010 (Network Log):
"Firewall alert 02:10: Outbound SSH connection to IP 185.x.x.x (Moldova). 4.2GB data exfiltrated."

KO-011 (Physical Evidence):
"Latent print lifted from Server Rack 14 handle. Match: A. Tamm (99.9% confidence)."

KO-012 (Suspect B Sighting):
"Patrol unit reports individual matching description of 'The Broker' entering vehicle 771-BKV at 01:45."

KO-013 (Dark Web Chatter):
"Post on 'BreachForums' at 03:30: 'Fresh gov database for sale. Estonia origin.' User: 'SilentNight'."

KO-014 (Vehicle Search):
"Vehicle 771-BKV intercepted at 04:00. Laptop (Ev-002) found under passenger seat. Driver A. Tamm detained."

KO-015 (Laptop Forensics):
"Ev-002 contains SSH keys matching Central Data Facility server. Browser history shows access to 'BreachForums'."

KO-016 (Arrest Report):
"Suspect B ('The Broker') apprehended at safehouse. Confirms A. Tamm was hired for physical access."

Task: Synthesize KOs into Executive Briefing.

Figure 8: Simulation of multi-source evidence summarisation.

System Specs

● Online

Model: Mistral 7B (Ollama)

Input: JSON-LD Stream

Context: 8k Tokens

Mode: Air-gapped (Offline)

Why Summarize KOs?

When AI summarises structurizes KOs (not from arbitrary free text), the risk of hallucination decreases because the summary must rely on the "facts" already recorded in the knowledge graph.

Assistant Desktop

Interactive prototype of the investigation workspace.

This prototype demonstrates the multi-widget dashboard. Use the sidebar to switch between views (Home, Dashboard, Documents, Analysis, Chat) and select a role from the dropdown to see role-specific configurations.

Widget-Based UI

The dashboard adapts to the user's role. The investigator sees graphs and timelines; the analyst sees scenarios and hypotheses.

Roles

Select a role from the dropdown to see how the interface adapts: Investigator, Analyst, Supervisor, Prosecutor, Auditor, AML Specialist, Audio Secretary.

Interface

Interactive prototype of the voice-first assistant designed for high-stress environments.

Key Features

1. Voice-First Interaction

Prioritizing voice lowers the cognitive barrier for articulating tacit knowledge, encouraging storytelling and in-the-moment narration.

2. Conversational Externalization

The AI acts as a Socratic partner, using "Intuition Pumps" to elicit hidden assumptions during the conversation.

3. Groundedness (GraphRAG)

Every answer is anchored in the Knowledge Graph. The UI explicitly links generated insights back to their source KOs.

4. Context-Aware Adaptation

Adapts interface and suggestions based on the user's current role and location.

5. EASCI Integration

Seamlessly bridges the gap between capturing raw Experience and Articulating it into structured knowledge.

Try it: Click the microphone icon in the prototype to simulate a voice capture session.

Cognitive Load Theory

Sweller (1988). Working memory is limited. In high-stress situations, the cognitive load of typing (visual-motor) competes with the task. Voice (auditory-verbal) uses a separate channel, reducing interference.

Socratic Method

The AI doesn't just record; it asks "Why?". "Why did you check the trunk first?" This forces the expert to make their implicit reasoning explicit.

Voice Efficiency

Speaking is 3x faster than typing (150 wpm vs 40 wpm). In high-stress environments, typing is a friction point that prevents knowledge capture.

Presenter Notes

Interactive Demo: This isn't a screenshot. It's the actual code running in an iframe.
Why Voice? It's not just convenience. It's about cognitive load. Police officers can't type while assessing a threat.
Socratic Partner: Emphasize that the AI is active, not passive. It probes for details.
EASCI Integration: This is the "E" (Experience) and "A" (Articulation) part of the loop happening in real-time.

The Intelligence Cycle

Making AI Reasoning Auditable and Explainable

Micro Loop (Real-Time Inference)

GraphRAG Retrieve

GoT Reasoning Reason

PROV-O Graph Synthesize

Current Phase

SYSTEM READY

Live Telemetry

Waiting for simulation start...

PROV: 0 nodes

CONF: --%

TOKENS: 0

> System ready. Waiting for new Knowledge Objects...

Figure 5: The Intelligence Cycle in Action.
Left panel shows the three-stage micro loop: Retrieve (via GraphRAG), Reason (via Graph-of-Thought), and Synthesize (via PROV-O). Right panel displays the macro loop (1→2→3→4→5) with live telemetry: PROV nodes created, confidence scores, and token consumption. The simulation demonstrates how tacit knowledge flows from raw experience through AI-mediated articulation to structured, reusable knowledge objects. Press "Run" to start the simulation. Coloured nodes represent knowledge objects at different lifecycle stages.

Theoretical Backbone

This framework underpins the system's operational logic. Each stage maps to specific modules and AI agent tasks in the architecture.

Why This Cycle?

Traditional models say tacit knowledge can be made explicit but often skip the "how". This framework splits the process into two distinct actions: first explaining the reasoning (Articulation), then organising it (Structuring). This provides a practical way to build AI that assists with each specific cognitive task.

Cycle Stages

1. Experience: (Dewey) grounds knowledge in action: learning by doing, not isolation.
2. Articulation: (Dennett, Polanyi) treats explanation as an elicited process requiring guided prompts.
3. Structuring: (Peirce) formalises tacit insights through abductive reasoning.
4. Consolidation: (Weick) requires community sensemaking before knowledge enters the canon.
5. Innovation: (Whitehead) ensures knowledge remains dynamic, allowing pruning and refinement.

Dynamic, Not Linear

Traditional models present a linear spiral. Investigation is actually a complex adaptive system with multiple feedback loops: new knowledge can trigger new Experience; Consolidation can require return to Articulation.

Why Five Stages?

Each stage transforms knowledge through specific cognitive mechanisms. Unlike four-stage models, this separates Articulation from Structuring because they require different AI interventions: dialogue-based elicitation vs. abductive formalisation.

Interactive Demo: Mobile + Desktop

Collaborative hypothesis workflow — from field officer to analyst in real time.

Mobile App Voice-based capture in the field

Desktop App Investigation Platform (A4 format)

Field Capture

Officers use voice to capture observations during patrol, interviews, or inspections. The mobile interface prioritises speed and minimal cognitive load.

Deep Analysis

Analysts access the full knowledge graph, entity relationships, and reasoning chains through the desktop platform's multi-widget layout.

Mobile App

Field officers use voice input to quickly capture evidence and observations at the scene.

Desktop App

The analyst sees the knowledge graph updating in real time and can immediately work with new evidence.

Synchronisation

Information captured on mobile appears instantly on the desktop graph — collaboration without delay.

Data Separation and Legal Compliance

How the system ensures compliance with the Law Enforcement Directive (LED), GDPR, the EU AI Act, and Estonian law

Principle: Data vs Patterns

The system does not retain personal data across cases. Only anonymous modus operandi is preserved — a crime scheme or method that is completely separated from specific individuals and individual cases.

Data Flow and Separation

Data flow from case file to pattern database: personal data is removed, only abstract modus operandi is preserved.

Does and Dont’s

What the System Does NOT Do	What IS Done
Cross-case data sharing between case files	Each case file is fully isolated from other case files
Storing personal data in the pattern database	Only anonymous and abstract modus operandi is stored
Training algorithms on personal or case data	The algorithm uses a pre-trained model
Transmitting data to cloud or external servers	Data remains fully under the organisation's control

Legal Compliance

Requirement	Implemented Measures and Solutions
*Data Protection and Privacy*
Estonian Constitution § 26 — Privacy of private life	Case-based data isolation; role-based access control (RBAC); all queries are logged for audit trail purposes.
Estonian Constitution § 43 — Secrecy of communications	Encrypted communication data storage; access only through court order ID binding; automatic data expiry checks.
LED Art. 4 — Lawfulness and fairness	Data is processed solely for law enforcement purposes in a transparent and traceable manner.
LED Art. 4(1)(b) — Purpose limitation	Each case file is strictly isolated; data is not used for other purposes.
LED Art. 4(1)(c) — Data minimisation	Only abstract schemes are stored in the pattern database; personal data is not retained.
Estonian PDPA § 14, § 15 — Processing principles and lawfulness	Built-in purpose limitations; law enforcement processing lawfulness per § 15; automatic data quality validation; encrypted data transmission (TLS 1.3).
Estonian PDPA § 20 — Special categories of personal data	Processing of special category data (race, ethnicity, political views, religion, health, biometrics) only in cases prescribed by law; additional security measures.
LED Art. 10 — Processing of special categories of data	Special category personal data processing only when strictly necessary; appropriate safeguards; automatic classification and restrictions.
Estonian PDPA § 43 — Security measures	Data encryption at rest (AES-256); RBAC-based role management; full audit logging; automatic backup.
GDPR Art. 5, 6 (where applicable)	When processing administrative or non-criminal investigation data: data subject consent or legitimate interest; mandatory data subject notification.
*Human Oversight and Automated Decision-Making*
Estonian Constitution § 22 — Presumption of innocence	AI does not make guilt-determining decisions; the system supports the investigator and does not replace court rulings.
LED Art. 11 — Automated decision-making	AI does not make automated decisions; all results are confirmed by the investigator; profile-based decisions without human intervention are prohibited.
EU AI ACT Art. 14(1)–(4) — Human oversight	The investigator can override, correct, or ignore AI output at any time; the system can be stopped with a "stop" button; the UI displays limitations and capabilities.
EU AI ACT Art. 6 — High-risk systems	Completed Fundamental Rights Impact Assessment (FRIA); technical documentation per Article 11; risk assessment log; conformity declaration.
*Transparency and Right to Explanation*
Estonian Constitution § 15 — Right to effective proceedings	AI reasoning chain export in PDF or JSON format; provenance graph enables step-by-step challenge of the decision process.
Estonian Constitution § 24 — Right to fair trial	AI outputs are transparent, explainable, and accessible to the defence.
Estonian Constitution § 44(3) — Right to access data	Built-in Data Subject Access Request (DSAR) export; personal data query report is generated automatically.
EU AI ACT Art. 86 — Right to explanation	Each AI output includes an explanation (XAI); provenance graph displays inputs, inferences, and sources.
EU AI ACT Art. 13 — Transparency	The system user guide and UI explain AI capabilities, limitations, and intended use cases.
EU AI ACT Art. 50 — User notification	Users are clearly informed they are interacting with AI; outputs are marked as AI-generated.
*Data Quality and Evidence*
LED Art. 7 — Data quality	AI-based conclusions are clearly distinguished from facts; the investigator confirms data accuracy before further use.
Estonian CPC § 63 — Concept of evidence	AI is an investigative aid; documents prepared by the investigator based on AI analysis may qualify as evidence within the meaning of § 63 (other document).
Estonian CPC § 64 — Conditions for evidence collection	Full traceability is ensured: each AI output references the original source and maintains data integrity.
Estonian CPC § 146 — Procedural action protocol	Documents prepared with AI assistance comply with protocol format requirements: date, author, criminal case number, course, and results of the action.
Estonian CPC § 150 — Audio and video recording	A report based on AI analysis may rely on material recorded under CPC § 150; recordings are unaltered and added to the case file.
*Security and Logging*
Estonian PDPA § 36 — Logging	Logged: collection, modification, reading, transmission, combination, and deletion. Logs are retained for at least 3 years.
LED Art. 25(1) — Logging obligation	Automatically logged: collection, modification, query, disclosure (incl. transmission), combination, deletion. Logs ensure traceability and help detect unauthorised access.
E-ITS (ISKE) — Security measures	Compliance with the Estonian information security standard: security class is determined by data confidentiality, integrity, and availability; ISKE catalogue measures are applied.
*Retention and Deletion*
LED Art. 5 — Retention periods	Personal data is retained only as long as necessary for the purpose. The system automatically tracks deadlines and notifies of expiry.
EU AI ACT Art. 12 — Log retention	AI system logs are retained for at least 6 months. Provenance graphs and decision logs are exported to archive.
Estonian PIA § 12 — Document retention	Documents subject to archiving obligation are exported separately. Automatic deletion times according to document type.

Retention Periods and Deletion

Data Type	Retention Period	Legal Basis	Exception Possible?
Criminal Case File evidence, protocols	10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes againsthumanity)	KrMS § 209 lg 2; VVm § 6 lg 1–4	Yes, if archivalvalue ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5
Surveillance File wiretaps, surveillance	Up to 50 years	KrMS § 126¹² lg 3	No CPC § 126¹² § 3 (strict limit)
Court File hearing protocols, decisions	10 yrs (after entry into force)	KrMS § 160¹ lg 6–7	Yes, if archivalvalue ArhS § 2 §§ 3–4; § 8 § 2
DNA/FingerprintData	Kuni karistusandmete kustutamiseni	KrMS § 206 lg 4	Yes, uponacquittal CPC § 206 § 4 (immediate deletion)
AI arutluslogid provenance, otsused	Min 6 months; technical docs 10 yrs after market withdrawal	EL AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1	Jah, vaidlustamisel pikendatav LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1
Personal Data toimikus	Until purpose is fulfilled	IKS § 17; LED Art. 4(1)(e), Art. 5	No, except asprescribed by law IKS § 17 lg 2; § 25 lg 3–4
Pattern Database Entries anonymous modus operandi	Indefinite (no personal data)	LED Art. 4(1)(c)	Not applicable Anonymous data, GDPR does not apply
Audit Log who, when, what	At least 3 years	IKS § 36; LED Art. 25(2)	Yes, extendable IKS § 36 lg 5; E-ITS/ISKE turvaklassist
Protocols and Recordings interrogations, observations	With case file (10–15 yrs)	KrMS § 146, § 148, § 150 lg 4	Yes, with case file VVm § 6 (follows main document)
Public Information under PIA	5–50 years (document type)	AvTS § 42	Yes, if archivalvalue ArhS § 2 § 3; § 8 §§ 1–2

Glossary of Abbreviations

RBAC: Role-Based Access Control. Users see only what their role permits.
TLS 1.3: Transport Layer Security — encrypted protocol for secure data transmission.
AES-256: Advanced Encryption Standard — symmetric encryption with 256-bit key.
FRIA: Fundamental Rights Impact Assessment — required by the EU AI Act for high-risk systems.
XAI: Explainable AI. Every AI output includes an understandable justification.
DSAR: Data Subject Access Request — a query by the data subject about their personal data (GDPR Art. 15).
Provenance Graph: A visual graph showing data origin and processing history.

Modus Operandi

Latin for "mode of operating". In criminalistics, it refers to a criminal's characteristic behavioural pattern.

Example: "Money is moved through 3 countries using 5 shell companies" — without names, dates, or amounts.

LED vs GDPR

Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.

GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).

Requesting Exceptions

Extended retention requires a justified request. The system logs all exceptions and their justifications.

Example: archival-value documents, pending challenges, international cooperation.

Automatic Deletion

The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.

This prevents situations where investigators keep data "just in case" — fear that it might be needed later.

Retention Periodw Explanationsd

Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes againsthumanity, etc.

Surveillance File: CPC § 126¹² § 3: upon conviction until criminalrecord deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.

AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.

Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.

Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.

Abductive Reasoning

"Abduction is the process of forming explanatory hypotheses. It is the only logical operation which introduces any new idea." (C.S. Peirce, 1931)

Linear (Deductive)

Premise: Rule

↓

Premise: Case

↓

Conclusion (Certain)

Fragile: If one premise fails, the chain breaks.

Branching (Abductive)

Observation: "Van at Scene"

H1: Delivery

✗

H2: Collusion

✓

H3: Coercion

?

Best Explanation Selected

Resilient: Survives uncertainty by weighing options.

Linear Reasoning (Deductive)

IF suspect has motive

AND suspect has means

AND suspect at scene

THEN suspect is guilty

Problem: Premises must be certain.

Abductive Reasoning (Detective)

OBSERVATIONS:

Warehouse breach (02:00-04:00)
Logs deleted
Van 771-BKV on camera

HYPOTHESES:

H1: A. Tamm & J. Kask colluding (Confidence: 0.73)

H2: J. Kask victim (Confidence: 0.21)

H3: Legitimate delivery (Confidence: 0.06)

NEXT STEPS:

Verify Tamm's access logs to test H1.

The Logic of Investigation

Type	Formula	Certainty
Deduction	Rule + Case = Result	Certain
Induction	Cases = Rule	Probabilistic
Abduction	Result + Rule = Case	Creative / Plausible

Expert reasoning in policing is primarily abductive: guessing the cause from the effects.

The AI Performance Gap

Current AI models struggle with abduction. On the ART Benchmark, AI scores ~69% vs 91% for humans (Bhagavatula et al., 2020).

Implication: Full automation of the "conclusion" phase is not possible. The AI generates hypotheses, but the human must select the best one.

TacitFlow Implementation (Phase 3)

Current State: Manual abductive reasoning by analysts.
Planned: AI generates 3-5 competing hypotheses (e.g., "Collusion" vs "Coercion") and suggests discriminating evidence.
Goal: Support the analyst's "Satisficing" process by surfacing relevant precedents.

Inference to the Best Explanation

The modern name for abduction. Given surprising observations, generate hypotheses that would explain them, then select the best based on explanatory virtues (simplicity, scope).

Satisficing

Herbert Simon (1956). Accepting a solution that is "good enough" rather than optimal. Experts satisfice by recognizing situations quickly. TacitFlow supports this by surfacing relevant precedents.

ACL Findings 2025

The RECV benchmark decomposes 1,500 claims into deductive vs abductive atoms. Deductive items stay solvable, but every model craters on abductive rows (Dougrez-Lewis et al., 2025).

Presenter Notes

Sherlock Holmes Logic: Holmes didn't deduce; he abducted. He guessed the best explanation.
The Gap: AI is great at math (deduction) and patterns (induction), but terrible at creative guessing (abduction).
Human Role: This is why the human is essential. The AI proposes; the human decides.

Adversarial Debate Model

"One agent proposes a hypothesis; another's only job is to find flaws. This stress-tests theories and avoids confirmation bias."

A1

Proposer: "Hypothesis: It's Tamm. He was at the scene."

A2

Critic: "Flaw found: Alibi is unverified. Camera 2 is empty."

✓

Outcome: New Task → Verify Alibi

Single-Agent Reasoning

1. Agent generates hypothesis

2. Agent evaluates own hypothesis

3. Agent confirms own reasoning

4. THEN hypothesis accepted

Problem: Confirmation bias.

Adversarial Debate (TacitFlow)

Proposer (Agent 1):

"It's Tamm. He was at the scene."

Critic (Agent 2):

Flaw 1: Alibi is unverified

Flaw 2: Camera 2 shows nothing

Flaw 3: Motive unclear

Outcome:

New Task: Verify Alibi → Test hypothesis.

TacitFlow Implementation (Roadmap: Phase 3)

Current State (Pilot): Single-agent reasoning with human oversight; manual critique by analysts.
Planned Enhancement: Dual-agent debate where Critic is incentivized solely to find logical flaws.
Basis: "Debate" models (Irving et al., 2018) and "Reflexion" (Shinn et al., 2023) self-correction.

Cognitive Rationale

Goal: Prevent "groupthink" and confirmation bias (Nickerson, 1998).

Mechanism: Dual-agent debate forces explicit consideration of disconfirming evidence (Irving et al., 2018).

IJCAI Logical Reasoning Survey

IJCAI 2025's survey splits reasoning gaps into logical QA vs logical consistency. Solver-based pipelines need NL→symbolic translators plus SAT/FOL tooling yet drop facts. Adversarial debate sidesteps this by keeping reasoning in natural language. (Cheng et al., 2025)

Presenter Notes

This is "Future Work" but critical for credibility.
Admit that LLMs are prone to "syccophancy" (agreeing with the user).
The "Critic" agent is the solution to this.

TacitFlow Alternatives

Why a custom airgapped solution? Evaluating TacitFlow against market alternatives.

Dimension	Palantir Gotham	IBM i2 Analyst	ChatGPT/Claude	TacitFlow
Cost & Licensing
Pricing Model	Per-user annual €50k+ / analyst	Perpetual + maint €25k + 20%	Metered API Variable	Tiered models €36k (infra)
Lock-in Risk	High Proprietary fmt	Medium Some export	High Cloud-only	Low Open Standards
Sovereignty & Security
Data Sovereignty	Configurable On-prem costly	Full control On-prem	None US Cloud	Air-gapped 100% Sovereign
LED Compliance	Possible Audit needed	Possible Manual	Non-compliant Data export	Native By Design
Knowledge Management
Tacit Knowledge	No Explicit only	No Visual only	No Stateless	Core Feature EASCI Framework

Market Analysis

Lock-in: Proprietary formats hold data hostage. TacitFlow uses open W3C standards.
vs Palantir: Palantir is for explicit data fusion, not tacit reasoning. Cost-prohibitive for small agencies.
vs ChatGPT: Public LLMs violate Data Sovereignty and LED compliance.

Unique Value Proposition

Sovereign: Air-gapped & On-premise.
Specialized: Built for Tacit Knowledge.
Predictable: Fixed hardware cost.
Compliant: Automated "Smart Forgetting".

Architecture

Internal components and AI agent orchestration.

C4Context title System Context Diagram for AI Investigator Person(investigator, "Investigator", "Analyst, Officer, Manager") Enterprise_Boundary(b0, "Agency Boundary") { System(ai_system, "AI Investigator", "RAG analysis, hypothesis generation, knowledge capture") } System_Ext(data_sources, "Data Sources", "Banks, Registries, Logs") System_Ext(integrations, "EU Integrations", "Europol, Interpol, JITs") Rel(investigator, ai_system, "Queries & Reviews") Rel(ai_system, data_sources, "Ingests data") Rel(ai_system, integrations, "Shares patterns") UpdateLayoutConfig($c4ShapeInRow="4", $c4BoundaryInRow="1") UpdateRelStyle(investigator, ai_system, $textColor="blue", $lineColor="blue")

Division of Labour

Different agents handle distinct tasks (retrieval and synthesis vs reasoning). The Evaluation Engine continuously monitors output quality and verifiability.

AI in Daily Work

AI is used in real investigations.

Practical Benefits Are Measurable

Evaluating whether the solution actually makes work faster and clearer.

The Solution Fits Real Work Conditions

Legal, security, and operational requirements are accounted for from the start.

AI Is Used Safely

Clear rules define how and for what purposes AI may be used.

AI for Investigators

AI Near Future

The Problem

Data Overload

Fragmented Information

Time Pressure

Undocumented Experience

Key Insight

Information Overload (Miller, 1956)

Current Investigation Challenges

Slow Investigations

Missed Critical Connections

Inconsistent Reasoning

Legal Vulnerability

Shortcomings of Existing Solutions

Public AI Tools

Cloud-Based Systems

Existing Investigation Tools

Human Analysis Alone

Project as Applied Research Output

Focus

Solution

Approach

Tacit Knowledge

Proposed Solution — AI Investigator (Concept)

Core Principles

Secure and Internal

Verifiable and Transparent

Organises and Analyses Past Events

Supports Thinking, Not Prediction

Human Is Always the Decision-Maker

Institutional Memory and Compliance

AI Support for Investigations

Investigation Impact

Investigation Functions

AI Foundations

Practical Roles

AI Assistant

AI Analyst

Audio Secretary

AML Support

Project Activities and Legal Assurance

Data Processing Logic and Protection

Data Flow Step by Step

Supported Data Types

How Data Protection Is Ensured

Legal Compliance

Retention Periods and Deletion

Automatic Deletion

Requesting Exceptions

LED vs GDPR

Retention Period Details

Data Lifecycle and System Context

Data Flow in an Air-Gapped System

What Data Is Collected and Where Does It Go?

Frequently Asked Questions About Data

FAQ

Air-Gapped Workflow

Provenance Tracking

Metadata vs Provenance

Interoperability

AI Usage Restrictions

Live Demo: AI Summarisation Module

Output (Mistral 7B)

Provenance Trace (PROV-O)

System Specs

Why Summarize KOs?

Assistant Desktop

Widget-Based UI

Roles

Interface

Key Features

1. Voice-First Interaction

2. Conversational Externalization

3. Groundedness (GraphRAG)

4. Context-Aware Adaptation

5. EASCI Integration

Cognitive Load Theory

Socratic Method

Voice Efficiency