SCENE 1 0:00

AI Nearest Future

AI for Investigators

Improving knowledge transfer and operational efficiency in the investigation agency

Iren Irbe

PhD researcher in applied informatics, Tallinn University

Head of Unit, Investigations Dpt of Tax and Customs Board of Estonia


AI Near Future

Kurzweil Curve - exponential technological progress
Ray Kurzweil's exponential growth curve.

The Problem

Investigations are becoming cognitively overwhelming.

Data Overload

Modern investigations generate huge amounts of data (often over 1 TB per case). Current tools do not adequately support information processing.

Fragmented Information

Information is spread across multiple systems and different formats.

Time Pressure

Investigators must connect facts under tight deadlines.

Undocumented Experience

Important work-related experience and decision rationale often remain only in people's heads.

Key Insight

Investigators spend too much time managing information and too little time analysing it.

Information Overload (Miller, 1956)

Miller's research (1956) established that human working memory can hold approximately 7±2 items simultaneously. When data volume exceeds this cognitive capacity, processing degrades. Modern intelligence environments routinely exceed these limits, creating the "cognitive bottleneck" that severs the human connection required for tacit knowledge transfer.


Current Investigation Challenges

Slow Investigations

Manual information processing creates bottlenecks and delays case resolution times.

Missed Critical Connections

Higher risk that critical links between evidence, people, and events remain undiscovered.

Inconsistent Reasoning

Different approaches to similar cases result in quality variations.

Legal Vulnerability

Greater risk in situations where decisions cannot be adequately explained or justified later.


Shortcomings of Existing Solutions

Public AI Tools

Not suitable for processing sensitive investigation data. Security and confidentiality requirements prohibit the use of cloud-based AI solutions.

Cloud-Based Systems

Conflict with data protection requirements. Sensitive data must not leave the controlled environment.

Existing Investigation Tools

Focus on document storage, not analysis. They store information but do not support reasoning or establishing connections.

Human Analysis Alone

Does not scale with growing data volumes. Human cognitive limits are real.


Project as Applied Research Output

Based on research on tacit knowledge in high-stress work.

Focus

  • Decision-making under overload
  • Explaining reasoning
  • Knowledge loss when people leave

Solution

  • Practical tool based on research
  • Legally and organisationally compliant

Approach

  • Start from real work practices, not technology
Tacit Knowledge

Experiential knowledge that is hard to put into words, but manifests in skilled performance — intuition, pattern recognition, an "eye" for situations.


Proposed Solution — AI Investigator (Concept)

A secure AI assistant for investigators

A local, air-gapped AI solution that helps investigators organise information gathered through voice and text, think through possible explanations, and preserve important work-related knowledge — all while ensuring data never leaves the investigation agency and all activities comply with the law.

The system analyses previous cases, including audio recordings and notes, identifies recurring patterns, and helps consider possible future developments and preventive actions based on them.

Core Principles

Secure and Internal

  • Runs entirely on MTA infrastructure
  • No data is transmitted to cloud services or outside the agency
  • Data use and sharing comply with applicable legal restrictions (GDPR, LED, EU AI Act)

Verifiable and Transparent

  • All system responses are based on specific and verifiable source documents
  • The system clearly shows what conclusions and connections are based on
  • High-risk features (e.g., emotion recognition) are deliberately excluded

Organises and Analyses Past Events

  • Consolidates past cases, events, and evidence
  • Helps identify recurring behavioural and activity patterns
  • Supports thinking through risks and possible future developments

Supports Thinking, Not Prediction

  • Offers possible explanations and scenarios, not definitive predictions
  • Helps consider different behavioural patterns and their impact
  • Does not draw conclusions or make decisions on behalf of humans

Human Is Always the Decision-Maker

  • All system outputs require human confirmation
  • The investigator decides which patterns and scenarios to consider
  • Responsibility always remains with the investigator

Institutional Memory and Compliance

  • Helps preserve recurring patterns and lessons learned so they do not disappear with departing staff
  • Supports compliance with EU AI Act requirements, including preparing Fundamental Rights Impact Assessments (FRIA)

AI Support for Investigations

Investigation Impact

  • Faster understanding of complex cases
  • Reduced cognitive load
  • Clear, traceable reasoning (audit / court-ready)

Investigation Functions

What it does for the user

  • Summarises large case files
  • Finds connections (people, events, transactions)
  • Supports hypothesis testing
  • Preserves investigation logic

AI Foundations

How it is technically enabled

  • Processes large volumes of text and audio (incl. transcription)
  • Combines data from multiple sources (documents, emails, interviews)
  • Detects patterns and links across data
  • Handles foreign languages
  • Runs fully inside agency (secure, no external data sharing)

Practical Roles

AI Assistant

For Field Officers

  • Voice Control: hands-free queries in Estonian and English
  • Quick Summaries: e.g., "Summarise the last reports on suspect X"
  • Procedural Support: quick access to guidelines and protocols during work

AI Analyst

For Investigation Units

  • Hypothesis Generation: helps think through different scenarios
  • Impact Assessment: e.g., what may happen when certain measures are applied
  • Critical View: draws attention to possible bias or missed connections

Audio Secretary

For Investigators and Support Staff

  • Auto-Transcription: local transcription of meetings and conversations
  • Action Item Extraction: highlights action points from voice notes
  • Interview Support: helps spot inconsistencies in statements

AML Support

For Financial Crime Investigations

  • Complex Financial Flow Analysis: including new digital assets
  • Cross-Border Pattern Detection: helps see connections between different countries
  • UBO Unravelling: simple visualisation of ownership chains

Project Activities and Legal Assurance

Activity Description
Project Preparation and Management Project coordination and scope definition.
Solution and Infrastructure On-premise server cluster, required hardware, and secure infrastructure for the solution.
Legal and Ethical Validation Compliance with GDPR, LED, and EU AI Act (incl. FRIA).
Real-World Deployment Using the solution in controlled conditions with investigators.
Training and User Support User training and support during the development phase.
Measurement and Evaluation Analysis of project outcomes and decision on further scaling.

Data Processing Logic and Protection

Data Flow Step by Step

Step Stage Description
1 Ingestion Investigation-related materials are loaded into the system: documents, audio files, pictures, and structured data.
2 Indexing Data is made searchable and interconnected for easy analysis.
3 Analysis The system helps summarise information and surface connections, based solely on existing data.
4 Human Verification All results are reviewed by the investigator and confirmed or corrected.
5 Output Overviews, summaries, and visual views are generated to support the investigation.
6 Smart Forgetting The system ensures data is kept only as long as the law permits. When the retention period expires, data is automatically archived or deleted — with a clear justification.

Supported Data Types

  • Documents: PDFs, Word and Excel files, emails
  • Audio and Video: recordings and their transcribed versions
  • Structured Data: bank transactions, registry data
  • Cooperation Channels: Europol, Interpol, and other authorised channels

How Data Protection Is Ensured

  • Data remains entirely within the organisation
  • All data is encrypted
  • The system maintains precise records of where information comes from and how it was used
  • Data usage is always controllable and auditable
Legal Compliance

Data processing complies with law enforcement requirements:

  • Data is used only for a specific purpose
  • Only necessary information is collected
  • Access is strictly limited

Retention Periods and Deletion

Every data type has a legally mandated lifecycle. The system enforces these automatically.

Data Type Retention Period Legal Basis Exception Possible?
Criminal Case File
evidence, protocols
10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes against­humanity) KrMS § 209 lg 2; VVm § 6 lg 1–4 Yes, if archival­value
ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5
Surveillance File
wiretaps, surveillance
Up to 50 years KrMS § 12612 lg 3 No
CPC § 126¹² § 3 (strict limit)
Court File
hearing protocols, decisions
10 yrs (after entry into force) KrMS § 1601 lg 6–7 Yes, if archival­value
ArhS § 2 §§ 3–4; § 8 § 2
DNA / Fingerprint Data Until criminal record deletion KrMS § 206 lg 4 Yes, upon­acquittal
CPC § 206 § 4 (immediate deletion)
AI Reasoning Logs
provenance, decisions
Min 6 months; technical docs 10 yrs after market withdrawal EU AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1 Yes, extendable on dispute
LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1
Personal Data in Case File Until purpose is fulfilled IKS § 17; LED Art. 4(1)(e), Art. 5 No, except as prescribed by law
IKS § 17 lg 2; § 25 lg 3–4
Pattern Database Entries
anonymous modus operandi
Indefinite (no personal data) LED Art. 4(1)(c) Not applicable
Anonymous data, GDPR does not apply
Audit Log
who, when, what
At least 3 years IKS § 36; LED Art. 25(2) Yes, extendable
IKS § 36 lg 5; E-ITS/ISKE security class
Protocols and Recordings
interrogations, observations
With case file (10–15 yrs) KrMS § 146, § 148, § 150 lg 4 Yes, with case file
VVm § 6 (follows main document)
Public Information
under PIA
5–50 years (document type) AvTS § 42 Yes, if archival­value
ArhS § 2 § 3; § 8 §§ 1–2
Automatic Deletion

The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.

This prevents situations where investigators keep data "just in case" — fear that it might be needed later.

Requesting Exceptions

Extended retention requires a justified request. The system logs all exceptions and their justifications.

Example: archival-value documents, pending challenges, international cooperation.

LED vs GDPR

Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.

GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).

Retention Period Details

Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes against humanity, etc.

Surveillance File: CPC § 12612 § 3: upon conviction until criminal record deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.

AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.

Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.

Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.


Data Lifecycle and System Context

How data moves within an air-gapped environment and how traceability and deletion are ensured.

Data Flow in an Air-Gapped System

External Sources Devices (phone, computer) Registries, banks Court decisions, orders Procedures, laws USB air-gap bridge AI INVESTIGATOR (analysis network) Case File Storage Uploaded documents Device databases Metadata + timestamps AI Processing Analysis and summaries Hypothesis generation Provenance tracking Pattern Database Anonymous modus operandi No personal data Export Reports (PDF, JSON) Provenance graph Secure Archive Exported reports Automatic deletion times Per LED/AI Act requirements
Data flow: external sources → USB → case file storage → AI processing → export → secure archive with automatic deletion times.

What Data Is Collected and Where Does It Go?

Data Type Input Method Storage Deletion
Device Data
phones, computers
Via specialised software → USB → upload Case File Storages, isolated On case file closure or retention­period expiry
Registry­Queries
banks, databases
Manual confirmation before each query Case File Storages + provenance log Automatic expiration control
Court­Decisions, Orders USB or manual upload Case File Storages Per retention­requirements
General Documents
laws, procedures
By the Administrator Separate shared knowledgebase Version­control based
AI Outputs
summaries, hypotheses
Generated by the system Within the case file + provenance graph With the case file

Frequently Asked Questions About Data

Question Answer
Do I need to re-upload data when returning to a case file? No. All data loaded into a case file is retained until the file is closed or deleted. The investigator can immediately continue where they left off.
How does AI learn from case file documents? AI does not train on user data. A pre-trained model is used. Only anonymous modus operandi is extracted from documents into the pattern database.
Does a "super­database" emerge where everything is cross-queryable? No. Each case file is fully isolated. Cross-searching between case files is technically impossible. The pattern database contains only anonymous information.
Who deletes data and when? The system tracks retention deadlines automatically. The investigator sets exceptions. On export, deletion times are set automatically (LED, AI Act requirements).
How can data be recovered after deletion? The provenance graph shows where data originated. If needed, it can be re-queried from sources (if the source still permits).
How to export case file documents? Reports + documents are exported to a secure archive. Automatic deletion times are set according to document type and legal requirements.
FAQ
  • Where is data stored? → Only on agency servers, within case files
  • Where does it go? → Deleted per retention deadlines
  • How does AI learn? → Uses pre-trained model; does not train on user data
  • Do case files cross? → No, each case file is fully isolated
Air-Gapped Workflow

Current practice: data is copied from "black computers" to USB → transferred to analysis network → stored in regional storage (up to 100TB). Access via RDP, printers allowed, disks not.

Provenance Tracking

Provenance = data history (where it came from and how it got here).

  • Source (e.g., which system, document, interview)
  • When it was created or retrieved
  • What transformations were applied (e.g., summarised, translated)
  • Links to original records

The provenance graph shows the full path of the data — so you can trace it back and re-query the original source if needed.

Metadata vs Provenance

Metadata = data about the data (descriptive labels).

  • File type, author, date
  • Keywords, tags
  • Case ID, document category

Metadata helps organise and find data, but it does not explain its origin or processing history.

Interoperability

Built-in support for ISO 20022 and JIT workspaces for cross-border cooperation (FUNC-160, FUNC-162).


AI Usage Restrictions

Situation Reason Action
Classified Materials
state secrets, NATO
Air-gapped environment is necessary but not sufficient. Separate accreditation to the relevant classification level is required (ISKE, NATO security class). AI model assessment for classified processing. Separate accredited environment; or manual processing
Source Protection Cases
informants
Source identities must be protected Source identities must not reach any log or pattern database.
Court Prohibition
specific order
A court may prohibit automated processing in a specific case. AI functions are blocked at the case file level
Data Subject Objection
GDPR Art. 21 (where applicable)
In administrative cases, the data subject may object to profiling. Manual review; AI output does not affect the decision

Live Demo: AI Summarisation Module

The AI ingests a stream of structured Knowledge Objects (KOs), representing different evidence sources, and synthesises them into a coherent, readable summary.

Input View (Knowledge Objects) Token Usage: 842/8192
// INGESTED EVIDENCE STREAM (JSON-LD)
KO-001 (Incident Report):
"At 02:35, silent alarm at Central Data Facility. Rear door unsecured. Guard J. Kask found unconscious".
KO-002 (Surveillance Log):
"Camera 04 captures Blue Van (771-BKV) departing at 02:15. Driver unidentifiable. Logs 02:00–02:30 deleted".
KO-003 (Suspect Interview):
"Suspect A. Tamm (Owner 771-BKV) claims alibi: 'Night Market 22:00–03:00'. Status: UNVERIFIED".
KO-004 (Forensics Preliminary):
"USB Drive (Ev-001) recovered near rack 14. Contains encrypted partition. Traces of 'DarkSide' ransomware signature."
KO-005 (Toxicology Report):
"Guard J. Kask blood sample positive for Zolpidem (sedative). Dosage consistent with forced ingestion approx 01:30."
KO-006 (ANPR Hit):
"Vehicle 771-BKV detected by camera #442 (Pärnu Hwy) heading South at 02:45. Speed: 110km/h."
KO-007 (Witness Statement):
"Market vendor M. Tamm (no relation) states stall #42 was closed at 22:00. Contradicts Suspect A's alibi."
KO-008 (Financial Intel):
"Wallet 0x7a...f2 linked to A. Tamm received 2.5 BTC at 03:15. Sender wallet flagged as 'DarkSide Affiliate'."
KO-009 (Background Check):
"A. Tamm: Prior conviction (2021) for cyber-facilitated fraud. Known associate of 'The Broker' (Suspect B)."
KO-010 (Network Log):
"Firewall alert 02:10: Outbound SSH connection to IP 185.x.x.x (Moldova). 4.2GB data exfiltrated."
KO-011 (Physical Evidence):
"Latent print lifted from Server Rack 14 handle. Match: A. Tamm (99.9% confidence)."
KO-012 (Suspect B Sighting):
"Patrol unit reports individual matching description of 'The Broker' entering vehicle 771-BKV at 01:45."
KO-013 (Dark Web Chatter):
"Post on 'BreachForums' at 03:30: 'Fresh gov database for sale. Estonia origin.' User: 'SilentNight'."
KO-014 (Vehicle Search):
"Vehicle 771-BKV intercepted at 04:00. Laptop (Ev-002) found under passenger seat. Driver A. Tamm detained."
KO-015 (Laptop Forensics):
"Ev-002 contains SSH keys matching Central Data Facility server. Browser history shows access to 'BreachForums'."
KO-016 (Arrest Report):
"Suspect B ('The Broker') apprehended at safehouse. Confirms A. Tamm was hired for physical access."
Task: Synthesize KOs into Executive Briefing.
Figure 8: Simulation of multi-source evidence summarisation.
System Specs
● Online

Model: Mistral 7B (Ollama)

Input: JSON-LD Stream

Context: 8k Tokens

Mode: Air-gapped (Offline)

Why Summarize KOs?

When AI summarises structurizes KOs (not from arbitrary free text), the risk of hallucination decreases because the summary must rely on the "facts" already recorded in the knowledge graph.


Assistant Desktop

Interactive prototype of the investigation workspace.

This prototype demonstrates the multi-widget dashboard. Use the sidebar to switch between views (Home, Dashboard, Documents, Analysis, Chat) and select a role from the dropdown to see role-specific configurations.

Widget-Based UI

The dashboard adapts to the user's role. The investigator sees graphs and timelines; the analyst sees scenarios and hypotheses.

Roles

Select a role from the dropdown to see how the interface adapts: Investigator, Analyst, Supervisor, Prosecutor, Auditor, AML Specialist, Audio Secretary.


Interface

Interactive prototype of the voice-first assistant designed for high-stress environments.

Key Features

1. Voice-First Interaction

Prioritizing voice lowers the cognitive barrier for articulating tacit knowledge, encouraging storytelling and in-the-moment narration.

2. Conversational Externalization

The AI acts as a Socratic partner, using "Intuition Pumps" to elicit hidden assumptions during the conversation.

3. Groundedness (GraphRAG)

Every answer is anchored in the Knowledge Graph. The UI explicitly links generated insights back to their source KOs.

4. Context-Aware Adaptation

Adapts interface and suggestions based on the user's current role and location.

5. EASCI Integration

Seamlessly bridges the gap between capturing raw Experience and Articulating it into structured knowledge.

Try it: Click the microphone icon in the prototype to simulate a voice capture session.

Cognitive Load Theory

Sweller (1988). Working memory is limited. In high-stress situations, the cognitive load of typing (visual-motor) competes with the task. Voice (auditory-verbal) uses a separate channel, reducing interference.

Socratic Method

The AI doesn't just record; it asks "Why?". "Why did you check the trunk first?" This forces the expert to make their implicit reasoning explicit.

Voice Efficiency

Speaking is 3x faster than typing (150 wpm vs 40 wpm). In high-stress environments, typing is a friction point that prevents knowledge capture.

Presenter Notes
  • Interactive Demo: This isn't a screenshot. It's the actual code running in an iframe.
  • Why Voice? It's not just convenience. It's about cognitive load. Police officers can't type while assessing a threat.
  • Socratic Partner: Emphasize that the AI is active, not passive. It probes for details.
  • EASCI Integration: This is the "E" (Experience) and "A" (Articulation) part of the loop happening in real-time.

The Intelligence Cycle

Making AI Reasoning Auditable and Explainable

Micro Loop (Real-Time Inference)
GraphRAG Retrieve
GoT Reasoning Reason
PROV-O Graph Synthesize
Current Phase
SYSTEM READY
Live Telemetry
Waiting for simulation start...
PROV: 0 nodes
CONF: --%
TOKENS: 0
> System ready. Waiting for new Knowledge Objects...
Figure 5: The Intelligence Cycle in Action.
Left panel shows the three-stage micro loop: Retrieve (via GraphRAG), Reason (via Graph-of-Thought), and Synthesize (via PROV-O). Right panel displays the macro loop (1→2→3→4→5) with live telemetry: PROV nodes created, confidence scores, and token consumption. The simulation demonstrates how tacit knowledge flows from raw experience through AI-mediated articulation to structured, reusable knowledge objects. Press "Run" to start the simulation. Coloured nodes represent knowledge objects at different lifecycle stages.

Theoretical Backbone

This framework underpins the system's operational logic. Each stage maps to specific modules and AI agent tasks in the architecture.

Why This Cycle?

Traditional models say tacit knowledge can be made explicit but often skip the "how". This framework splits the process into two distinct actions: first explaining the reasoning (Articulation), then organising it (Structuring). This provides a practical way to build AI that assists with each specific cognitive task.

Cycle Stages

  • 1. Experience: (Dewey) grounds knowledge in action: learning by doing, not isolation.
  • 2. Articulation: (Dennett, Polanyi) treats explanation as an elicited process requiring guided prompts.
  • 3. Structuring: (Peirce) formalises tacit insights through abductive reasoning.
  • 4. Consolidation: (Weick) requires community sensemaking before knowledge enters the canon.
  • 5. Innovation: (Whitehead) ensures knowledge remains dynamic, allowing pruning and refinement.
Dynamic, Not Linear

Traditional models present a linear spiral. Investigation is actually a complex adaptive system with multiple feedback loops: new knowledge can trigger new Experience; Consolidation can require return to Articulation.

Why Five Stages?

Each stage transforms knowledge through specific cognitive mechanisms. Unlike four-stage models, this separates Articulation from Structuring because they require different AI interventions: dialogue-based elicitation vs. abductive formalisation.


Interactive Demo: Mobile + Desktop

Collaborative hypothesis workflow — from field officer to analyst in real time.

Mobile App Voice-based capture in the field
Desktop App Investigation Platform (A4 format)

Field Capture

Officers use voice to capture observations during patrol, interviews, or inspections. The mobile interface prioritises speed and minimal cognitive load.

Deep Analysis

Analysts access the full knowledge graph, entity relationships, and reasoning chains through the desktop platform's multi-widget layout.

Mobile App

Field officers use voice input to quickly capture evidence and observations at the scene.

Desktop App

The analyst sees the knowledge graph updating in real time and can immediately work with new evidence.

Synchronisation

Information captured on mobile appears instantly on the desktop graph — collaboration without delay.


Data Separation and Legal Compliance

How the system ensures compliance with the Law Enforcement Directive (LED), GDPR, the EU AI Act, and Estonian law

Principle: Data vs Patterns

The system does not retain personal data across cases. Only anonymous modus operandi is preserved — a crime scheme or method that is completely separated from specific individuals and individual cases.

Data Flow and Separation

CASE FILE A Documents Personal Data Transaction Data Isolated Environment Pattern Extraction Personal Data is removed Abstract behavioural pattern is preserved PATTERN DATABASE Money Laundering Scheme #127 Fraud Pattern #89 Network Topology Anonymous, depersonalised
Data flow from case file to pattern database: personal data is removed, only abstract modus operandi is preserved.

Does and Dont’s

What the System Does NOT Do What IS Done
Cross-case data sharing between case files Each case file is fully isolated from other case files
Storing personal data in the pattern database Only anonymous and abstract modus operandi is stored
Training algorithms on personal or case data The algorithm uses a pre-trained model
Transmitting data to cloud or external servers Data remains fully under the organisation's control

Legal Compliance

Requirement Implemented Measures and Solutions
Data Protection and Privacy
Estonian Constitution § 26 — Privacy of private life Case-based data isolation; role-based access control (RBAC); all queries are logged for audit trail purposes.
Estonian Constitution § 43 — Secrecy of communications Encrypted communication data storage; access only through court order ID binding; automatic data expiry checks.
LED Art. 4 — Lawfulness and fairness Data is processed solely for law enforcement purposes in a transparent and traceable manner.
LED Art. 4(1)(b) — Purpose limitation Each case file is strictly isolated; data is not used for other purposes.
LED Art. 4(1)(c) — Data minimisation Only abstract schemes are stored in the pattern database; personal data is not retained.
Estonian PDPA § 14, § 15 — Processing principles and lawfulness Built-in purpose limitations; law enforcement processing lawfulness per § 15; automatic data quality validation; encrypted data transmission (TLS 1.3).
Estonian PDPA § 20 — Special categories of personal data Processing of special category data (race, ethnicity, political views, religion, health, biometrics) only in cases prescribed by law; additional security measures.
LED Art. 10 — Processing of special categories of data Special category personal data processing only when strictly necessary; appropriate safeguards; automatic classification and restrictions.
Estonian PDPA § 43 — Security measures Data encryption at rest (AES-256); RBAC-based role management; full audit logging; automatic backup.
GDPR Art. 5, 6 (where applicable) When processing administrative or non-criminal investigation data: data subject consent or legitimate interest; mandatory data subject notification.
Human Oversight and Automated Decision-Making
Estonian Constitution § 22 — Presumption of innocence AI does not make guilt-determining decisions; the system supports the investigator and does not replace court rulings.
LED Art. 11 — Automated decision-making AI does not make automated decisions; all results are confirmed by the investigator; profile-based decisions without human intervention are prohibited.
EU AI ACT Art. 14(1)–(4) — Human oversight The investigator can override, correct, or ignore AI output at any time; the system can be stopped with a "stop" button; the UI displays limitations and capabilities.
EU AI ACT Art. 6 — High-risk systems Completed Fundamental Rights Impact Assessment (FRIA); technical documentation per Article 11; risk assessment log; conformity declaration.
Transparency and Right to Explanation
Estonian Constitution § 15 — Right to effective proceedings AI reasoning chain export in PDF or JSON format; provenance graph enables step-by-step challenge of the decision process.
Estonian Constitution § 24 — Right to fair trial AI outputs are transparent, explainable, and accessible to the defence.
Estonian Constitution § 44(3) — Right to access data Built-in Data Subject Access Request (DSAR) export; personal data query report is generated automatically.
EU AI ACT Art. 86 — Right to explanation Each AI output includes an explanation (XAI); provenance graph displays inputs, inferences, and sources.
EU AI ACT Art. 13 — Transparency The system user guide and UI explain AI capabilities, limitations, and intended use cases.
EU AI ACT Art. 50 — User notification Users are clearly informed they are interacting with AI; outputs are marked as AI-generated.
Data Quality and Evidence
LED Art. 7 — Data quality AI-based conclusions are clearly distinguished from facts; the investigator confirms data accuracy before further use.
Estonian CPC § 63 — Concept of evidence AI is an investigative aid; documents prepared by the investigator based on AI analysis may qualify as evidence within the meaning of § 63 (other document).
Estonian CPC § 64 — Conditions for evidence collection Full traceability is ensured: each AI output references the original source and maintains data integrity.
Estonian CPC § 146 — Procedural action protocol Documents prepared with AI assistance comply with protocol format requirements: date, author, criminal case number, course, and results of the action.
Estonian CPC § 150 — Audio and video recording A report based on AI analysis may rely on material recorded under CPC § 150; recordings are unaltered and added to the case file.
Security and Logging
Estonian PDPA § 36 — Logging Logged: collection, modification, reading, transmission, combination, and deletion. Logs are retained for at least 3 years.
LED Art. 25(1) — Logging obligation Automatically logged: collection, modification, query, disclosure (incl. transmission), combination, deletion. Logs ensure traceability and help detect unauthorised access.
E-ITS (ISKE) — Security measures Compliance with the Estonian information security standard: security class is determined by data confidentiality, integrity, and availability; ISKE catalogue measures are applied.
Retention and Deletion
LED Art. 5 — Retention periods Personal data is retained only as long as necessary for the purpose. The system automatically tracks deadlines and notifies of expiry.
EU AI ACT Art. 12 — Log retention AI system logs are retained for at least 6 months. Provenance graphs and decision logs are exported to archive.
Estonian PIA § 12 — Document retention Documents subject to archiving obligation are exported separately. Automatic deletion times according to document type.

Retention Periods and Deletion

Data Type Retention Period Legal Basis Exception Possible?
Criminal Case File
evidence, protocols
10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes against­humanity) KrMS § 209 lg 2; VVm § 6 lg 1–4 Yes, if archival­value
ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5
Surveillance File
wiretaps, surveillance
Up to 50 years KrMS § 12612 lg 3 No
CPC § 126¹² § 3 (strict limit)
Court File
hearing protocols, decisions
10 yrs (after entry into force) KrMS § 1601 lg 6–7 Yes, if archival­value
ArhS § 2 §§ 3–4; § 8 § 2
DNA/Fingerprint­Data Kuni karistus­andmete kustutamiseni KrMS § 206 lg 4 Yes, upon­acquittal
CPC § 206 § 4 (immediate deletion)
AI arutlus­logid
provenance, otsused
Min 6 months; technical docs 10 yrs after market withdrawal EL AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1 Jah, vaidlus­tamisel pikendatav
LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1
Personal Data toimikus Until purpose is fulfilled IKS § 17; LED Art. 4(1)(e), Art. 5 No, except as­prescribed by law
IKS § 17 lg 2; § 25 lg 3–4
Pattern Database Entries
anonymous modus operandi
Indefinite (no personal data) LED Art. 4(1)(c) Not applicable
Anonymous data, GDPR does not apply
Audit Log
who, when, what
At least 3 years IKS § 36; LED Art. 25(2) Yes, extendable
IKS § 36 lg 5; E-ITS/ISKE turvaklassist
Protocols and Recordings
interrogations, observations
With case file (10–15 yrs) KrMS § 146, § 148, § 150 lg 4 Yes, with case file
VVm § 6 (follows main document)
Public Information
under PIA
5–50 years (document type) AvTS § 42 Yes, if archival­value
ArhS § 2 § 3; § 8 §§ 1–2
Glossary of Abbreviations
RBAC
Role-Based Access Control. Users see only what their role permits.
TLS 1.3
Transport Layer Security — encrypted protocol for secure data transmission.
AES-256
Advanced Encryption Standard — symmetric encryption with 256-bit key.
FRIA
Fundamental Rights Impact Assessment — required by the EU AI Act for high-risk systems.
XAI
Explainable AI. Every AI output includes an understandable justification.
DSAR
Data Subject Access Request — a query by the data subject about their personal data (GDPR Art. 15).
Provenance Graph
A visual graph showing data origin and processing history.
Modus Operandi

Latin for "mode of operating". In criminalistics, it refers to a criminal's characteristic behavioural pattern.

Example: "Money is moved through 3 countries using 5 shell companies" — without names, dates, or amounts.

LED vs GDPR

Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.

GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).

Requesting Exceptions

Extended retention requires a justified request. The system logs all exceptions and their justifications.

Example: archival-value documents, pending challenges, international cooperation.

Automatic Deletion

The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.

This prevents situations where investigators keep data "just in case" — fear that it might be needed later.

Retention Periodw Explanationsd

Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes against­humanity, etc.

Surveillance File: CPC § 12612 § 3: upon conviction until criminal­record deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.

AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.

Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.

Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.


Abductive Reasoning

"Abduction is the process of forming explanatory hypotheses. It is the only logical operation which introduces any new idea." (C.S. Peirce, 1931)

Linear (Deductive)

Premise: Rule
Premise: Case
Conclusion (Certain)

Fragile: If one premise fails, the chain breaks.

Branching (Abductive)

Observation: "Van at Scene"
H1: Delivery
H2: Collusion
H3: Coercion
?
Best Explanation Selected

Resilient: Survives uncertainty by weighing options.

Linear Reasoning (Deductive)

IF suspect has motive

AND suspect has means

AND suspect at scene

THEN suspect is guilty

Problem: Premises must be certain.

Abductive Reasoning (Detective)

OBSERVATIONS:

  • Warehouse breach (02:00-04:00)
  • Logs deleted
  • Van 771-BKV on camera

HYPOTHESES:

H1: A. Tamm & J. Kask colluding (Confidence: 0.73)

H2: J. Kask victim (Confidence: 0.21)

H3: Legitimate delivery (Confidence: 0.06)

NEXT STEPS:

Verify Tamm's access logs to test H1.

The Logic of Investigation

Type Formula Certainty
Deduction Rule + Case = Result Certain
Induction Cases = Rule Probabilistic
Abduction Result + Rule = Case Creative / Plausible

Expert reasoning in policing is primarily abductive: guessing the cause from the effects.

The AI Performance Gap

Current AI models struggle with abduction. On the ART Benchmark, AI scores ~69% vs 91% for humans (Bhagavatula et al., 2020).

Implication: Full automation of the "conclusion" phase is not possible. The AI generates hypotheses, but the human must select the best one.

TacitFlow Implementation (Phase 3)
  • Current State: Manual abductive reasoning by analysts.
  • Planned: AI generates 3-5 competing hypotheses (e.g., "Collusion" vs "Coercion") and suggests discriminating evidence.
  • Goal: Support the analyst's "Satisficing" process by surfacing relevant precedents.
Inference to the Best Explanation

The modern name for abduction. Given surprising observations, generate hypotheses that would explain them, then select the best based on explanatory virtues (simplicity, scope).

Satisficing

Herbert Simon (1956). Accepting a solution that is "good enough" rather than optimal. Experts satisfice by recognizing situations quickly. TacitFlow supports this by surfacing relevant precedents.

ACL Findings 2025

The RECV benchmark decomposes 1,500 claims into deductive vs abductive atoms. Deductive items stay solvable, but every model craters on abductive rows (Dougrez-Lewis et al., 2025).

Presenter Notes
  • Sherlock Holmes Logic: Holmes didn't deduce; he abducted. He guessed the best explanation.
  • The Gap: AI is great at math (deduction) and patterns (induction), but terrible at creative guessing (abduction).
  • Human Role: This is why the human is essential. The AI proposes; the human decides.

Adversarial Debate Model

"One agent proposes a hypothesis; another's only job is to find flaws. This stress-tests theories and avoids confirmation bias."

A1
Proposer: "Hypothesis: It's Tamm. He was at the scene."
A2
Critic: "Flaw found: Alibi is unverified. Camera 2 is empty."
Outcome: New Task → Verify Alibi

Single-Agent Reasoning

1. Agent generates hypothesis

2. Agent evaluates own hypothesis

3. Agent confirms own reasoning

4. THEN hypothesis accepted

Problem: Confirmation bias.

Adversarial Debate (TacitFlow)

Proposer (Agent 1):

"It's Tamm. He was at the scene."

Critic (Agent 2):

Flaw 1: Alibi is unverified

Flaw 2: Camera 2 shows nothing

Flaw 3: Motive unclear

Outcome:

New Task: Verify Alibi → Test hypothesis.

TacitFlow Implementation (Roadmap: Phase 3)

  • Current State (Pilot): Single-agent reasoning with human oversight; manual critique by analysts.
  • Planned Enhancement: Dual-agent debate where Critic is incentivized solely to find logical flaws.
  • Basis: "Debate" models (Irving et al., 2018) and "Reflexion" (Shinn et al., 2023) self-correction.
Cognitive Rationale

Goal: Prevent "groupthink" and confirmation bias (Nickerson, 1998).

Mechanism: Dual-agent debate forces explicit consideration of disconfirming evidence (Irving et al., 2018).

IJCAI Logical Reasoning Survey

IJCAI 2025's survey splits reasoning gaps into logical QA vs logical consistency. Solver-based pipelines need NL→symbolic translators plus SAT/FOL tooling yet drop facts. Adversarial debate sidesteps this by keeping reasoning in natural language. (Cheng et al., 2025)

Presenter Notes
  • This is "Future Work" but critical for credibility.
  • Admit that LLMs are prone to "syccophancy" (agreeing with the user).
  • The "Critic" agent is the solution to this.

TacitFlow Alternatives

Why a custom airgapped solution? Evaluating TacitFlow against market alternatives.

Dimension Palantir Gotham IBM i2 Analyst ChatGPT/Claude TacitFlow
Cost & Licensing
Pricing Model Per-user annual
€50k+ / analyst
Perpetual + maint
€25k + 20%
Metered API
Variable
Tiered models
€36k (infra)
Lock-in Risk High
Proprietary fmt
Medium
Some export
High
Cloud-only
Low
Open Standards
Sovereignty & Security
Data Sovereignty Configurable
On-prem costly
Full control
On-prem
None
US Cloud
Air-gapped
100% Sovereign
LED Compliance Possible
Audit needed
Possible
Manual
Non-compliant
Data export
Native
By Design
Knowledge Management
Tacit Knowledge No
Explicit only
No
Visual only
No
Stateless
Core Feature
EASCI Framework
Market Analysis
  • Lock-in: Proprietary formats hold data hostage. TacitFlow uses open W3C standards.
  • vs Palantir: Palantir is for explicit data fusion, not tacit reasoning. Cost-prohibitive for small agencies.
  • vs ChatGPT: Public LLMs violate Data Sovereignty and LED compliance.
Unique Value Proposition
  • Sovereign: Air-gapped & On-premise.
  • Specialized: Built for Tacit Knowledge.
  • Predictable: Fixed hardware cost.
  • Compliant: Automated "Smart Forgetting".

Architecture

Internal components and AI agent orchestration.

C4Context title System Context Diagram for AI Investigator Person(investigator, "Investigator", "Analyst, Officer, Manager") Enterprise_Boundary(b0, "Agency Boundary") { System(ai_system, "AI Investigator", "RAG analysis, hypothesis generation, knowledge capture") } System_Ext(data_sources, "Data Sources", "Banks, Registries, Logs") System_Ext(integrations, "EU Integrations", "Europol, Interpol, JITs") Rel(investigator, ai_system, "Queries & Reviews") Rel(ai_system, data_sources, "Ingests data") Rel(ai_system, integrations, "Shares patterns") UpdateLayoutConfig($c4ShapeInRow="4", $c4BoundaryInRow="1") UpdateRelStyle(investigator, ai_system, $textColor="blue", $lineColor="blue")
Division of Labour

Different agents handle distinct tasks (retrieval and synthesis vs reasoning). The Evaluation Engine continuously monitors output quality and verifiability.


AI in Daily Work

AI is used in real investigations.

Practical Benefits Are Measurable

Evaluating whether the solution actually makes work faster and clearer.

The Solution Fits Real Work Conditions

Legal, security, and operational requirements are accounted for from the start.

AI Is Used Safely

Clear rules define how and for what purposes AI may be used.