AI Nearest Future
AI for Investigators
Improving knowledge transfer and operational efficiency in the investigation agency
Iren Irbe
PhD researcher in applied informatics, Tallinn University
Head of Unit, Investigations Dpt of Tax and Customs Board of Estonia
AI Near Future
The Problem
Investigations are becoming cognitively overwhelming.
Data Overload
Modern investigations generate huge amounts of data (often over 1 TB per case). Current tools do not adequately support information processing.
Fragmented Information
Information is spread across multiple systems and different formats.
Time Pressure
Investigators must connect facts under tight deadlines.
Undocumented Experience
Important work-related experience and decision rationale often remain only in people's heads.
Key Insight
Investigators spend too much time managing information and too little time analysing it.
Information Overload (Miller, 1956)
Miller's research (1956) established that human working memory can hold approximately 7±2 items simultaneously. When data volume exceeds this cognitive capacity, processing degrades. Modern intelligence environments routinely exceed these limits, creating the "cognitive bottleneck" that severs the human connection required for tacit knowledge transfer.
Current Investigation Challenges
Slow Investigations
Manual information processing creates bottlenecks and delays case resolution times.
Missed Critical Connections
Higher risk that critical links between evidence, people, and events remain undiscovered.
Inconsistent Reasoning
Different approaches to similar cases result in quality variations.
Legal Vulnerability
Greater risk in situations where decisions cannot be adequately explained or justified later.
Shortcomings of Existing Solutions
Public AI Tools
Not suitable for processing sensitive investigation data. Security and confidentiality requirements prohibit the use of cloud-based AI solutions.
Cloud-Based Systems
Conflict with data protection requirements. Sensitive data must not leave the controlled environment.
Existing Investigation Tools
Focus on document storage, not analysis. They store information but do not support reasoning or establishing connections.
Human Analysis Alone
Does not scale with growing data volumes. Human cognitive limits are real.
Project as Applied Research Output
Based on research on tacit knowledge in high-stress work.
Focus
- Decision-making under overload
- Explaining reasoning
- Knowledge loss when people leave
Solution
- Practical tool based on research
- Legally and organisationally compliant
Approach
- Start from real work practices, not technology
Tacit Knowledge
Experiential knowledge that is hard to put into words, but manifests in skilled performance — intuition, pattern recognition, an "eye" for situations.
Proposed Solution — AI Investigator (Concept)
A secure AI assistant for investigators
A local, air-gapped AI solution that helps investigators organise information gathered through voice and text, think through possible explanations, and preserve important work-related knowledge — all while ensuring data never leaves the investigation agency and all activities comply with the law.
The system analyses previous cases, including audio recordings and notes, identifies recurring patterns, and helps consider possible future developments and preventive actions based on them.
Core Principles
Secure and Internal
- Runs entirely on MTA infrastructure
- No data is transmitted to cloud services or outside the agency
- Data use and sharing comply with applicable legal restrictions (GDPR, LED, EU AI Act)
Verifiable and Transparent
- All system responses are based on specific and verifiable source documents
- The system clearly shows what conclusions and connections are based on
- High-risk features (e.g., emotion recognition) are deliberately excluded
Organises and Analyses Past Events
- Consolidates past cases, events, and evidence
- Helps identify recurring behavioural and activity patterns
- Supports thinking through risks and possible future developments
Supports Thinking, Not Prediction
- Offers possible explanations and scenarios, not definitive predictions
- Helps consider different behavioural patterns and their impact
- Does not draw conclusions or make decisions on behalf of humans
Human Is Always the Decision-Maker
- All system outputs require human confirmation
- The investigator decides which patterns and scenarios to consider
- Responsibility always remains with the investigator
Institutional Memory and Compliance
- Helps preserve recurring patterns and lessons learned so they do not disappear with departing staff
- Supports compliance with EU AI Act requirements, including preparing Fundamental Rights Impact Assessments (FRIA)
AI Support for Investigations
Investigation Impact
- ▸Faster understanding of complex cases
- ▸Reduced cognitive load
- ▸Clear, traceable reasoning (audit / court-ready)
Investigation Functions
What it does for the user
- ▸Summarises large case files
- ▸Finds connections (people, events, transactions)
- ▸Supports hypothesis testing
- ▸Preserves investigation logic
AI Foundations
How it is technically enabled
- ▸Processes large volumes of text and audio (incl. transcription)
- ▸Combines data from multiple sources (documents, emails, interviews)
- ▸Detects patterns and links across data
- ▸Handles foreign languages
- ▸Runs fully inside agency (secure, no external data sharing)
Practical Roles
AI Assistant
For Field Officers
- Voice Control: hands-free queries in Estonian and English
- Quick Summaries: e.g., "Summarise the last reports on suspect X"
- Procedural Support: quick access to guidelines and protocols during work
AI Analyst
For Investigation Units
- Hypothesis Generation: helps think through different scenarios
- Impact Assessment: e.g., what may happen when certain measures are applied
- Critical View: draws attention to possible bias or missed connections
Audio Secretary
For Investigators and Support Staff
- Auto-Transcription: local transcription of meetings and conversations
- Action Item Extraction: highlights action points from voice notes
- Interview Support: helps spot inconsistencies in statements
AML Support
For Financial Crime Investigations
- Complex Financial Flow Analysis: including new digital assets
- Cross-Border Pattern Detection: helps see connections between different countries
- UBO Unravelling: simple visualisation of ownership chains
Project Activities and Legal Assurance
| Activity | Description |
|---|---|
| Project Preparation and Management | Project coordination and scope definition. |
| Solution and Infrastructure | On-premise server cluster, required hardware, and secure infrastructure for the solution. |
| Legal and Ethical Validation | Compliance with GDPR, LED, and EU AI Act (incl. FRIA). |
| Real-World Deployment | Using the solution in controlled conditions with investigators. |
| Training and User Support | User training and support during the development phase. |
| Measurement and Evaluation | Analysis of project outcomes and decision on further scaling. |
Data Processing Logic and Protection
Data Flow Step by Step
| Step | Stage | Description |
|---|---|---|
| 1 | Ingestion | Investigation-related materials are loaded into the system: documents, audio files, pictures, and structured data. |
| 2 | Indexing | Data is made searchable and interconnected for easy analysis. |
| 3 | Analysis | The system helps summarise information and surface connections, based solely on existing data. |
| 4 | Human Verification | All results are reviewed by the investigator and confirmed or corrected. |
| 5 | Output | Overviews, summaries, and visual views are generated to support the investigation. |
| 6 | Smart Forgetting | The system ensures data is kept only as long as the law permits. When the retention period expires, data is automatically archived or deleted — with a clear justification. |
Supported Data Types
- Documents: PDFs, Word and Excel files, emails
- Audio and Video: recordings and their transcribed versions
- Structured Data: bank transactions, registry data
- Cooperation Channels: Europol, Interpol, and other authorised channels
How Data Protection Is Ensured
- Data remains entirely within the organisation
- All data is encrypted
- The system maintains precise records of where information comes from and how it was used
- Data usage is always controllable and auditable
Legal Compliance
Data processing complies with law enforcement requirements:
- Data is used only for a specific purpose
- Only necessary information is collected
- Access is strictly limited
Retention Periods and Deletion
Every data type has a legally mandated lifecycle. The system enforces these automatically.
| Data Type | Retention Period | Legal Basis | Exception Possible? |
|---|---|---|---|
| Criminal Case File evidence, protocols |
10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes againsthumanity) | KrMS § 209 lg 2; VVm § 6 lg 1–4 | Yes, if archivalvalue ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5 |
| Surveillance File wiretaps, surveillance |
Up to 50 years | KrMS § 12612 lg 3 | No CPC § 126¹² § 3 (strict limit) |
| Court File hearing protocols, decisions |
10 yrs (after entry into force) | KrMS § 1601 lg 6–7 | Yes, if archivalvalue ArhS § 2 §§ 3–4; § 8 § 2 |
| DNA / Fingerprint Data | Until criminal record deletion | KrMS § 206 lg 4 | Yes, uponacquittal CPC § 206 § 4 (immediate deletion) |
| AI Reasoning Logs provenance, decisions |
Min 6 months; technical docs 10 yrs after market withdrawal | EU AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1 | Yes, extendable on dispute LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1 |
| Personal Data in Case File | Until purpose is fulfilled | IKS § 17; LED Art. 4(1)(e), Art. 5 | No, except as prescribed by law IKS § 17 lg 2; § 25 lg 3–4 |
| Pattern Database Entries anonymous modus operandi |
Indefinite (no personal data) | LED Art. 4(1)(c) | Not applicable Anonymous data, GDPR does not apply |
| Audit Log who, when, what |
At least 3 years | IKS § 36; LED Art. 25(2) | Yes, extendable IKS § 36 lg 5; E-ITS/ISKE security class |
| Protocols and Recordings interrogations, observations |
With case file (10–15 yrs) | KrMS § 146, § 148, § 150 lg 4 | Yes, with case file VVm § 6 (follows main document) |
| Public Information under PIA |
5–50 years (document type) | AvTS § 42 | Yes, if archivalvalue ArhS § 2 § 3; § 8 §§ 1–2 |
Automatic Deletion
The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.
This prevents situations where investigators keep data "just in case" — fear that it might be needed later.
Requesting Exceptions
Extended retention requires a justified request. The system logs all exceptions and their justifications.
Example: archival-value documents, pending challenges, international cooperation.
LED vs GDPR
Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.
GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).
Retention Period Details
Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes against humanity, etc.
Surveillance File: CPC § 12612 § 3: upon conviction until criminal record deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.
AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.
Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.
Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.
Data Lifecycle and System Context
How data moves within an air-gapped environment and how traceability and deletion are ensured.
Data Flow in an Air-Gapped System
What Data Is Collected and Where Does It Go?
| Data Type | Input Method | Storage | Deletion |
|---|---|---|---|
| Device Data phones, computers |
Via specialised software → USB → upload | Case File Storages, isolated | On case file closure or retentionperiod expiry |
| RegistryQueries banks, databases |
Manual confirmation before each query | Case File Storages + provenance log | Automatic expiration control |
| CourtDecisions, Orders | USB or manual upload | Case File Storages | Per retentionrequirements |
| General Documents laws, procedures |
By the Administrator | Separate shared knowledgebase | Versioncontrol based |
| AI Outputs summaries, hypotheses |
Generated by the system | Within the case file + provenance graph | With the case file |
Frequently Asked Questions About Data
| Question | Answer |
|---|---|
| Do I need to re-upload data when returning to a case file? | No. All data loaded into a case file is retained until the file is closed or deleted. The investigator can immediately continue where they left off. |
| How does AI learn from case file documents? | AI does not train on user data. A pre-trained model is used. Only anonymous modus operandi is extracted from documents into the pattern database. |
| Does a "superdatabase" emerge where everything is cross-queryable? | No. Each case file is fully isolated. Cross-searching between case files is technically impossible. The pattern database contains only anonymous information. |
| Who deletes data and when? | The system tracks retention deadlines automatically. The investigator sets exceptions. On export, deletion times are set automatically (LED, AI Act requirements). |
| How can data be recovered after deletion? | The provenance graph shows where data originated. If needed, it can be re-queried from sources (if the source still permits). |
| How to export case file documents? | Reports + documents are exported to a secure archive. Automatic deletion times are set according to document type and legal requirements. |
FAQ
- Where is data stored? → Only on agency servers, within case files
- Where does it go? → Deleted per retention deadlines
- How does AI learn? → Uses pre-trained model; does not train on user data
- Do case files cross? → No, each case file is fully isolated
Air-Gapped Workflow
Current practice: data is copied from "black computers" to USB → transferred to analysis network → stored in regional storage (up to 100TB). Access via RDP, printers allowed, disks not.
Provenance Tracking
Provenance = data history (where it came from and how it got here).
- Source (e.g., which system, document, interview)
- When it was created or retrieved
- What transformations were applied (e.g., summarised, translated)
- Links to original records
The provenance graph shows the full path of the data — so you can trace it back and re-query the original source if needed.
Metadata vs Provenance
Metadata = data about the data (descriptive labels).
- File type, author, date
- Keywords, tags
- Case ID, document category
Metadata helps organise and find data, but it does not explain its origin or processing history.
Interoperability
Built-in support for ISO 20022 and JIT workspaces for cross-border cooperation (FUNC-160, FUNC-162).
AI Usage Restrictions
| Situation | Reason | Action |
|---|---|---|
| Classified Materials state secrets, NATO |
Air-gapped environment is necessary but not sufficient. Separate accreditation to the relevant classification level is required (ISKE, NATO security class). AI model assessment for classified processing. | Separate accredited environment; or manual processing |
| Source Protection Cases informants |
Source identities must be protected | Source identities must not reach any log or pattern database. |
| Court Prohibition specific order |
A court may prohibit automated processing in a specific case. | AI functions are blocked at the case file level |
| Data Subject Objection GDPR Art. 21 (where applicable) |
In administrative cases, the data subject may object to profiling. | Manual review; AI output does not affect the decision |
Live Demo: AI Summarisation Module
The AI ingests a stream of structured Knowledge Objects (KOs), representing different evidence sources, and synthesises them into a coherent, readable summary.
"At 02:35, silent alarm at Central Data Facility. Rear door unsecured. Guard J. Kask found unconscious".
"Camera 04 captures Blue Van (771-BKV) departing at 02:15. Driver unidentifiable. Logs 02:00–02:30 deleted".
"Suspect A. Tamm (Owner 771-BKV) claims alibi: 'Night Market 22:00–03:00'. Status: UNVERIFIED".
"USB Drive (Ev-001) recovered near rack 14. Contains encrypted partition. Traces of 'DarkSide' ransomware signature."
"Guard J. Kask blood sample positive for Zolpidem (sedative). Dosage consistent with forced ingestion approx 01:30."
"Vehicle 771-BKV detected by camera #442 (Pärnu Hwy) heading South at 02:45. Speed: 110km/h."
"Market vendor M. Tamm (no relation) states stall #42 was closed at 22:00. Contradicts Suspect A's alibi."
"Wallet 0x7a...f2 linked to A. Tamm received 2.5 BTC at 03:15. Sender wallet flagged as 'DarkSide Affiliate'."
"A. Tamm: Prior conviction (2021) for cyber-facilitated fraud. Known associate of 'The Broker' (Suspect B)."
"Firewall alert 02:10: Outbound SSH connection to IP 185.x.x.x (Moldova). 4.2GB data exfiltrated."
"Latent print lifted from Server Rack 14 handle. Match: A. Tamm (99.9% confidence)."
"Patrol unit reports individual matching description of 'The Broker' entering vehicle 771-BKV at 01:45."
"Post on 'BreachForums' at 03:30: 'Fresh gov database for sale. Estonia origin.' User: 'SilentNight'."
"Vehicle 771-BKV intercepted at 04:00. Laptop (Ev-002) found under passenger seat. Driver A. Tamm detained."
"Ev-002 contains SSH keys matching Central Data Facility server. Browser history shows access to 'BreachForums'."
"Suspect B ('The Broker') apprehended at safehouse. Confirms A. Tamm was hired for physical access."
Output (Mistral 7B)
Provenance Trace (PROV-O)
VerifiedSystem Specs
● OnlineModel: Mistral 7B (Ollama)
Input: JSON-LD Stream
Context: 8k Tokens
Mode: Air-gapped (Offline)
Why Summarize KOs?
When AI summarises structurizes KOs (not from arbitrary free text), the risk of hallucination decreases because the summary must rely on the "facts" already recorded in the knowledge graph.
Assistant Desktop
Interactive prototype of the investigation workspace.
This prototype demonstrates the multi-widget dashboard. Use the sidebar to switch between views (Home, Dashboard, Documents, Analysis, Chat) and select a role from the dropdown to see role-specific configurations.
Widget-Based UI
The dashboard adapts to the user's role. The investigator sees graphs and timelines; the analyst sees scenarios and hypotheses.
Roles
Select a role from the dropdown to see how the interface adapts: Investigator, Analyst, Supervisor, Prosecutor, Auditor, AML Specialist, Audio Secretary.
Interface
Interactive prototype of the voice-first assistant designed for high-stress environments.
Key Features
1. Voice-First Interaction
Prioritizing voice lowers the cognitive barrier for articulating tacit knowledge, encouraging storytelling and in-the-moment narration.
2. Conversational Externalization
The AI acts as a Socratic partner, using "Intuition Pumps" to elicit hidden assumptions during the conversation.
3. Groundedness (GraphRAG)
Every answer is anchored in the Knowledge Graph. The UI explicitly links generated insights back to their source KOs.
4. Context-Aware Adaptation
Adapts interface and suggestions based on the user's current role and location.
5. EASCI Integration
Seamlessly bridges the gap between capturing raw Experience and Articulating it into structured knowledge.
Try it: Click the microphone icon in the prototype to simulate a voice capture session.
Cognitive Load Theory
Sweller (1988). Working memory is limited. In high-stress situations, the cognitive load of typing (visual-motor) competes with the task. Voice (auditory-verbal) uses a separate channel, reducing interference.
Socratic Method
The AI doesn't just record; it asks "Why?". "Why did you check the trunk first?" This forces the expert to make their implicit reasoning explicit.
Voice Efficiency
Speaking is 3x faster than typing (150 wpm vs 40 wpm). In high-stress environments, typing is a friction point that prevents knowledge capture.
Presenter Notes
- Interactive Demo: This isn't a screenshot. It's the actual code running in an iframe.
- Why Voice? It's not just convenience. It's about cognitive load. Police officers can't type while assessing a threat.
- Socratic Partner: Emphasize that the AI is active, not passive. It probes for details.
- EASCI Integration: This is the "E" (Experience) and "A" (Articulation) part of the loop happening in real-time.
The Intelligence Cycle
Making AI Reasoning Auditable and Explainable
Left panel shows the three-stage micro loop: Retrieve (via GraphRAG), Reason (via Graph-of-Thought), and Synthesize (via PROV-O). Right panel displays the macro loop (1→2→3→4→5) with live telemetry: PROV nodes created, confidence scores, and token consumption. The simulation demonstrates how tacit knowledge flows from raw experience through AI-mediated articulation to structured, reusable knowledge objects. Press "Run" to start the simulation. Coloured nodes represent knowledge objects at different lifecycle stages.
Theoretical Backbone
This framework underpins the system's operational logic. Each stage maps to specific modules and AI agent tasks in the architecture.
Why This Cycle?
Traditional models say tacit knowledge can be made explicit but often skip the "how". This framework splits the process into two distinct actions: first explaining the reasoning (Articulation), then organising it (Structuring). This provides a practical way to build AI that assists with each specific cognitive task.
Cycle Stages
- 1. Experience: (Dewey) grounds knowledge in action: learning by doing, not isolation.
- 2. Articulation: (Dennett, Polanyi) treats explanation as an elicited process requiring guided prompts.
- 3. Structuring: (Peirce) formalises tacit insights through abductive reasoning.
- 4. Consolidation: (Weick) requires community sensemaking before knowledge enters the canon.
- 5. Innovation: (Whitehead) ensures knowledge remains dynamic, allowing pruning and refinement.
Dynamic, Not Linear
Traditional models present a linear spiral. Investigation is actually a complex adaptive system with multiple feedback loops: new knowledge can trigger new Experience; Consolidation can require return to Articulation.
Why Five Stages?
Each stage transforms knowledge through specific cognitive mechanisms. Unlike four-stage models, this separates Articulation from Structuring because they require different AI interventions: dialogue-based elicitation vs. abductive formalisation.
Interactive Demo: Mobile + Desktop
Collaborative hypothesis workflow — from field officer to analyst in real time.
Field Capture
Officers use voice to capture observations during patrol, interviews, or inspections. The mobile interface prioritises speed and minimal cognitive load.
Deep Analysis
Analysts access the full knowledge graph, entity relationships, and reasoning chains through the desktop platform's multi-widget layout.
Mobile App
Field officers use voice input to quickly capture evidence and observations at the scene.
Desktop App
The analyst sees the knowledge graph updating in real time and can immediately work with new evidence.
Synchronisation
Information captured on mobile appears instantly on the desktop graph — collaboration without delay.
Data Separation and Legal Compliance
How the system ensures compliance with the Law Enforcement Directive (LED), GDPR, the EU AI Act, and Estonian law
Principle: Data vs Patterns
The system does not retain personal data across cases. Only anonymous modus operandi is preserved — a crime scheme or method that is completely separated from specific individuals and individual cases.
Data Flow and Separation
Does and Dont’s
| What the System Does NOT Do | What IS Done |
|---|---|
| Cross-case data sharing between case files | Each case file is fully isolated from other case files |
| Storing personal data in the pattern database | Only anonymous and abstract modus operandi is stored |
| Training algorithms on personal or case data | The algorithm uses a pre-trained model |
| Transmitting data to cloud or external servers | Data remains fully under the organisation's control |
Legal Compliance
| Requirement | Implemented Measures and Solutions |
|---|---|
| Data Protection and Privacy | |
| Estonian Constitution § 26 — Privacy of private life | Case-based data isolation; role-based access control (RBAC); all queries are logged for audit trail purposes. |
| Estonian Constitution § 43 — Secrecy of communications | Encrypted communication data storage; access only through court order ID binding; automatic data expiry checks. |
| LED Art. 4 — Lawfulness and fairness | Data is processed solely for law enforcement purposes in a transparent and traceable manner. |
| LED Art. 4(1)(b) — Purpose limitation | Each case file is strictly isolated; data is not used for other purposes. |
| LED Art. 4(1)(c) — Data minimisation | Only abstract schemes are stored in the pattern database; personal data is not retained. |
| Estonian PDPA § 14, § 15 — Processing principles and lawfulness | Built-in purpose limitations; law enforcement processing lawfulness per § 15; automatic data quality validation; encrypted data transmission (TLS 1.3). |
| Estonian PDPA § 20 — Special categories of personal data | Processing of special category data (race, ethnicity, political views, religion, health, biometrics) only in cases prescribed by law; additional security measures. |
| LED Art. 10 — Processing of special categories of data | Special category personal data processing only when strictly necessary; appropriate safeguards; automatic classification and restrictions. |
| Estonian PDPA § 43 — Security measures | Data encryption at rest (AES-256); RBAC-based role management; full audit logging; automatic backup. |
| GDPR Art. 5, 6 (where applicable) | When processing administrative or non-criminal investigation data: data subject consent or legitimate interest; mandatory data subject notification. |
| Human Oversight and Automated Decision-Making | |
| Estonian Constitution § 22 — Presumption of innocence | AI does not make guilt-determining decisions; the system supports the investigator and does not replace court rulings. |
| LED Art. 11 — Automated decision-making | AI does not make automated decisions; all results are confirmed by the investigator; profile-based decisions without human intervention are prohibited. |
| EU AI ACT Art. 14(1)–(4) — Human oversight | The investigator can override, correct, or ignore AI output at any time; the system can be stopped with a "stop" button; the UI displays limitations and capabilities. |
| EU AI ACT Art. 6 — High-risk systems | Completed Fundamental Rights Impact Assessment (FRIA); technical documentation per Article 11; risk assessment log; conformity declaration. |
| Transparency and Right to Explanation | |
| Estonian Constitution § 15 — Right to effective proceedings | AI reasoning chain export in PDF or JSON format; provenance graph enables step-by-step challenge of the decision process. |
| Estonian Constitution § 24 — Right to fair trial | AI outputs are transparent, explainable, and accessible to the defence. |
| Estonian Constitution § 44(3) — Right to access data | Built-in Data Subject Access Request (DSAR) export; personal data query report is generated automatically. |
| EU AI ACT Art. 86 — Right to explanation | Each AI output includes an explanation (XAI); provenance graph displays inputs, inferences, and sources. |
| EU AI ACT Art. 13 — Transparency | The system user guide and UI explain AI capabilities, limitations, and intended use cases. |
| EU AI ACT Art. 50 — User notification | Users are clearly informed they are interacting with AI; outputs are marked as AI-generated. |
| Data Quality and Evidence | |
| LED Art. 7 — Data quality | AI-based conclusions are clearly distinguished from facts; the investigator confirms data accuracy before further use. |
| Estonian CPC § 63 — Concept of evidence | AI is an investigative aid; documents prepared by the investigator based on AI analysis may qualify as evidence within the meaning of § 63 (other document). |
| Estonian CPC § 64 — Conditions for evidence collection | Full traceability is ensured: each AI output references the original source and maintains data integrity. |
| Estonian CPC § 146 — Procedural action protocol | Documents prepared with AI assistance comply with protocol format requirements: date, author, criminal case number, course, and results of the action. |
| Estonian CPC § 150 — Audio and video recording | A report based on AI analysis may rely on material recorded under CPC § 150; recordings are unaltered and added to the case file. |
| Security and Logging | |
| Estonian PDPA § 36 — Logging | Logged: collection, modification, reading, transmission, combination, and deletion. Logs are retained for at least 3 years. |
| LED Art. 25(1) — Logging obligation | Automatically logged: collection, modification, query, disclosure (incl. transmission), combination, deletion. Logs ensure traceability and help detect unauthorised access. |
| E-ITS (ISKE) — Security measures | Compliance with the Estonian information security standard: security class is determined by data confidentiality, integrity, and availability; ISKE catalogue measures are applied. |
| Retention and Deletion | |
| LED Art. 5 — Retention periods | Personal data is retained only as long as necessary for the purpose. The system automatically tracks deadlines and notifies of expiry. |
| EU AI ACT Art. 12 — Log retention | AI system logs are retained for at least 6 months. Provenance graphs and decision logs are exported to archive. |
| Estonian PIA § 12 — Document retention | Documents subject to archiving obligation are exported separately. Automatic deletion times according to document type. |
Retention Periods and Deletion
| Data Type | Retention Period | Legal Basis | Exception Possible? |
|---|---|---|---|
| Criminal Case File evidence, protocols |
10 yrs (general); 15 yrs (1st degree crimes); permanent (crimes againsthumanity) | KrMS § 209 lg 2; VVm § 6 lg 1–4 | Yes, if archivalvalue ArhS § 2 §§ 3–4, § 8; VVm § 6 § 5 |
| Surveillance File wiretaps, surveillance |
Up to 50 years | KrMS § 12612 lg 3 | No CPC § 126¹² § 3 (strict limit) |
| Court File hearing protocols, decisions |
10 yrs (after entry into force) | KrMS § 1601 lg 6–7 | Yes, if archivalvalue ArhS § 2 §§ 3–4; § 8 § 2 |
| DNA/FingerprintData | Kuni karistusandmete kustutamiseni | KrMS § 206 lg 4 | Yes, uponacquittal CPC § 206 § 4 (immediate deletion) |
| AI arutluslogid provenance, otsused |
Min 6 months; technical docs 10 yrs after market withdrawal | EL AI Act Art. 12 lg 1, Art. 18 lg 1, Art. 19 lg 1 | Jah, vaidlustamisel pikendatav LED art. 16 lg 3(a)(b); GDPR art. 18 lg 1 |
| Personal Data toimikus | Until purpose is fulfilled | IKS § 17; LED Art. 4(1)(e), Art. 5 | No, except asprescribed by law IKS § 17 lg 2; § 25 lg 3–4 |
| Pattern Database Entries anonymous modus operandi |
Indefinite (no personal data) | LED Art. 4(1)(c) | Not applicable Anonymous data, GDPR does not apply |
| Audit Log who, when, what |
At least 3 years | IKS § 36; LED Art. 25(2) | Yes, extendable IKS § 36 lg 5; E-ITS/ISKE turvaklassist |
| Protocols and Recordings interrogations, observations |
With case file (10–15 yrs) | KrMS § 146, § 148, § 150 lg 4 | Yes, with case file VVm § 6 (follows main document) |
| Public Information under PIA |
5–50 years (document type) | AvTS § 42 | Yes, if archivalvalue ArhS § 2 § 3; § 8 §§ 1–2 |
Glossary of Abbreviations
- RBAC
- Role-Based Access Control. Users see only what their role permits.
- TLS 1.3
- Transport Layer Security — encrypted protocol for secure data transmission.
- AES-256
- Advanced Encryption Standard — symmetric encryption with 256-bit key.
- FRIA
- Fundamental Rights Impact Assessment — required by the EU AI Act for high-risk systems.
- XAI
- Explainable AI. Every AI output includes an understandable justification.
- DSAR
- Data Subject Access Request — a query by the data subject about their personal data (GDPR Art. 15).
- Provenance Graph
- A visual graph showing data origin and processing history.
Modus Operandi
Latin for "mode of operating". In criminalistics, it refers to a criminal's characteristic behavioural pattern.
Example: "Money is moved through 3 countries using 5 shell companies" — without names, dates, or amounts.
LED vs GDPR
Law Enforcement Directive (LED) — Directive 2016/680 — is the primary legal act for processing personal data by law enforcement agencies.
GDPR applies additionally when data is processed outside criminal investigations (e.g., administrative cases).
Requesting Exceptions
Extended retention requires a justified request. The system logs all exceptions and their justifications.
Example: archival-value documents, pending challenges, international cooperation.
Automatic Deletion
The system notifies the investigator 30 days before the deadline. If no exception is requested, data is deleted automatically.
This prevents situations where investigators keep data "just in case" — fear that it might be needed later.
Retention Periodw Explanationsd
Criminal Case File: VVm RT I, 02.09.2011, 5 § 6: periods depend on severity. Permanent retention for genocide, crimes againsthumanity, etc.
Surveillance File: CPC § 12612 § 3: upon conviction until criminalrecord deletion, max 50 yrs; upon acquittal up to 5 yrs; upon case closure also up to 5 yrs.
AI Logs: AI Act Art. 12: automatically generated logs min 6 months. Art. 18: technical documentation 10 years after market placement.
Personal Data: PDPA § 17: data retained until purpose is achieved. LED Art. 5: regular review and deletion as needed.
Audit Log: PDPA § 36: log data automatically recorded. LED Art. 25: logs must enable identification of the sender, recipient, and timing of data.
Abductive Reasoning
"Abduction is the process of forming explanatory hypotheses. It is the only logical operation which introduces any new idea." (C.S. Peirce, 1931)
Linear (Deductive)
Fragile: If one premise fails, the chain breaks.
Branching (Abductive)
Resilient: Survives uncertainty by weighing options.
Linear Reasoning (Deductive)
IF suspect has motive
AND suspect has means
AND suspect at scene
THEN suspect is guilty
Problem: Premises must be certain.
Abductive Reasoning (Detective)
OBSERVATIONS:
- Warehouse breach (02:00-04:00)
- Logs deleted
- Van 771-BKV on camera
HYPOTHESES:
H1: A. Tamm & J. Kask colluding (Confidence: 0.73)
H2: J. Kask victim (Confidence: 0.21)
H3: Legitimate delivery (Confidence: 0.06)
NEXT STEPS:
Verify Tamm's access logs to test H1.
The Logic of Investigation
| Type | Formula | Certainty |
|---|---|---|
| Deduction | Rule + Case = Result | Certain |
| Induction | Cases = Rule | Probabilistic |
| Abduction | Result + Rule = Case | Creative / Plausible |
Expert reasoning in policing is primarily abductive: guessing the cause from the effects.
The AI Performance Gap
Current AI models struggle with abduction. On the ART Benchmark, AI scores ~69% vs 91% for humans (Bhagavatula et al., 2020).
Implication: Full automation of the "conclusion" phase is not possible. The AI generates hypotheses, but the human must select the best one.
TacitFlow Implementation (Phase 3)
- Current State: Manual abductive reasoning by analysts.
- Planned: AI generates 3-5 competing hypotheses (e.g., "Collusion" vs "Coercion") and suggests discriminating evidence.
- Goal: Support the analyst's "Satisficing" process by surfacing relevant precedents.
Inference to the Best Explanation
The modern name for abduction. Given surprising observations, generate hypotheses that would explain them, then select the best based on explanatory virtues (simplicity, scope).
Satisficing
Herbert Simon (1956). Accepting a solution that is "good enough" rather than optimal. Experts satisfice by recognizing situations quickly. TacitFlow supports this by surfacing relevant precedents.
ACL Findings 2025
The RECV benchmark decomposes 1,500 claims into deductive vs abductive atoms. Deductive items stay solvable, but every model craters on abductive rows (Dougrez-Lewis et al., 2025).
Presenter Notes
- Sherlock Holmes Logic: Holmes didn't deduce; he abducted. He guessed the best explanation.
- The Gap: AI is great at math (deduction) and patterns (induction), but terrible at creative guessing (abduction).
- Human Role: This is why the human is essential. The AI proposes; the human decides.
Adversarial Debate Model
"One agent proposes a hypothesis; another's only job is to find flaws. This stress-tests theories and avoids confirmation bias."
Single-Agent Reasoning
1. Agent generates hypothesis
2. Agent evaluates own hypothesis
3. Agent confirms own reasoning
4. THEN hypothesis accepted
Problem: Confirmation bias.
Adversarial Debate (TacitFlow)
Proposer (Agent 1):
"It's Tamm. He was at the scene."
Critic (Agent 2):
Flaw 1: Alibi is unverified
Flaw 2: Camera 2 shows nothing
Flaw 3: Motive unclear
Outcome:
New Task: Verify Alibi → Test hypothesis.
TacitFlow Implementation (Roadmap: Phase 3)
- Current State (Pilot): Single-agent reasoning with human oversight; manual critique by analysts.
- Planned Enhancement: Dual-agent debate where Critic is incentivized solely to find logical flaws.
- Basis: "Debate" models (Irving et al., 2018) and "Reflexion" (Shinn et al., 2023) self-correction.
Cognitive Rationale
Goal: Prevent "groupthink" and confirmation bias (Nickerson, 1998).
Mechanism: Dual-agent debate forces explicit consideration of disconfirming evidence (Irving et al., 2018).
IJCAI Logical Reasoning Survey
IJCAI 2025's survey splits reasoning gaps into logical QA vs logical consistency. Solver-based pipelines need NL→symbolic translators plus SAT/FOL tooling yet drop facts. Adversarial debate sidesteps this by keeping reasoning in natural language. (Cheng et al., 2025)
Presenter Notes
- This is "Future Work" but critical for credibility.
- Admit that LLMs are prone to "syccophancy" (agreeing with the user).
- The "Critic" agent is the solution to this.
TacitFlow Alternatives
Why a custom airgapped solution? Evaluating TacitFlow against market alternatives.
| Dimension | Palantir Gotham | IBM i2 Analyst | ChatGPT/Claude | TacitFlow |
|---|---|---|---|---|
| Cost & Licensing | ||||
| Pricing Model |
Per-user annual €50k+ / analyst |
Perpetual + maint €25k + 20% |
Metered API Variable |
Tiered models €36k (infra) |
| Lock-in Risk | High Proprietary fmt |
Medium Some export |
High Cloud-only |
Low Open Standards |
| Sovereignty & Security | ||||
| Data Sovereignty |
Configurable On-prem costly |
Full control On-prem |
None US Cloud |
Air-gapped 100% Sovereign |
| LED Compliance | Possible Audit needed |
Possible Manual |
Non-compliant Data export |
Native By Design |
| Knowledge Management | ||||
| Tacit Knowledge | No Explicit only |
No Visual only |
No Stateless |
Core Feature EASCI Framework |
Market Analysis
- Lock-in: Proprietary formats hold data hostage. TacitFlow uses open W3C standards.
- vs Palantir: Palantir is for explicit data fusion, not tacit reasoning. Cost-prohibitive for small agencies.
- vs ChatGPT: Public LLMs violate Data Sovereignty and LED compliance.
Unique Value Proposition
- Sovereign: Air-gapped & On-premise.
- Specialized: Built for Tacit Knowledge.
- Predictable: Fixed hardware cost.
- Compliant: Automated "Smart Forgetting".
Architecture
Internal components and AI agent orchestration.
Division of Labour
Different agents handle distinct tasks (retrieval and synthesis vs reasoning). The Evaluation Engine continuously monitors output quality and verifiability.
AI in Daily Work
AI is used in real investigations.
Practical Benefits Are Measurable
Evaluating whether the solution actually makes work faster and clearer.
The Solution Fits Real Work Conditions
Legal, security, and operational requirements are accounted for from the start.
AI Is Used Safely
Clear rules define how and for what purposes AI may be used.