This case study builds upon earlier research on AI knowledge agents in maritime operations (published here), transitioning from conceptual foundations to the practical design, implementation, and ongoing internal evaluation of a regulatory compliance system in the shipping industry.
The international shipping industry operates under a dense, continuously evolving regulatory framework. This case study documents our engineering effort to design, implement, and evaluate an artificial intelligence-driven decision-support system tailored for maritime compliance. Built on Google Cloud Platform (GCP) and implemented primarily in Rust, the system utilizes a hybrid Retrieval-Augmented Generation (RAG) architecture. By combining vector semantic search with a temporal knowledge graph, the application transforms fragmented regulatory data into an auditable, expert-in-the-loop compliance tool. The architecture successfully navigates the rigorous maritime and cybersecurity standards, demonstrating measurable improvements in regulatory precision and audit readiness during internal benchmarking.
Regulatory Context and Problem Framing
You probably know the story. The international shipping industry operates under a web of regulatory requirements originating from the International Maritime Organization (IMO), flag states, classification societies, and company-specific management systems. Safety of Life at Sea (SOLAS), the MARPOL pollution prevention conventions, the ISM Code, and various ISO-based integrated management systems together define a compliance landscape that is both voluminous and highly consequential.
In a baseline state, regulatory information is typically fragmented across thousands of PDF documents, shared drives, intranet systems, and hard-copy manuals. With the IMO issuing 50–80 amendments and circulars annually, maintaining a current, vessel-specific view of regulatory applicability is a massive administrative challenge. Officers and shore staff often spend significant time locating relevant passages across multiple systems. Internal audits require significant manual effort to assemble evidence of compliance, and the training burden is heavy in both cost and crew time. Retrospective analyses routinely link Port State Control (PSC) or inspections like SIRE, findings and detentions to outdated or misinterpreted regulatory knowledge rather than the physical absence of equipment. These difficulties are amplified by multilingual crews needing authoritative guidance, while source regulations remain predominantly in English.
The objective of this project was not to build a generic conversational chatbot, but to design a system capable of providing reliable, explainable, and auditable answers to precise regulatory questions—such as, “Which version of SOLAS II‑2/10.5 applies to this particular bulk carrier?” or “Which on-board procedure implements this MARPOL requirement?”
The project’s problem statement was framed around three core questions:
- Can regulatory queries from vessels and shore offices be answered quickly and consistently, with full traceability to underlying conventions, class rules, and internal procedures?
- Can the organization automatically track temporal regulatory applicability as amendments are published?
- Can this system be introduced without creating new vulnerabilities regarding cybersecurity, legal liability, or over-reliance on opaque AI behavior?
System Architecture and Technology Selection
The resulting system combines several AI and data management techniques, driven strictly by the constraints of the maritime domain rather than algorithmic novelty alone. The architecture is divided into four main subsystems: ingestion and normalization; hybrid retrieval; guarded generation; and an edge–cloud deployment split.
Core Application Logic
A foundational decision was to implement the core application logic and API gateways in Rust, a systems programming language that provides memory safety guarantees without runtime garbage collection. In a safety-critical domain like maritime compliance, system reliability is critical. Rust’s ownership model enforces memory safety at compile time, eliminating entire classes of vulnerabilities (e.g., null pointer dereferences, buffer overflows, and data races) common in C/C++. It's also fun to learn new things.
Beyond security, Rust provides critical operational advantages for maritime edge deployment:
- Resource Efficiency: Deploying models on resource-constrained vessel hardware requires minimal memory footprint and CPU overhead. Rust’s zero-cost abstractions make it highly suited for this environment.
- Predictable Concurrency: The system utilizes
Tokio, a mature asynchronous Rust runtime, to efficiently handle highly concurrent I/O-bound operations (database querying, graph traversal, and external API calls) with predictable latency. - Hexagonal Architecture: The Rust codebase was structured using ports and adapters (hexagonal architecture). This isolated the core regulatory reasoning logic from external dependencies, allowing engineers to swap vector databases or LLM endpoints without rewriting the business logic.
Cloud Infrastructure: Google Cloud Platform (GCP)
The shore-based infrastructure is heavily integrated with Google Cloud Platform (GCP), chosen for its compliance certifications and enterprise-grade security controls.
- Orchestration: Google Kubernetes Engine (GKE) manages containerized deployments, enabling consistent environments across development, testing, and production, while facilitating rolling updates with minimal downtime.
- AI & Machine Learning: Large Language Models (LLMs) and text-embedding models are served dynamically through Google Cloud Vertex AI. Routing logic selects lighter model configurations for straightforward factual retrieval to minimize latency and cost, while reserving higher-capability models for complex, multi-hop regulatory interpretations.
- Security & Identity: Google Cloud Identity and Access Management (IAM) and Google Secret Manager ensure that cryptographic keys, database credentials, and TLS certificates are securely rotated and structurally isolated from the application code.
Hybrid Retrieval: PostgreSQL and Temporal Knowledge Graphs
Rather than utilizing a niche vector database, we selected PostgreSQL extended with pgvector. This prioritized operational simplicity, mature backup/replication, and the ACID compliance vital for maintaining an immutable audit trail of regulatory queries. The pgvector extension implements Hierarchical Navigable Small World (HNSW) indexing, allowing low-latency nearest-neighbor searches over dense embeddings.
However, pure semantic vector search is insufficient for regulatory compliance, as it cannot reliably track structural dependencies or temporal amendments. To resolve this, a Graph Database was integrated. The graph schema explicitly models entities (regulations, vessels, certificates, procedures) and their relationships ("requires," "supersedes," "implements"). Crucially, the graph introduces temporal modeling: linking old and new regulatory versions alongside metadata like effective dates and transition periods. This hybrid approach allows the system to traverse compliance chains based on a vessel's build date and flag, rather than simply retrieving the most recently published text.
Data Pipelines, Guardrails, and Edge Deployment
Domain-Aware Chunking and Multilingual Support
Standard NLP chunking methodologies destroy the hierarchical context of maritime law. The ingestion pipeline was custom-built to parse PDFs and Word documents while explicitly respecting regulatory boundaries (chapters, annexes, regulation numbers). Pattern matching and Named Entity Recognition (NER) tag each text chunk with metadata, ensuring the LLM always maintains a connection to the specific applicability conditions of the text. Early experiments with naive segmentation significantly degraded retrieval relevance, proving that aligning the chunking strategy with maritime domain expertise is foundational.
To support multilingual crews without compromising legal accuracy, the system leverages the Google Cloud Translation API. User queries are translated to English for semantic search against the authoritative English corpus. The generated response is then translated back to the user’s native language, with programmatic rules ensuring that specific maritime acronyms (e.g., LSA, ISM, PSC) remain safely untranslated to preserve technical meaning.
Automated Content Lifecycle: Versioning and Deprecation
In maritime compliance, regulations do not simply vanish when amended. Transition periods, retroactive requirements, and "grandfather" clauses based on a vessel’s keel-laying date mean that multiple versions of the same SOLAS or MARPOL regulation may be simultaneously valid across a single fleet. A critical failure point in standard RAG architectures is "document drift"—when newly issued regulations are ingested into a vector database alongside outdated ones, causing the LLM to retrieve conflicting requirements.
To solve this, the architecture implements a programmatic document lifecycle and deprecation engine. When a new regulatory amendment is published, the ingestion pipeline extracts the effective_from and applies_to metadata. Within the temporal knowledge graph, a new entity node is created and linked to the older version via a SUPERSEDES edge. The system utilizes a strongly typed state machine (mapping states such as Pending, Active, Transition, and Retired) to manage these regulatory transitions automatically.
Crucially, outdated content is never hard-deleted. Historical auditability is a strict maritime requirement (e.g., a DPA may need to know exactly what a SOLAS regulation stated during an incident two years prior). Instead, the system employs soft deletion. Superseded vector chunks in PostgreSQL are flagged with an is_current: false boolean and a valid_until timestamp.
During a standard query, the API gateway automatically injects strict active-date filters into the pgvector search, mathematically excluding retired content from the LLM's context window. If a user explicitly queries historical context, the temporal graph actively routes the query to the archived vector embeddings. Furthermore, before a document is fully suppressed for a specific query, the system evaluates the queried vessel's metadata (e.g., flag state, build date). If a retired clause applies, the graph database forces the retrieval of the older, technically "outdated" text. This ensures the LLM synthesizes compliance answers that are strictly legally accurate for that specific vessel's unique operational profile, rather than blindly applying the newest fleet-wide amendment.
Deterministic Guardrails and Human-in-the-Loop
The LLM operates strictly in a retrieval-augmented generation (RAG) mode; it is mathematically constrained from answering based on its underlying training weights.
A deterministic validation layer acts as a guardrail. When the LLM generates a response, it must provide exact citations. The Rust backend automatically cross-checks these citations against the retrieved context window. If a generated regulation code does not exist in the retrieved text, the response is flagged. Furthermore, the system computes a composite confidence score based on vector similarity, graph centrality, and LLM token probabilities. Queries falling below strict thresholds are suppressed and routed to human domain experts (e.g., DPAs or marine superintendents) for manual review.
Evaluation, Testing, and Organizational Impact
The knowledge agent is currently undergoing structured internal evaluation utilizing company-specific SMS modules alongside public maritime regulations. Rather than focusing purely on algorithmic novelty, the evaluation framework emphasizes operational precision, search efficiency, and audit readiness.
- Expert Benchmark Dataset: We spent significant time annotating a dataset of regulatory questions covering common interpretive challenges, edge cases, and vessel-specific applicability rules. This dataset serves as both training material for refining retrieval and a benchmark for measuring system accuracy.
- Precision Improvements via Hybrid Retrieval: Evaluations demonstrated that combining domain-aware chunking with temporal knowledge graph traversal significantly outperformed pure vector search. It allows the system to reliably execute multi-hop reasoning (e.g., matching a newly amended SOLAS chapter to the specific company SMS procedure that dictates crew compliance).
- Hallucination Mitigation: The integration of deterministic citation checkers and strict confidence thresholds reduced hallucination rates to negligible levels, fulfilling the mandate for auditability and traceability.
- Catalyzing SMS Data Hygiene: A profoundly valuable, albeit unexpected, result of the implementation was the exposure of legacy data quality issues. The process of mapping internal company procedures into the knowledge graph immediately surfaced outdated, contradictory, or missing SMS instructions. The AI deployment effectively forced a rigorous organizational cleanup of the Safety Management System, proving that preparing for AI yields operational benefits even before the model is queried.
Conclusion and Future Roadmap
Building AI systems for regulated, safety-critical physical industries requires a fundamental departure from standard chatbot architectures. In the maritime domain, the value of AI lies not in generating novel text, but in structuring, retrieving, and operationalizing existing legal frameworks with absolute traceability.
This case study demonstrates that reliability is a function of the entire architecture—from the memory safety of Rust and the scalability of Google Cloud, to domain-aware chunking and temporal graph routing. Furthermore, this project confirms that AI in maritime compliance must be positioned as a decision-support tool, keeping liability and ultimate authority firmly in the hands of human operators.
Future iterations of this architecture will focus on predictive compliance—transitioning the agent from a reactive querying engine to a proactive monitoring tool that actively cross-references vessel itineraries with expiring certificates and upcoming regional regulations. Ultimately, by respecting the physical and legal constraints of the operating environment, AI can serve as a highly secure, cognitive multiplier for maritime safety and operational readiness.