Skip to content

BOOST.DocumentManager

The authoritative domain component for documents in the BOOST platform. DocumentManager owns document lifecycle, business decisions, and persistence. It is the only component allowed to write authoritative document state to the tenant database.

Architecture

                    ┌─────────────────────────────────┐
                    │       DocumentManagerApp        │
                    │         (Entry Point)           │
                    └────────────────┬────────────────┘

              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
              ▼                      ▼                      ▼
   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
   │TenantDatabaseMgr │   │DocumentManagerBoss│   │    AppConfig     │
   │ (DataSources)    │   │ (AMQP Connection) │   │                  │
   └────────┬─────────┘   └────────┬─────────┘   └──────────────────┘
            │                      │
            │    ┌─────────────────┼─────────────────┐
            │    │                 │                 │
            ▼    ▼                 ▼                 ▼
   ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
   │TenantDocument   │   │TenantDocument   │   │TenantDocument   │
   │Processor        │   │Processor        │   │Processor        │
   │(Tenant A)       │   │(Tenant B)       │   │(Tenant N)       │
   └─────────────────┘   └─────────────────┘   └─────────────────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘

                    JMS Message Selector:
                    "tenantUUID = '{tenantId}'"

Pipeline Position

DocumentExtractor (bytes → text)


DocumentProfessor (text → proposals)


DocumentManager (proposals → authoritative state)  ◄── You are here

Mental Model

ComponentResponsibility
DocumentExtractorbytes → text
DocumentProfessortext → meaning (proposals)
DocumentManagermeaning → authoritative state

Message Flow

Input Commands

cmd.document.process

Triggers document processing pipeline. Sent by BOOST.IntegrationsServer when a document is saved.

json
{
  "documentUUID": "742dba26-...",
  "organizationId": "org-123"
}

cmd.document.retry

Retries processing for failed or needs-review documents.

json
{
  "documentUUID": "742dba26-...",
  "organizationId": "org-123",
  "fromStep": "extraction"
}
FieldDescription
fromStepOptional. "extraction" or "interpretation". Defaults to extraction.

Input Events

evt.document.text.extracted

Received from DocumentExtractor when text extraction completes.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "success": true,
  "textRef": "742dba26-.../extracted-text/Invoice.txt",
  "textBucket": "boost.app.tenant1",
  "engineUsed": "PDFBOX",
  "processingTimeMs": 1234
}

evt.document.interpreted

Received from DocumentProfessor with interpretation proposals.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "success": true,
  "documentTypeProposal": {
    "documentTypeUUID": "doctype-invoice-001",
    "confidence": 0.9
  },
  "partyProposals": [...],
  "fieldProposals": [...],
  "matchedRuleUUIDs": [...]
}

Output Events

evt.document.processing.started

Emitted when document processing begins.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "status": "processing",
  "timestamp": 1706184000000
}

evt.document.ready

Emitted when document processing completes successfully.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "status": "ready",
  "documentTypeUUID": "doctype-invoice-001",
  "partyUUID": "party-acme-001",
  "timestamp": 1706184000000
}

evt.document.needs.review

Emitted when manual review is required.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "status": "needs_review",
  "reason": "Document type confidence below threshold (0.72). Multiple party candidates (2).",
  "timestamp": 1706184000000
}

evt.document.failed

Emitted when processing fails.

json
{
  "documentUUID": "742dba26-...",
  "entityUUID": "5d16c711-...",
  "organizationId": "org-123",
  "status": "failed",
  "errorMessage": "Text extraction failed: Unsupported content type",
  "timestamp": 1706184000000
}

Document Lifecycle

DocumentManager owns all document state transitions after ingestion.

RECEIVED (set by IntegrationsServer)

    ▼ cmd.document.process
PROCESSING

    ▼ evt.document.text.extracted
TEXT_EXTRACTED

    ▼ evt.document.interpreted
INTERPRETED

    ├──► READY (confidence ≥ auto_accept threshold)
    ├──► NEEDS_REVIEW (confidence between thresholds)
    └──► FAILED (error occurred)
StatusDescription
RECEIVEDDocument ingested by IntegrationsServer
PROCESSINGDocumentManager has started processing
TEXT_EXTRACTEDText extraction completed
INTERPRETEDInterpretation completed
READYAll processing complete, document ready for use
NEEDS_REVIEWManual review required (low confidence or conflicts)
FAILEDProcessing failed

Valid Transitions

FromTo
RECEIVEDPROCESSING
PROCESSINGTEXT_EXTRACTED, FAILED
TEXT_EXTRACTEDINTERPRETED, FAILED
INTERPRETEDREADY, NEEDS_REVIEW, FAILED
NEEDS_REVIEWREADY, FAILED

Business Decision Engine

DocumentManager applies business rules to interpretation proposals:

Confidence Thresholds

ThresholdDefaultBehavior
auto_accept0.85Proposals with confidence ≥ this are automatically accepted
needs_review0.60Proposals with confidence between thresholds require review

Decision Logic

  1. Document Type: Accept if confidence ≥ auto_accept threshold
  2. Party Selection: Pick highest confidence candidate if ≥ auto_accept
  3. Field Extraction: Accept fields individually based on confidence
  4. Multiple Candidates: Flag for review if multiple high-confidence party candidates

Review Triggers

  • Document type confidence below auto_accept threshold
  • Party confidence below auto_accept threshold
  • Multiple party candidates with similar confidence
  • Required fields missing or low confidence

Multi-Tenant Architecture

Similar to BOOST.DeliveryBoy, DocumentManager uses per-tenant processors:

java
// Main entry point
DocumentManagerApp app = new DocumentManagerApp();
app.start();

// Inside start():
List<CustomerDataSource> tenants = dbManager.getCustomerDataSources();
for (CustomerDataSource tenant : tenants) {
    documentManagerBoss.createProcessorForTenant(tenant);
}

Message Filtering

Each TenantDocumentProcessor uses JMS message selectors to filter by tenantUUID:

java
String selector = "tenantUUID = '" + organizationId + "'";
MessageConsumer consumer = session.createConsumer(queue, selector);

This ensures each tenant's processor only receives messages for that tenant.

When publishing messages, DocumentManager sets the tenantUUID property:

java
private void sendMessage(MessageProducer producer, String json) {
    TextMessage message = session.createTextMessage(json);
    message.setStringProperty("tenantUUID", organizationId);
    producer.send(message);
}

For detailed information about tenant isolation across all services, see Multi-Tenant Messaging.

Configuration

Configuration is loaded from config.properties.

properties
# AMQP Broker
broker_url=amqp://172.16.200.32:5672

# Input queues
queue.cmd.document.process=cmd.document.process
queue.cmd.document.retry=cmd.document.retry
queue.evt.document.text.extracted=evt.document.text.extracted
queue.evt.document.interpreted=evt.document.interpreted

# Output queues
queue.evt.document.processing.started=evt.document.processing.started
queue.evt.document.ready=evt.document.ready
queue.evt.document.needs.review=evt.document.needs.review
queue.evt.document.failed=evt.document.failed
queue.evt.document.updated=evt.document.updated

# Commands to other services
queue.cmd.document.extractText=cmd.document.extractText
queue.cmd.document.interpret=cmd.document.interpret

# Master database for tenant lookup
master.jdbc.url=jdbc:mysql://10.245.10.35:3306/qeeping?useSSL=false&serverTimezone=UTC
master.jdbc.username=qeeping-master
master.jdbc.password=****

# Business rules
confidence.threshold.auto_accept=0.85
confidence.threshold.needs_review=0.60

# Connection pool settings
hikari.minimum_idle=2
hikari.maximum_pool_size=10
hikari.idle_timeout=300000
hikari.max_lifetime=600000

Package Structure

src/main/java/com/luqon/boost/documentmanager/
├── DocumentManagerApp.java           # Entry point
├── DocumentManagerBoss.java          # Manages tenant processors (like DispatcherBoss)
├── TenantDocumentProcessor.java      # Per-tenant message handler (like PersonalDispatcher)
├── config/
│   └── AppConfig.java                # Configuration loader
├── db/
│   ├── TenantDatabaseManager.java    # Multi-tenant datasource initialization
│   └── CustomerDataSource.java       # Tenant database wrapper
├── model/
│   └── DocumentStatus.java           # Lifecycle states enum
└── service/
    └── DecisionEngine.java           # Business rule evaluation

Key Design Decisions

AspectChoiceRationale
Multi-tenantPer-tenant processorsIsolation, independent scaling
Message filteringJMS selectorsEfficient tenant message routing
DatabasejOOQ with schema mappingType-safe SQL, multi-tenant support
Connection poolingHikariCPHigh performance, configurable
Ack modeCLIENT_ACKNOWLEDGEDon't lose messages on failure
State machineExplicit transitionsPrevent invalid state changes
Proposals vs DecisionsApply thresholdsBusiness rules in one place

Explicit Non-Responsibilities

DocumentManager must NOT:

  • Perform PDF parsing or OCR (DocumentExtractor's job)
  • Interpret raw text directly (DocumentProfessor's job)
  • Apply heuristic extraction logic
  • Depend on user authentication (no JWT)
  • Act as a generic workflow engine

Dependencies

DependencyVersionPurpose
Qpid JMS Client2.10.0AMQP messaging
jOOQ3.19.6Type-safe SQL
HikariCP6.2.1Connection pooling
MySQL Connector9.2.0Database driver
SLF4J + Logback2.0.16 / 1.5.16Logging
com.luqon.json1hJSON parsing/building

Building

bash
cd BOOST.DocumentManager
mvn clean package -DskipTests

Produces fat JAR at target/BOOST.DocumentManager-0.0.1-SNAPSHOT.jar.

Running

bash
java -jar target/BOOST.DocumentManager-0.0.1-SNAPSHOT.jar

The service will:

  1. Load configuration from config.properties
  2. Initialize database connections for all tenants
  3. Create per-tenant document processors
  4. Start consuming messages

Startup Log

Starting BOOST DocumentManager...
Initializing TenantDatabaseManager...
Registered tenant: Acme Corp (org-123) -> database: qeeping_acme
Registered tenant: Beta Inc (org-456) -> database: qeeping_beta
TenantDatabaseManager initialized with 2 tenants
Initializing DocumentManagerBoss...
Creating document processors for 2 tenants...
TenantDocumentProcessor started for tenant: Acme Corp (org-123)
TenantDocumentProcessor started for tenant: Beta Inc (org-456)
BOOST DocumentManager started successfully
Active tenant processors: 2

Integration with Other Services

BOOST.IntegrationsServer

When a document is saved:

java
// In DocumentService.createFromMailbox()
document.store();
Secretary.sendDocumentProcessCommand(document.getUuid(), organizationId);

BOOST.DocumentExtractor

DocumentManager sends:

cmd.document.extractText → DocumentExtractor

DocumentExtractor responds:

evt.document.text.extracted → DocumentManager

BOOST.DocumentProfessor

DocumentManager sends:

cmd.document.interpret → DocumentProfessor

DocumentProfessor responds:

evt.document.interpreted → DocumentManager

Database Tables

DocumentManager writes to these tables:

TableOperations
DOCUMENTSUpdate status, documentTypeUUID, partyUUID, processingError
DOCUMENT_FIELDSInsert/update extracted field values

DOCUMENTS Updates

sql
UPDATE DOCUMENTS SET
  status = 'ready',
  documentTypeUUID = '...',
  partyUUID = '...',
  updatedAt = NOW(),
  updatedByUUID = 'SYSTEM'
WHERE UUID = '...'

DOCUMENT_FIELDS Inserts

sql
INSERT INTO DOCUMENT_FIELDS
  (UUID, documentUUID, fieldName, rawValue, convertedValue, confidence, ruleUUID, createdAt)
VALUES
  ('...', '...', 'invoiceNumber', 'INV-2025-001', 'INV-2025-001', 0.9, 'rule-002', NOW())

Error Handling

ScenarioBehavior
Document not foundLog error, skip processing
Invalid state transitionLog warning, skip
Text extraction failedSet status FAILED, publish evt.document.failed
Interpretation failedSet status FAILED, publish evt.document.failed
S3 fetch failedRecord error, throw exception to trigger redelivery
Database errorException thrown, message NOT acknowledged

Message Resilience

DocumentManager ensures no messages are lost through several mechanisms:

CLIENT_ACKNOWLEDGE Mode

Messages are only acknowledged after successful processing:

java
// In onMessage()
try {
    // Process message...
    message.acknowledge();  // Only after success
} catch (Exception e) {
    // Don't acknowledge - broker will redeliver
}

Redelivery Protection

To prevent "poison messages" (messages that fail forever), DocumentManager tracks redelivery count:

java
private static final int MAX_REDELIVERY_COUNT = 5;

// Check JMSXDeliveryCount property
if (redeliveryCount >= MAX_REDELIVERY_COUNT) {
    log.error("Message exceeded max redelivery count, dropping");
    message.acknowledge();  // Prevent infinite loop
    return;
}

Failure Flow

1st attempt → Exception → No ack → Broker redelivers
2nd attempt → Exception → No ack → Broker redelivers
...
5th attempt → Exception → No ack → Broker redelivers
6th attempt → Max exceeded → Acknowledge → Message dropped (logged)

Database State Preservation

Even when messages fail, document state is preserved:

  • recordProcessingError() saves the error message to the DOCUMENTS table
  • Document status is set to FAILED
  • Error details are available for debugging and retry

Temporary vs Permanent Failures

Failure TypeBehaviorRationale
S3 connection errorThrow exception (retry)May be temporary network issue
Document not foundLog and skipPermanent - retrying won't help
Invalid content typeRecord error, no retryPermanent - file won't change
Database errorThrow exception (retry)May be temporary

Idempotency Considerations

Processing the same message multiple times should be safe:

  • Status transitions are validated (can't go backward)
  • Database updates use optimistic locking where possible
  • External calls (S3, downstream services) should be idempotent

Logging

Logs written to:

  • Console (stdout)
  • logs/documentmanager.log (rolling daily, 30 days retention)

Configure levels in logback.xml.