BOOST.DocumentManager
The authoritative domain component for documents in the BOOST platform. DocumentManager owns document lifecycle, business decisions, and persistence. It is the only component allowed to write authoritative document state to the tenant database.
Architecture
┌─────────────────────────────────┐
│ DocumentManagerApp │
│ (Entry Point) │
└────────────────┬────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│TenantDatabaseMgr │ │DocumentManagerBoss│ │ AppConfig │
│ (DataSources) │ │ (AMQP Connection) │ │ │
└────────┬─────────┘ └────────┬─────────┘ └──────────────────┘
│ │
│ ┌─────────────────┼─────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│TenantDocument │ │TenantDocument │ │TenantDocument │
│Processor │ │Processor │ │Processor │
│(Tenant A) │ │(Tenant B) │ │(Tenant N) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
JMS Message Selector:
"tenantUUID = '{tenantId}'"Pipeline Position
DocumentExtractor (bytes → text)
│
▼
DocumentProfessor (text → proposals)
│
▼
DocumentManager (proposals → authoritative state) ◄── You are hereMental Model
| Component | Responsibility |
|---|---|
| DocumentExtractor | bytes → text |
| DocumentProfessor | text → meaning (proposals) |
| DocumentManager | meaning → authoritative state |
Message Flow
Input Commands
cmd.document.process
Triggers document processing pipeline. Sent by BOOST.IntegrationsServer when a document is saved.
{
"documentUUID": "742dba26-...",
"organizationId": "org-123"
}cmd.document.retry
Retries processing for failed or needs-review documents.
{
"documentUUID": "742dba26-...",
"organizationId": "org-123",
"fromStep": "extraction"
}| Field | Description |
|---|---|
fromStep | Optional. "extraction" or "interpretation". Defaults to extraction. |
Input Events
evt.document.text.extracted
Received from DocumentExtractor when text extraction completes.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"success": true,
"textRef": "742dba26-.../extracted-text/Invoice.txt",
"textBucket": "boost.app.tenant1",
"engineUsed": "PDFBOX",
"processingTimeMs": 1234
}evt.document.interpreted
Received from DocumentProfessor with interpretation proposals.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"success": true,
"documentTypeProposal": {
"documentTypeUUID": "doctype-invoice-001",
"confidence": 0.9
},
"partyProposals": [...],
"fieldProposals": [...],
"matchedRuleUUIDs": [...]
}Output Events
evt.document.processing.started
Emitted when document processing begins.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"status": "processing",
"timestamp": 1706184000000
}evt.document.ready
Emitted when document processing completes successfully.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"status": "ready",
"documentTypeUUID": "doctype-invoice-001",
"partyUUID": "party-acme-001",
"timestamp": 1706184000000
}evt.document.needs.review
Emitted when manual review is required.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"status": "needs_review",
"reason": "Document type confidence below threshold (0.72). Multiple party candidates (2).",
"timestamp": 1706184000000
}evt.document.failed
Emitted when processing fails.
{
"documentUUID": "742dba26-...",
"entityUUID": "5d16c711-...",
"organizationId": "org-123",
"status": "failed",
"errorMessage": "Text extraction failed: Unsupported content type",
"timestamp": 1706184000000
}Document Lifecycle
DocumentManager owns all document state transitions after ingestion.
RECEIVED (set by IntegrationsServer)
│
▼ cmd.document.process
PROCESSING
│
▼ evt.document.text.extracted
TEXT_EXTRACTED
│
▼ evt.document.interpreted
INTERPRETED
│
├──► READY (confidence ≥ auto_accept threshold)
├──► NEEDS_REVIEW (confidence between thresholds)
└──► FAILED (error occurred)| Status | Description |
|---|---|
RECEIVED | Document ingested by IntegrationsServer |
PROCESSING | DocumentManager has started processing |
TEXT_EXTRACTED | Text extraction completed |
INTERPRETED | Interpretation completed |
READY | All processing complete, document ready for use |
NEEDS_REVIEW | Manual review required (low confidence or conflicts) |
FAILED | Processing failed |
Valid Transitions
| From | To |
|---|---|
| RECEIVED | PROCESSING |
| PROCESSING | TEXT_EXTRACTED, FAILED |
| TEXT_EXTRACTED | INTERPRETED, FAILED |
| INTERPRETED | READY, NEEDS_REVIEW, FAILED |
| NEEDS_REVIEW | READY, FAILED |
Business Decision Engine
DocumentManager applies business rules to interpretation proposals:
Confidence Thresholds
| Threshold | Default | Behavior |
|---|---|---|
auto_accept | 0.85 | Proposals with confidence ≥ this are automatically accepted |
needs_review | 0.60 | Proposals with confidence between thresholds require review |
Decision Logic
- Document Type: Accept if confidence ≥ auto_accept threshold
- Party Selection: Pick highest confidence candidate if ≥ auto_accept
- Field Extraction: Accept fields individually based on confidence
- Multiple Candidates: Flag for review if multiple high-confidence party candidates
Review Triggers
- Document type confidence below auto_accept threshold
- Party confidence below auto_accept threshold
- Multiple party candidates with similar confidence
- Required fields missing or low confidence
Multi-Tenant Architecture
Similar to BOOST.DeliveryBoy, DocumentManager uses per-tenant processors:
// Main entry point
DocumentManagerApp app = new DocumentManagerApp();
app.start();
// Inside start():
List<CustomerDataSource> tenants = dbManager.getCustomerDataSources();
for (CustomerDataSource tenant : tenants) {
documentManagerBoss.createProcessorForTenant(tenant);
}Message Filtering
Each TenantDocumentProcessor uses JMS message selectors to filter by tenantUUID:
String selector = "tenantUUID = '" + organizationId + "'";
MessageConsumer consumer = session.createConsumer(queue, selector);This ensures each tenant's processor only receives messages for that tenant.
When publishing messages, DocumentManager sets the tenantUUID property:
private void sendMessage(MessageProducer producer, String json) {
TextMessage message = session.createTextMessage(json);
message.setStringProperty("tenantUUID", organizationId);
producer.send(message);
}For detailed information about tenant isolation across all services, see Multi-Tenant Messaging.
Configuration
Configuration is loaded from config.properties.
# AMQP Broker
broker_url=amqp://172.16.200.32:5672
# Input queues
queue.cmd.document.process=cmd.document.process
queue.cmd.document.retry=cmd.document.retry
queue.evt.document.text.extracted=evt.document.text.extracted
queue.evt.document.interpreted=evt.document.interpreted
# Output queues
queue.evt.document.processing.started=evt.document.processing.started
queue.evt.document.ready=evt.document.ready
queue.evt.document.needs.review=evt.document.needs.review
queue.evt.document.failed=evt.document.failed
queue.evt.document.updated=evt.document.updated
# Commands to other services
queue.cmd.document.extractText=cmd.document.extractText
queue.cmd.document.interpret=cmd.document.interpret
# Master database for tenant lookup
master.jdbc.url=jdbc:mysql://10.245.10.35:3306/qeeping?useSSL=false&serverTimezone=UTC
master.jdbc.username=qeeping-master
master.jdbc.password=****
# Business rules
confidence.threshold.auto_accept=0.85
confidence.threshold.needs_review=0.60
# Connection pool settings
hikari.minimum_idle=2
hikari.maximum_pool_size=10
hikari.idle_timeout=300000
hikari.max_lifetime=600000Package Structure
src/main/java/com/luqon/boost/documentmanager/
├── DocumentManagerApp.java # Entry point
├── DocumentManagerBoss.java # Manages tenant processors (like DispatcherBoss)
├── TenantDocumentProcessor.java # Per-tenant message handler (like PersonalDispatcher)
├── config/
│ └── AppConfig.java # Configuration loader
├── db/
│ ├── TenantDatabaseManager.java # Multi-tenant datasource initialization
│ └── CustomerDataSource.java # Tenant database wrapper
├── model/
│ └── DocumentStatus.java # Lifecycle states enum
└── service/
└── DecisionEngine.java # Business rule evaluationKey Design Decisions
| Aspect | Choice | Rationale |
|---|---|---|
| Multi-tenant | Per-tenant processors | Isolation, independent scaling |
| Message filtering | JMS selectors | Efficient tenant message routing |
| Database | jOOQ with schema mapping | Type-safe SQL, multi-tenant support |
| Connection pooling | HikariCP | High performance, configurable |
| Ack mode | CLIENT_ACKNOWLEDGE | Don't lose messages on failure |
| State machine | Explicit transitions | Prevent invalid state changes |
| Proposals vs Decisions | Apply thresholds | Business rules in one place |
Explicit Non-Responsibilities
DocumentManager must NOT:
- Perform PDF parsing or OCR (DocumentExtractor's job)
- Interpret raw text directly (DocumentProfessor's job)
- Apply heuristic extraction logic
- Depend on user authentication (no JWT)
- Act as a generic workflow engine
Dependencies
| Dependency | Version | Purpose |
|---|---|---|
| Qpid JMS Client | 2.10.0 | AMQP messaging |
| jOOQ | 3.19.6 | Type-safe SQL |
| HikariCP | 6.2.1 | Connection pooling |
| MySQL Connector | 9.2.0 | Database driver |
| SLF4J + Logback | 2.0.16 / 1.5.16 | Logging |
| com.luqon.json | 1h | JSON parsing/building |
Building
cd BOOST.DocumentManager
mvn clean package -DskipTestsProduces fat JAR at target/BOOST.DocumentManager-0.0.1-SNAPSHOT.jar.
Running
java -jar target/BOOST.DocumentManager-0.0.1-SNAPSHOT.jarThe service will:
- Load configuration from
config.properties - Initialize database connections for all tenants
- Create per-tenant document processors
- Start consuming messages
Startup Log
Starting BOOST DocumentManager...
Initializing TenantDatabaseManager...
Registered tenant: Acme Corp (org-123) -> database: qeeping_acme
Registered tenant: Beta Inc (org-456) -> database: qeeping_beta
TenantDatabaseManager initialized with 2 tenants
Initializing DocumentManagerBoss...
Creating document processors for 2 tenants...
TenantDocumentProcessor started for tenant: Acme Corp (org-123)
TenantDocumentProcessor started for tenant: Beta Inc (org-456)
BOOST DocumentManager started successfully
Active tenant processors: 2Integration with Other Services
BOOST.IntegrationsServer
When a document is saved:
// In DocumentService.createFromMailbox()
document.store();
Secretary.sendDocumentProcessCommand(document.getUuid(), organizationId);BOOST.DocumentExtractor
DocumentManager sends:
cmd.document.extractText → DocumentExtractorDocumentExtractor responds:
evt.document.text.extracted → DocumentManagerBOOST.DocumentProfessor
DocumentManager sends:
cmd.document.interpret → DocumentProfessorDocumentProfessor responds:
evt.document.interpreted → DocumentManagerDatabase Tables
DocumentManager writes to these tables:
| Table | Operations |
|---|---|
DOCUMENTS | Update status, documentTypeUUID, partyUUID, processingError |
DOCUMENT_FIELDS | Insert/update extracted field values |
DOCUMENTS Updates
UPDATE DOCUMENTS SET
status = 'ready',
documentTypeUUID = '...',
partyUUID = '...',
updatedAt = NOW(),
updatedByUUID = 'SYSTEM'
WHERE UUID = '...'DOCUMENT_FIELDS Inserts
INSERT INTO DOCUMENT_FIELDS
(UUID, documentUUID, fieldName, rawValue, convertedValue, confidence, ruleUUID, createdAt)
VALUES
('...', '...', 'invoiceNumber', 'INV-2025-001', 'INV-2025-001', 0.9, 'rule-002', NOW())Error Handling
| Scenario | Behavior |
|---|---|
| Document not found | Log error, skip processing |
| Invalid state transition | Log warning, skip |
| Text extraction failed | Set status FAILED, publish evt.document.failed |
| Interpretation failed | Set status FAILED, publish evt.document.failed |
| S3 fetch failed | Record error, throw exception to trigger redelivery |
| Database error | Exception thrown, message NOT acknowledged |
Message Resilience
DocumentManager ensures no messages are lost through several mechanisms:
CLIENT_ACKNOWLEDGE Mode
Messages are only acknowledged after successful processing:
// In onMessage()
try {
// Process message...
message.acknowledge(); // Only after success
} catch (Exception e) {
// Don't acknowledge - broker will redeliver
}Redelivery Protection
To prevent "poison messages" (messages that fail forever), DocumentManager tracks redelivery count:
private static final int MAX_REDELIVERY_COUNT = 5;
// Check JMSXDeliveryCount property
if (redeliveryCount >= MAX_REDELIVERY_COUNT) {
log.error("Message exceeded max redelivery count, dropping");
message.acknowledge(); // Prevent infinite loop
return;
}Failure Flow
1st attempt → Exception → No ack → Broker redelivers
2nd attempt → Exception → No ack → Broker redelivers
...
5th attempt → Exception → No ack → Broker redelivers
6th attempt → Max exceeded → Acknowledge → Message dropped (logged)Database State Preservation
Even when messages fail, document state is preserved:
recordProcessingError()saves the error message to the DOCUMENTS table- Document status is set to FAILED
- Error details are available for debugging and retry
Temporary vs Permanent Failures
| Failure Type | Behavior | Rationale |
|---|---|---|
| S3 connection error | Throw exception (retry) | May be temporary network issue |
| Document not found | Log and skip | Permanent - retrying won't help |
| Invalid content type | Record error, no retry | Permanent - file won't change |
| Database error | Throw exception (retry) | May be temporary |
Idempotency Considerations
Processing the same message multiple times should be safe:
- Status transitions are validated (can't go backward)
- Database updates use optimistic locking where possible
- External calls (S3, downstream services) should be idempotent
Logging
Logs written to:
- Console (stdout)
logs/documentmanager.log(rolling daily, 30 days retention)
Configure levels in logback.xml.