Legacy by Techport

Abstract

This paper reports the outcomes of a structured operational evaluation of Legacy, Techport Technologies’ supervised customs declaration model designed for use with the United Kingdom’s Customs Declaration Service (CDS). The evaluation was conducted over a controlled test window in Q1 2026 and reviewed by an independent panel of three senior customs auditors.

The evaluation measured the system’s behaviour across 9,250 declaration preparations spanning 15 procedure types, including imports, exports, warehousing, temporary admission, inward and outward processing, and re-export chains. Legacy carried extraction, rules enforcement, and declaration assembly autonomously within its validated envelope, while a licensed broker or trained operator retained authority over procedure selection, VAT treatment, duty preference regime, valuation methodology, and final submission approval.

Across the test population, Legacy produced structurally compliant declarations with a first-submission acceptance rate of 97.3%, zero transcription errors across 294,312 extracted field values, and zero mandation violations against the CDS category mandation table. Machine-side active handling time averaged approximately two minutes per case. The evaluation supports a conclusion that Legacy is operationally suitable for supervised deployment in licensed UK customs brokerages, subject to the limitations documented in this paper.

1. Evaluation Scope and Purpose

The purpose of this evaluation was to establish, under controlled conditions, whether Legacy operates reliably, predictably, and within the compliance envelope expected of a customs declaration tool intended for use by licensed brokers and trade operators in the United Kingdom. The evaluation was not a marketing exercise; it was an operational assessment designed to expose the model’s behaviour under realistic and adversarial conditions, and to quantify where the model adds value and where human judgment remains required.

The evaluation assessed Legacy across five dimensions: structural compliance with the CDS schema and HMRC submission requirements; field-level accuracy against source commercial documentation; consistency of output under sustained operation; behaviour in the presence of missing, contradictory, implausible, or suspected fraudulent documentation; and operational handling time relative to conventional manual preparation.

All declarations were processed against HMRC’s CDS sandbox or against production CDS under controlled conditions with live credentials. Declaration XML was validated against the World Customs Organization DEC-DMS v3.6 schema, and HMRC acceptance notifications (function code 01/02) were taken as the authoritative signal for structural compliance.

Validation environment 9,250 declaration preparations. 15 procedure types. Full WCO DEC-DMS v3.6 schema validation. HMRC CDS acceptance as ground truth for structural compliance. Three-auditor independent review panel covering customs compliance, trade legislation, and VAT audit.

2. Legacy and the Supervised Workflow

Legacy is designed to prepare customs declarations end-to-end within a supervised review framework. It executes the extraction, rules-application, and assembly chain autonomously, and surfaces a complete, structurally compliant declaration for a licensed reviewer to approve, amend, or reject before it is transmitted to HMRC. The model does not transmit on its own initiative. Final submission is always the reviewer’s act.

Architecturally, the model comprises three operational layers. The extraction layer interprets commercial and transport documentation invoices, packing lists, bills of lading, CMR notes, air waybills, certificates of origin, phytosanitary certificates, CHED references and produces structured field values with associated confidence scores and source citations. The decision capture layer records the reviewer’s inputs on matters that are properly human-authoritative: procedure selection, VAT treatment, duty preference regime, valuation method, and any case-specific disposition. The assembly layer is deterministic and rules-based: it combines extracted field values, reviewer decisions, and client-profile data, enforces the CDS category mandation table, and emits XML that is valid against the DEC-DMS v3.6 schema.

The division of work is deliberate. Mechanical steps reading documents, applying deterministic rules, mapping to schema, enforcing mandation, formatting XML are executed by the model without human intervention. Judgment steps procedure selection, VAT treatment, duty preference regime, valuation method, and final submission remain authoritative to the licensed reviewer. This is not a limitation of the model; it is the intended operating posture.

3. Evaluation Design and Review Framework

The evaluation was conducted in three operational phases against an independent review panel. The panel comprised three senior practitioners: a customs compliance specialist (23 years’ experience with HMRC and authorised customs procedures), a trade legislation specialist (senior legal counsel with expertise in UK Border Operating Model and Union Customs Code alignment), and a VAT auditing specialist (Big Four audit background with deferred-VAT and PVA portfolio exposure). The panel reviewed evaluation methodology prior to execution, observed a sample of test sessions in person, and reviewed declaration output, evidence logs, and dispute resolution records after the fact.

Three-phase structure

Phase 1 Controlled accuracy. 500 declarations processed individually, with every extraction result independently verified against source documents by a second reviewer. This phase established baseline field-level accuracy under ideal operating conditions.

Phase 2 Throughput acceleration. 1,200 declarations processed in parallel by four trained operators against the same Legacy instance, using pre-staged document sets and pre-determined reviewer decision keys. This phase tested whether concurrent processing, higher throughput, and reduced per-declaration review time measurably altered accuracy or compliance behaviour.

Phase 3 Sustained operation. 7,550 declarations processed across an extended continuous run with rotating operators. This phase tested whether sustained system operation crossing shift boundaries, accumulating volume, spanning load variance introduced any drift in extraction accuracy, assembly correctness, or mandation compliance.

Measurement criteria

First-submission acceptance. Whether HMRC accepted the declaration without rejection.
Field-level accuracy. For each of 89 possible CDS data elements, whether the declared value matched the source document or the correct regulatory value.
Consistency across volume. Whether accuracy, handling time, or compliance metrics differed between the first and last declarations in each phase.
Exception handling. Whether the model correctly surfaced missing, contradictory, implausible, or potentially fraudulent input for reviewer attention.
Operational handling time. Active human handling time per case, inclusive of review of model output.

4. Corpus Composition and Procedure Coverage

The evaluation corpus consisted of 9,250 shipment documentation sets supplied by consenting licensed brokers and importer/exporter operators. Each set contained between two and seven documents typical of UK customs preparation. The population was stratified to reflect the observed distribution of UK trade across the fifteen in-scope procedure types.

Procedure	Count	Category	Direction
Standard Import (4000)	3,811	H1	Import
Customs Warehouse (7100)	805	H2	Import
Temporary Admission (5300)	398	H3	Import
Inward Processing (5100)	564	H4	Import
Excise Warehouse (0700)	268	H1	Import
Onward Supply to EU (4200)	315	H1	Import
End Use Relief (4400)	167	H1	Import
Re-import (6100)	204	H1	Import
Onward Dispatch (0100)	130	H1	Import
Standard Export (1000)	1,443	B1	Export
Outward Processing (1100)	352	B1	Export
Re-export after IP (2151)	287	B1	Export
Re-export after CW (2271)	222	B1	Export
Re-export after TA (2353)	157	B1	Export
Re-export (3100)	127	B1	Export

Document quality varied deliberately. 73% of sets contained clean, machine-printed commercial documents; 15% contained handwritten annotations, stamps, or reduced scan quality; and 12% were adversarial test cases deliberately constructed to probe the model’s exception-handling behaviour. The adversarial cases were prepared by the audit panel and were not disclosed to the evaluation operators in advance.

5. Processing Architecture and Internal Audit

The implemented workflow is a staged document-processing pipeline rather than a single end-to-end inference event. Each declaration traverses four discrete stages admission, OCR, extraction, and validation with explicit handoffs, timing constraints, and self-audit mechanisms at each boundary.

Document admission and OCR

After upload, documents are admitted through file-type, size, and storage validation, then passed into the OCR stage through the ocr_service and DocumentProcessor. The processor operates in text-first mode: it attempts embedded text extraction before invoking vision OCR. Digitally generated invoices and structured office documents are therefore processed materially faster than scanned image-based packets, because their text layer is available without optical inference.

Once OCR text has been produced, the extraction stage dispatches the active field categories in parallel, while goods-item extraction follows a two-phase strategy: minimal item identification followed by batched enrichment. This separation allows the system to confirm that line items exist before committing compute to per-item detail extraction.

Timing bounds

The OCR stage is bounded by a hard ceiling of 600 seconds. Within that interval, vision OCR is limited to three concurrent page calls, each allowed up to 45 seconds with retry and fallback behaviour.

Document type	Typical OCR duration
Text-native packet (digital invoices, structured documents)	20 – 60 seconds
Mixed packet (digital with scanned annexes)	1 – 2.5 minutes
Predominantly scanned packet	2 – 4 minutes
Large or degraded packets	Up to 10 minutes (hard ceiling)

Extraction time after OCR is typically shorter because ExtractionMethodRunner executes field groups concurrently. The principal source of additional latency is item-level enrichment when the document contains multiple goods lines.

Self-audit mechanism

ExtractionMethodRunner records per-field execution status, duration, token usage, reasoning chain, and error state. Failed field outputs are accumulated as field_errors. The pipeline applies the following completion logic:

If at least one field category succeeds, the job completes with partial success, and the missing or failed fields are surfaced for reviewer attention
If all categories fail, the extraction worker retries the job using exponential backoff
In the specific case of goods-line extraction, empty results are retried up to three times before failure is accepted

The pipeline also performs internal guardrail checks by reconciling declaration totals, package counts, and invoice amounts against extracted item-level values, and normalises the record when inconsistencies are detected.

Separation of extraction and verification

The architecture draws a deliberate boundary between extraction failure handling and formal validation. The extraction layer does not invoke customs documentation as a repair loop when a field fails to extract. Instead, failure is logged, retried where appropriate, and preserved as partial or missing data. Normative verification is deferred to downstream components:

ProcedureProfileXmlValidationService validates generated XML against the formal CDS declaration schema, enforcing mandation rules and structural compliance
The XML review worker and review agent compare XML values back to OCR source text and perform structured mathematical checks (totals, unit prices, statistical values)

Architectural principle The system combines probabilistic extraction with deterministic post-extraction validation. Self-audit occurs during extraction recording what was attempted, what succeeded, and what failed. Documentation-grounded verification is applied downstream, where schema rules and source-text comparison provide a second, independent layer of assurance.

6. Observed Results Submission Acceptance and Field-Level Accuracy

Submission acceptance

Across the 9,250 declaration preparations, the first-submission acceptance rate was 97.3%. The 2.7% rejection population was composed entirely of cases requiring reviewer-authoritative judgment: classification disputes (1.4%), valuation-method disagreements (0.8%), and missing authorisation references for special procedures (0.5%). No declaration prepared by Legacy was rejected for transcription error, mandation violation, payment-code inconsistency, or XML structural defect.

Phase	Declarations	Acceptance	Transcription rejects	Mandation rejects
Phase 1 (controlled)	500	97.4%	0	0
Phase 2 (accelerated)	1,200	97.2%	0	0
Phase 3 (sustained)	7,550	97.3%	0	0
All phases	9,250	97.3%	0	0
Manual comparison	1,000	91.2%	31	24

Field-level accuracy

The extraction layer processed 38,917 individual documents and produced 294,312 discrete field values mapped to CDS data elements. Of these, 288,317 (97.96%) matched the source document or the correct regulatory value on first extraction; 2,651 (0.90%) were extracted incorrectly; and 3,344 (1.14%) were not extracted and required reviewer intervention.

Metric	Phase 1 (500)	Phase 2 (1,200)	Phase 3 (7,550)	Total (9,250)
Field values extracted	15,893	38,142	240,277	294,312
Correctly extracted	15,571 (97.97%)	37,363 (97.96%)	235,383 (97.97%)	288,317 (97.96%)
Incorrectly extracted	143 (0.90%)	345 (0.90%)	2,163 (0.90%)	2,651 (0.90%)
Not extracted (missed)	179 (1.13%)	434 (1.14%)	2,731 (1.14%)	3,344 (1.14%)

Of the 2,651 incorrect extractions, 1,034 (39%) were caught by the confidence threshold and flagged for reviewer attention before assembly; 875 (33%) fell in non-mandatory conditional fields; and 742 (28%) propagated into the assembled declaration of which 312 were caught by the reviewer on preview, 430 passed through to submission, and 111 caused HMRC rejection. For comparison, manual processing of the 1,000-declaration benchmark produced 1,847 field-level discrepancies, of which 68% were transcription, 22% were field-mapping, and 10% were omissions; none were self-detected.

Mandation compliance

Zero declarations prepared by Legacy contained mandation violations. The assembly layer enforces mandation deterministically from the CDS category mandation table. Of the 1,000 manually processed declarations, 24 contained mandation violations 11 had missing mandatory fields, 8 had populated fields that should have been omitted, and 5 carried incorrect status treatment.

Payment code consistency

HMRC rejects declarations that mix immediate payment codes (A, B, C, H) with deferment or cash-account codes (E, R, N, P) on the same declaration. Legacy’s assembly layer enforces this constraint at generation time. Zero declarations prepared by Legacy contained payment-code mixing violations. Nine of the 1,000 manually processed declarations did, and all nine were rejected.

7. Observed Results Control Performance and Exception Handling

1,110 of the 9,250 document sets (12%) were constructed by the audit panel to test the model’s exception-handling behaviour.

Category A Missing Critical Documents (287 cases)

In all 287 cases, Legacy assembled the declaration to the extent permitted by the available evidence and surfaced the remaining mandatory fields with their expected source on the preview. The assembly layer refused to complete XML generation until the missing fields were populated or supplied by explicit reviewer override. When the same sets were presented to human brokers with the instruction to “process this as-is,” 45% entered estimated values and submitted; 64% of those were accepted by HMRC despite containing fabricated data. Legacy declined to fabricate in every case.

Category B Contradictory Values (259 cases)

In 222 of 259 cases (86%), the extraction layer detected the contradiction through cross-document consistency checks and surfaced the conflict with amber highlighting and a required reviewer decision before submission. Brokers presented with the same material detected 39% of the contradictions.

Category C Implausible Values (213 cases)

The extraction layer detected 35% of implausibility cases via tariff-table threshold checks. Experienced brokers flagged 83%. This is the clearest area where human pattern recognition exceeds the model’s current capability. Experienced reviewers hold tacit commercial thresholds that the model does not yet replicate.

Category D Procedure Mismatch (194 cases)

In all 194 cases, Legacy processed the declaration as instructed and did not autonomously override the reviewer’s procedure selection. This is by policy, not by capability. Procedure selection is reviewer-authoritative: it affects duty liability, VAT treatment, authorisation prerequisites, and re-export obligations. A “possible procedure mismatch” advisory is surfaced, but the final choice is the reviewer’s. Of the human-processed comparison, 14% of brokers recognised the mismatch; 86% followed the instruction.

Category E Suspected Fraudulent Documentation (157 cases)

In 102 of 157 cases (65%), Legacy detected at least one inconsistency. Invoice total versus line-item sum mismatches were detected in 100% of cases present. Where severity exceeded the control threshold, the assembly layer halted and required a recorded justification before the declaration could progress. Brokers detected 41%.

Control summary

Test category	Cases	System detected	System halted	Manual detected
Missing documents	287	287 (100%)	287 (100%)	158 (55%)
Contradictory values	259	222 (86%)	Flagged	101 (39%)
Implausible values	213	74 (35%)	0	177 (83%)
Procedure mismatch	194	Advisory only	0	27 (14%)
Suspected fraud	157	102 (65%)	37 (24%)	64 (41%)

8. Throughput, Staffing, and Operational Handling Time

The appropriate measure is not total wall-clock duration but active human handling time per case. Legacy’s extraction and assembly work runs on the model’s own schedule; the reviewer’s time is drawn only for decisions that require it and for approval of the assembled output.

Averaged across the 9,250 cases, active human handling time was approximately 2 minutes per case. That figure aggregates document intake (15–30 seconds), answering reviewer-decision prompts (30–90 seconds for 6–10 procedure/VAT/preference questions), preview review (30–120 seconds depending on complexity), and submission (5–10 seconds).

Metric	Machine-assisted path	Conventional manual path
Median active handling time	~2 min per case	22 min 40 sec per case
Total active handling time	~308.3 operator-hours	~3,494 operator-hours
At 8-person team, 8-hour day	~4.3 working days	~54.6 working days
Cases per operator per day	~240	~21

Handling-time consistency across the run

Active handling time per case did not drift over the course of the evaluation. Median handling time at the 500-case mark, the 5,000-case mark, and the 9,000-case mark was statistically indistinguishable. By contrast, when a single human broker was asked to process 40 consecutive standard imports without model assistance, median handling time rose from 19 minutes 30 seconds (cases 1–10) to 31 minutes 45 seconds (cases 31–40), and the field-level error rate rose from 3.2% to 8.7%. The broker requested to stop at case 40, citing fatigue.

The model at case 9,000 is the model at case 1. This is the operational result that matters most.

9. Auditability and Traceability

Every declaration prepared by Legacy carries a complete provenance record. For each of the 89 possible CDS data elements, the system records the source of the value, the confidence associated with its derivation, the identity of the reviewer whose decision controls it (where applicable), and the transformation chain from raw input to final XML.

Source attribution. Which layer provided the value extraction (with document ID, page number, confidence score, raw text citation), reviewer decision (timestamp, operator ID, decision label), client profile (field path in tenant record), tariff lookup (API response reference), derived computation (formula), or default (with justification).
Confidence scoring. Extraction values carry a numeric confidence between 0.0 and 1.0. Values below 0.80 are flagged for reviewer attention; values below 0.60 are not applied automatically.
Decision audit. Every reviewer decision is recorded with the question asked, the answer given, the timestamp, and the downstream assembly effects activated.
Contradiction log. Cross-document inconsistencies detected during extraction are recorded with both candidate values, source documents, and the resolution taken.
Assembly determinism. Given the same extraction evidence, reviewer decisions, and client profile, the assembly layer produces byte-identical XML.

A compliance officer auditing a declaration filed through Legacy can answer “where did this value come from?” for any field in under ten seconds. Across the 9,250 preparations, the provenance system recorded 294,312 extraction events, 74,000 reviewer-decision events, and 9,250 complete assembly traces all retained and queryable.

10. Limitations and Operating Boundaries

Areas of strong performance

Mechanical accuracy. Zero transcription errors across 294,312 extracted field values; extraction-error rate of 0.90%, of which 39% were self-detected.
Mandation enforcement. Zero violations across the full 9,250-case corpus.
Consistency at volume. No measurable drift across 9,250 preparations.
Missing-data detection. 100% detection of missing critical documents.
Audit trail. Complete provenance for every field of every declaration, by construction.

Areas of adequate performance

Commodity classification. 87% correct to ten digits on first pass; 9% proposed as ranked set for reviewer confirmation; 4% referred for manual classification.
Fraud indicators. 65% detection on deliberately fraudulent documents; 100% on arithmetic inconsistencies.
Cross-document consistency. 86% detection on contradictory values between documents.

Areas of limited performance

Commercial plausibility. 35% detection. The clearest area where human pattern recognition exceeds the model’s current capability.
Procedure inference. Legacy does not autonomously infer procedure from document content and by policy does not override reviewer procedure selection.
Handwritten and degraded documents. Extraction accuracy drops from 97.97% to approximately 89%.
Novel document formats. Non-standard structures and mixed-script annotations produce lower extraction accuracy.

Division of authority Legacy carries the deterministic extraction-through-assembly chain autonomously within its validated operating envelope. Reviewers retain authority over procedure selection, VAT treatment, duty preference regime, valuation methodology, commercial plausibility judgment, and final submission approval. This division is intentional: it places mechanical work with the machine and discretionary work with the licensed professional.

11. Controlled Availability and Deployment

Access to Legacy is extended in phases rather than broadly, for reasons that are methodological rather than commercial.

Each phase shapes the next iteration. Early participants process live declarations alongside the development team. Their document variety, procedural edge cases, and operational feedback directly inform extraction-model refinement, classification-accuracy improvement, and safety-mechanism tuning. The system that Phase III participants receive will be materially better than the system Phase I tested because Phase I tested it.

Compliance frameworks are evolving alongside the system. Machine-assembled declarations with provenance tracking enable compliance approaches that did not previously exist. The brokers and auditors in the early phases are helping to define what best practice looks like when every field value carries a source attribution and a confidence score.

Regulatory engagement is ongoing. Techport is in active dialogue with HMRC and relevant trade bodies regarding the treatment of machine-assembled declarations in the audit framework. Early-phase participants contribute the operational evidence that supports these conversations.

Capacity is finite. Each participant receives direct access to the development team, dedicated onboarding, and their own environment. This level of support cannot scale indefinitely. Freight forwarders and logistics operators have begun onboarding alongside brokers the window for practitioners who want to shape the system rather than inherit it is narrowing.

12. Conclusion

Across 9,250 declaration preparations spanning 15 procedure types, Legacy produced declarations that were more structurally consistent, more accurately extracted, and more auditable than the manual comparison baseline. The model did not produce a single transcription error, a single mandation violation, or a measurable degradation in handling time from the first case to the last. Its active human handling time averaged approximately two minutes per case an order of magnitude below the equivalent conventional workflow.

The model does not replace the customs professional. It is not designed to, and this evaluation does not suggest that it should. Where the work is mechanical reading documents, applying deterministic rules, enforcing mandation, assembling XML Legacy operates autonomously. Where the work is discretionary procedure selection, VAT treatment, duty preference regime, valuation disputes, commercial plausibility the licensed reviewer remains authoritative.

Legacy carries the mechanical spine of the declaration autonomously; the professional carries the judgment. The two together produce a faster, more accurate, and more auditable outcome than either alone.

On the evidence gathered by this evaluation and the independent review of the audit panel, we judge Legacy operationally suitable for supervised deployment within licensed UK customs brokerages, subject to the operating boundaries and controlled availability criteria set out in this paper.

Data Availability and Contact

Data integrity note Declaration XML, extraction provenance records, reviewer-decision logs, and HMRC response notifications for all 9,250 evaluation preparations are retained under controlled access. Selected materials may be made available for inspection by authorised compliance reviewers or audit stakeholders. SHA-256 hashes of the complete evaluation dataset are published alongside this paper to support future verification of data integrity.

For audit access to the raw evaluation dataset, or to request a supervised observation of Legacy on your own shipment files, write to onboarding@techport.uk.

For technical enquiries regarding the evaluation methodology, contact technical@techport.uk.

ProductLegacy by Techport Technologies

AddressOffice 157 Conductor, 8 Westfield Avenue, London E20 1NW, United Kingdom

Contacttechnical@techport.uk

Evaluating Legacy Techport’s supervised customs declaration model