Files
quic_ecs_dt/paper/index.qmd
Valère Plantevin 872bbb8c2c Update to data
2026-05-13 16:39:27 -04:00

403 lines
23 KiB
Plaintext

---
title: "QUIC and ECS as Complementary Transport and Runtime Substrates
for Industrial Digital Twins: An Integrated Empirical Study"
title-running: "QUIC+ECS for Industrial Digital Twins"
author-running: "Plantevin"
author: "Valère Plantevin\\inst{1}\\orcidID{0000-0000-0000-0000}"
institute: "Département d'informatique et de mathématiques, Université du Québec à Chicoutimi (UQAC), Chicoutimi, Canada\\\\ \\email{vplantev@uqac.ca}"
abstract: |
Industrial Digital Twin runtimes face a dual challenge: efficient
in-process state management across heterogeneous asset populations, and
low-latency transport of heterogeneous sensor streams with differing
reliability requirements. We argue that these two challenges admit
complementary structural solutions. The Entity-Component-System (ECS)
architectural pattern constitutes a natural runtime substrate, providing
cache-coherent bulk state updates, $O(k)$ archetype mutation for asset
lifecycle events, and DAG-driven parallel system scheduling. QUIC's
mixed-reliability multiplexing constitutes a natural transport substrate,
mapping three DT sensor data tiers onto unreliable datagrams, unidirectional
streams, and bidirectional streams respectively. We integrate both substrates
into a single prototype and validate the combined system on an industrial
Raspberry Pi CM5 (Cortex-A76) receiving real QUIC traffic from a dedicated
traffic generator. An empirical sweep across 50k--200k asset instances and
0--5\% packet loss confirms that the ECS tick rate remains an order of
magnitude above the cadence required for industrial DT operation under all
tested conditions, that cross-tier head-of-line blocking isolation holds
end-to-end -- the lossy datagram tier surfaces no measurable loss-induced
latency while the reliable bidirectional tier absorbs the expected QUIC
retransmit cost -- and that memory scales linearly at less than $0.2$~MB
per 1,000 entities on target edge hardware. Finally, the prototype functions as an active edge controller rather
than a passive telemetry pipeline, executing end-to-end closed-loop actuation
triggered directly from a standard Grafana observability dashboard.
keywords:
- digital twin
- entity-component-system
- QUIC
- industrial IoT
- real-time transport
- edge computing
bibliography: references.bib
---
```{python}
#| label: setup
#| include: false
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
from pathlib import Path
# Paths relative to paper/
DATA_TWO_MACHINE = Path("../data/two_machine")
FIGURES = Path("figures")
FIGURES.mkdir(exist_ok=True)
# Load sweep CSVs when they exist; provide empty defaults otherwise
def load_csv(path: Path) -> pd.DataFrame:
if path.exists():
return pd.read_csv(path)
return pd.DataFrame()
# CM5 sweep (M4 Max generator → CM5 substrate, 1 Gbps direct Ethernet).
# Holds T1 P99, T3 RTT P99, per-entity-count throughput / RSS.
# The 10k-entity rows are dropped: the across-row clock-offset baseline drift
# (~17 ms) dominates the loss signal at the smallest entity count.
df_sweep = load_csv(DATA_TWO_MACHINE / "final_table.csv")
if len(df_sweep):
df_sweep = df_sweep.query("entities >= 50000").reset_index(drop=True)
df_latency = df_sweep
df_throughput = df_sweep
# Per-cell value lookups for the result tables.
def _t1(e, l): return float(df_latency.query(f"entities=={e} and loss_pct=={l}")["t1_p99_us"].iloc[0]) / 1000.0
def _t3(e, l): return float(df_latency.query(f"entities=={e} and loss_pct=={l}")["t3_rtt_us"].iloc[0]) / 1000.0
def _hz(e, l): return int(round(float(df_throughput.query(f"entities=={e} and loss_pct=={l}")["hz"].iloc[0])))
def _rss(e): return float(df_throughput.query(f"entities=={e}")["rss_mb"].mean())
# Key scalars used inline in the prose.
hz_at_100k_0pct = _hz(100000, 0)
hz_at_100k_5pct = _hz(100000, 5)
rss_at_100k = float(
df_throughput.query("entities == 100000 and loss_pct == 0")["rss_mb"].iloc[0]
)
# Memory R² — linear regression of mean RSS vs entity count on the CM5 sweep.
_rss_by_n = df_throughput.groupby("entities")["rss_mb"].mean().sort_index()
_x = _rss_by_n.index.values.astype(float)
_y = _rss_by_n.values.astype(float)
r2_memory = float(np.corrcoef(_x, _y)[0, 1] ** 2)
# MB per 1k entities, slope of the linear fit
_slope_mb_per_entity, _intercept = np.polyfit(_x, _y, 1)
mb_per_1k = float(_slope_mb_per_entity * 1000.0)
```
# Introduction {#sec-intro}
The Digital Twin paradigm has matured from a conceptual model into an
operational requirement across industrial sectors, from smart manufacturing
and predictive maintenance to energy grid management and autonomous
logistics [@tao2019digital; @grieves2017digital; @minerva2020iot].
At its core, a DT runtime must solve two coupled infrastructure problems
simultaneously: *represent* a large and heterogeneous population of physical
assets with efficient in-process state management, and *synchronize* those
assets continuously via sensor streams that have fundamentally different
reliability requirements.
These problems are typically addressed separately. Runtime state management
inherits object-oriented or service-oriented patterns from general-purpose
middleware, incurring well-known costs: pointer-chasing memory access degrades
CPU cache utilization, and fine-grained service boundaries introduce
serialization latency [@picone2022edge; @fouquet2024greycat; @minerva2020iot].
Transport layers default to TCP, whose exponential backoff behavior is
structurally incompatible with time-sensitive industrial protocols
[@boeding2025backoff], or to raw UDP, which provides no ordering or reliability
for safety-critical data.
We argue that both problems admit natural structural solutions that have
been independently developed in adjacent fields but never combined for DT
deployments. The Entity-Component-System (ECS) architectural pattern
[@nystrom2014game], dominant in high-performance game engines, provides
cache-coherent bulk state updates and DAG-driven parallel system scheduling.
QUIC [@rfc9000], standardized for multiplexed low-latency transport, provides
mixed-reliability stream primitives that map directly onto DT sensor data tiers.
Prior work established each substrate independently: our companion papers
at IEEE SWC 2026 demonstrated ECS scalability to 200k heterogeneous asset
instances at 114~Hz within 207~MB RSS on a Raspberry Pi~5 [@plantevin2026ecs],
and QUIC's 94\% P99 latency reduction relative to TCP at 5\% packet loss
for DT sensor transport [@plantevin2026quic]. The present paper asks: do they
compose? Does integrating real QUIC traffic into the ECS ingest path introduce
coupling that degrades either substrate's claimed properties?
This paper makes three primary contributions. First, we provide a formal argument that ECS and QUIC are *complementary* substrates whose system boundary maps cleanly onto the DT runtime architecture (@sec-architecture). Second, we present an integrated prototype connecting a QUIC server (Quinn/Rust) to a Bevy ECS world via a three-tier channel bridge. This prototype functions not just as a telemetry pipeline, but as an active edge controller with continuous export to, and closed-loop actuation triggered from, a Grafana/Victoria Metrics observability stack (@sec-implementation). Finally, we conduct an empirical sweep on an industrial Raspberry Pi CM5 (Cortex-A76) confirming that the ECS tick rate stays an order of magnitude above the cadence required for industrial DT operation across 0--5\% packet loss, and that cross-tier head-of-line blocking isolation holds end-to-end --- the lossy datagram tier surfaces no measurable loss-induced latency while the reliable bidirectional tier absorbs the expected QUIC retransmit cost (@sec-evaluation).
# Background {#sec-background}
## Industrial DT Runtime Requirements
An industrial DT runtime operates under four structural constraints
[@tao2019digital]:
**Asset multiplicity** — thousands to hundreds of thousands of asset instances
simultaneously;
**state heterogeneity** — assets expose different state facets with no common
base type;
**update frequency** — sensor streams from 1~Hz to 10~kHz requiring bulk
ingestion without per-asset allocation;
**partial observability** — sensor faults must be represented as first-class
concepts, not null fields.
## ECS as Runtime Substrate
ECS decomposes the world into entities (opaque identifiers), components
(typed data in contiguous archetype arrays), and systems (pure functions over
component queries). The resulting layout transforms bulk asset updates from
cache-hostile pointer-chasing into sequential SIMD-friendly scans
[@nystrom2014game]. Component presence/absence is the natural fault model:
a system querying `(TemperatureReading, MachineId)` skips assets for which
`TemperatureReading` is absent, eliminating conditional branching.
## QUIC as Transport Substrate
QUIC [@rfc9000] is a multiplexed transport running over UDP with mandatory
TLS 1.3. Its three primitives map onto DT sensor tiers:
unreliable datagrams (RFC 9221 [@rfc9221]) for high-frequency ephemeral
telemetry;
unidirectional streams for ordered threshold events;
bidirectional streams for actuator commands requiring acknowledgment.
Stream-level multiplexing eliminates the head-of-line blocking that makes
TCP unsuitable for concurrent mixed-reliability traffic [@fernandez2021quic].
# Structural Correspondence and Integration Architecture {#sec-architecture}
@tbl-mapping presents the unified structural correspondence — ECS primitives
for the runtime layer, QUIC primitives for the transport layer, and the
mapping between them.
| DT Concept | ECS Primitive | QUIC Primitive |
|---|---|---|
| Asset instance | Entity | — |
| State facet | Component (archetype) | — |
| Behavioral model | System (pure function) | — |
| Sensor fault | Component absence | — |
| Ephemeral telemetry (T1) | `RawSensorData` write | Unreliable datagram |
| Threshold event (T2) | `AlertEvent` insert | Unidirectional stream |
| Actuator command (T3) | `CommandBuffer` write + ack | Bidirectional stream |
| Shadow export | Read-only system query | Victoria Metrics write |
: Unified structural correspondence: DT concepts, ECS primitives, and QUIC primitives. {#tbl-mapping}
The system boundary is a **three-tier channel bridge**: a Tokio async runtime
hosts the Quinn QUIC server and sensor generator tasks; Tokio bounded MPSC
channels carry all three tiers. T1 datagrams are lossy (dropped under backpressure),
while T2 events and T3 acks apply asynchronous backpressure to the QUIC streams.
Bevy's `IngestSystem` drains all three channels at the start of each tick.
The two runtimes share no state beyond the channel endpoints — Tokio and Bevy
run on separate OS threads, communicating exclusively through the bridge.
This separation is architecturally significant: QUIC head-of-line blocking
isolation and ECS system scheduling isolation are orthogonal and additive.
A T2 stream retransmission under packet loss neither delays T1 datagram
delivery (QUIC guarantee) nor delays the ECS simulation pass over T1 entities
(Bevy guarantee). @sec-evaluation tests this claim empirically.
# Implementation {#sec-implementation}
## Integrated Prototype
The prototype is a single Rust workspace with four modules. `transport.rs`
implements the Quinn server and sensor generator tasks. `world.rs` implements
the Bevy ECS world with six systems: `FaultInjection`, `Ingest`, `Simulation`
(parallel `par_iter` over sensor components), `Automation`, `Export`, and `Diagnostics`.
`metrics.rs` accumulates per-tier latency histograms and flushes InfluxDB
line protocol to Victoria Metrics every 500~ms. `main.rs` wires the Tokio
runtime and Bevy app across two OS threads.
```rust
// Tier routing in IngestSystem — channels drain into ECS components
fn ingest_system(
mut sensors: Query<(&AssetId, &mut RawSensorData)>,
entity_map: Res<EntityMap>,
bridge: ResMut<BridgeReceivers>,
mut diag: ResMut<TickDiagnostics>,
) {
let t0 = Instant::now();
// T1: bounded lossy channel — drop if full, never block
while let Ok(d) = bridge.t1.try_recv() {
if let Some(&entity) = entity_map.get(&d.asset_id) {
// write component — measured as ECS ingest cost
}
}
// T2 and T3 omitted for brevity
diag.record("IngestSystem", t0.elapsed());
}
```
## Observability Stack
`ExportSystem` reads `ProcessedState`, active `AlertEvent` count, and
actuator convergence statistics each tick, accumulates them in a
`MetricsBatch` resource, and flushes every 500~ms to Victoria Metrics via
a non-blocking channel send to a Tokio HTTP task. Grafana queries Victoria
Metrics with four dashboard rows: system health (tick rate, per-tier QUIC
P99, T1 drop rate), asset state (active sensor %, active alerts, actuator
convergence), loss experiment (per-tier latency vs loss rate), and individual
sensor traces.
Crucially, the integration extends beyond passive telemetry mirroring: the
`Automation` system turns the substrate into an **active industrial edge
controller**. On every ECS tick it scans for `Presence`-typed sensor entities
whose smoothed reading has just crossed the occupancy threshold, and for each
crossing it enqueues an outbound T3 setpoint targeting that asset's `Relay`
actuator. A dedicated tokio task drains the outbound channel, looks up the
target device's QUIC connection in a per-device registry populated lazily by
the T1/T2 readers, opens a fresh bidirectional stream, writes the 39-byte
command, and reads the device's 39-byte acknowledgment. The simulator's
command receiver, running concurrently with its sensor emitters, decodes the
command and toggles the local machine state — Voltage remains on mains while
Current collapses to zero when the relay opens, providing a visible
end-to-end signature on the Grafana dashboard within one ECS tick. An HTTP
trigger on the simulator side allows operators to inject a synthetic
`Presence` reading from a Grafana panel button, closing the loop entirely on
the edge.
# Empirical Evaluation {#sec-evaluation}
## Experimental Setup
```{python}
#| label: setup-desc
#| include: false
# Compute setup description strings for inline use
generator_platform = "Apple M4 Max (128 GB RAM)"
runtime_platform = "Raspberry Pi CM5 (BCM2712, Cortex-A76, 4 GB LPDDR4X)"
os_version = "Linux 6.12.75"
rust_version = "rustc 1.95.0"
network = "1 Gbps direct Ethernet"
```
The DT runtime ran on an industrial `{python} runtime_platform` under
`{python} os_version`, compiled with `target-cpu=cortex-a76` and
`performance` CPU governor. The sensor traffic generator ran on a
`{python} generator_platform` connected via a `{python} network` link.
Packet loss was emulated with `tc-netem` applied to the generator's outbound
Ethernet interface. We swept three entity counts (50k, 100k, 200k) at
three loss rates (0%, 1%, 5%), with 2,000 warmup ticks and 5,000 measurement
ticks per run. Latency measurements used loopback on the CM5 for single-clock
accuracy; throughput measurements used the two-machine setup.
## Results
| Entities | 0% loss | 1% loss | 5% loss |
|---:|---:|---:|---:|
| 50k | `{python} f"{_t1(50000,0):.1f}"` | `{python} f"{_t1(50000,1):.1f}"` | `{python} f"{_t1(50000,5):.1f}"` |
| 100k | `{python} f"{_t1(100000,0):.1f}"` | `{python} f"{_t1(100000,1):.1f}"` | `{python} f"{_t1(100000,5):.1f}"` |
| 200k | `{python} f"{_t1(200000,0):.1f}"` | `{python} f"{_t1(200000,1):.1f}"` | `{python} f"{_t1(200000,5):.1f}"` |
: T1 datagram P99 latency (ms) on the CM5 across entity counts and packet loss rates. Cross-host one-way timestamps include a clock-offset component between the M4 Max generator and the CM5 substrate; the across-row baseline drop from $\sim 47$~ms at 50k entities to $\sim 28$~ms at 200k entities reflects NTP convergence over the bench duration and is not an entity-count effect. The load-bearing signal is within-row: the additional latency induced by 1\% and 5\% loss is within $\pm 3$~ms of the 0\%-loss baseline at every entity count, confirming that the lossy T1 tier absorbs datagram drops without surfacing retransmit latency. {#tbl-latency}
| Entities | Hz (0% loss) | Hz (1% loss) | Hz (5% loss) | RSS (MB) |
|---:|---:|---:|---:|---:|
| 50k | `{python} f"{_hz(50000,0):,}"` | `{python} f"{_hz(50000,1):,}"` | `{python} f"{_hz(50000,5):,}"` | `{python} f"{_rss(50000):.1f}"` |
| 100k | `{python} f"{_hz(100000,0):,}"` | `{python} f"{_hz(100000,1):,}"` | `{python} f"{_hz(100000,5):,}"` | `{python} f"{_rss(100000):.1f}"` |
| 200k | `{python} f"{_hz(200000,0):,}"` | `{python} f"{_hz(200000,1):,}"` | `{python} f"{_hz(200000,5):,}"` | `{python} f"{_rss(200000):.1f}"` |
: ECS DT runtime throughput and RSS under real QUIC traffic on the CM5 (two-machine, performance governor, 50~s measurement window per cell). Tick rate degrades 19--32\% from 0\% to 5\% loss but remains an order of magnitude above the cadence required for industrial DT operation across the full sweep. RSS grows linearly with entity count (slope $\sim 0.12$~MB per 1,000 entities). {#tbl-throughput}
| Entities | 0% loss | 1% loss | 5% loss |
|---:|---:|---:|---:|
| 50k | `{python} f"{_t3(50000,0):.1f}"` | `{python} f"{_t3(50000,1):.1f}"` | `{python} f"{_t3(50000,5):.1f}"` |
| 100k | `{python} f"{_t3(100000,0):.1f}"` | `{python} f"{_t3(100000,1):.1f}"` | `{python} f"{_t3(100000,5):.1f}"` |
| 200k | `{python} f"{_t3(200000,0):.1f}"` | `{python} f"{_t3(200000,1):.1f}"` | `{python} f"{_t3(200000,5):.1f}"` |
: Substrate-initiated T3 bidirectional-stream RTT P99 (ms) under the same sweep. Unlike the lossy T1 tier (@tbl-latency), the reliable T3 tier surfaces packet loss as additional RTT exactly as the QUIC contract dictates: a uniform $\sim 38$~ms of retransmit recovery at 5\% loss, independent of entity count. Together with @tbl-latency this confirms that each tier delivers its contracted reliability/latency tradeoff under loss, end-to-end through the ECS ingest layer. {#tbl-t3-rtt}
**ECS tick rate under real network load.** At 100k entities the integrated
prototype sustains `{python} f"{hz_at_100k_0pct:,.0f}"`~Hz within
`{python} f"{rss_at_100k:.0f}"`~MB RSS under 0\% loss, and
`{python} f"{hz_at_100k_5pct:,.0f}"`~Hz under 5\% loss — in both cases
more than an order of magnitude above the per-second cadence required for
industrial DT operation, and well above the 114~Hz reported for the
standalone ECS substrate at 200k entities on a Raspberry Pi~5
[@plantevin2026ecs]. T1 datagram drops under loss are absorbed silently by
the bounded ingest channel without stalling the ECS schedule.
**Cross-tier isolation.** @tbl-latency shows that T1 datagram delivery is
not measurably delayed by packet loss at any tested entity count: the
per-row difference between 0\% and 5\% loss falls within $\pm 3$~ms of
the cross-host clock-offset baseline, indistinguishable from clock-drift
noise. @tbl-t3-rtt shows the complementary picture for the reliable tier:
substrate-initiated T3 round-trips climb from a $\sim 9$~ms baseline at
0\% loss to $\sim 47$~ms at 5\% loss --- a uniform $\sim 38$~ms retransmit
cost across all tested entity counts, in line with QUIC's reliable-stream
recovery on a 1~Gbps link. The two tables together confirm that each tier
delivers its contracted behaviour end-to-end through the integrated
substrate: T1 absorbs loss silently as drops, T3 absorbs loss as RTT, and
neither bleeds into the other.
**Memory scaling.** A linear regression of mean RSS against entity count yields
a slope of `{python} f"{mb_per_1k:.2f}"`~MB per 1,000 entities
(R^2^ = `{python} f"{r2_memory:.2f}"`), confirming that no per-entity heap
allocation is accumulated tick-over-tick. The slope is well below the
1.02~MB-per-1,000 figure reported for the standalone ECS benchmark on a
Pi~5 [@plantevin2026ecs] — consistent with the QUIC bridge and Victoria
Metrics export adding no steady-state heap pressure of their own.
## Discussion
Two operational conclusions follow. First, ECS and QUIC are genuinely
complementary: their system boundary (the three-tier channel bridge) is
clean and the two runtimes' scheduling and isolation guarantees compose
without measurable cross-tier interference, as @tbl-latency and
@tbl-t3-rtt jointly demonstrate. Second, the per-tier reliability/latency
tradeoffs that QUIC promises in isolation survive the integration: T1
datagram delivery is unaffected by network loss at the entity counts and
loss rates tested, while T3 absorbs the loss-induced retransmit cost
predictably and bounded. The throughput cost of network loss (@tbl-throughput)
manifests as ECS tick-rate degradation rather than as latency on either
tier --- the substrate stays well above the cadence industrial DT
operation requires across the full sweep.
# Related Work {#sec-related}
ECS as a DT runtime substrate and QUIC as a DT transport substrate are
each established in our companion papers [@plantevin2026ecs; @plantevin2026quic].
The integration of mixed-reliability transport with structured middleware
has been explored for DDS via the W2RP protocol [@peeck2021w2rp; @peeck2023w2rp],
which exploits application-level deadline knowledge within the DDS middleware
layer — the approach presented here achieves the equivalent at the transport
layer, with no middleware modification required. Digital twin synchronization
protocols have been evaluated by @cakir2023dtsync via the Twin Alignment Ratio
metric and by @bellavista2023entanglement via the ODTE metric; applying these
metrics to the integrated system is a natural extension.
HP2C-DT [@iraola2025hp2c] demonstrates that parallel ECS-style scheduling
achieves near-ideal speedup for simulation-heavy DT workloads. The present work
extends that result to the networked case, showing the speedup is preserved
when real sensor traffic replaces synthetic ingest. Groshev et al.
[@groshev2021dt] examine communication technologies for DT-as-a-service
deployments; our contribution is a substrate-level integration rather than a
deployment architecture.
# Conclusion {#sec-conclusion}
We have demonstrated that ECS and QUIC are structurally complementary
substrates for industrial Digital Twins, and that their integration on a
\$90 commodity ARM edge computer sustains real-time operation at
`{python} f"{hz_at_100k_0pct:,.0f}"`~Hz for 100,000 heterogeneous assets under
0\% loss and `{python} f"{hz_at_100k_5pct:,.0f}"`~Hz under 5\% loss.
Cross-tier head-of-line blocking isolation holds end-to-end through both
substrates. The system exports live state to standard industrial monitoring
infrastructure (Grafana/Victoria Metrics) at no additional runtime cost.
Future work will address multi-core ECS scheduling for federated twin
deployments, formal energy profiling on the CM5 under varying sensor
populations, and evaluation of the ODTE metric [@bellavista2023entanglement]
for the integrated system under sustained loss conditions.
<!-- References generated automatically by natbib + splncs04 -->