217 lines
24 KiB
Markdown
217 lines
24 KiB
Markdown
# quic_ecs_dt — Project Guide for Claude
|
||
|
||
## What & why
|
||
|
||
Source repo for **"QUIC and ECS as Complementary Transport and Runtime Substrates for Industrial Digital Twins: An Integrated Empirical Study"** — submitted to **UCAmI 2026** (Track 2: *Internet of EveryThing (IoT, People & Processes) and Sensors*; primary topic *IoE interoperability, integration and performance*, secondary topic *IoE experimental results and deployment scenarios*). Single-author (Plantevin, UQAC). Third paper in a sequence; the first two are at IEEE SWC 2026:
|
||
|
||
- `plantevin2026ecs` — ECS as runtime substrate for industrial DT (200k assets @ 114 Hz on Pi 5).
|
||
- `plantevin2026quic` — QUIC partial reliability for DT sensor streams (94% P99 reduction vs TCP at 5% loss).
|
||
|
||
**UCAmI hypothesis (the composition question):** prior work shows ECS and QUIC each work as substrates *independently*. Does integrating real QUIC traffic into a Bevy ECS ingest path introduce coupling that degrades either one's claimed properties? The paper argues no, and measures it on a real CM5 ↔ M4 Max two-machine deployment.
|
||
|
||
## Architecture
|
||
|
||
Three-tier QUIC ↔ ECS bridge, headless Bevy runtime. **T1/T2 are inbound (device → substrate); T3 is outbound (substrate → device, actuator commands):**
|
||
|
||
| Tier | QUIC primitive | Direction | Use case | Channel cap | Sender |
|
||
|------|----------------|-----------|----------|-------------|--------|
|
||
| T1 | Unreliable datagrams (RFC 9221) | device → substrate | High-freq ephemeral telemetry; drops OK | 1024 | `T1Sender::send_lossy` (try_send, drop on full) |
|
||
| T2 | Unidirectional streams | device → substrate | Ordered threshold events; reliable | 512 | `T2Sender::send` (await, backpressure) |
|
||
| T3 | Bidirectional streams | **substrate → device** | Actuator commands w/ ACK | 256 | `T3OutboundSender::try_send` of `OutboundT3 { target_device, sensor_id, raw_value, sensor_type }` |
|
||
|
||
QUIC server runs on a dedicated OS thread with a Tokio multi-thread runtime. T1/T2 decoded `QuicMessage`s (39 B fixed LE: 16 UUID + 2 sensor_id + 8 f64 + 8 ts + 4 seq + 1 sensor_type) flow into per-tier `tokio::sync::mpsc` channels and are drained by Bevy's `ingest_system` in `PreUpdate`, gated by `run_if(in_state(ServerState::Started))`. T3 flows the other way: `automation_system` constructs `OutboundT3` items and the tokio-side `drain_outbound_t3` task opens bi-streams to the target device. The per-tier sender newtypes (in [substrate/src/transport/mod.rs](substrate/src/transport/mod.rs)) make tier mixups a type error. Pattern in [substrate/src/transport/ecs.rs](substrate/src/transport/ecs.rs).
|
||
|
||
**T3 actuator-command protocol.** The substrate's `automation_system` decides to actuate (e.g. Presence < 1.0 ⇒ Relay = stop) and pushes an `OutboundT3` onto the outbound channel. The tokio `drain_outbound_t3` pops it, looks up the target device's `quinn::Connection` in a `ConnectionRegistry` (populated by `read_datagrams` / `read_one_uni_stream` on first sight of each device UUID), then **spawns one task per command** to do `conn.open_bi() → write 39 B → finish → read 39 B ack`. Per-task spawning means a single stuck `read_exact` can't stall the pipeline. Latency from `open_bi()` to ack-receipt is recorded as `substrate_latency_us{tier="t3"}` and a successful ack increments `substrate_received_total{tier="t3"}`. Misses (`substrate_t3_outbound_no_route_total`), drops (`substrate_t3_outbound_dropped_total`), and bi-stream errors (`substrate_t3_outbound_errors_total`) each have their own counter.
|
||
|
||
**Connection registry.** `Arc<std::sync::RwLock<HashMap<Uuid, quinn::Connection>>>`. `quinn::Connection` is internally `Arc`; one simulator process commonly hosts 7 device UUIDs sharing one connection. Registry insert is idempotent (`ensure_registered`). On `conn.closed().await` returning, `handle_incoming` purges every key whose `Connection::stable_id()` matches the closed connection.
|
||
|
||
**Target hardware:** CM5 (BCM2712, Cortex-A76, 4 GB) as DT runtime; M4 Max as traffic generator; 1 Gbps direct Ethernet. Both rigs are in hand; benchmark sweeps live on the CM5.
|
||
|
||
## Repo map
|
||
|
||
```
|
||
quic_ecs_dt/
|
||
├── paper/ Quarto + LNCS source — single index.qmd, refs in references.bib
|
||
├── substrate/ Rust crate: Bevy 0.18 + Quinn 0.11 + rustls 0.23 + Tokio
|
||
│ └── src/
|
||
│ ├── main.rs App::new, MinimalPlugins, EcsQuicTransportPlugin, ObservabilityPlugin
|
||
│ ├── lib.rs re-exports
|
||
│ ├── config.rs figment chain: defaults → config.toml → APP_* env (split on "__")
|
||
│ ├── observability.rs metrics-exporter-prometheus on :9100
|
||
│ ├── transport/
|
||
│ │ ├── mod.rs QuicMessage codec + tier sender newtypes + OutboundT3
|
||
│ │ ├── ecs.rs EcsQuicTransportPlugin: tokio thread + bridge + registry + drain spawn
|
||
│ │ ├── server.rs bind_endpoint + accept_loop + read_datagrams + read_uni_streams
|
||
│ │ │ + drain_outbound_t3 + synthetic_t3_driver + ConnectionRegistry
|
||
│ │ └── state.rs ServerState{Starting, Started}
|
||
│ └── world/
|
||
│ ├── mod.rs WorldPlugin (5 systems wired into Pre/Update/Post)
|
||
│ ├── components.rs Asset, DeviceId, SensorId, SensorTypeTag, RawSensorData, SmoothedValue, threshold_for
|
||
│ ├── resources.rs SensorRegistry, DiagnosticsState, ExportSampleState
|
||
│ ├── systems.rs ingest, simulation, automation, export, diagnostics
|
||
│ └── tests.rs 8 unit tests inc. automation_dispatches_relay_stop
|
||
├── simulator/ Rust crate: Quinn client + sensor generators + T3 receiver
|
||
│ ├── src/
|
||
│ │ ├── main.rs CLI driver + HTTP-trigger task + T1 inline loop
|
||
│ │ ├── lib.rs module exports
|
||
│ │ ├── client.rs SimulatorClient (connect, send_datagram, send_uni_stream, request, close)
|
||
│ │ ├── commands.rs run_command_receiver (substrate → device T3 accept-bi loop)
|
||
│ │ ├── emitters.rs run_t2_emitter (T1 lives inline in main.rs)
|
||
│ │ └── profile.rs SensorProfile (single | industrial), generate_value
|
||
│ └── tests/ T1, T2, end-to-end full-loop integration tests
|
||
├── data/
|
||
│ ├── two_machine/ CM5 ↔ M4 Max sweep — final_table.csv (load-bearing for the paper)
|
||
│ └── local/ loopback sweeps (scaling.csv, cross_tier.csv)
|
||
├── scripts/
|
||
│ ├── bench-loss.sh M6 sweep entities×loss → data/two_machine/final_table.csv
|
||
│ ├── bench-scaling.sh T1 rate sweep + optional synthetic-T3 cross-tier mode
|
||
│ ├── bench-client.sh M8 client driver (run from Mac when substrate is on CM5)
|
||
│ ├── demo.sh full-stack demo: certs + build + VM/Grafana + sub + sim
|
||
│ ├── setup-cm5.sh CM5 provisioning (apt + cargo install)
|
||
│ └── verify-netem.sh confirm tc-netem is shaping in the right direction (BIDI=1 for ifb mode)
|
||
├── monitoring/ docker-compose: VictoriaMetrics + Grafana auto-provisioned
|
||
├── dashboards/ runtime.json + sensors.json
|
||
├── certs/ gitignored, regenerated by `make certs`
|
||
├── Cargo.toml workspace
|
||
└── Makefile render, preview, build, build-cm5, deploy-cm5, monitoring-up
|
||
```
|
||
|
||
## Status
|
||
|
||
**Code (substrate + simulator):**
|
||
|
||
| Area | State |
|
||
|------|-------|
|
||
| `AppConfig` figment loader (defaults → TOML → env with `__` split) | Done — [substrate/src/config.rs](substrate/src/config.rs). Env override actually works (`Env::prefixed("APP_").split("__")`); discovered late that the previous chain silently ignored env vars |
|
||
| 39 B wire codec | Done — [substrate/src/transport/mod.rs](substrate/src/transport/mod.rs), 5 unit tests |
|
||
| Quinn server lifecycle + TLS | Done — `bind_endpoint` + `accept_loop` in [substrate/src/transport/server.rs](substrate/src/transport/server.rs); `ServerState{Starting, Started}` in [state.rs](substrate/src/transport/state.rs); explicit `TransportConfig` w/ 256 KiB datagram recv buffer; dev cert via `make certs`, rustls `aws-lc-rs` provider installed in [main.rs](substrate/src/main.rs) |
|
||
| T1 demux (datagrams → ECS) | Done. `read_datagrams` reader; decode errors non-fatal; channel-full drops silent; per-stream counters in debug summary. Calls `ensure_registered` on first decode so outbound T3 can route to this device |
|
||
| T2 demux (uni streams → ECS) | Done. `read_uni_streams` accepts streams, spawns one task per stream that reads 39 B chunks until EOF; decode failure resets the stream via `recv.stop(0)`; `t2.send().await` honours backpressure; first decode also calls `ensure_registered` |
|
||
| T3 outbound (ECS → device) | Done. `drain_outbound_t3` task pops `OutboundT3` items, looks up the target device's `Connection` in `ConnectionRegistry`, **spawns one task per command** to do `open_bi → write 39 B → finish → read ack`. Per-task spawning prevents a single stuck `read_exact` from stalling the pipeline. Records `substrate_latency_us{tier="t3"}` on success; counts no-route / dropped / errors separately. The old simulator-initiated T3 inbound path (`T3Sender` / `T3Inbound` / `accept_bi_streams`) is **gone** |
|
||
| Connection registry (Uuid → Connection) | Done — `Arc<RwLock<HashMap<Uuid, quinn::Connection>>>`; idempotent insert via `ensure_registered`; purged in `handle_incoming` after `conn.closed().await` using `Connection::stable_id()` |
|
||
| Synthetic T3 driver (bench only) | Done. `synthetic_t3_driver` task in [server.rs](substrate/src/transport/server.rs) spawned by `accept_loop` when `APP_NETWORK__SYNTHETIC_T3_RATE_HZ > 0`. Round-robins over registered devices, toggles `raw_value` between 0/1, pushes through the same outbound channel `automation_system` uses |
|
||
| ECS components + 5 systems | Done — [world/](substrate/src/world/). Entities = `(Asset, DeviceId, SensorId, SensorTypeTag, RawSensorData, SmoothedValue)` per (device, sensor). 5 systems: `ingest` (PreUpdate, drains T1+T2), `simulation` (Update, rolling mean + threshold-crossings counter), `automation` (Update, Presence-cross → `t3_out.try_send(OutboundT3{Relay setpoint})` + local mirror), `export` (PostUpdate, per-second metric sample), `diagnostics` (PostUpdate, per-second `tick_hz` log) |
|
||
| Schedule rate-gating | Done — `MinimalPlugins.set(ScheduleRunnerPlugin::run_loop(1/tick_rate_hz))` in [main.rs](substrate/src/main.rs) |
|
||
| Prometheus exporter + Grafana | Done. `metrics-exporter-prometheus` on :9100 via `ObservabilityPlugin`. Runtime metrics: `substrate_received_total{tier}`, `substrate_dropped_total{tier=t1}`, `substrate_decode_errors_total{tier}`, `substrate_t3_outbound_*_total`, `substrate_latency_us{tier}` histograms, `substrate_tick_hz`, `substrate_entities`, `substrate_channel_depth{tier}`, `substrate_rss_bytes`. Sensor data: `sensor_aggregate{type, stat=count\|mean\|min\|max}`. Dashboards: [dashboards/runtime.json](dashboards/runtime.json) + [dashboards/sensors.json](dashboards/sensors.json) |
|
||
| Simulator binary | Done — [simulator/src/main.rs](simulator/src/main.rs). Clap flags: `--addr`, `--server-name`, `--cert`, `--profile {single, industrial}`, `--sensor-type`, `--sensor-id`, `--rate-hz`, `--t2-rate-hz`, `--count`, `--devices`. `industrial` profile fans out to **7 sensors per device** on ids 0..6 (Temperature/Humidity/Pressure/Voltage/Current/Presence/Relay). HTTP trigger on `:9002` (`POST /trigger`) pushes Presence=0 over T2 — operator-facing demo entry point. T1/T2 emitters check `engine_running` per tick; when `false`, Current waveform drops to ~0 while Voltage stays at ~230 V |
|
||
| Simulator command receiver | Done — [simulator/src/commands.rs](simulator/src/commands.rs). `run_command_receiver` loops on `conn.accept_bi()`, decodes 39 B, flips `engine_running` on `sensor_type == Relay` setpoints, writes 39 B ack. Spawned by `main.rs` post-connect. `new_engine_state()` constructor exported for integration tests |
|
||
| End-to-end test harness | **18 tests, all green.** 5 codec unit tests; 8 world unit tests (incl. `automation_dispatches_relay_stop_when_presence_drops`); 2 T1 + 2 T2 integration tests; 1 **full closed-loop** test (`simulator/tests/end_to_end_full_loop.rs`: Presence < 1.0 → substrate T3 → `engine_running` flips to false; then Presence > 1.0 → flips back) |
|
||
| Benchmark scripts | Done. [bench-loss.sh](scripts/bench-loss.sh) — entity × loss sweep, **bidirectional `tc-netem` via `ifb` on the CM5** (BIDI=1 default). [bench-scaling.sh](scripts/bench-scaling.sh) — T1 rate sweep + optional substrate-side `APP_NETWORK__SYNTHETIC_T3_RATE_HZ`. [verify-netem.sh](scripts/verify-netem.sh) — sanity-check netem on the right interface in the right direction (BIDI=1 mode covers ingress via ifb) |
|
||
| CM5 deploy | Done — `make build-cm5 && make deploy-cm5`; [setup-cm5.sh](scripts/setup-cm5.sh) provisions deps. Bench has been run end-to-end on CM5; data lives in [data/two_machine/final_table.csv](data/two_machine/final_table.csv) |
|
||
|
||
**Paper:**
|
||
|
||
| Area | State |
|
||
|------|-------|
|
||
| Track + topics chosen | Done — UCAmI Track 2 (IoE and Sensors); primary *IoE interoperability, integration and performance*; secondary *IoE experimental results and deployment scenarios* |
|
||
| Abstract | Done. Honest framing: "tick rate remains an order of magnitude above the cadence required" (not "stable"), mixed-reliability isolation as the T1-vs-T3 story, 0.12 MB/1k slope |
|
||
| Tables 2/3/4 from real CM5 data | Done. Native markdown tables driven by inline `{python}` values reading from `data/two_machine/final_table.csv`; cross-refs (`@tbl-latency`, `@tbl-throughput`, `@tbl-t3-rtt`) resolve in the LNCS LaTeX output. Earlier `display(Markdown(...))` approach didn't register with Quarto's cross-ref filter; switched to native md tables with inline-python cells |
|
||
| `fig-isolation` | **Dropped.** Cross-tier story now told by `tbl-latency` + `tbl-t3-rtt` (T1 flat under loss, T3 absorbs ~38 ms retransmit). Cleaner than the loopback fig. `data/local/cross_tier.csv` is still on disk but the paper no longer reads it |
|
||
| Architecture §3 + Table 1 | Updated for substrate-initiated T3. Table 1 T3 row reads "OutboundT3 enqueue + ack \| Bidirectional stream (server-initiated)"; the connection-registry / per-device routing is described in the prose |
|
||
| Implementation §4 Automation paragraph | Updated for the new outbound T3 path; describes the per-device registry, the per-command bi-stream, and the simulator-side `run_command_receiver` engine-state flip |
|
||
| Discussion + Conclusion | Honest now: drops the unbacked "<5% IngestSystem drain" and "Grafana adds no overhead" claims; conclusion populates both 0%-loss and 5%-loss Hz from data |
|
||
| Render | Clean against LNCS LaTeX template (`make render` → 10-page PDF, no Quarto warnings) |
|
||
|
||
## Roadmap
|
||
|
||
Treat the milestone log as historical. The paper-side work below tracks what's *left* before camera-ready.
|
||
|
||
- **M1 — Wire codec & root config.** ✅ 2026-05-04.
|
||
- **M2 — Quinn server + TLS.** ✅ 2026-05-06.
|
||
- **M3 — Simulator client.** ✅ Done. `SimulatorClient` + CLI driver + waveform profiles + HTTP trigger + closed-loop command receiver.
|
||
- **M4 — ECS world.** ✅ Done. 5 systems wired; automation closes the T3 loop.
|
||
- **M5 — Observability.** ✅ Done. Both dashboards live; metrics exposed via prometheus scrape.
|
||
- **M6 — Benchmark harness.** ✅ Done. `bench-loss.sh` + `bench-scaling.sh` + `verify-netem.sh` (last one added when egress-only netem was masking the inbound T1 loss path; now `ifb` ingress shaping is default).
|
||
- **M7 — CM5 cross-compile & deploy.** ✅ Done. Multiple sweeps shipped from CM5.
|
||
- **M8 — Two-machine run + paper render.** ✅ Done. Paper renders against [data/two_machine/final_table.csv](data/two_machine/final_table.csv); all inline scalars and tables populate from real numbers.
|
||
- **M9 — T3 inversion (substrate-initiated actuator commands).** ✅ 2026-05-13. The paper's Table 1 said T3 was "actuator commands" but the code had it inverted (device → substrate RPC). Refactored to match the paper: substrate opens bi-streams, simulator's `run_command_receiver` accepts. Full closed-loop integration test landed.
|
||
- **M10 — Abstract submission polish.** ⏳ In progress. Top-of-paper fixes shipped (abstract framing, contributions paragraph, Table 1 T3 row, Architecture §3 backpressure paragraph, author affiliation, `(author?)` cite markers). Remaining polish is full-paper-only (Implementation §4 module-list lies, code listing with fake types, Observability §4.2 push-vs-pull mismatch, Experimental Setup §5.1 stale tc-netem / tick counts / loopback-vs-two-machine sentence). None block abstract submission.
|
||
|
||
**Open polish items** (not blocking abstract submission):
|
||
|
||
- §4.1 *Integrated Prototype* still lists six systems including a non-existent `FaultInjection`; module list says `transport.rs` / `world.rs` / `metrics.rs` / `main.rs` but the actual layout is `transport/`, `world/`, `observability.rs`, `config.rs`, `main.rs`, `lib.rs` plus a separate `simulator` crate.
|
||
- §4.1 code listing uses fictional types (`AssetId`, `EntityMap`, `TickDiagnostics`). Easier to drop the listing than to rewrite faithfully.
|
||
- §4.2 *Observability Stack* describes a push model with InfluxDB line protocol; actual code uses `metrics-exporter-prometheus` exposing `/metrics` for VM scrape.
|
||
- §5.1 *Experimental Setup* needs three updates: tc-netem direction (now bidirectional via `ifb`), "2,000 warmup ticks and 5,000 measurement ticks" → "20 s warmup + 50 s window (wall-clock)", and drop the "loopback for latency / two-machine for throughput" sentence (all numbers are from the two-machine sweep now).
|
||
|
||
## Conventions
|
||
|
||
- **Rust:** edition 2024; workspace at root with `simulator` + `substrate`.
|
||
- **Pinned crates:** Bevy 0.18, Quinn 0.11, rustls 0.23, Tokio 1 (full), figment 0.10 (toml + env), uuid 1.23 (v4), serde 1.
|
||
- **Config:** `figment` chain — defaults → `config.toml` → env `APP_*` with `__` nesting (e.g. `APP_NETWORK__SERVER_PORT=9000`, `APP_NETWORK__SYNTHETIC_T3_RATE_HZ=100`).
|
||
- **Bevy:** headless — `MinimalPlugins` only; do not pull rendering plugins.
|
||
- **Tokio↔Bevy:** keep the dedicated-thread + mpsc pattern in [transport/ecs.rs](substrate/src/transport/ecs.rs); do not block the ECS schedule on async work.
|
||
- **Paper:** Quarto + LNCS template ([paper/_extensions/template.tex](paper/_extensions/template.tex), [paper/_quarto.yml](paper/_quarto.yml)). **Never commit `llncs.cls` or `splncs04.bst`** — CTAN licensing; download per [README.md](README.md). For tables in LaTeX target, use native markdown tables with `: Caption {#tbl-foo}` syntax and inline `{python}` cells, **not** `display(Markdown(...))` chunks — Quarto's cross-ref filter doesn't pick the latter up in LaTeX output.
|
||
- **Data:** raw CSVs under `data/` are committed; `*_processed.csv` is gitignored. Paper figures consume `data/two_machine/final_table.csv` exclusively (the previous `data/loopback/` was renamed to `data/two_machine/` once it became the real CM5 sweep).
|
||
- **Errors:** `anyhow` (with `.context()`) for internal startup paths; `thiserror` for boundary types we want to match against (e.g. `WireError` in the codec).
|
||
- **Warnings:** let real warnings show. No `#[allow(dead_code)]`, `_var` blanket suppression, or `PhantomData` shims to silence the compiler — warnings are honest TODO markers and disappear when the consuming code lands.
|
||
|
||
## Known deferrals
|
||
|
||
- **Channel ownership is per-host, not per-connection.** All connections share the same inbound mpsc channels and the outbound T3 channel. Fairness under N-device load relies on tokio scheduling. Acceptable for "one ECS world per host".
|
||
- **No graceful shutdown.** The `quic-runtime` thread parks on `pending()`; spawned tasks orphan at process exit. Fine for research runs.
|
||
- **Bind failure is fatal.** `OnEnter(Starting)` panics if `bind_endpoint` fails.
|
||
- **T3 outbound concurrency is unbounded.** `drain_outbound_t3` spawns one task per command. Under sustained T1 ingest beyond ~10k msg/s the per-command tasks queue behind the tokio scheduler and T3 P99 climbs into the hundreds of ms (throughput still holds). If we ever need strict T3 latency isolation under heavy T1 load, add a `tokio::Semaphore` cap or a dedicated runtime/thread for T3.
|
||
- **NTP drift over a long bench shifts the across-row T1 P99 baseline.** Visible in `tbl-latency` (47 ms at 50k → 28 ms at 200k). The within-row Δ is what speaks to isolation; the across-row absolutes don't. Paper caption explains this.
|
||
- **Schedule rate-gating is approximate.** Observed `tick_hz` runs ~85% of target on macOS dev; tighter on the CM5.
|
||
|
||
## Run / verify
|
||
|
||
```bash
|
||
make certs # dev TLS (ECDSA P-256, SAN: localhost/cm5.local/127.0.0.1/::1)
|
||
make build # cargo build --release native
|
||
make build-cm5 # aarch64 cross-build
|
||
make deploy-cm5 # scp to $CM5_HOST
|
||
make render # paper PDF
|
||
make preview # live-reload paper at :4848
|
||
make monitoring-up # docker-compose VM + Grafana
|
||
```
|
||
|
||
**Tests.** `cargo test --workspace` runs codec unit tests + world unit tests + 5 integration tests (T1, T2, full closed-loop) in [simulator/tests/](simulator/tests/). Each integration test calls `bind_endpoint` + `accept_loop` in-process on `127.0.0.1:0`. The full-loop test stands up the real outbound machinery (`accept_loop` + `drain_outbound_t3`) and asserts the engine-state flag flips in both directions.
|
||
|
||
**Metrics scrape.** With `metrics_enabled = true` (default):
|
||
|
||
```bash
|
||
curl http://127.0.0.1:9100/metrics
|
||
```
|
||
|
||
`make monitoring-up` brings up VictoriaMetrics + Grafana auto-provisioned at <http://localhost:3000> (admin / admin); the dashboards mount live from [dashboards/](dashboards/) so JSON edits re-import within ~10 s.
|
||
|
||
**Full-stack demo.** [scripts/demo.sh](scripts/demo.sh) brings up certs + cargo build + monitoring stack + substrate + simulator and tails the simulator's progress log. Industrial profile by default; Presence dips below threshold every few seconds, triggering substrate-initiated T3 Relay setpoints, visible on the operator dashboard as Current collapsing to ~0 A while Voltage holds.
|
||
|
||
```bash
|
||
./scripts/demo.sh # defaults
|
||
PROFILE=single RATE_HZ=100 DEVICES=20 ./scripts/demo.sh
|
||
KEEP_MONITORING=1 ./scripts/demo.sh # leave VM + Grafana running on exit
|
||
```
|
||
|
||
**Manual two-process run.** From the repo root:
|
||
|
||
```bash
|
||
# shell 1 — server
|
||
cargo run -p substrate
|
||
|
||
# shell 2 — client
|
||
cargo run -p simulator -- --profile industrial --rate-hz 100 --count 0 --devices 4
|
||
```
|
||
|
||
Simulator flags (see `cargo run -p simulator -- --help`): `--addr`, `--server-name`, `--cert`, `--profile {single, industrial}`, `--sensor-type`, `--sensor-id`, `--rate-hz` (T1 datagram rate; `0` disables T1), `--t2-rate-hz` (T2 event rate; `0` disables T2), `--count` (T1 count; `0` = until Ctrl-C), `--devices`. **No simulator-side T3 flag** — T3 is substrate-initiated. Per-second `progress` lines show `t1_sent`/`t2_sent`/`engine={running,stopped}`.
|
||
|
||
**Bidirectional netem on the CM5.** [scripts/bench-loss.sh](scripts/bench-loss.sh) applies `tc netem loss N%` bidirectionally via an `ifb` ingress-redirect (`BIDI=1` default). [scripts/verify-netem.sh](scripts/verify-netem.sh) confirms it lands on the right interface:
|
||
|
||
```bash
|
||
./scripts/verify-netem.sh <peer-ip> end0 5 # egress only
|
||
BIDI=1 ./scripts/verify-netem.sh <peer-ip> end0 5 # both directions via ifb
|
||
```
|
||
|
||
## Key references
|
||
|
||
- Prior self-citations: `plantevin2026ecs`, `plantevin2026quic` (both IEEE SWC 2026, "to appear").
|
||
- QUIC: RFC 9000 (core), RFC 9221 (unreliable datagrams).
|
||
- DT foundations: Tao et al. 2019; Grieves & Vickers 2017; Minerva et al. 2020.
|
||
- ECS: Nystrom 2014, *Game Programming Patterns*.
|
||
- Mixed-reliability transport: Peeck et al. (W2RP for DDS).
|
||
- DT sync metrics: Çakır et al. 2023 (Twin Alignment Ratio); Bellavista et al. 2023 (ODTE).
|
||
- Industrial QUIC/IIoT: Fernández et al. 2021; Boeding et al. 2025.
|
||
- Full bibliography: [paper/references.bib](paper/references.bib).
|