An analytical survey of the physical, economic, and architectural forces relocating compute from the data center to the industrial gateway, and an argument for why the gateway represents the current equilibrium point of the edge computing wave.
Preamble
The history of computing has been, in one reading, a sequence of oscillations between centralization and distribution. The mainframe gave way to the minicomputer, which gave way to the personal computer, which gave way to client-server, which gave way to the web, which gave way to the cloud. Each cycle has been narrated by its participants as the final synthesis. None has been.
The current oscillation — labeled, variously, edge computing, fog computing, cloudlet computing, and multi-access edge computing — represents another phase shift. Computation is migrating from hyperscaler data centers toward the points where data is generated. The popular narrative treats this as a unified phenomenon. It is not. Edge has become an umbrella term that conceals at least four distinct architectural patterns, each driven by different forces and constrained by different realities.
This article does three things. First, it disambiguates the term edge computing by proposing a layered taxonomy. Second, it surveys the physical, economic, and regulatory forces driving compute toward the network's periphery. Third, and most centrally, it argues that the gateway layer — the on-premises industrial computing tier between device and cloud — represents the current equilibrium point of this migration. The gateway is not the final destination of the edge wave, but it is where the present forces converge: hardware capability, security topology, administrative boundary, and economic incentive align here in a way that they do not align at the device edge or the network edge.
The argument is intended to be useful both to system architects designing industrial IoT deployments and to researchers studying the political economy of distributed computing.
A Taxonomy of "Edge"
Conflation is the principal source of confusion in edge computing discourse. When a hyperscaler markets edge, it typically means a regional point of presence ~50-100 km from end users. When a telco operator markets edge, it typically means compute co-located with radio access network infrastructure. When an industrial automation vendor markets edge, it typically means a gateway or PLC on the plant floor. When a microcontroller vendor markets edge, it typically means inference running directly on the sensing device. These are not the same thing, and architectural recommendations drawn from one context routinely fail when transferred to another.
A more rigorous treatment distinguishes at least four tiers:
Device edge. Computation co-resident with the sensor or actuator. Hardware ranges from 32-bit ARM Cortex-M microcontrollers (operating under 1 W) to small ARM Cortex-A application processors. Memory is typically measured in kilobytes to low megabytes. Software stacks are bare metal, RTOS-based (FreeRTOS, Zephyr), or stripped-down Linux. The workloads suited to this tier are dominated by signal conditioning, simple thresholding, lightweight inference (TinyML), and protocol translation.
Gateway / on-premises edge. Computation at an aggregation point co-located with the equipment it serves but not embedded within individual devices. Hardware is typically gateway-class: ARM Cortex-A53/A72/A76 or modest x86, with 512 MB to 16 GB of RAM, operating in the 5-25 W envelope. The defining characteristic is sufficient capability to run a full Linux distribution with containers, multiple concurrent workloads, and modest ML inference, while remaining cheap enough to deploy in quantity. This is the tier the present article principally concerns.
Network edge / multi-access edge computing (MEC). Computation operated by a network provider, typically a telecommunications carrier, hosted at or near radio access network infrastructure or aggregation sites. Standardized under ETSI GS MEC 003 [1] and related specifications. Hardware is server-class. Round-trip latency to end devices is on the order of 1-10 ms in 5G URLLC scenarios. Administrative ownership lies with the carrier, not the application owner — a fact whose significance is frequently underestimated.
Regional edge. Computation operated by hyperscalers or content delivery networks at metro-area facilities. Examples include AWS Local Zones, Azure Edge Zones, Google Distributed Cloud Edge, Cloudflare Workers, and the broader CDN compute layer (Fastly Compute@Edge, Akamai EdgeWorkers). Round-trip latency to end devices is typically 10-40 ms within-region. Hardware is data-center-class.
These tiers differ along at least seven independent dimensions: round-trip latency, energy envelope, computational capacity, administrative ownership, security perimeter, deployment density, and economic cost per unit of compute. A workload's optimal placement is determined by which dimensions it is most sensitive to, and conflating tiers obscures these trade-offs.
The remainder of this article concerns the second tier — the gateway — and the reasons it has emerged as the dominant locus of present-day edge deployment in industrial contexts.
Theoretical Drivers of the Edge Migration
Why is computation moving outward at all? Five forces, of varying age and analytical precision, account for the migration.
2.1 The Latency Floor
The most-cited driver is also the most physically inescapable: the speed of light. In fiber, electromagnetic signals propagate at approximately 200,000 km/s — roughly two-thirds of the vacuum speed of light. A round trip between Frankfurt and Northern Virginia, the geographic centers of European and North American cloud capacity, traverses approximately 6,200 km of fiber path, yielding a theoretical minimum round-trip time of 62 ms before any switching, queuing, or protocol overhead. Observed RTTs in practice are 80-100 ms.
This floor is not a function of bandwidth, protocol design, or budget. It is a function of physics. For control loops that must close within 10 ms — a routine requirement in motion control, robotics, and certain process control applications — the entire feedback path must reside within approximately 1,000 km of fiber. In practice, this means on-premises or, in the most aggressive scenarios, metro-area edge.
A subtler point: the latency that matters is not the median (p50) but the tail (p99 or p99.9). Internet paths exhibit substantial latency variance from queuing, BGP convergence events, and microbursts. A path with 80 ms median RTT may exhibit 200-500 ms tail latency. Control systems must be designed against the tail, not the median, which further compresses the effective latency budget available for cloud-mediated control loops.
2.2 Data Gravity and Bandwidth Economics
Jim Gray's "Distributed Computing Economics" [2] articulated, in 2003, what has since been re-discovered repeatedly: it is generally cheaper to ship computation to data than data to computation, once data volumes exceed a modest threshold. The principle has been re-popularized as data gravity [3].
Industrial sensing makes the arithmetic concrete. A vibration sensor on a rotating asset, sampling at 25.6 kHz across three axes with 24-bit resolution, produces approximately 220 KB of raw data per second per asset, or 19 GB per day. A medium-sized plant with 500 such sensors generates 9.5 TB per day of raw vibration data alone. Streaming this to a hyperscaler at standard egress rates is economically infeasible. Even if it were affordable, the cellular or fiber uplink would be saturated.
The economic argument is straightforward: most raw industrial data has no analytical value in its raw form. Spectral features, envelope statistics, and anomaly scores carry the diagnostic signal. Compute that extracts these features at the gateway reduces uplink volume by two to three orders of magnitude while preserving — and often improving — analytical fidelity.
2.3 Partition Tolerance as an Operational Requirement
The CAP theorem [4][5] formalizes a constraint that industrial systems have always faced: in the presence of network partitions, distributed systems must choose between consistency and availability. The choice is not optional; it is forced by the physical reality of network failure.
For safety-critical and many process-critical applications, availability is non-negotiable. The control logic must continue to function when the wide-area link to the cloud is unreachable. The cloud may be the system of record, the orchestrator of long-term policy, and the locus of analytics, but it cannot be the locus of moment-to-moment control unless the network between cloud and asset is treated as a hard real-time fabric — which, on commercial cellular or internet links, it is not.
The architectural implication is that the control plane may live in the cloud, but the data plane — the loop that closes the control action — must reside locally. The gateway is the natural location for the data plane: close enough to the asset to be partition-tolerant, capable enough to host substantial logic.
2.4 Regulatory Fragmentation and Data Sovereignty
The legal environment for cross-border data transfer has become substantially more constrained over the past decade. The General Data Protection Regulation [6], the NIS2 Directive [7], China's Cybersecurity Law and its Personal Information Protection Law, India's Digital Personal Data Protection Act, sector-specific rules (automotive in Germany, medical data in most jurisdictions, financial data nearly everywhere), and the Schrems II ruling [8] have collectively made all data automatically flows to the cloud an architecturally and legally precarious default.
Edge processing supports data sovereignty by enabling data minimization at the source. Personally identifiable or commercially sensitive data can be processed, transformed, and aggregated in-territory; only derived, anonymized, or aggregated outputs cross jurisdictional boundaries. This is not a marginal compliance benefit — for some workloads in some jurisdictions, it is the only legally permissible architecture.
2.5 Privacy as an Architectural Primitive
Beyond regulatory compliance, a growing class of techniques treats privacy as a first-class architectural concern. Differential privacy [9] adds calibrated noise to released statistics to bound information leakage. Federated learning [10] trains models across distributed devices without centralizing raw data. Secure aggregation [11] permits computing sums across a population without any party — including the aggregator — observing individual contributions. Confidential computing [12] enclaves the computation itself.
Each of these techniques shifts work toward the data's location. The gateway, as the aggregation point for a population of devices, is the natural locus for the noise addition, federated update, or secure aggregation operations these techniques require. The mathematical structure of the techniques implies the architectural location.
Why the Gateway, Specifically?
The forces of Section 2 push computation outward. They do not, by themselves, determine where on the path it lands. The gateway has emerged as the dominant locus in industrial settings, and the reasons are worth examining carefully.
Gateway equilibrium across edge placement criteria
bar chart3.1 The Computational Viability Threshold
The device edge is severely resource-constrained. A typical industrial sensor microcontroller operates under 100 mW, with kilobytes of RAM. It can run a quantized neural network classifier — TinyML — but cannot host a general-purpose operating system, a container runtime, or a non-trivial application stack. Its software is delivered as monolithic firmware, updated infrequently, and difficult to evolve.
The regional edge, by contrast, is capable but distant. The 10-40 ms RTT is acceptable for many applications but rules out the tightest control loops. More significantly, regional edge is administratively heterogeneous — workloads are subject to the operational policies of the carrier or hyperscaler hosting them, which is not always compatible with the security model of the industrial operator.
The gateway occupies a position with no equivalent on either side. A modern industrial gateway with an ARM Cortex-A53 quad-core, a neural processing unit delivering 2-6 TOPS [13][14], and 2-8 GB of RAM is capable of:
It is the smallest computing platform on which a general-purpose distributed systems toolchain functions, and the largest computing platform compatible with the energy and cost envelopes acceptable in industrial deployment.
- Running a full Linux distribution
- Hosting a container runtime (containerd, Podman) or orchestrator (K3s, KubeEdge)
- Executing modest ML inference (image classification, anomaly detection, predictive maintenance models)
- Maintaining persistent application state in embedded databases
- Bridging industrial protocols (Modbus, OPC UA, EtherNet/IP) to cloud-native ones (MQTT, HTTPS, gRPC)
3.2 Administrative and Security Boundary Alignment
In any non-trivial industrial network, the gateway is already the boundary between operational technology (OT) and information technology (IT). It is where firewalls terminate, where VPN tunnels emerge, where protocol translation occurs, and where access control is enforced. The IEC 62443 zones-and-conduits model [15] formalizes this: the gateway sits at a conduit between an OT zone and an external zone.
Locating compute at this boundary aligns the security perimeter with the workload perimeter. Workloads executing at the gateway can access OT data without crossing additional trust boundaries, while their outputs traverse the existing IT conduit. The alternative — running workloads in the cloud and pulling raw OT data outward — creates a new attack surface for every workload, multiplying the conduits that must be defended.
This alignment is more than convenience. In threat models drawn from real industrial incidents — Triton, Industroyer, the various ransomware events that have crossed the IT/OT boundary — the gateway is precisely the locus that must be hardened. Putting compute there does not introduce new risk; it consolidates risk that was already there.
3.3 Aggregation Density and Statistical Workloads
Many of the most economically valuable analytical workloads in industrial settings are statistical or population-scale: anomaly detection across a fleet of similar assets, predictive maintenance models trained on cohort behavior, energy optimization across an aggregate load. These workloads inherently require aggregation across multiple devices.
The device edge cannot perform aggregation by definition — it is per-device. The cloud can, but at the cost of streaming raw data and tolerating partition risk. The gateway is the smallest tier at which N-to-1 aggregation occurs naturally: it serves a population of devices, by construction. Compute resources need not be replicated per-device; they can be amortized.
This has implications for ML workloads in particular. Federated learning works at the gateway tier even when it does not work at the device tier — gateways have the memory and compute to participate, the connectivity to coordinate, and the aggregation scope to produce statistically meaningful local updates.
3.4 Workload Mobility via Orchestration
Until recently, gateway-class hardware was managed as fixed-function appliances: a firmware image, infrequently updated, with limited operational flexibility. The development over the past five years that has most changed the economics of edge computing is the maturation of lightweight orchestration: K3s [16], KubeEdge [17], OpenYurt, Akri, and the broader CNCF edge ecosystem.
These tools bring container orchestration — declarative workload specification, rolling updates, observability, service mesh — to gateway-class hardware. A gateway becomes a node in a fleet, managed with the same toolchain as the cloud. Workloads are specified declaratively (Kubernetes manifests, Helm charts) and converge to desired state via GitOps controllers (Argo CD, Flux, Fleet).
The consequence is that gateways are no longer static appliances; they are programmable computing platforms with substantial portability of workload between cloud and edge. The same container image that runs in the cloud for development can run on a gateway in production with minimal modification. This eliminates the historical penalty of edge deployment — bespoke build pipelines, manual update procedures, fragile fleet management — and is the practical enabler that has made gateway-tier edge computing economically viable at scale.
Architectural Patterns
A taxonomy and a set of drivers are insufficient; the architectures that have emerged at the gateway tier are worth examining in their own right.
4.1 Stream Processing at the Edge
Many industrial workloads are most naturally expressed as streaming computations: incoming sensor data is transformed, windowed, aggregated, and emitted as derived streams. The architectural patterns that have emerged in cloud-scale stream processing — variants of the lambda architecture [18] and the simpler kappa architecture [19] — have edge-tier analogs.
Apache NiFi has emerged as a common workflow tool at the gateway tier (its MiNiFi subproject targets exactly this niche). Kafka with single-node deployments, or lighter-weight alternatives like Redpanda, run on capable gateways. For more sophisticated stream processing, Apache Flink and the recent emergence of Arroyo and similar projects extend the model.
The fundamental analytical question at the gateway tier is the same as at cloud scale — exactly-once semantics in the presence of failures, watermarking for out-of-order data, state management across restarts — but the resource envelope is different. Gateway stream processors must run in hundreds of megabytes of RAM, not tens of gigabytes, which has produced a generation of stream processing tools designed specifically for this constraint.
4.2 ML Inference at the Edge
The case for ML inference at the gateway is not primarily about training; it is about runtime. Training a model is a one-time (or periodic) cloud-scale workload. Running the trained model — inference — is a continuous, latency-sensitive workload that benefits substantially from edge placement.
The technical foundations are well established. Quantization-aware training [20] reduces model weights from 32-bit floating point to 8-bit integer with minimal accuracy loss, shrinking memory footprint by 4× and accelerating inference proportionally. Knowledge distillation [21] trains compact "student" models from larger "teacher" models. Structured pruning removes redundant parameters. Neural architecture search [22] discovers compact architectures purpose-built for edge deployment.
Runtime tooling has matured commensurately: TensorFlow Lite, ONNX Runtime, ExecuTorch, NVIDIA TensorRT for the higher-capability gateways with discrete GPUs. Hardware acceleration has converged on dedicated neural processing units: the NPU in Rockchip's RK3588 delivers 6 TOPS [13], the Hailo-8 add-in accelerator delivers 26 TOPS [14], NXP's i.MX 8M Plus integrates a 2.3 TOPS NPU, and similar units appear across nearly all modern industrial SoCs.
The result is that workloads that would have been considered cloud-only as recently as 2018 — real-time object detection, anomaly detection on multivariate time series, conditional speech recognition — are now routinely deployed at the gateway tier.
4.3 Federated Learning and Privacy-Preserving Aggregation
The original federated averaging algorithm [10] was designed for mobile devices, but its assumptions — partial participation, statistical heterogeneity, communication-constrained updates — apply directly to gateway-tier deployments. Each gateway trains a local update on its locally observed population of devices; periodic updates are aggregated across gateways, typically via a coordinator in the cloud, to produce a global model that benefits from cross-population information without centralizing raw data.
Practical deployment must contend with several non-trivial challenges. Statistical heterogeneity (the non-IID nature of data across gateways) degrades convergence; FedProx [23] and SCAFFOLD address this with various forms of variance reduction. Communication efficiency is improved by gradient compression and quantization. Privacy attacks via gradient inversion [24] motivate differential privacy and secure aggregation [11].
The current state of the art is that federated learning is operationally viable for many industrial ML workloads, but it remains substantially more complex to deploy and operate than centralized training. The benefit must justify the operational cost, which is most often the case when the data centralized would itself carry regulatory or commercial sensitivity.
4.4 GitOps and Declarative Edge Management
The principle is straightforward: the desired state of the fleet is described in a version-controlled repository, and gateways converge toward that state via pull-based controllers. The controller (Argo CD, Flux) observes the gap between observed and desired state and applies the changes required to close it.
For a fleet of thousands of gateways, this model has substantial operational advantages over imperative push-based management. Updates are atomic and rollback-able. The system tolerates gateways that are temporarily offline; they catch up when reconnected. The full operational history of the fleet is captured in Git history, providing auditability that satisfies most compliance regimes.
The pattern has been borrowed nearly without modification from cloud-native operations. Its successful translation to the edge tier is one of the major operational advances of recent years.
4.5 Edge-Native State Management
Application state at the gateway must be durable across restarts, consistent under concurrent access, and synchronizable with central systems when connectivity permits. The workhorse remains SQLite, which is mature, embedded, and well-understood. For time-series data, DuckDB has emerged as a strong analytical option, while VictoriaMetrics and the single-node InfluxDB serve operational metrics workloads.
For multi-master scenarios — where multiple gateways or a gateway and the cloud both write the same logical data — Conflict-free Replicated Data Types (CRDTs) [25] provide a principled foundation for eventual consistency without centralized coordination. The literature on CRDTs is mature; productization for industrial edge use is still emerging but accelerating (Automerge, Yjs, and similar libraries are increasingly used).
Open Problems
The gateway tier is not a solved system. Several open problems warrant attention.
Fleet observability at scale. A thousand gateways each emitting telemetry produce a telemetry firehose of their own. The selection of which metrics to emit, at what granularity, with what sampling, and how to aggregate them centrally without losing diagnostic fidelity is an unsolved design problem. OpenTelemetry has improved the situation at the cloud tier, but edge-specific guidance remains thin.
Confidential computing at the edge. Trusted execution environments — Intel SGX, ARM TrustZone, AMD SEV — provide hardware-isolated execution suitable for workloads handling sensitive data. Their adoption in gateway-class hardware is uneven: TrustZone is widely available but constrained in capability, while richer enclaves are largely absent from the ARM application processors that dominate the gateway market. For workloads that require enclave-level protection at the gateway, options remain limited.
Model lifecycle management. Deploying a new inference model to a fleet of thousands of gateways without breaking the workloads that depend on it requires A/B testing, canary rollouts, and rollback discipline. Tooling exists at the cloud tier; its adaptation to the edge tier is incomplete.
Schema evolution. When the data model evolves — a new field, a renamed metric, a different unit — gateways and cloud must remain coherent. Schema registries, contract testing, and explicit versioning are well understood in cloud-native architectures. Their consistent application across the gateway boundary is an area of ongoing engineering practice rather than a solved problem.
Energy-proportional computing. Gateways typically run 24/7. Idle power consumption matters, both for thermal envelopes and operating cost. The dynamic voltage and frequency scaling that desktop and server processors employ is less effective on the embedded SoCs that dominate the gateway market. Energy efficiency at the gateway tier is an underexplored area of systems research.
The Equilibrium Argument
To return to the central claim: the gateway is not the final destination of the edge computing wave. It is the current equilibrium point.
The forces are dynamic. Silicon density continues to improve; the device tier will absorb workloads that today require gateway-class compute. Confidential computing will mature on embedded silicon, dissolving some of the gateway's advantage in security perimeter alignment. 5G and 6G ultra-reliable low-latency communication, where actually deployed at scale, may pull some workloads toward the network edge. Each of these shifts will reapportion workloads among the four tiers.
But for the foreseeable horizon — call it five to ten years — the gateway tier is where the present forces converge. Its hardware is capable enough to run general-purpose workloads. Its administrative position aligns with established security perimeters. Its aggregation density matches the statistical structure of valuable industrial workloads. Its tooling has matured to the point where it is no longer a bespoke development effort but an extension of cloud-native operations.
For the practitioner designing an industrial IoT deployment today, the implications are direct. Treat the gateway as a first-class computing platform, not a protocol bridge. Invest in workload portability between gateway and cloud, not lock-in to either. Plan for orchestration, observability, and model lifecycle management from the outset, not as afterthoughts. The gateway tier rewards engineering investment in proportion to that investment, and penalizes deployments that treat it as a static appliance.
The edge is not where compute is going. It is where compute, in part, already lives. The gateway is where, for now, most of it is settling.
References
[1] ETSI. Mobile Edge Computing (MEC); Framework and Reference Architecture. ETSI GS MEC 003, 2016 (revised).
[2] J. Gray. "Distributed Computing Economics." Microsoft Research Technical Report MSR-TR-2003-24, 2003.
[3] D. McCrory. "Data Gravity in the Clouds." 2010.
[4] E. A. Brewer. "Towards Robust Distributed Systems." Keynote, PODC 2000.
[5] S. Gilbert and N. Lynch. "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services." ACM SIGACT News, 33(2), 2002.
[6] European Parliament and Council. Regulation (EU) 2016/679 (General Data Protection Regulation). 2016.
[7] European Parliament and Council. Directive (EU) 2022/2555 (NIS2). 2022.
[8] Court of Justice of the European Union. Data Protection Commissioner v. Facebook Ireland and Schrems (Case C-311/18), 2020.
[9] C. Dwork. "Differential Privacy." Proceedings of ICALP, 2006.
[10] H. B. McMahan et al. "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS, 2017.
[11] K. Bonawitz et al. "Practical Secure Aggregation for Privacy-Preserving Machine Learning." ACM CCS, 2017.
[12] Confidential Computing Consortium. A Technical Analysis of Confidential Computing. 2021.
[13] Rockchip Electronics. RK3588 Technical Reference Manual. 2022.
[14] Hailo Technologies. Hailo-8 AI Accelerator Datasheet. 2021.
[15] International Electrotechnical Commission. IEC 62443-3-3: Industrial Communication Networks — Network and System Security — Part 3-3: System Security Requirements and Security Levels. 2013.
[16] Rancher Labs / SUSE. K3s: Lightweight Kubernetes. https://k3s.io
[17] CNCF. KubeEdge: A Kubernetes-native Edge Computing Framework. https://kubeedge.io
[18] N. Marz and J. Warren. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning, 2015.
[19] J. Kreps. "Questioning the Lambda Architecture." O'Reilly Radar, 2014.
[20] B. Jacob et al. "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference." CVPR, 2018.
[21] G. Hinton, O. Vinyals, and J. Dean. "Distilling the Knowledge in a Neural Network." NeurIPS Deep Learning Workshop, 2014.
[22] B. Zoph and Q. V. Le. "Neural Architecture Search with Reinforcement Learning." ICLR, 2017.
[23] T. Li et al. "Federated Optimization in Heterogeneous Networks." MLSys, 2020.
[24] L. Zhu, Z. Liu, and S. Han. "Deep Leakage from Gradients." NeurIPS, 2019.
[25] M. Shapiro et al. "Conflict-Free Replicated Data Types." SSS, 2011.
This article is part of the Modibus technical series. The Modibus MB213 gateway platform, with its containerized application environment, NPU-accelerated inference capability, and declarative fleet management, is engineered around the architectural patterns described here. For technical discussion or platform inquiries, contact [info@modibus.com](mailto:info@modibus.com).