Synthetic Data Simulation
Today’s Internet infrastructure is always-on: routers and switches are designed to draw nearly the same amount of power when idle as they do at peak load. The majority of their consumption is fixed idle draw, which means large portions of the backbone continue to burn energy even during deep traffic valleys at night or weekends.
Therefore, we want to understand what network control strategies (or protocol modifications) would allow us to safely power down routers or links during low-demand periods, and how much energy savings this could realistically yield in practice—without harming performance.
To explore this, we designed a discrete-time network simulator that lets us test different sleep-mode strategies under realistic traffic conditions. The simulator models a 100-node, scale-free (ISP/WAN-like) topology, where each node can be standard hardware or an upgraded, more efficient variant. Traffic is weekly, residential and business, with prime-time peaks, late-night valleys, heavy-tailed flow sizes, and optional overlays for heavy users and live events.
Our goal is to determine whether statements of the following form can be validated:
“Turning down up to X% of nodes during low-demand periods can save Y% of energy. Achieving equivalent savings would otherwise require upgrading Z% of the hardware fleet.”
In other words, we want to quantify an equivalence between software interventions and hardware upgrades: how much energy can be saved with intelligent, SDN-style control alone, and how those savings compare to traditional capex-driven efficiency gains.
Related Work¶
The idea of reducing the energy footprint of communication networks has a long history in the “green networking” literature. Early studies established that routers and switches are not energy-proportional: their power draw remains nearly constant across load levels, with idle devices often consuming 70%–90% of peak power.
This motivated proposals to make networks more energy-aware by putting idle components to sleep, scaling link rates, or both. For example, Gupta and Singh [SIGCOMM 2003] argued that interfaces and components should support low-power states, while Nedevschi et al. [NSDI 2008] demonstrated—using both simulations and real topologies such as Abilene and Intel—that sleeping and rate adaptation could substantially cut backbone energy use while bounding queueing delay.
Standardization followed, most notably IEEE 802.3az Energy-Efficient Ethernet (EEE) [IEEE Std 802.3az-2010], which introduced Low-Power Idle (LPI) at the PHY layer with non-zero wake/refresh times. These mechanisms showed that sleep states are technically feasible, but also highlighted their limitations: wake-up penalties are non-negligible, and careful coordination is required to avoid harming latency-sensitive traffic.
With the rise of Software-Defined Networking (SDN), researchers have explored how centralized control can dynamically consolidate traffic and selectively disable underutilized links or switches. Studies such as Heller et al. [HotNets 2010], Bianzino et al. [Computer Networks 2012], and Chiaraviglio et al. [IEEE Communications Surveys & Tutorials 2017] showed that energy-aware traffic engineering using SDN can reduce network power consumption in both ISP backbones and data centers while maintaining throughput and connectivity.
This line of work confirms that software control hooks exist today that could implement energy-saving policies in practice. However, most of these studies focus narrowly on maximizing energy reduction, without systematically comparing software-based approaches against other efficiency levers.
Our contribution is to revisit this space with a new angle: treating software-only sleeping policies as a first-class alternative to hardware upgrades, and explicitly quantifying the equivalent savings between the two. Whereas prior work demonstrated the feasibility of sleeping and SDN-based consolidation, we ask:
“How much energy can be saved in practice with conservative, SLO-guarded sleeping strategies—and how does this compare to the savings from replacing equipment with more efficient hardware?”
This framing matters in 2025 and beyond, because operators face both urgent decarbonization goals and practical constraints: hardware refreshes are costly, slow, and come with embedded emissions, while software interventions can be deployed quickly and at scale. By casting the problem as a trade-off between software control vs. hardware upgrades, our work highlights a novel and relevant decision axis for operators and policymakers.
Traffic Profile¶
Energy savings from sleeping only matter if traffic is uneven: networks burn nearly the same energy at 03:00. as at 21:00., even though demand can differ by an order of magnitude. To capture this effect, we model a 7-day synthetic traffic profile that mixes residential and business demand:
- Residential traffic dominates evenings. A sharp peak around 21:00 local time (streaming/video), a steady decline overnight, and a trough between 02:00 and 05:00.
- Business traffic: demand peaks mid-day on weekdays between 11:00 and 14:00, then falls off after working hours and on weekends.
- Weekend effect: Residential demand is slightly higher, while business use is lower.
- Heavy users & bursts: A small fraction of users generate very large flows (Pareto-heavy tail), and optional “event overlays” can model live spikes (e.g., sports).
This profile is inspired by Cloudflare Radar, which publishes per-country Internet traffic statistics. For the U.S., Radar shows a deep night-time valley, a strong evening peak, and business-day mid-day bumps—consistently with the pattern our synthetic model reproduces.
By running the simulator over a full week (7 days × 24 hours), we capture both diurnal cycles and weekday/weekend differences. This makes it clear that the opportunity for savings lies in valleys, not peaks: software policies can safely turn down equipment when capacity is under-used, then wake it before the next surge.
The chart below shows the normalized residential and business components of the synthetic profile, repeated across a 7-day week. Residential (blue) dominates evenings, while business (red) drives mid-day weekdays.
Configuration¶
Our simulator is designed to capture the main ingredients that matter for energy vs. performance in backbone networks. Below we outline its components, the assumptions we make, and the parameters we sweep in the experiments. Full source code is available here.
Topology¶
Topology affects path redundancy, which in turn governs how much capacity can be safely powered down:
- Scale-free graph (default). We model the network as a 100-node, power-law degree graph, with a few high-degree “hubs” and many low-degree leaves. This structure resembles ISP/WAN backbones, where hubs represent core routers and leaves represent regional aggregation.
- Fat-tree (optional). The simulator also supports clos/fat-tree topologies, which resemble data center fabrics (pods, spines, leaves) with uniform path diversity. Fat-tree typically permits more safe sleeping because of richer alternative routes.
Node & Link Energy Model¶
Vendor numbers vary, but literature agrees most power is fixed idle draw, not load-sensitive. Wake transitions are non-zero and must be modeled:
- Baseline hardware (Juniper QFX5120-like): about 210 W at idle and 60 W dynamic, with 10 W at sleep and 20 ms of wake penalty.
- Upgraded hardware (Arista 7050X-like): around 40% lower idle/dynamic consumption, wutg modest latency and capacity improvements.
- Links (EEE model): about 2 W at idle, 0.1 W at sleep, 2 ms of wake penalty, and per-link sleep only allowed if utilization is below thresholds.
Sleep Policies¶
These mechanisms enforce operator-safe bounds. Without them, aggressive sleeping would destabilize routing or increase loss/latency:
- Adaptive consolidation. At low load, we re-rank nodes by utilization, keep a small “spine” of high-degree nodes, and greedily sleep the rest subject to guardrails.
- Guardrails.
- Adaptive guard (default): require
capacity
greater thanheadroom × (last tick’s aggregate load / ρ_target)
. - Fixed guard: enforce a minimum percentage of baseline capacity.
- Adaptive guard (default): require
- Articulation avoidance. Nodes critical to connectivity are never slept.
- SLO guardrail. If rolling latency/drops exceed thresholds, nodes are woken automatically.
- Hysteresis. Nodes/links must stay in a state for at least 5 ticks before flipping, reducing churn.
- Link-level sleep. Links can enter low-power idle (LPI) if underutilized, with their own hysteresis.
Routing & Performance¶
Routing determines which capacity is actually usable at any moment, so our shortest-path and queueing model is the bridge between sleeping decisions and user-visible outcomes. If sleeping pushes load onto longer or more congested paths, it will show up immediately:
- Routing: Shortest-path by latency among alive nodes/links. Flows drop if no path exists.
- Queueing proxy: We approximate delay inflation as
base_latency / (1–ρ)
. Links beyondρ ≈ 0.99
cap out. - Performance metrics: We monitor
latency
(mean
,p50
,p95
),packet drops
,p95 link utilization
,p95 queue delay
,number of flaps
,SLO breach ticks
.
Sweep¶
For each run, we record energy, latency, drops, and ops metrics, and generate the six baseline charts plus equivalence analysis.
- Sleep caps:
0–90%
of nodes (in10%
steps). - Upgrade fractions:
0–100%
upgraded nodes (in10%
steps).
Design Decisions¶
We intentionally choose conservative defaults:
- Guards keep
p95 link utilization < 1
and queueing bounded. - Articulation avoidance preserves connectivity.
- Hysteresis reduces sleep/wake oscillations.
- Hardware profiles are realistic mid-market values (not worst-case or best-case).
This ensures that reported savings are plausible for operators, not optimistic artifacts of overly aggressive assumptions.
Experiment Parameters¶
To generate the results shown in the next section, we ran the simulator for a full week (7 days) period at 5-minute tick resolution, producing a detailed view of both weekday and weekend behavior.
Here is the exact command we used:
python3 simulate.py \
--days 7 \
--tick-min 5 \
--topology scale_free \
--policy adaptive \
--pairs 160 \
--guard-mode adaptive \
--rho-target 0.60 \
--headroom 1.20 \
--spine-frac 0.18 \
--res-share 0.85 \
--heavy-user-share 0.015 \
--emit-timeseries \
--log-progress
Explanation of key parameters¶
- Horizon (
--days 7
): Simulates a full 7-day week to capture weekday/weekend differences. - Granularity (
--tick-min 5
): Each tick represents 5 minutes, giving 2016 ticks per week. This is fine enough to capture diurnal patterns while keeping runs tractable. - Topology (
--topology scale_free
): A 100-node scale-free graph models WAN/ISP-like backbones (a few high-degree hubs, many leaves). This structure tends to produce realistic bottlenecks. - Policy (
--policy adaptive
): Adaptive sleeping with consolidation, which ranks nodes by utilization, keeps a minimal “spine” of hubs, and sleeps the rest subject to guardrails. - Traffic pairs (
--pairs 160
): Each tick samples about 160 random Origin-Destination (OD) pairs with heavy-tailed volumes. This defines the offered load. Lowering this number reduces load and makes sleeping easier, while increasing it stresses the network. We keep 160 for realism. - Guard mode (
--guard-mode adaptive
): Uses utilization-aware guards rather than a fixed fraction of capacity. At each tick:required capacity ≈ headroom × (U last / ρ_target)
, where:ρ_target = 0.60
: Tries to keep link utilization under 60% on average (conservative).headroom = 1.20
: Provides 20% of cushion for bursts and measurement error.- Together, this pair is a “vanilla conservative” setting an operator would accept.
- Spine fraction (
--spine-frac 0.18
): Ensures at least 18% of high-degree nodes are always on, even in valleys. This prevents over-pruning the backbone. - Residential share (
--res-share 0.85
): The traffic mix is made of 85% residential and 15% business. This matches Cloudflare Radar data showing that residential dominates U.S. fixed broadband traffic. - Heavy user share (
--heavy-user-share 0.015
): About 1.5% of flows are boosted to mimic “power users” who generate multiple TB/month. This captures the heavy-tail distribution observed in practice. - Emit timeseries (
--emit-timeseries
): Writes per-tick energy and sleeping data for post-processing (e.g., hourly savings chart). - Log progress (
--log-progress
): Prints simulation progress every 10% of ticks for long runs.
What the simulator records¶
As mentioned earlier, for each sleep-cap run (0–90% in 10% steps), the simulator outputs:
- Energy (Wh), reported as % savings vs. the all-on baseline.
- Latency (mean, p50, p95), representing user-visible delay.
- Drops as proxy for throughput not delivered.
- Utilization headroom, where
p95(max link ρ)
ensures links do not saturate. - Queueing delay, where
p95(max link queue ms)
guards against hidden congestion. - Stability metrics such as total sleeps/wakes, short flaps, SLO breach ticks.
- Structural safety in terms of articulation nodes that never slept to calculate average alive capacity fraction.
These are stored in per-chart CSV files (for visualization) and per-cap JSON files (for ops safety analysis).
Results¶
Energy Consumption vs. Sleep Cap¶
Figure 2 — Energy Consumption vs. Sleep Cap (Weekly, % of Baseline).
What “sleep cap” means. A sleep cap of X% means the policy is allowed to sleep up to X% of nodes at the same time, but will only do so when the guardrails (utilization target and headroom, articulation avoidance, SLO auto-wake) say it is safe. In practice, the actual number of sleeping nodes varies with demand—fewer at peaks, more in valleys.
What to look for. The line shows energy consumption as a percentage of the 0% cap baseline (100% at the left). As the allowed sleep cap increases, weekly energy declines—gently at first (guard clamps during busy hours) and more noticeably as the policy can harvest the night-time valleys. In this current run, the weekly average energy consumption falls to 90.7% at 50% cap (about 9.3% in aggregate energy savings, with less than 10 nodes sleeping on average) and 80.8% at 90% cap (about 19.2% in aggregate energy savings, with about 20 sleeping on average). As a reminder, the weekly energy savings aggregate valleys and peaks: if we plot hourly savings, we will see 25%–35% reductions overnight when sleeping is most active.
Why this is significant. Unlike a fixed “capacity-alive” rule, the adaptive, SLO-based guard lets the network sleep more in valleys and clamp in peaks, so we harvest savings where users will not feel it. That is the operating point an ISP would want: conservative at busy times, opportunistic at night—delivering between 9% and 19% weekly reductions without touching hardware.
Latency vs. Sleep Cap¶
Figure 3 — Latency vs. Sleep Cap (Weekly, % of Baseline).
What to look for. We plot latency as a percentage change vs. the 0% cap baseline (week-aggregate). Changes are extremely small: the mean
and p50
are within a ±0.1% range, and p95
does not exceed +0.25%. The slight uptick around 20%–40% cap can happen when removing low-utilization peripheral nodes marginally shifts traffic onto the core (slightly higher base path length or queueing). At higher caps, consolidation and the adaptive guard keep performance flat again. This is ideally what we want to see: guardrails preserve SLOs while sleeping harvests energy in valleys.
Why this is significant. Expressed relative to the baseline, the curves show no meaningful degradation: the policy’s adaptive guard and SLO auto-wake keep the mean and tail latency essentially flat even as energy drops in Figure 2. In other words, the simulator demonstrates the operational safety of sleeping: the network saves energy in valleys without pushing users into higher delay.
Drops vs. Sleep Cap¶
Figure 4 — Drops vs. Sleep Cap (Weekly, % of Baseline).
What to look for. We plot drops as a percentage change vs. the 0% cap baseline (week-aggregate). In our run, drops rise only about +3.6% between the 0% sleep cap to the 90% cap. The line increases gently because the offered load dominates, and the adaptive guard clamps sleeping during busy hours. Any small increase reflects (a) occasional detours when some periphery is slept and (b) the fact that we count conservative queueing rejections (ρ > ~1.2
) as drops.
Why this is significant. Even as we see meaningful energy reductions when the sleep cap increases, drops barely move (a few percentage points across the whole week). That is ideally what we want: the SLO guard and adaptive utilization guard grant sleep opportunities in valleys while preserving reliability during peaks.
Energy Consumption vs. Hardware Upgrades¶
Figure 5 — Energy Consumption vs. Upgraded Nodes (Weekly, % of Baseline).
What “node upgrades” means. Here we model replacing a fraction of nodes with a more efficient hardware profile (lower idle/dynamic power, slight latency/capacity improvement). No topology changes: routing and traffic stay the same. This is an alternative path to energy savings (procurement, install windows, embedded emissions).
What to look for. Energy consumption falls in a roughly linear fashion as more of the fleet is upgraded. In our run, a full upgrade (100% of nodes) reduces weekly energy consumption to about 60% of the baseline. This 40% reduction represents the upper bound of what this hardware profile can achieve. The smooth, monotone decline also shows that each incremental upgrade contributes steadily to savings, without sudden thresholds or diminishing returns in this range.
Why this matters. This chart illustrates the hardware path to energy savings: replacing switches/routers with more efficient models directly reduces idle and dynamic draw. The gains are significant, but they come at the cost of capital expenditure, rollout time, and embedded emissions from manufacturing. Compared to software-based sleeping, hardware upgrades deliver deeper long-term reductions, but they require greater investment and slower deployment.
Latency vs. Hardware Upgrades¶
Figure 6 — Latency vs. Upgraded Nodes (Weekly, % of Baseline).
What to look for. We plot the percentage change in latency vs. the 0%-upgrade baseline. In our run, the curves are monotone down: by 100% upgrades, mean
latency improves by about 10.9%, and the tail (p95
) improves by 10.6%. Even at 50% upgrades, we already see a noticeable improvement (about −2.1% for the mean). This is the expected behavior: faster boxes and modest capacity gains reduce both base path delay and queueing.
Why this matters. We saw earlier that sleeping preserves performance (illustrated by a flat latency trend), while this figure shows hardware upgrades improve it. Together with what we observed when measuring energy consumption as a function of hardware upgrades, we discover a more complete story: software saves energy quickly with no performance penalty, while hardware saves energy and also reduces latency over time, which comes at higher costs in terms of time, capital and embedded emissions. They are complementary levers.
Drops vs. Hardware Upgrades¶
Figure 7 — Drops vs. Upgraded Nodes (Weekly, % of Baseline).
What to look for. We plot drops as a percentage change vs. the 0%-upgrade baseline. In our run, the curve is essentially flat to slightly negative: moving from 0% to 100% hardware upgrades reduces weekly drops by only 0.04% (from 92,215 to 92179 in absolute terms). That tiny downward drift is expected: upgrades add a bit of capacity and trim base latency, so marginal queueing rejections become slightly less likely.
Why this is significant. Together with Figure 6, this confirms that hardware upgrades improve performance (lower latency and a hair fewer drops) while energy consumption falls almost linearly. In other words, the hardware upgrade path buys both energy savings and light reliability gains. This complements the software path (sleeping control strategies), which preserved drops while delivering immediate energy savings.
Discussion¶
Our results suggest that software-defined sleeping can deliver meaningful energy savings without harming performance, while hardware upgrades yield both energy savings and latency improvements. Several factors influence the magnitude and robustness of these findings:
- Topology realism. We evaluated scale-free topologies (ISP/WAN-like). Data center fat-tree topologies would likely permit more safe sleeping due to richer path diversity, but the qualitative equivalence pattern between software and hardware interventions should remain.
- Guardrails. Our adaptive guard and SLO auto-wake mechanism enforce conservative safety margins. Loosening thresholds (e.g., higher
ρ_target
, lowerheadroom
) increases savings but trades away burst slack. Operators can tune these knobs based on risk appetite. - Hardware profiles. We modeled mid-market switches: Juniper QFX5120-like (baseline) and Arista 7050X-like (upgrade). Other fleets will differ in absolute percentages, but the equivalence pattern persists.
- Limitations. Queueing is modeled as a first-order approximation: we do not simulate control-plane instabilities or failures. Results are energy-focused—latency improvements from hardware upgrades appear as a side effect. Carbon-aware scheduling (time-varying intensity) is not yet incorporated.
With conservative guardrails, a software-only sleeping approach generates 9% of energy savings at a sleep cap of 50%, and 19% of energy savings at a sleep cap of 90%, with no performance harm. Achieving the same savings via hardware upgrades requires replacing 23.5% and 48.6% of the network fleet, respectively, while improving latency.
The equivalence curve below summarizes this key trade-off—for any sleep cap, it shows the fraction of hardware upgrades that would deliver the same energy saving:
Figure 8 — Software Sleep Cap vs. Hardware Upgrade Equivalence.
What to look for. The equivalence curve shows, for each sleep cap, the fraction of hardware upgrades needed to achieve the same energy savings. In our run, a 50% sleep cap delivers 9% in weekly energy savings, equivalent to upgrading 23.5% of the network fleet. At a 90% sleep cap, we achieve 19% in energy savings, which equates to 48.6% of the fleet upgraded. The relationship is smooth and monotone: higher sleep caps map to steadily larger fractions of upgrades.
Why this is significant. This figure quantifies the core trade-off: software sleeping can immediately substitute for a substantial portion of hardware upgrades in terms of energy savings, without substantial capital expenditures, rollout delays, or embedded emissions. Hardware still has long-term benefits, especially for latency, but the chart makes clear that operators can achieve a meaningful share of those energy savings via control policies alone.
Broader implications. Wang et al. [ISCA’24] show that cloud operators face a tension: minimizing emissions requires heterogeneous fleets (deploying newer, more efficient SKUs where grids are carbon-intensive, while tolerating older, less efficient ones where grids are clean), but SKU diversity creates operational headaches in procurement and lifecycle management.
The parallel here is that software control can mitigate some of this complexity for network operators:
- In regions where the grid is already clean, software sleeping avoids generating additional embedded emissions from unnecessary hardware replacement.
- In regions where the grid is carbon-intensive, hardware upgrades deliver real decarbonization gains.
Taken together, the strategy is hybrid: use software interventions where hardware replacement would not cut net emissions, and target hardware upgrades where they matter most. This approach balances immediate, low-cost savings with deeper, targeted reductions, without imposing unmanageable operational complexity.
Going one step further, nothing prevents operators from stacking both approaches—applying sleeping policies even after hardware upgrades—since upgraded hardware still consumes non-zero energy. In theory, compounded savings are possible.
Conclusion¶
We began with a simple observation: today’s Internet infrastructure is always-on. Routers and switches draw nearly the same power when idle as at peak load, with most of their consumption locked into fixed idle draw. As a result, large portions of the backbone waste energy during the valleys of night-time and weekend traffic.
Our guiding research question was whether network control strategies—implemented in software—could safely power down routers or links during low-demand periods, and how much energy this could realistically save without harming performance. To explore this, we built a discrete-time simulator of a 100-node, scale-free (ISP/WAN-like) topology, with weekly diurnal traffic mixes and configurable hardware profiles. The goal was to test whether statements of the following form could be validated:
“Turning down up to X% of nodes during low-demand periods can save Y% of energy. Achieving equivalent savings would otherwise require upgrading Z% of the hardware fleet.”
Our results provide a clear answer. With conservative guardrails in place, software-only sleeping achieves 9% weekly savings at a 50% sleep cap and 19% at a 90% cap, with no measurable impact on latency or drops. To achieve the same reductions through hardware alone, operators would need to replace 23.5% and 48.6% of the fleet, respectively. Hardware upgrades deliver deeper reductions (up to 40% in our modeled profiles) and improve latency, but they require capital expenditure, longer rollout times, and come with embedded emissions.
The two approaches are therefore complementary rather than competing. Software strategies offer immediate, low-cost reductions deployable via over-the-air updates, buying time and savings today. Hardware upgrades provide deeper efficiency gains and performance improvements over the long term. And because upgraded hardware still consumes power, operators can layer software sleeping on top of hardware upgrades, compounding the effect. Together, they form a hybrid path: software where embedded emissions would outweigh hardware benefits (e.g., in regions with already clean grids), and hardware where decarbonization is most urgent.
Improving the energy efficiency of backbone networks is not optional—it is central to the sustainability of the Internet. After all: “No energy is greener than the energy we don’t use.”
Future Work¶
Building on these results, several next steps can deepen realism and bring the approach closer to deployment:
-
Real traces. Ingest traffic matrices from GÉANT, Abilene, or CAIDA to validate results on real topologies and demand patterns. Compare hourly and weekly equivalence curves.
-
Emulation & lab experiments. Prototype an SDN controller that implements consolidation and SLO guard, and measure real switch/router power consumption under dynamic sleep/wake policies.
-
Carbon-aware scheduling. Extend the guard to incorporate time-varying grid intensity. Prioritize sleeping during high-carbon hours, and combine with renewable-friendly routing.
Together, these directions form the roadmap for the next stage of this project: moving from synthetic simulation to trace-driven validation, then toward real-world emulation and carbon-aware deployment.