NETWORK-OPS-SRE-01
Site reliability engineer running AI cluster networks day-to-day.
Audience
- · 5-12
- Current: Network SRE / NetOps Lead
- Pain: Debugging hung allreduce at 10k+ GPU scale
- Pain: Topology change rollout safety (link drains)
Product Needs
(none)
Channels
(none)
Competitor Lens
(none)