Site Reliability Engineer · Austin, TX

Hyunsuk Bang

I build and operate large-scale Kubernetes platforms at Tesla, with a focus on networking, eBPF, and the reliability of the infrastructure that runs underneath.

About

I'm a Site Reliability Engineer at Tesla with a strong passion for computer networking and platform engineering. I enjoy translating systems concepts into practical infrastructure — Kubernetes, Cilium, eBPF, and the tooling that keeps clusters healthy at scale.

Experience

Site Reliability Engineer

Tesla

2025 — Present

  • Automated lifecycle management of Kubernetes clusters running NVIDIA and AMD GPUs, including health monitoring and auto-remediation of unhealthy nodes.
  • Contributed to Cilium (CNCF) — fixed a race condition in the IPSec key rotation path that caused packet drops during agent rolling restarts. Solution used pinned BPF maps to defer peer advertisement until kernel XFRM states were ready (PR #44701).
  • Drove the migration from kube-proxy to Cilium's kube-proxy-replacement across all clusters, auditing the Cilium codebase to validate impact and correctness of service routing.
  • Led the migration of Kubernetes clusters from Ingress NGINX to Gateway API + Envoy.
  • Enabled secure arbitrary code execution for AI workloads by architecting a sandboxed runtime on Kata Containers backed by Firecracker microVMs — VM-level isolation inside Kubernetes for running untrusted code.

Site Reliability Engineer Intern

Tesla · Fremont, CA

May 2024 — Dec 2024

  • Migrated computationally intensive Kubernetes clusters from AWS to an on-prem data center, achieving up to 90% cost savings.
  • Built a tool to query Mimir and Prometheus across 300+ Kubernetes clusters, identifying over- and under-utilized workloads with automated email notifications for owners.
  • Analyzed metric cardinality in Mimir to achieve a 10% reduction in cluster workload, and built a Grafana dashboard for ongoing high-cardinality monitoring.
  • Streamlined Cilium maintenance on ArgoCD and authored technical documentation and best practices.
  • Used Prometheus, Grafana, OpenTelemetry, and Tempo to collect, visualize, and analyze metrics and traces for faster incident resolution.

Research Assistant

Illinois Institute of Technology · Chicago, IL

May 2023 — May 2025

  • Conducted in-depth research and built comprehensive test plans for evaluating the performance and accuracy of a BPF compiler.
  • Designed packet-processing applications hitting 10 Gbps in-kernel and 100 Gbps on DPDK using Receive Side Scaling and parallel processing.
  • Built automation scripts to ensure replicability of test results across runs.
  • Collaborated with Fermi Lab on a joint project that leveraged DPDK to generate measurement output.
  • Documented findings, methodologies, and procedures in technical reports and presentations.

Publications

Patchwork: A Traffic Capture and Analysis Platform for Network Experiments on a Federated Testbed

Nishanth Shyamkumar, Hyunsuk Bang, Bjoern Sagstad, Prajwal Somendyapanahalli Venkateshmurthy, Sean Cummings, Nik Sultana

ACM Internet Measurement Conference (IMC '25) · Madison, WI · October 2025

Patchwork is an open-source, user-deployed traffic capture and analysis platform that runs as an experiment on the FABRIC federated testbed. It offloads packet processing to FPGA NICs and DPDK to scale to line rate, and has been running weekly for over a year to produce a testbed-wide profile of how researchers collectively use FABRIC's network.

A Survey on Packet Filtering

Nik Sultana, Hyunsuk Bang, Elena Yulaeva, Ricky K. P. Mok, kc claffy, Richard Mortier

ACM SIGCOMM Computer Communication Review · Vol. 54, Issue 3 · July 2024

A community survey (91 participants) of how packet filtering tools and techniques are used across the networking research and practitioner communities. The paper surfaces prevalent tools, pain points, unmet needs, and proposes future research directions in packet filtering.

Projects

pcap → BPF Compiler & Simulator

C, Go · bpfsimulator.com

CRA Honorable Mention

  • Built a compiler that translates pcap expressions into Berkeley Packet Filters for more efficient and accurate packet filtering.
  • Designed and implemented control-flow-graph optimization passes to improve BPF execution speed.
  • Extended tcpdump's transport-layer capture to work over IPv6.
  • Contributed back to the open-source community — enhanced the "Caper expansion" section on tcpdump.org.
  • Wrote a Go web server that visualizes step-by-step packet processing for user-uploaded pcap files (~2.1K requests/month).
  • Received Honorable Mention from the Computing Research Association (CRA) for Outstanding Undergraduate Research.

Raft Consensus Algorithm

Go · Distributed Systems

  • Implemented the Raft consensus algorithm in Go using Test-Driven Development for robust leader election and log replication.
  • Wrote automated tests that simulate network partitions and recoveries to verify fault tolerance across cluster nodes.
  • Applied machine-learning techniques to tune election timeouts and heartbeat intervals, reducing leader re-election times.

Skills

Platform & Orchestration

  • Kubernetes
  • Helm
  • ArgoCD
  • Kubernetes Operator
  • Docker
  • Ansible
  • Jenkins

Networking

  • Cilium
  • BPF / eBPF
  • DPDK
  • TCP/IP
  • gRPC

Observability

  • Prometheus
  • Grafana
  • OpenTelemetry
  • Splunk

Cloud & Systems

  • AWS
  • Linux
  • Bash
  • PostgreSQL
  • Redis

Education

Illinois Institute of Technology

M.S. & B.S. in Computer Science · GPA 3.91

Sep 2021 — May 2025

Coursework spanning Data Structures, Algorithms, Systems Programming, Databases, Functional Programming, Computer Networking, Software Defined Networking, and Operating Systems.