eBPF in Plain English
- eBPF can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules.
- eBPF is a revolutionary technology because it lets programmers execute custom bytecode within the kernel without having to change the kernel or load kernel modules.
- eBPF is event-driven, i.e. each eBPF program is an event handler. These events are called “hooks”.
- eBPF programs interact with user-space programs via eBPF maps that are key-value pairs.
Memory is partitioned between kernel and user space in the Linux architecture. The kernel space is where the kernel core code and device drivers are executed. Kernel-space processes have complete access to all hardware, including the CPU, memory, and storage. All other processes operate in userspace, which is dependent on the kernel for hardware access. Userspace processes connect with the kernel through system calls to perform privileged tasks such as disc or network I/O.
While this separation provides a safe separation of processes, the syscall interface is insufficient in certain circumstances, and developers want further flexibility to execute custom code directly in the kernel without modifying the kernel’s source code. Linux offers Linux Kernel Modules, which may be loaded directly into the kernel during runtime.
Kernel modules, on the other hand, pose security issues due to their ability to execute arbitrary code directly in the kernel space. A kernel module with erroneous code may easily crash the kernel. Through eBPF, Linux offers a mechanism for running safe, certified sandboxed programmes in the kernel area.
So What is eBPF?
How does eBPF Work?
eBPF programs are event-driven, meaning they can be hooked to certain events and run by the kernel when that particular event occurs. The program can store information in maps, print to ring buffers, or call a subset of kernel functions defined by a special API. The kernel manages the map and ring buffer structures, and multiple eBPF programs can access the same map to share data.
eBPF programs follow these steps:
- The bytecode of the eBPF program is sent to the kernel along with a program type that determines where the program needs to be attached, which in-kernel helper functions the verifier will allow to be called, whether network packet data can be accessed directly, and what type of object will pass as the first argument to the program.
- The kernel runs a verifier on the bytecode. The verifier runs several security checks on the bytecode, ensuring that the program terminates and does not contain any loop that could potentially lock up the kernel. It also stimulates the execution of the eBPF program and checks the state of the virtual machine at every step to ensure the register and stack states are valid. Finally, it uses the program type to restrict the allowed kernel function calls from the program.
- The bytecode is JIT-compiled into native code and attached to the specified location.
- When the specified event occurs, the program is executed and writes data to the ring buffer or the map.
- The map or ring buffer can be read by the user space to get the program result.
What are the benefits of eBPF?
eBPF is most commonly used to trace and profile userspace processes and, more recently, as a way to enhance observability capabilities. It has many distinct benefits over other methods:
- eBPF applications are sandboxed and verified, ensuring that the kernel does not crash or stall in a loop. This improves the security of kernel modules.
- eBPF shifts packet filtering from user space to kernel space, reducing superfluous packet copies and resulting in a large speed boost. The software runs rapidly since it is JIT-compiled.
- Using eBPF does not need the modification of kernel source code or the creation of full-fledged kernel modules. An eBPF application is simple to create and run.
You can use several open-source tools to build custom programs that get loaded into the kernel at runtime in case you want to get your hands dirty. The list includes:
eBPF is a fantastic addition to the Linux kernel. The ability to execute code in the kernel in a safe and sandboxed manner is useful for observability, network traffic management, and containerisation.
- ebpf.io — A gateway to discover all the basics of eBPF, including a listing of the main related projects and of community resources.
- Cilium’s BPF and XDP Reference Guide — In-depth documentation about most features and aspects of eBPF.
- BPF Documentation — Index for BPF-related documentation coming with the Linux kernel.
- linux/Documentation/networking/filter.rst — eBPF specification (somewhat outdated; information should still be valid, but not exhaustive).
- BPF Design Q&A — Frequently Asked Questions on the decisions behind the BPF infrastructure.
- HOWTO interact with BPF subsystem — Frequently Asked Questions about contributing to eBPF development.
User Space eBPF
- uBPF — Written in C. Contains an interpreter, a JIT compiler for x86_64 architecture, an assembler and a disassembler.
- A generic implementation — With support for FreeBSD kernel, FreeBSD user space, Linux kernel, Linux user space and macOS user space. Used for the VALE software switch’s BPF extension module.
- rbpf — Written in Rust. Interpreter for Linux, macOS and Windows, and JIT-compiler for x86_64 under Linux.
- PREVAIL — A user space verifier for eBPF using an abstract interpretation layer, with support for loops.
- oster — Written in Go. A tool for tracing execution of Go programs by attaching eBPF to uprobes.
- wachy — A tracing profiler that aims to make eBPF uprobe-based debugging easier to use. This is done by displaying traces in a UI next to the source code and allowing interactive drilldown analysis.
Originally published at https://www.israelo.io on April 6, 2022.