Building BPF applications with libbpf-bootstrap
Table of Contents
Get started with your own BPF application quickly and painlessly with libbpf-bootstrap scaffolding, which takes care of all the mundane setup steps and lets you dive right into BPF fun and minimize the necessary boilerplate. We'll take a look at what libbpf-bootstrap provides and how everything is tied together.
Why libbpf-bootstrap?
BPF is an amazing kernel technology, which allows anyone to take a pick under the cover of how kernel functions without intense kernel development experience and without spending tons of time to set things up for the kernel development. BPF also eliminates the risk of crashing your OS while doing that. Once you get up to speed with BPF, it's lots of fun and power in your hands.
But getting started with BPF can still be intimidating in a large part because setting up a build workflow for even a simple "Hello, World"-like BPF application requires a bunch of steps that could be frustrating and intimidating for the new BPF developer. It's not really all that complicated, but knowing the necessary steps is an (unnecessarily) hard part which probably demotivates a lot of people from even trying, despite all the interest in and promise of BPF.
libbpf-bootstrap is a scaffolding playground setting up as much things as possible for beginner users to let them dive straight into writing BPF programs and tinkering with them without unnecessary frustrations of initial setup. It takes into account best practices developed in BPF community over last few years and provides a modern and convenient workflow with, arguably, best BPF user experience to date. libbpf-bootstrap is relying on libbpf and uses a simple Makefile. For users needed more advanced set ups, it should be a good starting point. At the very least, if Makefile can't be used directly, it's simple enough to just transfer the logic to whichever build system needs to be used.
libbpf-bootstrap currently has two demo BPF applications available: minimal
and bootstrap
. minimal
is exactly that – the most minimal BPF application
that compiles, loads, and runs a simple BPF equivalent of printf("Hello, World!")
. Being the most minimal one, it also doesn't impose many
requirements on Linux kernel recentness and should run fine on quite old
kernel versions.
minimal
is great for quick experimentation and trying things out locally,
but it's not set up to reflect the setup of a production-intended BPF-based
application deployable across a variety of kernels. bootstrap
is such an
example. bootstrap
demo shows off a real-world approach to building out
minimal, but fully functional and portable
BPF application. To that end, it does rely on BPF CO-RE
and kernel BTF support, so make sure that your Linux
kernel is built with CONFIG_DEBUG_INFO_BTF=y
Kconfig. See libbpf
README
for the list of Linux distributions that have everything already setup for you.
If you'd like to minimize the hassle of building custom kernel, just stick
with the recent enough versions of any of the major Linux distros.
Additionally, bootstrap
demonstrates BPF global variables usage (Linux 5.5+)
and BPF ring buffer use (Linux 5.8+). Neither of those
features are mandatory to build useful BPF application, but they bring huge
usability improvements and are the way that modern BPF application are built,
so I've added example of using them into a basic bootstrap
example.
Prerequisites
BPF is a very dynamic technology that is constantly being developed and evolved. This means that new features and capabilities are added all the time, so depending on which of them you need, you might need newer kernel versions. But BPF community takes backwards compatibility extremely seriously, which means that old Linux kernels will still run BPF applications just fine, provided you don't need the very latest feature sets. So the simpler and more conservative your BPF application logic and feature set is, the higher the chances are that you'll be able to run your BPF application on old kernels.
Having said that, BPF user experience gets better all the time and BPF in more recent kernel versions provide profound improvements in BPF usability, so if you are just getting started and don't have a strict requirements to support outdated Linux kernel versions, make your life less painful and use the latest kernel version you can get your hands on.
BPF program code is normally written in the C language with some code organization conventions added to let libbpf make sense of BPF code structure and load properly hand everything into the kernel. Clang is the compiler used for BPF code compilation and it's generally recommended to use the latest Clang you can. Still, Clang 10 or newer should work fine for most BPF features, but some more advanced BPF CO-RE features might require Clang 11 or even 12 (e.g., for some of the more recent and advanced CO-RE relocation built-ins).
libbpf-bootstrap bundles with it libbpf (as a Git submodule) and bpftool (for
x86-64 architecture only) to avoid dependency on any specific (and potentially
outdated) versions available in your Linux distribution. Your system should
also have zlib
(libz-dev
or zlib-devel
package) and libelf
(libelf-dev
or elfutils-libelf-devel
package) installed. Those are
dependencies of libbpf
necessary to compile and run it properly.
This is not a primer on BPF technology itself, so some familiarity with basic concepts like BPF program, BPF map, BPF hooks (attach points) are assumed. If you need a refresher on BPF fundamentals, these resources should be a good starting point.
In the rest of this post I'll walk you through the structure of
libbpf-bootstrap, its Makefile
and both minimal
and bootstrap
examples. We'll look at libbpf conventions
and structuring BPF C code for use with libbpf as a BPF program loader, as well
as how to interact with your BPF programs from the user-space using libbpf
APIs.
Libbpf-bootstrap overview
Here's the contents of the libbpf-bootstrap
repository:
$ tree
.
├── libbpf
│ ├── ...
│ ...
├── LICENSE
├── README.md
├── src
│ ├── bootstrap.bpf.c
│ ├── bootstrap.c
│ ├── bootstrap.h
│ ├── Makefile
│ ├── minimal.bpf.c
│ ├── minimal.c
│ ├── vmlinux_508.h
│ └── vmlinux.h -> vmlinux_508.h
└── tools
├── bpftool
└── gen_vmlinux_h.sh
16 directories, 85 files
libbpf-bootstrap
bundles libbpf as a submodule in libbpf/
sub-directory to
avoid depending on system-wide libbpf availability and version.
tools/
contains bpftool
binary, which is used to build BPF
skeletons
of your BPF code. Similarly to libbpf, it's bundled to avoid depending on
system-wide bpftool availability and its version being sufficiently
up-to-date.
Additionally, bpftool can be used to generate your own vmlinux.h
header
with all the Linux kernel type definitions. Chances are you won't need to do
that because libbpf-bootstrap already provides pre-generated
vmlinux.h
in src/
sub-directory. It is based on default kernel config for Linux 5.8
with a bunch of extra BPF-related functionality enabled. This means it should
have lots of commonly needed kernel types and constants already. Due to BPF
CO-RE, vmlinux.h
doesn't have to match
your kernel configuration and version exactly. But if nevertheless you do need to
generate your custom vmlinux.h
, feel free to check
tools/gen_vmlinux_h.sh
script to see how it can be done.
Beyond self-explanatory LICENSE
and README.md
the rest of libbpf-bootstrap
is contained in a src/
sub-directory.
Makefile defines the necessary build rules to compile all the supplied (and your custom ones) BPF apps. It follows a simple file naming convention:
<app>.bpf.c
files are the BPF C code that contain the logic which is to be executed in the kernel context;<app>.c
is the user-space C code, which loads BPF code and interacts with it throughout the lifetime of the application;- optional
<app>.h
is a header file with the common type definitions and is shared by both BPF and user-space code of the application.
So, minimal.c
and minimal.bpf.c
form the minimal
BPF demo app. And
bootstrap.c
, bootstrap.bpf.c
, and bootstrap.h
are the bootstrap
BPF
app. Simple.
Minimal app
minimal
is a good example to start with. Consider it a minimalistic
playground for trying BPF things out. It doesn't use BPF CO-RE, so you can use
older kernels and just include your system kernel headers for kernel type
definitions. It's not the best approach for building production-ready
applications and tools, but is good enough for local experimentation.
The BPF side
Here's the BPF-side code (minimal.bpf.c) in its entirety:
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
/* Copyright (c) 2020 Facebook */
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
int my_pid = 0;
SEC("tp/syscalls/sys_enter_write")
int handle_tp(void *ctx)
{
int pid = bpf_get_current_pid_tgid() >> 32;
if (pid != my_pid)
return 0;
bpf_printk("BPF triggered from PID %d.\n", pid);
return 0;
}
#include <linux/bpf.h>
includes some basic BPF-related types and constants
necessary for using the kernel-side BPF APIs (e.g., BPF helper function
flags). This header is needed for the bpf_helpers.h
header, included next.
bpf_helpers.h
is provided by libbpf
and contains most-often used macros,
constants, and BPF helper definitions, which are used by virtually every
existing BPF application. bpf_get_current_pid_tgid()
above is an example of
such BPF helper.
LICENSE
variable defines the license of your BPF code. Specifying the
license is mandatory and is enforced by the kernel. Some BPF functionality is
unavailable to non-GPL-compatible code. Note the special SEC("license")
annotation. SEC()
(provided by bpf_helpers.h
) puts variables and functions
into the specified sections. SEC("license")
, along some other section names,
is the convention dictated by libbpf
, so make sure you stick to it.
Next, we see the use of an exciting BPF feature: global variables. int my_pid = 0;
does exactly what you'd expect: it defines a global variable which BPF
code can read and update just like any user-space C code would do with
a global variable. It's extremely convenient and also performant to use BPF
global variables for maintaining the state of your BPF program. Additionally,
such global variables can be read and written from the user-space side. This
feature is available starting from Linux 5.5 version. It is frequently used
for things like configuring BPF application with extra settings, low-overhead
stats, etc. It can also be used to pass data back-and-forth between in-kernel
BPF code and user-space control code.
SEC("tp/syscalls/sys_enter_write") int handle_tp(void *ctx) { ... }
defines
the BPF program which will be loaded into the kernel. It's is represented as
a normal C function in a specially-named section (using SEC()
macro).
Section name defines what type of BPF program libbpf should create and
how/where it could be attached in the kernel. In this case, we define
a tracepoint BPF program, which will be called each time a write()
syscall
is invoked from any user-space application.
There could be many BPF programs defined within the same BPF C code file. They could have different types (i.e.,
SEC()
annotations). E.g., you can have few different BPF programs, each for a different tracepoint or some other kernel event (e.g., network packet being processed, etc). You can also define multiple BPF programs with the sameSEC()
attribute:libbpf
will handle that just fine. All BPF programs defined within the same BPF C code file share all the global state (likemy_pid
variable, but also any BPF map, if used). This is frequently utilized to coordinate few collaborating BPF programs.
Let's now look at what handle_tp
BPF program is doing:
int pid = bpf_get_current_pid_tgid() >> 32;
if (pid != my_pid)
return 0;
This part gets the PID (or "TGID" in internal kernel terminology) encoded in
upper 32 bits of bpf_get_current_pid_tgid()
's return value. It then checks
if the process triggering write()
syscall is our minimal
process. This is
quite important on a busy system, because most probably lots of unrelated
processes are going to issue write()
s, making it really hard to experiment
with your own BPF code on your own terms. my_pid
global variable is going to
be initialized with the actual PID of the minimal
process from the
user-space code below.
bpf_printk("BPF triggered from PID %d.\n", pid);
This is a BPF equivalent of printf("Hello, world!\n")
! It emits formatted
string to the special file at /sys/kernel/debug/tracing/trace_pipe
, which you
can cat to see its contents from the console (make sure you use sudo
or run
under root):
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: BPF triggered from PID 3840345.
<...>-3840345 [010] d... 3220702.101265: bpf_trace_printk: BPF triggered from PID 3840345.
bpf_printk()
helper andtrace_pipe
file is not intended to be used in production, but it's indispensable for debugging BPF code and getting insights into what your BPF program is doing. As there is no BPF debugger yet,bpf_printk()
is usually the fastest and most convenient way to debug a problem in a BPF code.
That's it for the BPF-side of minimal
app. Feel free to add any extra code to the
body of handle_tp()
BPF program and extend it according to your needs.
The user-space side
Let's now look at how things are tied together from user-space (minimal.c), skipping some pretty obvious parts (please check the full sources anyways).
#include "minimal.skel.h"
This includes a BPF skeleton of the BPF code in minimal.bpf.c
. It is
auto-generated by bpftool in one of Makefile steps and reflects the high-level
structure of minimal.bpf.c
. It also simplifies the BPF code deployment
logistics by embedding contents of the compiled BPF object code inside the
header file, which gets included from the user-space code. No extra files to
deploy along your application binary, just include the header and forget about
it.
BPF skeleton is purely a
libbpf
construct, kernel knows nothing about it. But it is a huge quality of life improvement for BPF development process, so consider familiarizing yourself with it. See the blog post for some more details about BPF skeleton.
libbpf-bootstrap BPF skeletons are generated into src/.output/<app>.skel.h
after successful make
invocation. To get a better intuition about it, here's
a high-level overview of the skeleton for minimal.bpf.c
:
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* THIS FILE IS AUTOGENERATED! */
#ifndef __MINIMAL_BPF_SKEL_H__
#define __MINIMAL_BPF_SKEL_H__
#include <stdlib.h>
#include <bpf/libbpf.h>
struct minimal_bpf {
struct bpf_object_skeleton *skeleton;
struct bpf_object *obj;
struct {
struct bpf_map *bss;
} maps;
struct {
struct bpf_program *handle_tp;
} progs;
struct {
struct bpf_link *handle_tp;
} links;
struct minimal_bpf__bss {
int my_pid;
} *bss;
};
static inline void minimal_bpf__destroy(struct minimal_bpf *obj) { ... }
static inline struct minimal_bpf *minimal_bpf__open_opts(const struct bpf_object_open_opts *opts) { ... }
static inline struct minimal_bpf *minimal_bpf__open(void) { ... }
static inline int minimal_bpf__load(struct minimal_bpf *obj) { ... }
static inline struct minimal_bpf *minimal_bpf__open_and_load(void) { ... }
static inline int minimal_bpf__attach(struct minimal_bpf *obj) { ... }
static inline void minimal_bpf__detach(struct minimal_bpf *obj) { ... }
#endif /* __MINIMAL_BPF_SKEL_H__ */
It has the struct bpf_object *obj;
which can be passed to libbpf
API functions. It also has maps
, progs
, and links
"sections", that
provide direct access to BPF maps and programs defined in your BPF code
(e.g., handle_tp
BPF program). These references can be passed to
libbpf APIs directly to do something extra with BPF map/program/link.
Skeleton can also optionally have bss
, data
, and rodata
sections that
allow direct (no extra syscalls needed) access to BPF global variables
from user-space. In this case, our my_pid
BPF variable corresponds to
the bss->my_pid
field.
Now onto what main()
does in our minimal
app:
int main(int argc, char **argv)
{
struct minimal_bpf *skel;
int err;
/* Set up libbpf errors and debug info callback */
libbpf_set_print(libbpf_print_fn);
libbpf_set_print()
provides a custom callback for all libbpf logs. This is
extremely useful, especially during active development, because it allows to
capture helpful libbpf debug logs. By default, libbpf will log only
error-level messages, if something goes wrong. Debug logs, though, are helpful
to get an extra context on what's going on and debug problems faster.
If you ever need to report some problem with libbpf and/or your libbpf-based application (e.g., by sending email to the bpf@vger.kernel.org mailing list), please always include full debug logs from libbpf.
In minimal
's case, libbpf_print_fn()
just emits everything to stdout.
/* Bump RLIMIT_MEMLOCK to allow BPF sub-system to do anything */
bump_memlock_rlimit();
This is a somewhat confusing, but necessary, step that pretty much any
realistic BPF application has to do. It bumps kernel's internal per-user
memory limit to allow BPF sub-system to allocate necessary resources for your
BPF programs, maps, etc. This limitation most probably is going away soon, but
for now you have to bump RLIMIT_MEMLOCK
limit one way or
another. Doing it through setrlimit(RLIMIT_MEMLOCK, ...)
, as minimal
code
is doing, is the simplest and the most convenient way.
/* Load and verify BPF application */
skel = minimal_bpf__open_and_load();
if (!skel) {
fprintf(stderr, "Failed to open and load BPF skeleton\n");
return 1;
}
Now, using an auto-generated BPF skeleton, prepare and load BPF programs into kernel and let the BPF verifier check it. If this step succeeds, your BPF code is correct and ready to be attached to whatever BPF hooks you need.
/* ensure BPF program only handles write() syscalls from our process */
skel->bss->my_pid = getpid();
But first, we need to communicate our PID to BPF code, so that it can filter
out irrelevant write()
invocations from unrelated processes. This sets
my_pid
BPF global variable value directly, though the memory-mapped
region. As mentioned above, this is how the user-space can access (read and
write) BPF global variables.
/* Attach tracepoint handler */
err = minimal_bpf__attach(skel);
if (err) {
fprintf(stderr, "Failed to attach BPF skeleton\n");
goto cleanup;
}
printf("Successfully started!\n");
We can finally attach handle_tp
BPF program, which by now is readily
awaiting in the kernel, to the corresponding kernel tracepoint. This
"activates" the BPF program and the kernel will start executing our custom BPF
code in the kernel context in response to each write()
syscall invocation!
libbpf is able to automatically determine where to attach BPF program to by looking at its special
SEC()
annotation. This doesn't work for all possible BPF program types, but it does for lots of them: tracepoints, kprobes, and quite a few others. Additionally, libbpf provides extra APIs to do the attachment programmatically.
for (;;) {
/* trigger our BPF program */
fprintf(stderr, ".");
sleep(1);
}
Endless loop here makes sure that handle_tp
BPF program stays attached in
the kernel until user kills the process (e.g., by pressing Ctrl-C
). Also, it
will generate the write()
syscall invocation periodically (once a second)
through fprintf(stderr, ...)
call. This way it's possible to "monitor"
internals of the kernel from handle_tp
and how the state changes over time.
cleanup:
minimal_bpf__destroy(skel);
return -err;
}
If any of the previous steps go wrong, minimal_bpf__destroy()
will clean up
all the resources (both in the kernel and in the user-space). It's a good
practice to make sure you always do this, but even if your application crashes
without cleaning up, the kernel will still clean up resources. Well, in most
cases, at least. There are some BPF program types which will stay active in
the kernel even if the owner user-space process dies, so make sure to check
that, if necessary.
And that's pretty much it for the minimal
application. BPF skeleton use
makes all this pretty straightforward.
Makefile
Now that we looked at the minimal
app, we have enough context to look at
what Makefile
does to compile everything into a final executable. I'll skip some necessary
boilerplate parts and instead concentrate only on the essentials.
INCLUDES := -I$(OUTPUT)
CFLAGS := -g -Wall
ARCH := $(shell uname -m | sed 's/x86_64/x86/')
Here we define some extra parameters used during the compilation. By default,
all intermediate files will be written under src/.output/
sub-directory, so
this directory is added into the include path for C compiler to find BPF
skeletons and libbpf headers. All the user-space files are compiled with
debug info (-g
) and without any optimizations to make it easier to debug
them. ARCH
captures the host OS architecture, which is passed along into BPF
code compilation step later for use with low-level tracing helper macros (in
libbpf's bpf_tracing.h
).
APPS = minimal bootstrap
This is a list of available applications. If you copy/paste minimal
or
bootstrap
and create your own copy, just add the name of your application
here to make it build. Each app defines corresponding make target, so you can
build just relevant files with:
$ make minimal
The whole build process happens in a few steps. First, libbpf is built as
a static library and its API headers are installed into .output/
:
# Build libbpf
$(LIBBPF_OBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)/libbpf
$(call msg,LIB,$@)
$(Q)$(MAKE) -C $(LIBBPF_SRC) BUILD_STATIC_ONLY=1 \
OBJDIR=$(dir $@)/libbpf DESTDIR=$(dir $@) \
INCLUDEDIR= LIBDIR= UAPIDIR= \
install
If you'd like to build against system-wide libbpf
shared library, you can
remove this step and adjust compilation rules accordingly.
The next step builds BPF C code (*.bpf.c
) into a compiled object file:
# Build BPF code
$(OUTPUT)/%.bpf.o: %.bpf.c $(LIBBPF_OBJ) $(wildcard %.h) vmlinux.h | $(OUTPUT)
$(call msg,BPF,$@)
$(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) $(INCLUDES) -c $(filter %.c,$^) -o $@
$(Q)$(LLVM_STRIP) -g $@ # strip useless DWARF info
We use Clang to do this. -g
is mandatory to make Clang emit BTF information.
-O2
is also necessary for BPF compilation. -D__TARGET_ARCH_$(ARCH)
defines
necessary macro for bpf_tracing.h
header dealing with low-level struct pt_regs
macro. You can disregard that if you are not dealing with kprobes and
struct pt_regs
. Finally, we strip DWARF info out from the generated .o
file, as it's never used and is mostly just a compilation artifact of Clang.
BTF is the only info necessary for BPF functionality and that one is preserved during stripping. It's important to reduce the size of a
.bpf.o
file because it will get embedded into the final application binary through BPF skeleton, so there is no need to increase its size with unneeded DWARF data.
Now that we have .bpf.o
file generated, bpftool
is used to generate
a corresponding BPF skeleton header (.skel.h
) with bpftool gen skeleton
command:
# Generate BPF skeletons
$(OUTPUT)/%.skel.h: $(OUTPUT)/%.bpf.o | $(OUTPUT)
$(call msg,GEN-SKEL,$@)
$(Q)$(BPFTOOL) gen skeleton $< > $@
With that, we make sure that whenever BPF skeleton is updated, user-space
parts of the application are rebuilt as well, because they need to embed BPF
skeleton during the compilation. The compilation of user-space .c
→ .o
is
pretty straightforward otherwise:
# Build user-space code
$(patsubst %,$(OUTPUT)/%.o,$(APPS)): %.o: %.skel.h
$(OUTPUT)/%.o: %.c $(wildcard %.h) | $(OUTPUT)
$(call msg,CC,$@)
$(Q)$(CC) $(CFLAGS) $(INCLUDES) -c $(filter %.c,$^) -o $@
Finally, using only user-space .o
file (together with libbpf.a
static
library) the final binary is generated. -lelf
and -lz
are
dependencies of libbpf and need to be provided explicitly to the compiler:
# Build application binary
$(APPS): %: $(OUTPUT)/%.o $(LIBBPF_OBJ) | $(OUTPUT)
$(call msg,BINARY,$@)
$(Q)$(CC) $(CFLAGS) $^ -lelf -lz -o $@
That's it, after running through these few steps, you'll end up with a small
user-space binary that embeds compiled BPF code through BPF skeleton and has
statically linked libbpf in it, so doesn't depend on system-wide libbpf
availability. The result is a small (200KB), fast, stand-alone binary, just like
Brendan Gregg asked.
Bootstrap app
Now that we covered minimal
app and how compilation is done in Makefile
,
we'll go through some extra BPF features demonstrated by bootstrap
app.
bootstrap
is how I'd write a production-ready BPF application in the modern
BPF Linux environment. It relies on BPF CO-RE (read why
here) and requires Linux kernel built
with CONFIG_DEBUG_INFO_BTF=y
(see
here).
bootstrap
traces exec()
syscalls (with SEC("tp/sched/sched_process_exec") handle_exit
BPF program), roughly corresponding to a spawning of a new
process (ignoring the fork()
part, for simplicity). Additionally, it traces
exit()
s (with SEC("tp/sched/sched_process_exit") handle_exit
BPF program)
to know when each process exits. These two BPF programs, working together,
allow to capture interesting information about any new process, like binary's
filename, as well as to measure the lifetime of the process and collect
interesting stats when the process dies, like exit code or amount of consumed
resources, etc. I find it a great starting point to dive into the kernel
internals and observe how things really work under the hood.
bootstrap
is also using argp API
(part of libc) for command-line argument parsing. Please check out
"Step-by-Step into Argp" tutorial
for the great intro into the argp
usage. This is how optional minimum process
lifetime duration is parsed (see min_duration_ns
read-only variable below;
use sudo ./bootstrap -d 100
to show only processes that existed for at least
100ms), as well as verbose mode flag (try sudo ./bootstrap -v
), enabling
libbpf
debug logs.
Includes: vmlinux.h, libbpf and app headers
Here's the include section on the BPF side of things (bootstrap.bpf.c):
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#include "bootstrap.h"
This differs from minimal.bpf.c
in that we now use vmlinux.h
header file,
which includes all the types from the Linux kernel in one file. It comes
pre-generated
with libbpf-bootstrap, but one can also generate the custom one with bpftool
(see gen_vmlinux_h.sh).
All the types in
vmlinux.h
come with extra__attribute__((preserve_access_index))
applied, which makes Clang generate BPF CO-RE relocations, allowing libbpf to adapt your BPF code to the specific memory layout of the host kernel, even if it differs from the one thatvmlinux.h
was originally generated from. This is a crucial aspect of building portable pre-compiled BPF application that doesn't require entire Clang/LLVM toolchain to be deployed alongside it to the target system. The alternative is BCC way of compiling BPF code in runtime, which comes with a bunch of downsides.
Keep in mind that vmlinux.h
can't be combined with other system-wide kernel
headers, as you'll inevitably run into type redefinitions and conflicts. So
please stick with using just vmlinux.h
, libbpf-provided headers, and your
application's custom headers to avoid unnecessary headaches.
In addition to bpf_helpers.h
we also use few extra libbpf-provided headers,
bpf_tracing.h
and bpf_core_read.h
, which provide some extra macros for
writing BPF CO-RE-based tracing BPF apps.
Finally, bootstrap.h
contains common type definitions, shared between BPF and
user-space code of the bootstrap
app (for BPF ringbuf, see below).
BPF maps
bootstrap
demonstrates the use of BPF maps, which is a BPF concept for abstract data container. Many different things are modeled as BPF maps: from simple arrays and hash maps to per-socket and per-task local storage, BPF perf and ring buffers, and even some more exotic uses. The important thing is that most BPF maps allow looking up, updating, and deleting its elements by some key. Some BPF maps allow extra (or alternative) operations, like BPF ring buffer, which allows to enqueue data, but never delete it from the BPF side. BPF maps are the means to share the state between (potentially many) BPF programs and user-space. The other one (more performant and convenient for storing simple plain data) is BPF global variables (which under the hood are still using BPF maps).
In bootstrap
's case, we define BPF map named exec_start
of type
BPF_MAP_TYPE_HASH
(a hash map) with the maximum size of 8192 entries, the
key is of pid_t
type and the value is a 64-bit unsigned integer, storing the
nanosecond-granularity timestamp of process's exec event. This is a so-called
BTF-defined map. SEC(".maps")
annotation is mandatory to let libbpf know
that it needs to create the corresponding BPF map in the kernel and wire
everything properly in the BPF code:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 8192);
__type(key, pid_t);
__type(value, u64);
} exec_start SEC(".maps");
Adding/updating entries in such hashmap is simple:
pid_t pid;
u64 ts;
/* remember time exec() was executed for this PID */
pid = bpf_get_current_pid_tgid() >> 32;
ts = bpf_ktime_get_ns();
bpf_map_update_elem(&exec_start, &pid, &ts, BPF_ANY);
bpf_map_update_elem()
BPF helper takes pointers to the map itself, key and
value pointers, as well as extra flags, which in this case (BPF_ANY
) tell to
either add a new key, or update the existing one.
Notice how the second BPF program (handle_exit
) is looking up the element
from the same BPF map and subsequently deletes it. This shows how the
exec_start
map is shared between the two BPF programs:
pid_t pid;
u64 *start_ts;
...
start_ts = bpf_map_lookup_elem(&exec_start, &pid);
if (start_ts)
duration_ns = bpf_ktime_get_ns() - *start_ts;
...
bpf_map_delete_elem(&exec_start, &pid);
Read-only BPF configuration variables
bootstrap
, as opposed to minimal
, is using a read-only global variable:
const volatile unsigned long long min_duration_ns = 0;
const volatile
part is important, it marks the variable as read-only for BPF
code and user-space code. In exchange, it makes the specific value of
min_duration_ns
variable known to the BPF verifier during the BPF program
verification time. This (due to a more detailed knowledge) allows BPF
verifier to prune the dead code, if the read-only value provably omits some
code paths. This property is often desirable for some more advanced use cases,
like dealing with various compatibility checks and extra configuration.
volatile
is necessary to make sure Clang doesn't optimize away the variable altogether, ignoring user-space provided value. Without it, Clang is free to just assume 0 and remove the variable completely, which is not at all what we want.
From the user-space part (in bootstrap.c),
there is a slight difference in initializing such read-only global variables.
They need to be set before BPF skeleton is loaded into the kernel. So,
instead of using a single-step bootstrap_bpf__open_and_load()
, we need to
separately first bootstrap_bpf__open()
the skeleton, set read-only
variable values, and only then bootstrap_bpf__load()
skeleton into the
kernel:
/* Load and verify BPF application */
skel = bootstrap_bpf__open();
if (!skel) {
fprintf(stderr, "Failed to open and load BPF skeleton\n");
return 1;
}
/* Parameterize BPF code with minimum duration parameter */
skel->rodata->min_duration_ns = env.min_duration_ms * 1000000ULL;
/* Load & verify BPF programs */
err = bootstrap_bpf__load(skel);
if (err) {
fprintf(stderr, "Failed to load and verify BPF skeleton\n");
goto cleanup;
}
Note that such read-only variables are part of rodata
section in the
skeleton (not data
or bss
): skel->rodata->min_duration_ns
. After the BPF
skeleton is loaded, user-space code can only read the value of the read-only
variable. BPF code can also only ever read such variables. BPF verifier will
reject the BPF program if it detects an attempt to write to such a variable.
BPF ring buffer
bootstrap
is using BPF ring buffer map heavily for preparing and sending
data back to user-space. It's using the bpf_ringbuf_reserve()
/bpf_ringbuf_submit()
combo for best usability
and performance. Please check the BPF ring buffer post
for more thorough coverage. That post goes through a very similar
functionality in detail, looking at examples in a separate
bpf-ringbuf-examples
repo. It should also give you a pretty good idea how to use BPF perf buffer,
if you choose to do so.
BPF CO-RE
BPF CO-RE (Compile Once – Run Everywhere) is a pretty big topic, covered
separately in a dedicated blog post,
please make sure to check it out as well. Here's one example from
bootstrap.bpf.c
of using BPF CO-RE features to read data from kernel's
struct task_struct
:
e->ppid = BPF_CORE_READ(task, real_parent, tgid);
In non-BPF world, it would be written as just e->ppid = task->real_parent->tgid;
,
but BPF verifier requires an extra effort because of the risk of reading an
arbitrary kernel memory. BPF_CORE_READ()
takes care of this in a succinct
manner and records necessary BPF CO-RE relocations along the way, allowing
libbpf to adjust all the field offsets to the specific memory layout of the
host kernel. Refer to this post
for more examples.
Conclusion
This should do it for the broad coverage of libbpf-bootstrap
and various
BPF/libbpf aspects. Hopefully, libbpf-bootstrap
will allow you to get over
the initial hurdle of getting everything set up to get started with BPF
development and instead will allow to spend more time on BPF itself and
tinkering with kernel observability, tracing, what have you. That, after all,
is the most exciting part of using BPF (for me, at least).
For the more seasoned BPF developers, it should have demonstrated a way to set up everything with modern BPF usability boosters like BPF skeleton, BPF ringbuf, BPF CO-RE (just in case you haven't followed BPF development closely).
So please check the Github repo and give it a go. PRs with bug fixes and improvements, as well as any suggests, are always welcome. Have fun with BPF!