20231115_浪潮信息_云计算行业研究报告:eBPF技术实践白皮书_64页.pdf
WUAZxOqNoNoPsMoNmRtMnM9PcMaQoMpPpNoNkPqRpOiNpOrPbRoOxPNZsRsMwMmPnR eBPF BPF BPF eBPF eBPF eBPF eBPF eBPF eBPF eBPF 目录 eBPF.6 eBPF.7 2.1 eBPF.7 2.1.1 eBPF.8 2.1.2 JIT.9 2.1.3.10 2.2 eBPF.17 2.2.1 BCC.17 2.2.2 bpfTrace.18 2.2.3 libbpf.18 2.2.4 libbpf-bootstrap.19 2.2.5 cilium-ebpf.19 2.2.6 Coolbpf.19 eBPF.23 3.1 eBPF.23 3.1.1.23 3.1.2 eBPF.28 3.2 eBPF IO.34 3.2.1 IO.34 3.2.2 bpftrace IO.36 3.3 eBPF.41 3.3.1 Linux.41 3.3.2 eBPF Linux.44 3.4 eBPF.53 3.4.1.53 3.4.2 eBPF.54.63 eBPF 简介 eBPF eBPF Linux()eBPF eBPF eBPF eBPF eBPF 技术介绍 eBPF 2.1 eBPF eBPF BPF BPF BPF map BPF eBPF 1 eBPF Program LLVM/Clang eBPF 2 bpf()3 Verifier 4 JIT Compiler 5/HOOK 6 HOOK eBPF eBPF map eBPF 技术介绍 图 2-1-1 eBPF 基本架构 与使用 示意图 2.1.1 eBPF BPF Verifier 1 eBPF ELF section map BTF eBPF ELF 2 map map map map eBPF map Helper Extern kconfig eBPF 技术介绍 2 1 BPF convert_ctx_access CTX 2 CO-RE(BTF)3 Verifier Verifier eBPF eBPF DAG bytecode class BPF_CALL 2.1.2 JIT CPU JIT Just In Time eBPF 技术介绍 JIT 图 2-1-2 JIT 的作用示 意图 2.1.3 eBPF kprobe eBPF eBPF eBPF eBPF eBPF root CAP_BPF capability root eBPF kernel.unprivileged_bpf_disabled sysctl false root bpf()userid:497168,docid:145752,date:2023-11-15,eBPF 技术介绍 eBPF eBPF eBPF map map BPF BPF BPF 1、map 的实现 map maps section bpf map eBPF map anon_inodefs fd map fd map eBPF BPF map fd cmd BPF_PROG_LOAD bpf eBPF eBPF map fd map map BPF BPF map eBPF eBPF map map bpf_map_lookup_elem bpf_map_update_elem bpf_map_delete_elem helper function map 2、map 的性能 eBPF 技术介绍 eBPF map Array Percpu Array Hash Percpu Hash Lru Hash Percpu lru Hash Lpm map Array Percpu Array Hash Percpu Hash Lru Hash Lpm Array Percpu Array Hash Percpu Hash Array Hash lookup helper Array Percpu Array Hash Percpu Hash Percpu 2.1.3.1 eBPF 常见 程序 类型 eBPF eBPF 2-1-1 eBPF tracepoint,kprobe,perf_event xdp,sock_ops,sk_msg,sk_skb,sk_reuseport,socket_filter,eBPF 技术介绍 cgroup_sock_addr LSM,flow_dissector,lwt_in,lwt_out,lwt_xmit 1、kprobe kprobe linux Probe kprobe kprobe kprobe kretprobe eBPF kprobe eBPF 技术介绍 2、XDP XDP eXpress Data Path XDP Linux eBPF eBPF Linux XDP native DDos 4 2-1-3 XDP XDP 2-1-2 XDP eBPF 技术介绍 XDP_DROP DDOS XDP_PASS XDP_TX/XDP_REDIRECT/XDP_ABORTED XDP 3、TC Linux TC Linux Traffic Control QoS Quality of Service Linux Linux HTTP FTP SSH TC:1;2;3;4 eBPF TC sch_handle_ingress sch_handle_egress HOOK TC ingress TC egress eBPF 技术介绍 eBPF/TC sk_buff eBPF eBPF map map TC 4、Sock_ops sock_ops eBPF socket TCP 15 TCP socket TCP sock_ops BPF socket IP port TCP TCP client server TCP 1 buffer size RTT buffer 2 SYN RTO SYN-ACK RTO 3 ECN TCP DCTCP DataCenter TCP 5、LSM LSM Linux Security Modules Linux MAC Mandatory Access Control RBAC Role-Based Access Control LSM Linux hook eBPF 技术介绍/LSM SELinux AppArmor Linux BPF LSM eBPF LSM eBPF LSM eBPF LSM eBPF LSM eBPF LSM Hook eBPF LSM 2.2 eBPF 2.2.1 BCC BCC python API libbpf bpf bpf.o BCC LLVM LLVM kernel-devel eBPF 技术介绍 2.2.2 bpfTrace bpfTrace BPF BCC BCC bpfTrace DSL LLVM BCC libbcc.so 2.2.3 libbpf libbpf linux tools/lib/bpf C BPF bpf kernel libbpf 1 BTF BTF.h bpf vmlinux.h 2 clang-11 eBPF preserve_access_index field relocation 3 bpf bpf Verifier call bpf eBPF eBPF 技术介绍 2.2.4 libbpf-bootstrap libbpf-bootstrap libbpf libbpf eBPF eBPF 2.2.5 cilium-ebpf cilium-ebpf Go BPF eBPF Go Cilium/eBPF 1 go cgo 2 Makefile 3 CO-RE 4 Cilium eBPF Cilium/eBPF 2.2.6 Coolbpf Coolbpf CORE Compile Once-Run Everywhere BCC Coolbpf&BTF eBPF eBPF 技术介绍 2-2-1 Coolbpf 1 Coolbpf python rust go c Coolbpf lwcb eBPF Generic library eBPF API eBPF map eBPF program Language bindings bindings Language library bindings language bindings eBPF 技术介 绍 2-2-2 Coolbpf 2 bpf.c bpf.c bpf.so bpf.o pip install Coolbpf lwcb eBPF eBPF 技术介绍 2-2-3 Coolbpf 3 Coolbpf eBPF Coolbpf eBPF Coolbpf eBPF eBPF ioctl eBPF ioctl map prog JIT 2-2-4 Coolbpf eBPF 的应用场景与实践 eBPF eBPF eBPF 3.1 eBPF 3.1.1 IO 1 2 3 eBPF 的应用场景与实践 4 5 1 Wireshark tcpdump 2 Ping Traceroute 3 SNMP Simple Network Management Protocol ECS OS VPC IO IO I/O eBPF 的应用场景与实践 I/O 1 I/O I/O 2 I/O I/O I/O I/O 1 diskstats iostat tsar-io vmstat-d I/O IOPS I/O BPS IOPS BPS 2 proc/$pid/io pidstat-d I/O I/O 3 taskstat iotop I/O IOWait I/O I/O I/O eBPF 的应用场景与实践 swapping Linux OOM killer 1 malloc()Valgrind memcheck CPU 2 libtcmalloc 3 gdb free()Linux Linux Linux eBPF 的应用场景与实践 1 sysstat sysstat sar mpstat pidstat sar CPU I/O 2 top top CPU CPU top CPU top 3 Perf Perf Linux perf sched 4 Latencytop Latencytop Linux Latencytop eBPF 的应用场景与实践 3.1.2 eBPF IO eBPF IO eBPF PingTrace PingTrace eBPF ICMP(ICMP_ECHO ICMP_ECHOREPLY)3-1-1 PingTrace PingTrace 1 ICMP PingTrace 2 eBPF eBPF 的应用场景与实践 3 PingTrace sysak sysak pingtrace PingTrace PingTrace Pingtrace IO eBPF iofsstat IO eBPF 的应用场景与实践 iofsstat 3-1-2 iofsstat iofsstat IO IO iofsstat IO iofsstat comm:pid:id cnt_rd:bw_rd:eBPF 的应用场景与实践 cnt_wr:bw_wr:inode:inode filepath:,-containterId:xxxxxx xxx-stat:r_iops iops xxx-stat:w_iops iops xxx-stat:r_bps bps xxx-stat:w_bps bps xxx-stat:wait io xxx-stat:r_wait io xxx-stat:w_wait io xxx-stat:util%io utils Coolbpf eBPF memleak memleak eBPF 的应用场景与实 践 1 memleak eBPF eBPF 2 memleak eBPF 3 memleak 4 memleak 5 memleak memleak 1 2 3 CPU eBPF 的应用场景与实践 eBPF irqoff 3-1-3 irqoff irqoff sysak irqoff-help-t THRESH(ms)-f LOGFILE duration(s)-t ms-f irqoff duration worker irqoff TIME(irqoff)CPU COMM TID LAT(us)2022-05-05_11:45:19 3 kworker/3:0 379531 1000539 eBPF 的应用场景与实践 owner_func process_one_work worker_thread kthread ret_from_fork 1 log header 5 CPU ID 2 log header 3 worker 3.2 eBPF IO 3.2.1 IO I/O IO IO IO hypervisor eBPF 的应用场景与实践 IO 1(guest file offset)(guest block LBA)ext4 ext4 LBA 2(guest block LBA)(guest vdisk offset)QCOW2 QCOW2 data cluster 3(guest vdisk offset)host block LBA guest file offset guest block LBA host block LBA IO IO Linux IO blktrace perf iostat IO()IO IO IO IO IO eBPF 的应用场景与实践 3.2.2 bpftrace IO 3.2.2.1 基于 bpftrace 虚拟化 IO 路径追踪方案 主要功能包括 2 点:1 IO IO IO 2 IO/3.2.2.2 3.2.2.2.1 IO QEMU-KVM IO eBPF 的应用场景与实践 3-2-1 QEMU-KVM IO IO 1 Linux IO VFS device mapper 2 virtio virtio virtio-blk/virtio-scsi virtio-pci(QEMU vhost 3 IO IO IO IO IO 1 virtio 2 virtio VFS 3 VFS 3.2.2.2.2 bpftrace 1 eBPF Dtrace systemtap systemtap 2 kprobe uprobe tracepoint usdt bpftrace IO IO eBPF 的应用场景与实践 1 IO IO IO IO IO 2 IO qcow2 ocfs2 1 virtio-blk IO qcow2 qcow2 L1 L2 IO 2 qcow2 lun ocfs2 extent map lun 3-2-2 3.2.2.2.3 eBPF IO eBPF IO recorder IO reporter IO IO recorder 1 IO probe bpftrace IO eBPF 的应用场景与实践 2 raw reporter 3-2-3 IO record reporter 1 reporter IO 2 recorder IO IO IO 3 report IO IO eBPF 的应用场景与实践 3-2-4 IO reporter 3.2.2.2.4 eBPF IO qemu io 10 IO IO IO 1 1 direct IO 2 IO 2 IO 1)IO IO qemu vfs eBPF 的应用场景与实践 2 IO IO ocfs2 direct IO 3-2-5 IO 3.3 eBPF 3.3.1 Linux 3.3.1.1 Linux 100G/200G Linux eBPF 的应用场景与实践 C1 8 1W 1%CPU 100 PPS Packet Per Second 10G 64 2000 PPS 1488 PPS 84B 100G 2 PPS 50 CPU Cache Miss TLB Cache Cache Miss 65 netfilter Cache Miss RSS/3.3.1.2 Linux eBPF Offload DPDK 3.3.1.2.1 Offload Offload CPU Cache Miss checksum/gro ovs offload rdma eBPF 的应用场景与实践 offload offload offload offload offload 3.3.1.2.2 DPDK DPDK Data Plane Development Kit DPDK DPDK eBPF 的应用场景与实践 DPDK DPDK DPDK DPDK DPDK Hugepages DPDK DPDK DPDK CPU DPDK CPU 3.3.2 eBPF Linux 3.3.2.1 3.3.2.1.1 eBPF Offload DPDK eBPF 1 eBPF eBPF 的应用场景与实践 2 eBPF 3 eBPF 4 eBPF eBPF Offload 3.3.2.1.2 Linux eBPF Linux eBPF eBPF 3.3.2.1.2.1 eBPF 1 socket socket eBPF tcp_bpf_sendmsg_redirect socket 2 TC egress sch_handle_egress TC egress eBPF eBPF 的应用场景与实践 3-3-1 3.3.2.1.2.2 eBPF 1 XDP offload XDP native XDP generic XDP offload XDP native XDP XDP native offloaded XDP generic XDP XDP XDP XDP native native XDP poll reveive_skb generic XDPeBPF 的应用场景与实践 reveive_skb 2 TC ingress sch_handle_ingress 3-3-2 3.3.2.2 InCloudOS Cilium CNI eBPF 1 TC eBPF Pod 2 sockops/sockmap Pod 3 TC eBPF Pod 4 Socket clusterIP eBPF 的应用场景与实践 5 XDP NodePort 6 DSR NodePort 7 TC eBPF Pod DNS/3.3.2.2.1 TC eBPF Pod eBPF Pod host TC ingress Pod host Calico CNI TCP Throughput 1 stream 21.41%TCP-RR 1 process 28.59%TCP_CRR 1 process 40.17%3-3-3 TC eBPF Pod 3.3.2.2.2 sockops/sockmap Pod eBPF tcp socket map Pod eBPF 的应用场景与实践 eBPF sendmsg map socket eBPF socket queue Pod netfilter host Pod Calico TCP Throughput 1 stream 94.42%TCP-RR 1 process 200.82%TCP_CRR 1 process 35.04%3-3-4 sockops/sockmap Pod 3.3.2.2.3 TC eBPF Pod eBPF Pod host TC ingress TC ingress pod CNI Calico host netfilter Calico TCP Throughput 1 stream 24%TCP-RR 1 process 43.51%TCP_CRR 1 process 55.62%eBPF 的应用场景与实践 3-3-5 TC eBPF Pod 3.3.2.2.4 Socket clusterIP Socket clusterIP Socket DNAT DNAT kube-proxy 140%CPU 46%3-3-6 Socket clusterIP 3.3.2.2.5 XDP NodePort XDP NodePort eBPF 的应用场景与实践 netfilter kube-proxy ipvs 138%CPU 65%3-3-7 XDP NodePort 3.3.2.2.6 DSR NodePort eBPF DSR NodePort Pod kube-proxy DSR NodePort 136%CPU 47%eBPF 的应用场景与实践 3-3-8 DSR NodePort 3.3.2.2.7 TC eBPF InCloud OS Cilium-eBPF DNS K8S DNS CoreDNS+Nodelocaldns DNS TC eBPF 3-3-9 TC eBPF 1 DNS TC eBPF host 2 eBPF TC DNS CoreDNS host eBPF 的应用场景与实践 DNS 20%3.4 eBPF Linux 3.4.1 eBPF eBPF eBPF 3-4-1 eBPF eBPF BPF eBPF 的应用场景与实践 JIT CO-RE API eBPF map 3.4.2 eBPF KeyarchOS KSecure eBPF eBPF 的应用场景与实践 3.4.2.1 KSecure 1/2 3 CIS 4 CPU 3-4-1 KSecure eBPF 的应用场景与实践 3.4.2.2 eBPF hook eBPF hook LSM syscall network kprobe LSM syscall network hook kprobe hook hook 3-4-2 KSecure 3.4.2.3 KSecure 3.4.2.3.1 eBPF-LSM hook LSM Linux Security Modules Linux Linux Linux Kernel 5.7 eBPF 的应用场景与实践 LSM eBPF BPF-LSM LSM SELinux AppArmor LSM hook eBPF 3-4-3 KSecure 图 3-4-4 文件防护示例流程图 eBPF 的应用场景与实践 1 eBPF KSecure Agent eBPF LSM Hook 2 KSecure YAML 3-4-2/test/test/bin/test/bin/-3 Agent YAML eBPF-map 4 LSM hook eBPF 5 eBPF eBPF-map/6 hook eBPF eBPF 的应用场景与实践 7 eBPF eBPF-map Ring buffer Agent 8 Agent 3.4.2.3.2 eBPF eBPF kprobe tracepoint MITRE ATT&CK Adversarial Tactics,Techniques and Common Knowledge KSecure 3-4-5 1 eBPF eBPF 的应用场景与实践 2 Ring buffer eBPF 3 4 shell KSecure 3-4-6 shell 1 yaml shell eBPF 的应用场景与实践 3-4-3 shell connect num 0,1,2,3 shell num 0 1 2 3 type IPv4IPv6 IPv4IPv6 2 BPF eBPF tracepoint kprobe shell connect 3 shell shell Shell()4 5 Ring buffer BPF 6 shell eBPF 的应用场景与实践 7.3.4.2.4 eBPF 1 2 Rootkit 3/挑战与展望 eBPF eBPF 1 bpfTrace 2 eBPF root 3 Linux eBPF 4 eBPF The future of eBPF in the Linux Kernel eBPF 1 eBPF 2 Verifier Rust 3 CO-RE Helper 挑战与展望 4/eBPF