parca-agent: [BUG] fail to unwind stack trace, lose data in one thread

Describe the bug

Run a simple cpp demo, but failing to parse symbols and failing to get profiling info from one thread.

I am build the agent and parca server from the latest main branch.

To Reproduce

#include <iostream>
#include <thread>
#include <chrono>
#include <cstring>
#include <unistd.h>
#include <sys/epoll.h>

void epollf() {
    int epollFd = epoll_create1(0);
    if (epollFd == -1) {
        std::cerr << "Failed to create epoll instance: " << std::strerror(errno) << std::endl;
        return;
    }

    struct epoll_event event;
    event.events = EPOLLIN;

    while (true) {
        struct epoll_event events[10];
        int numEvents = epoll_wait(epollFd, events, 1000, 0);

        if (numEvents == -1) {
            std::cerr << "epoll_wait error: " << std::strerror(errno) << std::endl;
            break;
        }

    }

    close(epollFd);
}

int main() {
    std::thread epollThread(epollf);

    while (true) {

    }

    epollThread.join();

    return 0;
}

clang++ -g -o jjjttt main.cpp -lpthread ./jjjttt

Expected behavior

Profiling with all symobls.

Screenshots or Profiles (using https://pprof.me)

image

Only one thread is proflied…

image

The other thread has

Logs

sudo cat /sys/kernel/debug/tracing/trace_pipe

          jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~checking shards now
          jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: [info] found chunk
          jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: pid 2177205 tgid 2177205
          jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: traversing stack using .eh_frame information!!
          jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: [debug] Within unwinding machinery loop
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: ## frame: 0
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: 	current pc: 559071098394
          jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: 	current sp: 7fff9cabaa10
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: 	current bp: 7fff9cabaa20
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~about to check shards found=1
          jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~checking shards now
          jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: [info] found chunk
          jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: le offset: 559071097000
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: ========== left 74625 right 74883
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	.done
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	=> table_index: 74651
          jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: 	=> adjusted pc: 1394
          jjjttt-2177205 [005] d.h.. 5107453.799561: bpf_trace_printk: 	cfa type: 1, offset: 16 (row pc: 1374)
          jjjttt-2177205 [005] d.h.. 5107453.799563: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff9cabaa28.

agent log no obvious error.

parca server log(some symbols are missing as well)

Has storage symbolization request failed" err=“fetching metadata: debuginfo metadata not found” and ErrUnknownPC

level=debug name=parca ts=2023-09-25T08:57:31.519863966Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=69389d485a9793dbe873f0ea2c93e02efaa9aa3d pc_range_start=0x28700 pc_range_end=0x1bc4c1
level=debug name=parca ts=2023-09-25T08:57:31.519933345Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3146e5a5e66e1fd6ab59a39e486a9d246bc675c0 pc_range_start=0x2090 pc_range_end=0x2b375
level=debug name=parca ts=2023-09-25T08:57:31.519963794Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520031807Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=e22ba7829a55a0dec2201a0b6dac7ba236118561 pc_range_start=0xe030 pc_range_end=0x1e19b
level=debug name=parca ts=2023-09-25T08:57:31.520112609Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=f90a926c28da963c0888cca1180ef28bb88f059b pc_range_start=0x2030 pc_range_end=0x2488
level=debug name=parca ts=2023-09-25T08:57:31.520184515Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d7435f9680bf6c38f11856638bbbc8a2b755e054 pc_range_start=0x2030 pc_range_end=0x554d0
level=debug name=parca ts=2023-09-25T08:57:31.520219354Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520289037Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=927b5ce7c32980646eaf7527a8b3d68c3da5ffea pc_range_start=0x10a0 pc_range_end=0x12d8
level=debug name=parca ts=2023-09-25T08:57:31.520358214Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=5ea13eaa57911db0 pc_range_start=0xafb0 pc_range_end=0x4c023
level=debug name=parca ts=2023-09-25T08:57:31.520386623Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520453982Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=4b60568bc25641b07e93f84f98c77093f3b279f7 pc_range_start=0x6030 pc_range_end=0x1833e
level=debug name=parca ts=2023-09-25T08:57:31.520527563Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7ab150049702ea494c7c9b681d86dc2d59c89ee1 pc_range_start=0xb030 pc_range_end=0x4ebcd
level=debug name=parca ts=2023-09-25T08:57:31.520600582Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=0665411b8b4fc787710290aacecac4b069bf89fe pc_range_start=0x4030 pc_range_end=0x4718
level=debug name=parca ts=2023-09-25T08:57:31.520681385Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=55447a37514c4a317439786251326b5f762d31392f6f6b7835704d635279724779346e4d6e78414a6a2f6d647a5247354a536e6e38616e6c31636c5954612f38496435624e72466c744770475332794967704c pc_range_start=0x46aaa0 pc_range_end=0xd53aa8
level=debug name=parca ts=2023-09-25T08:57:31.520760407Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7bf66e3d4dd23edad286982c17e7df08d1ab6fb2 pc_range_start=0x10a0 pc_range_end=0x12e2
level=debug name=parca ts=2023-09-25T08:57:31.520839323Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7d78d330de33842118d405b4afe4d58878d71b41 pc_range_start=0xb030 pc_range_end=0x73617
**level=debug name=parca ts=2023-09-25T08:57:31.520871855Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"**
level=debug name=parca ts=2023-09-25T08:57:31.520939133Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cc278076d1c0ecf82b5248dfffe7e2d43feafbb7 pc_range_start=0x7030 pc_range_end=0x7ee8
level=debug name=parca ts=2023-09-25T08:57:31.521048469Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cf6dd909194b0a9dd65eb30f7ec27c4002d0329d pc_range_start=0x3f030 pc_range_end=0x1fdb34
level=debug name=parca ts=2023-09-25T08:57:31.521126984Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3192c14e4fe2624ff4675ae5a5edfa8a599aa9db pc_range_start=0x2030 pc_range_end=0x2578
level=debug name=parca ts=2023-09-25T08:57:31.521195289Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d11886324c0b0501c46b595089d667e6894943af pc_range_start=0x3030 pc_range_end=0xa8e7
level=debug name=parca ts=2023-09-25T08:57:31.521273608Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=6673306d4d734c383466637842373044627642442f5a4b76785f546b716f435f2d506957696b447a492f4270345f73557834666d616445414b796b64555f2f4e524d525f67667a4c365a554361577041657232 pc_range_start=0x401e40 pc_range_end=0x49b0a85
level=debug name=parca ts=2023-09-25T08:57:31.521462171Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=229b7dc509053fe4df5e29e8629911f0c3bc66dd pc_range_start=0x28700 pc_range_end=0x1bc501
level=debug name=parca ts=2023-09-25T08:57:31.521507517Z caller=symbolizer.go:299 component=symbolizer buildid=036dde3b7b7d0159 msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.521520533Z caller=symbolizer.go:289 component=symbolizer msg="mapping of location is empty, skipping"
level=debug name=parca ts=2023-09-25T08:57:31.521590103Z caller=symbolizer.go:516 component=symbolizer msg="symbolizing locations" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 count=2
level=debug name=parca ts=2023-09-25T08:57:31.522622451Z caller=symbolizer.go:590 component=symbolizer **msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=651472 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner
level=debug name=parca ts=2023-09-25T08:57:31.523523449Z caller=symbolizer.go:590 component=symbolizer msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=654128 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner**
level=debug name=parca ts=2023-09-25T08:57:31.52356687Z caller=symbolizer.go:191 component=symbolizer msg="symbolization loop completed"
level=debug name=parca ts=2023-09-25T08:57:37.245414226Z caller=badger.go:452 msg=GetOrCreateStacktraces stacktrace_keys_len=49
level=debug name=parca ts=2023-09-25T08:57:37.246795774Z caller=badger.go:58 msg="writeRequests called. Writing to value log"
level=debug name=parca ts=2023-09-25T08:57:37.246872034Z caller=badger.go:58 msg="Writing to memtable"
level=debug name=parca ts=2023-09-25T08:57:37.246937884Z caller=badger.go:58 msg="Sending updates to subscribers"
level=debug name=parca ts=2023-09-25T08:57:37.246969165Z caller=badger.go:58 msg="2 entries written"
level=debug name=parca ts=2023-09-25T08:57:37.266391517Z caller=server.go:314 msg="finished call" protocol=grpc grpc.component=server grpc.service=parca.profilestore.v1alpha1.ProfileStoreService grpc.method=WriteRaw grpc.method_type=unary peer.address=127.0.0.1:59764 grpc.start_time=2023-09-25T08:57:37Z grpc.request.deadline=2023-09-25T08:59:37Z grpc.code=OK grpc.time_ms=24.529

Software (please complete the following information):

  • Parca Agent Version: [parca-agent --version]
  • Parca Server Version (if applicable): [parca --version]

Workload (please complete the following information):

  • Runtime (if applicable):
  • Compiler (if applicable):

Environment (please complete the following information):

  • Linux Distribution: [cat /etc/*-release] DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION=“Ubuntu 22.04.3 LTS” PRETTY_NAME=“Ubuntu 22.04.3 LTS” NAME=“Ubuntu” VERSION_ID=“22.04” VERSION=“22.04.3 LTS (Jammy Jellyfish)” VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” UBUNTU_CODENAME=jammy
  • Linux Version: [uname -r] 5.15.0-78-generic
  • Arch: [uname -m] x86_64
  • Kubernetes Version (if applicable): [kubectl --version] no
  • Container Runtime (if applicable): no

Additional context

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

I meet a similar problem in ubuntu 22.04 as well. I try to disable swap for a process with cgroup v2 like the following:

cgcreate -g memory:parca
cd /sys/fs/cgroup/parca/ && echo 0 > memory.swap.max
cgexec -g memory:parca <comm>

more symbols are resolved after disabling swap, but I can still see errors for the above process

bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7ffde661c8f8.

@javierhonduco .

The latest update is this issue seems to be related to some security enhancements we patched on ubuntu. Still trying to identify the specific change.

update: with another environment, the symbol resolution looks fine here and no “rip should not be zero” for the simple example.

and the bookmark looks like this(without disabling swap):

[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[stack] PageStats { swapped: 1, present: 2, unmapped: 33, total: 33 }
/path/to/the/example PageStats { swapped: 1, present: 2, unmapped: 5, total: 5 }
/usr/lib/libgcc_s.so.1 PageStats { swapped: 2, present: 24, unmapped: 37, total: 37 }
/usr/lib/libm.so.6 PageStats { swapped: 2, present: 72, unmapped: 237, total: 237 }
[heap] PageStats { swapped: 2, present: 0, unmapped: 33, total: 33 }
/usr/lib/ld-linux-x86-64.so.2 PageStats { swapped: 4, present: 49, unmapped: 53, total: 53 }
/usr/lib/libc.so.6 PageStats { swapped: 7, present: 259, unmapped: 576, total: 576 }
anon PageStats { swapped: 12, present: 514, unmapped: 2072, total: 2072 }
/usr/lib/libstdc++.so.6.0.32 PageStats { swapped: 14, present: 348, unmapped: 631, total: 631 }

environment:

Linux ***** 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux
parca: 0.19
parca-agent: 0.25.1

more symbols are resolved after disabling swap

More symbols are resolved or more stacks are unwound and sent to the server?

Thanks for trying disabling SWAP! Could you try running https://github.com/javierhonduco/bookmark on your machine while your applications is running and send the statistics belonging to your process’ main executable and dynamic libraries? Sorry that I don’t provide good instructions on how to install it or run it yet, but cargo run should do

it’s weird that the symbol resolution becomes as poor as it used to be with another a few runs…

/path/to/my/program PageStats { swapped: 0, present: 4, unmapped: 5, total: 5 }
/usr/lib/x86_64-linux-gnu/libc.so.6 PageStats { swapped: 0, present: 231, unmapped: 539, total: 539 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
/usr/lib/x86_64-linux-gnu/libm.so.6 PageStats { swapped: 0, present: 80, unmapped: 231, total: 231 }
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 PageStats { swapped: 0, present: 349, unmapped: 553, total: 553 }
anon PageStats { swapped: 0, present: 16, unmapped: 2072, total: 2072 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
/usr/lib/x86_64-linux-gnu/libgcc_s.so.1 PageStats { swapped: 0, present: 25, unmapped: 32, total: 32 }
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 PageStats { swapped: 0, present: 59, unmapped: 59, total: 59 }
[stack] PageStats { swapped: 0, present: 3, unmapped: 33, total: 33 }
[heap] PageStats { swapped: 0, present: 2, unmapped: 33, total: 33 }