libs: [BUG] Timestamp incorrect in event logs, dated in the future by 7 days/1 week

Describe the bug

While analyzing event and debug logs generated by Falco, we discovered that the events were dated in the future.

For example, we saw an debug entry on 1677527444 (around Mon, 27 Feb 2023 19:50:44 GMT), I identified the following entry with the following timestamp 2023-03-06T11:56:34.693685797Z:

{
  "hostname": "redacted",
  "output": "Falco internal: syscall event drop. 116192 system calls dropped in last second.",
  "output_fields": {
    "ebpf_enabled": "0",
    "n_drops": "116192",
    "n_drops_buffer_clone_fork_enter": "0",
    "n_drops_buffer_clone_fork_exit": "0",
    "n_drops_buffer_connect_enter": "0",
    "n_drops_buffer_connect_exit": "0",
    "n_drops_buffer_dir_file_enter": "0",
    "n_drops_buffer_dir_file_exit": "0",
    "n_drops_buffer_execve_enter": "0",
    "n_drops_buffer_execve_exit": "0",
    "n_drops_buffer_open_enter": "0",
    "n_drops_buffer_open_exit": "0",
    "n_drops_buffer_other_interest_enter": "0",
    "n_drops_buffer_other_interest_exit": "0",
    "n_drops_buffer_total": "116192",
    "n_drops_bug": "0",
    "n_drops_page_faults": "0",
    "n_drops_scratch_map": "0",
    "n_evts": "430758"
  },
  "priority": "Debug",
  "rule": "Falco internal: syscall event drop",
  "time": "2023-03-06T11:56:34.693685797Z"
}

Around an event generated on

date +%s ; date -u
1677527950
Mon 27 Feb 2023 07:59:10 PM UTC

The following event was observed with timestamp 1678109229443642000, 2023-03-06T13:27:09.443642063Z:

{
  "hostname": "redacted",
  "priority": "Warning",
  "rule": "Read sensitive file untrusted",
  "source": "syscall",
  "tags": [
    "T1020",
    "T1083",
    "T1212",
    "T1552",
    "T1555",
    "container",
    "filesystem",
    "host",
    "mitre_credential_access",
    "mitre_discovery"
  ],
  "time": "2023-03-06T13:27:09.443642063Z",
  "output_fields": {
    "container.id": "host",
    "container.image.repository": null,
    "evt.time": 1678109229443642000,
    "fd.name": "/etc/pam.d/common-session",
    "proc.aname[2]": "systemd",
    "proc.aname[3]": null,
    "proc.aname[4]": null,
    "proc.cmdline": "sshd -D -R",
    "proc.name": "sshd",
    "proc.pid": 3225785,
    "proc.pname": "sshd",
    "user.loginuid": -1,
    "user.name": "root"
  }
}

How to reproduce it

Unsure, installation was from HEAD @ 0de9af9 via systemd on-metal.

Generate events and look at log messages?

Expected behaviour

Timestamp should be accurate / not in the future.

Since the epoch times are pretty similar, 1678109229443642000 and 1677527950; perhaps the string generated in the time field is incorrect (~7 days in the future, or 6048000000000 in nanoseconds epoch)?

Environment

  • Falco version:
{
  "default_driver_version": "e1d0fd9b043f1c7dfd91c9d030c11cfe2c062931",
  "driver_api_version": "4.0.0",
  "driver_schema_version": "2.0.0",
  "engine_version": "16",
  "falco_version": "0.34.1-1-da4e36e4",
  "libs_version": "e1d0fd9b043f1c7dfd91c9d030c11cfe2c062931",
  "plugin_api_version": "2.0.0"
}
  • System info:
{
  "machine": "x86_64",
  "nodename": "redacted",
  "release": "5.15.90-custom",
  "sysname": "Linux",
  "version": "#1 SMP Mon Sep 27 00:00:00 UTC 2010"
}
  • OS:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel:
Linux redacted 5.15.90-custom #1 SMP Mon Sep 27 00:00:00 UTC 2010 x86_64 GNU/Linux
  • Installation method:
  • from source, HEAD @ 0de9af9 via DEB/systemd on-metal

Additional context

Possibly related to https://github.com/falcosecurity/libs/pull/595 ? (Did a search from prior PRs related to time.)

Discussion on Slack community: https://kubernetes.slack.com/archives/CMWH3EH32/p1677517061599169

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 41 (41 by maintainers)

Commits related to this issue

Most upvoted comments

Just updated the PR and added a comment, https://github.com/falcosecurity/libs/pull/1003#issuecomment-1483538662

Removed the (WIP) tag and submitted a new commit (and rebases with fixes to suggestions).

I tested a build with these two commits applied as a patch and first impressions are very positive; Falco is logging events with the right timestamps!

Summary:

After chatting in #932 and a Slack thread, we discovered that /proc/1/cmdline did not return the right boot time as we had thought. This was verified on a host in my org.

This lead us to do a round of debugging involving running the following commands on problematic systems:

* `uptime -s`, `who -b`, `stat /proc /proc/1 /proc/1/cmdline`, `cat /proc/stat | grep btime`: revealed that the output from who -b was incorrect, but the output from uptime was correct

* `ls -lahtr`: revealed that `/proc/uptime` had timestamps closer to when the system was started

Most importantly, the cat /proc/stat | grep btime had a btime value closest to the boot time. Speaking to a few engineers in my org also revealed that the same parts of the kernel inform/feed values into /proc/utime and /proc/stat’s btime, so they are “essentially” equivalent.

The Slack thread also discussed ways of getting the system boot time:

* btime in /proc/stat

* calculation via clock_gettime(CLOCK_REALTIME - CLOCK_BOOTTIME)

* calculation via time(NULL) - sysinfo().uptime

Overall, the maintainers preferred btime from /proc/stat for the following reasons:

* value does not depend on calculation using current timestamp

* btime is "static" and doesn't change once set

* btime is available in kernels from 2008

* CLOCK_BOOTTIME is available in kernels from 2011 (2.6.38)

@gnosek mentioned:

By scraping btime from /proc/stat, it is both the heaviest and most likely to succeed

And this PR (as it is right now) is a result of that discussion:

* use `btime` from `/proc/stat` for boot time

* use `/proc/<pid>/root/proc/1/cmdline` for container start time

ACK, this would certainly be a super simple refactor …

You know what ahhaah there is no issues here, just that sites like epochconverter automatically switch base depending on number of digits of the number. Therefore, in epochconverter,

  • 12721000000 is GMT: Saturday 10 February 2373 19:06:40
  • 1946299696411786 is GMT: Thursday 4 September 2031 14:48:16.411

Using another tool, ie: https://www.epoch101.com/, you’ll discover that:

  • 12721 (ie: 12721000000 parsed as milliseconds) means Thu, 01 Jan 1970 00:00:12 GMT
  • 1946299696 (ie: 1946299696411786 parsed as milliseconds) means Fri, 23 Jan 1970 12:38:19 GMT

Basically, the even is happening ~22days after you booted up your PC (and that is a validated information from your uptime output.)

The issue here is not about events timestamp, but is all about why your /proc/1 has a very low epoch change time…

Unfortunate news:

The change didn’t fix the timestamp on my nodes; it’s possible that /proc/1/cmdline change may be insufficient in producing the right timestamp – confirmed on both a x86-64 host and aarch64 host.

Changed the PR to (WIP)

Internally to my org, there may be separate issues affecting /proc mutually exclusive to x86-64 and aarch64 – I’ll investigate further on that front with internal engineers.

EDIT: I’m also discussing with my team on if we can patch our builds to use the fall back method w/ clock_gettime() instead 🤔

Here is some output from scap_open:

schan@host:~/libs/build$ date -u && date +%s && date +%s%N && uptime && stat /proc/1/cmdline; sudo ./libscap/examples/01-open/scap-open --modern_bpf --evt_type execve --num_events 10

Thu 23 Mar 2023 11:00:42 PM UTC

1679612442

1679612442422449367

 23:00:42 up 56 min,  1 user,  load average: 1.58, 1.63, 1.70

  File: /proc/1/cmdline
  Size: 0               Blocks: 0          IO Block: 1024   regular empty file
Device: 15h/21d Inode: 29705       Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-02-15 20:36:33.542000000 +0000
Modify: 2023-02-15 20:36:33.542000000 +0000
Change: 2023-02-15 20:36:33.542000000 +0000
 Birth: -

[SCAP-OPEN]: Hello!

---------------------- SCAP SOURCE ----------------------
* Modern BPF probe, 1 ring buffer every 1 CPUs
-----------------------------------------------------------

--------------------- CONFIGURATIONS ----------------------
* Print single event type: 0 (`-1` means no event to print).
* Run until '10' events are catched.
-----------------------------------------------------------

---------------------- INTERESTING SYSCALLS ----------------------
* All sc codes are enabled!
------------------------------------------------------------------

************ proc_cmdline: /proc/1/cmdline
************ proc_cmdline: /proc/1/cmdline
************ boot_time: 1676493393542000000

* OK! modern BPF probe correctly loaded: NO VERIFIER ISSUES :)
* Live capture in progress...
* Press CTRL+C to stop the capture

------------------ EVENT: 0 TID:73151
------ HEADER
timestamp: 1676496781579307218
tid: 73151
len: 34
type: 0
num params: 2
------
------ PARAMS
PARAM 0: 176
PARAM 1: 113
------
------------------

---------------------- STATS -----------------------
Events correctly captured (SCAP_SUCCESS): 10
Seen by driver: 247
Time elapsed: 0 s
Number of dropped events: 0
Number of timeouts: 1
Number of 'next' calls: 11
Number of dropped events caused by full buffer (total / all buffer drops - includes all categories below, likely higher than sum of syscall categories): 0
Number of dropped events caused by full buffer (n_drops_buffer_clone_fork_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_clone_fork_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_execve_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_execve_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_connect_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_connect_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_open_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_open_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_dir_file_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_dir_file_exit syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_other_interest_enter syscall category): 0
Number of dropped events caused by full buffer (n_drops_buffer_other_interest_exit syscall category): 0
Number of dropped events caused by full scratch map: 0
Number of dropped events caused by invalid memory access (page faults): 0
Number of dropped events caused by an invalid condition in the kernel instrumentation (bug): 0
Number of preemptions: 0
Number of events skipped due to the tid being in a set of suppressed tids: 0
Number of threads currently being suppressed: 0
-----------------------------------------------------

[SCAP-OPEN]: Bye!

Hey y’all,

Our internal kernel folks are still investigating the /proc/1 epoch time problem, but they have a suggestion for Falco:

Instead of /proc/1, can /proc/1/cmdline be used instead? This apparently similar to what Docker is doing (something about the procfs / container creation time?).

user@host:~$ strace -v -etrace=statx -y stat /proc/1 /proc/1/cmdline 
statx(AT_FDCWD, "/proc/1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_blksize=1024, stx_attributes=0, stx_nlink=9, stx_uid=0, stx_gid=0, stx_mode=S_IFDIR|0555, stx_ino=21510, stx_size=0, stx_blocks=0, stx_attributes_mask=STATX_ATTR_AUTOMOUNT|STATX_ATTR_MOUNT_ROOT|STATX_ATTR_DAX, stx_atime={tv_sec=12, tv_nsec=17000000} /* 1970-01-01T00:00:12.017000000+0000 */, stx_btime={tv_sec=0, tv_nsec=0}, stx_ctime={tv_sec=12, tv_nsec=17000000} /* 1970-01-01T00:00:12.017000000+0000 */, stx_mtime={tv_sec=12, tv_nsec=17000000} /* 1970-01-01T00:00:12.017000000+0000 */, stx_rdev_major=0, stx_rdev_minor=0, stx_dev_major=0, stx_dev_minor=21}) = 0
  File: /proc/1
  Size: 0             Blocks: 0          IO Block: 1024   directory
Device: 15h/21d    Inode: 21510       Links: 9
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 1970-01-01 00:00:12.017000000 +0000
Modify: 1970-01-01 00:00:12.017000000 +0000
Change: 1970-01-01 00:00:12.017000000 +0000
 Birth: -

statx(AT_FDCWD, "/proc/1/cmdline", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_blksize=1024, stx_attributes=0, stx_nlink=1, stx_uid=0, stx_gid=0, stx_mode=S_IFREG|0444, stx_ino=70683560, stx_size=0, stx_blocks=0, stx_attributes_mask=STATX_ATTR_AUTOMOUNT|STATX_ATTR_MOUNT_ROOT|STATX_ATTR_DAX, stx_atime={tv_sec=1679590867, tv_nsec=687421205} /* 2023-03-23T17:01:07.687421205+0000 */, stx_btime={tv_sec=0, tv_nsec=0}, stx_ctime={tv_sec=1679590867, tv_nsec=687421205} /* 2023-03-23T17:01:07.687421205+0000 */, stx_mtime={tv_sec=1679590867, tv_nsec=687421205} /* 2023-03-23T17:01:07.687421205+0000 */, stx_rdev_major=0, stx_rdev_minor=0, stx_dev_major=0, stx_dev_minor=21}) = 0
  File: /proc/1/cmdline
  Size: 0             Blocks: 0          IO Block: 1024   regular empty file
Device: 15h/21d    Inode: 70683560    Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-03-23 17:01:07.687421205 +0000
Modify: 2023-03-23 17:01:07.687421205 +0000
Change: 2023-03-23 17:01:07.687421205 +0000
\ Birth: -
+++ exited with 0 +++

Yep 😦 A solution could be to fallback to old method if stat /proc/1 leads to weird results (ie: a result like before year 2000). I think the new approach, that was introduced by Melissa if i recall correctly, is more stable therefore i’d avoid to kill it.

on your side @Happy-Dude it would be amazing to understand why in all your machines /proc has this strange change time (~epoch)

I’ve updated our test branch removing this proc scan and using the old fallback logic https://github.com/Andreagit97/libs/tree/timestamp_issue. Would you like to give it a try @Happy-Dude?

mount | grep proc

schan@host:~/libs/build$ mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

Hey team,

Did some preliminary investigation:

Will do some more digging as time permits. My next “excavation” will be where/how the timestamp is recorded when an event is traced/processed.

I’ve seen this in using the falco libs separately from falco. Tried to reproduced in sinsp-example/scap-example from the head of libs and haven’t been able to replicate currently. Will keep digging.