rancher: [ipsec] router logs have repeated message about - "querying policy failed: No such file or directory"

NOTE: There is no functional impact of these messages on the regular operations of ipsec.

Rancher server version - v1.4.0-rc3 ipsec template version: v0.0.4(rancher/net:v0.8.6)

Steps to reproduce the problem: Deploy few services

ipsec-ipsec-router-1 logs have repeated message about - “querying policy failed: No such file or directory” :

1/26/2017 1:49:11 PM16[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM16[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM09[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM09[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM10[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM10[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM15[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM15[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM16[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM16[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM09[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM09[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM10[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM10[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM15[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM15[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM16[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM16[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM09[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM09[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM10[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM10[IKE] sending keep alive to 35.184.95.198[4500]
1/26/2017 1:49:11 PM15[KNL] querying policy failed: No such file or directory (2)
1/26/2017 1:49:11 PM15[IKE] sending keep alive to 104.198.249.177[4500]
1/26/2017 1:49:11 PM16[KNL] querying policy failed: No such file or directory (2)

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 15
  • Comments: 27 (3 by maintainers)

Most upvoted comments

The log is being printed from the following line in file src/libcharon/plugins/kernel_netlink/kernel_netlink_ipsec.c:

2595                 case NLMSG_ERROR:
2596                 {
2597                     struct nlmsgerr *err = NLMSG_DATA(hdr);
2598                     DBG1(DBG_KNL, "querying policy failed: %s (%d)",
2599                          strerror(-err->error), -err->error);
2600                     break;
2601                 }

Currently the log level in rancher/ipsec is set to 1.

Reading through the code, for every ipsec connection, charon is periodically trying to find if there was any traffic destined to a policy with the given traffic selector … in Rancher’s use case, ipsec polices are managed outside of charon. Due to this, every time there is a policy lookup, it fails and the debug message is printed.

Thread 4 "charon" hit Breakpoint 1, query_policy (this=0x662640, id=0x7fbc5590cc70, data=0x7fbc5590cc98, use_time=0x7fbc5590cc98) at kernel_netlink_ipsec.c:2598
2598    in kernel_netlink_ipsec.c
(gdb) bt
#0  query_policy (this=0x662640, id=0x7fbc5590cc70, data=0x7fbc5590cc98, use_time=0x7fbc5590cc98) at kernel_netlink_ipsec.c:2598
#1  0x00007fbc5bdbf814 in query_policy (this=0x637e80, id=0x7fbc5590cc70, data=0x7fbc5590cc98, use_time=0x7fbc5590cc98) at kernel/kernel_interface.c:492
#2  0x00007fbc5bdc8cb7 in update_usetime (this=0x7fbc3c006660, inbound=true) at sa/child_sa.c:570
#3  0x00007fbc5bdc8e6c in get_usestats (this=0x7fbc3c006660, inbound=true, time=0x7fbc5590cd60, bytes=0x0, packets=0x0) at sa/child_sa.c:623
#4  0x00007fbc5bdcbab7 in get_use_time (this=0x7fbc38000ec0, inbound=true) at sa/ike_sa.c:340
#5  0x00007fbc5bdccbb0 in send_dpd (this=0x7fbc38000ec0) at sa/ike_sa.c:771
#6  0x00007fbc5bdc65e4 in execute (this=0x7fbc28000e60) at processing/jobs/send_dpd_job.c:57
#7  0x00007fbc5c088036 in process_job (this=0x6355d0, worker=0x66cac0) at processing/processor.c:235
#8  0x00007fbc5c088338 in process_jobs (worker=0x66cac0) at processing/processor.c:321
#9  0x00007fbc5c09f82a in thread_main (this=0x66cc70) at threading/thread.c:331
#10 0x00007fbc5b8656ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007fbc5b39782d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) detach

We recently upgraded our Rancher server to a new host and version 1.4.1 and are also experiencing this issue. Symptoms are that services work after initial launch, but eventually go down due to network failure. router logs have same message reported in this issue:

3/7/2017 7:01:57 AM16[KNL] querying policy failed: No such file or directory (2)
3/7/2017 7:01:57 AM16[IKE] sending keep alive to 54.245.4.160[4500]

Network manager logs have errors as well:

3/7/2017 6:28:49 AMnet.ipv4.conf.docker0.route_localnet = 1
3/7/2017 6:28:57 AMtime="2017-03-07T13:28:57Z" level=info msg="CNI down" cid=fef6cce8f5eb4e0c4ca3ac9f68841b6199ffa8a6b5a7731dfe988d80dbafe07c networkMode=ipsec
3/7/2017 6:28:58 AMtime="2017-03-07T13:28:58Z" level=info msg="Applying new port rules"
3/7/2017 6:28:59 AMtime="2017-03-07T13:28:59Z" level=info msg="Setting up resolv.conf for ContainerId [ede119334be234102189c523f225ab1a558a370ef4082a52379250a8fb758e60]"

Rancher server

Rancher Server: v1.4.1
Docker: 1.13.0-rc4, build 88862e7
Rancher OS: v0.8.1

Hosts

Rancher OS v 0.4.4
Docker:  1.10.3, build 20f81dd

Thinking docker version on server might need to be downgraded and/or hosts might need to be upgraded. Any info others have is appreciated though.