harvest: Getting "error reading api response => 408 Request Timeout" when running in K8s
Describe the bug We have multiple pollers running in k8s. One of them is working fine on local laptops under docker but in K8s throws 408 errors.
Environment Provide accurate information about the environment to help us reproduce the issue.
- Harvest version: 21.05.3-2
- Command line arguments used: See below
- OS: Kubernetes
- Install method: docker
- ONTAP Version: 9.3P4
To Reproduce
Deployment:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: CLUSTER_NAME
namespace: netapp-metrics
labels:
app: CLUSTER_NAME
spec:
replicas: 1
selector:
matchLabels:
app: CLUSTER_NAME
template:
metadata:
labels:
app: CLUSTER_NAME
spec:
nodeSelector:
topology.kubernetes.io/zone: ZONE_NAME
volumes:
- name: zapi-default-config
configMap:
name: zapi-default-config
- name: harvest-poller-config
secret:
secretName: harvest-poller-config
containers:
- name: CLUSTER_NAME
image: REGISTRY/harvest:21.05.3-2
args:
- "--config"
- "/harvest.yml"
- "--poller"
- "CLUSTER_NAME "
resources:
limits:
memory: 2G
cpu: 1000m
requests:
memory: 1G
cpu: 500m
ports:
- name: http
containerPort: 12990
volumeMounts:
- name: zapi-default-config
mountPath: /opt/harvest/conf/zapi/default.yaml
subPath: default.yaml
- name: harvest-poller-config
mountPath: /harvest.yml
subPath: harvest.yml
---
apiVersion: v1
kind: Service
metadata:
labels:
app: CLUSTER_NAME
name: CLUSTER_NAME
namespace: netapp-metrics
spec:
ports:
- port: 12990
targetPort: 12990
name: metrics
selector:
app: CLUSTER_NAME
type: ClusterIP
---
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
name: CLUSTER_NAME
namespace: netapp-metrics
labels:
prometheus: netapp
prometheusEnv: prd
spec:
selector:
matchLabels:
app: CLUSTER_NAME
endpoints:
- port: metrics
interval: 1m
path: /metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: CLUSTER_NAME -perf
namespace: netapp-metrics
labels:
app: CLUSTER_NAME -perf
spec:
replicas: 1
selector:
matchLabels:
app: CLUSTER_NAME -perf
template:
metadata:
labels:
app: CLUSTER_NAME -perf
spec:
nodeSelector:
topology.kubernetes.io/zone: ZONE_NAME
volumes:
- name: zapi-default-config
configMap:
name: zapi-default-config
- name: harvest-poller-config
secret:
secretName: harvest-poller-config
containers:
- name: CLUSTER_NAME -perf
image: REGISTRY/harvest:21.05.3-2
args:
- "--config"
- "/harvest.yml"
- "--poller"
- "CLUSTER_NAME -perf"
resources:
limits:
memory: 2G
cpu: 1000m
requests:
memory: 1G
cpu: 500m
ports:
- name: http
containerPort: 12990
volumeMounts:
- name: zapi-default-config
mountPath: /opt/harvest/conf/zapi/default.yaml
subPath: default.yaml
- name: harvest-poller-config
mountPath: /harvest.yml
subPath: harvest.yml
---
apiVersion: v1
kind: Service
metadata:
labels:
app: CLUSTER_NAME -perf
name: CLUSTER_NAME -perf
namespace: netapp-metrics
spec:
ports:
- port: 12990
targetPort: 12990
name: metrics
selector:
app: CLUSTER_NAME -perf
type: ClusterIP
---
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
name: CLUSTER_NAME -perf
namespace: netapp-metrics
labels:
prometheus: netapp
prometheusEnv: prd
spec:
selector:
matchLabels:
app: CLUSTER_NAME -perf
endpoints:
- port: metrics
interval: 1m
path: /metrics
Poller Config:
netapp-harvest/docker/harvest.yml
---
Pollers:
CLUSTER_NAME:
datacenter: DATACENTER_NAME
collectors:
- Zapi
addr: CLUSTER_NAME
exporters:
- promethues
CLUSTER_NAME-perf:
datacenter: DATACENTER_NAME
collectors:
- Zapiperf
addr: CLUSTER_NAME
exporters:
- promethues
default.yaml:
netapp-harvest/docker/default.yaml
---
collector: Zapi
# Order here matters!
schedule:
- instance: 300s
- data: 50s
client_timeout: 40
objects:
Node: node.yaml
Aggregate: aggr.yaml
Volume: volume.yaml
SnapMirror: snapmirror.yaml
Disk: disk.yaml
Shelf: shelf.yaml
Status: status.yaml
Subsystem: subsystem.yaml
Lun: lun.yaml
Expected behavior Poller should collect metrics
Actual behavior
5:50PM ERR goharvest2/cmd/poller/collector/collector.go:303 > error="error reading api response => 408 Request Timeout" Poller=CLUSTER_NAME collector=Zapi:Aggregate stack=[{"func":"New","line":"35","source":"errors.go"},{"func":"(*Client).invoke","line":"369","source":"client.go"},{"func":"(*Client).InvokeBatchWithTimers","line":"281","source":"client.go"},{"func":"(*Zapi).PollData","line":"350","source":"zapi.go"},{"func":"(*task).Run","line":"60","source":"schedule.go"},{"func":"(*AbstractCollector).Start","line":"269","source":"collector.go"},{"func":"goexit","line":"1371","source":"asm_amd64.s"}]
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (1 by maintainers)
I will try your suggestions but it will take me some time as something came up. Let me try those and will post back results as soon as I can.
I will try the same from kubectl and see if that works too.