spinnaker: Error fetching new jobs from Travis and Travis trigger stop working after upgrading to >= 1.16

Issue Summary:

Travis pipeline trigger stopped working for us after upgrading to 1.16.

Cloud Provider(s):

N/A

Environment:

Trigger pipeline run after Travis(travis-ci.com) builds finish.

Feature Area:

Travis service of Igor

Description:

Travis pipeline triggers stopped working for us after upgrading to >= 1.16. Tried 1.17 and 1.18, both have the same issue. We used to put Travis configs in igor-local.yaml but the issue persists after we re-apply the config with halyard. Igor logs have this error all the time:

2020-02-18 17:57:40.957  WARN 1 --- [ix-travis-ci-10] c.n.s.igor.travis.service.TravisService  : An error occurred while fetching new jobs from Travis.

retrofit.RetrofitError: 500 Internal Server Error
	at retrofit.RetrofitError.httpError(RetrofitError.java:40) ~[retrofit-1.9.0.jar:na]
	at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:388) ~[retrofit-1.9.0.jar:na]
	at retrofit.RestAdapter$RestHandler.invoke(RestAdapter.java:240) ~[retrofit-1.9.0.jar:na]
	at com.sun.proxy.$Proxy160.jobs(Unknown Source) ~[na:na]
	at com.netflix.spinnaker.igor.travis.service.TravisService.lambda$null$6(TravisService.java:317) ~[igor-web.jar:na]
	at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180) ~[na:na]
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:108) ~[na:na]
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:699) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[na:na]
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[na:na]
	at com.netflix.spinnaker.igor.travis.service.TravisService.lambda$getJobs$8(TravisService.java:327) ~[igor-web.jar:na]
	at com.netflix.spinnaker.hystrix.SimpleJava8HystrixCommand.run(SimpleJava8HystrixCommand.java:52) ~[kork-hystrix-7.5.1.jar:7.5.1]
	at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302) ~[hystrix-core-1.5.18.jar:1.5.18]
	at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298) ~[hystrix-core-1.5.18.jar:1.5.18]
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.Observable.unsafeSubscribe(Observable.java:10327) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.Observable.unsafeSubscribe(Observable.java:10327) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.Observable.unsafeSubscribe(Observable.java:10327) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OperatorSubscribeOn$SubscribeOnSubscriber.call(OperatorSubscribeOn.java:100) ~[rxjava-1.3.8.jar:1.3.8]
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56) ~[hystrix-core-1.5.18.jar:1.5.18]
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47) ~[hystrix-core-1.5.18.jar:1.5.18]
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69) ~[hystrix-core-1.5.18.jar:1.5.18]
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) ~[rxjava-1.3.8.jar:1.3.8]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

Steps to Reproduce:

Downgrade to 1.15 the Travis trigger would fix the issue. The issue resumes after changing to >= 1.16.

Additional Details:

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 27

Commits related to this issue

Most upvoted comments

I’ll have a look 👍

I asked our support guy from Travis in Slack, and I just got confirmation they have fixed it at their end, so I’m gonna close this issue. Please re-open if you still have issues!

I got the following response from Travis support:

We are working on this issue. We will let you know asap. Thanks.

Hopefully they’ll have this fixed on their end soon.

@jervi Hi, I got this confirmation back from Travis support: “In summary, the issue is that there are a bunch of complex database queries, which crash the entire system when left to run beyond a specific time frame thus making the entire system unusable for everyone. We have explored options to optimize this, however, there are so many interconnected parts that it will require significant engineering work to improve this as it is.” Looks like we should just remove the state param to work around this?

Thanks! I tried a couple of times. None of the permutation of the state params work. I also submitted a ticket to Travis asking about the error.