harbor: Wrong image pull times and tag retention removing not correct images

Expected behavior and actual behavior:

  • Problem: Older pushed images were actually not pulled by user nor system, but the last “pull time” field shows strange time and it won’t match with “last pull” time in second place. Our tag retention removes wrong images as older images are getting newer pull time
  • Waiting solution: Image pull time will be fixed? and tag retention is not removing wrong images

Steps to reproduce the problem:

  • Checking pull times manually – Projects -> choose project -> choose repository: – Check “Pull Time” from the last column: “4/13/21, 2:34 AM” – Check “PULL TIME” when holding cursor on “tags” column tag name: “1/7/21, 2:19 PM” – Pull times don’t match Screenshot 2021-04-13 at 15 36 53

  • Tag retention Dry run matches with “Pull Time” in last column: “2021/04/12 23:34:27” | Digest | Tag | Kind | Labels | PushedTime | PulledTime | CreatedTime | Retention | |-------------------------------------------------------------------------|-----------------|-------|--------|---------------------|---------------------|---------------------|-----------| | sha256:XXX | master-39 | image | | 2021/04/12 14:31:00 | 2021/04/12 22:45:38 | 2021/04/12 14:31:00 | RETAIN | | sha256:XXX | master-38 | image | | 2021/04/07 06:02:22 | 2021/04/12 22:59:29 | 2021/04/07 06:02:22 | RETAIN | | sha256:XXX | development-200 | image | | 2021/04/07 05:58:36 | 2021/04/12 22:59:33 | 2021/04/07 05:58:36 | RETAIN | | sha256:XXX | master-36 | image | | 2021/01/22 09:28:44 | 2021/04/12 23:33:28 | 2021/01/22 09:28:44 | DEL | | sha256:XXX | master-35 | image | | 2021/01/15 13:31:57 | 2021/04/12 23:33:54 | 2021/01/15 13:31:57 | DEL | | sha256:XXX | master-34 | image | | 2021/01/07 11:53:00 | 2021/04/12 23:34:27 | 2021/01/07 11:53:00 | RETAIN |

  • Checking Harbor logs for the specific image – Logs -> resource -> Type selected image -> last pull time matches with holding cursor on “tags” column tag name": “1/7/21, 2:19 PM” Screenshot 2021-04-13 at 15 39 27

Versions:

  • harbor version: 2.2.0
  • docker engine version: 20.10.5, build 55c4c88
  • docker-compose version: 1.23.2, build 1110ad01

Additional context:

  • Harbor config files:
  • Log files:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 24 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Yes. (Sorry I missed this.) Scanning should not update pull time. We should consider adding “Last scan time” field if enough people think it is useful.

After discussion, we decided to implement it in this way

  1. Add a configuration “Skip Scanner Pull Update”, if Harbor admin checks it, the scanner will not update the pull time for scanner, neither the pull count.
  2. The “Last Scan Time”, because it only indicates the last time pulled by the scanner, but it can not indicate the scan success or failure, it seems this time is of no use in the UI. usually, the Harbor user will check the scan report and get the scan time in it, it is the accurate scan time user wants to know. So we will not add the “Last Scan Time” column in the artifact table.
  3. In order to discriminate the scanner robot account from user created robot account, we add a UUID in the scanner robot account, it is a UUID generated by installation and the scanner robot account is named like that: robot$<projectName>+<Scanner UUID (8byte)>-<Scanner Name>-<UUID> and the Harbor detect the scanner robot account by prefix robot$<projectName>+<Scanner UUID (8byte)>

we’re consider not update pull time when doing scanning, or add some extra fields to distinguish the user pull from scan pull

@qnetter should we consider to fix it?

I agree that pulls from clients is what matters. That is why I suggest distinguishing between last PULLED time and last SCANNED time (which should not update last pulled time). If there were an indicator with last scanned time of S(uccessful)/U(successful)/-(could not scan) that would be good too.

I think you are using scan images, which is updating pull time. @wy65701436 @heww is it possible not to update pull time when just doing scanning

We will find a solution to fix this issue in harbor 2.5.0. cc @xaleeks

@wy65701436 thanks for working on this 👍🏼

I want to comment on what you mentioned here:

Before moving forward, I would like to point out that this is not actually a bug fix, but a change in behavior that we should be aware of and decide if it is good to go.

I see your point there, however I think it’s important to distinguish pull operations performed against Harbor from actual external clients that consume the artifacts, versus pulls that are performed by an internal system component.

when considering retention policies (for example), we generally care about the consumption (i.e. pulls) of the images from a perspective of Harbor consumers, not consumption (pulls) made by internal tooling of Harbor itself that fulfils some other function (such as periodically scanning the images)

if folks enable SCAN_ALL job, retention policies becomes useless and I’m not sure that was ever the intent, right? so, I would agree that this may not be considered a pure bug, however it does negatively interacts with another functionality that becomes unusable as a result of current behaviour.

in any case, just wanted to share my 2 cents and happy to see the team is going to address this!