triplet-reid: Unable to approach loss of less than 0.7 even when testing multiple learning rates.

I have tried many different learning rates and optimizers but I have not once seen a min loss drop below 0.69.

If I use learning_rate = 1e-2:

iter:    20, loss min|avg|max: 0.713|2.607|60.013, batch-p@3: 4.43%, ETA: 6:01:18 (0.87s/it)
iter:    40, loss min|avg|max: 0.696|1.204|21.239, batch-p@3: 5.99%, ETA: 5:58:53 (0.86s/it)
iter:    60, loss min|avg|max: 0.696|1.643|25.543, batch-p@3: 4.69%, ETA: 5:36:32 (0.81s/it)
iter:    80, loss min|avg|max: 0.695|1.679|42.339, batch-p@3: 7.03%, ETA: 5:58:01 (0.86s/it)
iter:   100, loss min|avg|max: 0.694|1.806|47.572, batch-p@3: 6.51%, ETA: 6:08:57 (0.89s/it)
iter:   120, loss min|avg|max: 0.695|1.200|21.791, batch-p@3: 4.43%, ETA: 6:14:15 (0.90s/it)
iter:   140, loss min|avg|max: 0.694|2.744|87.940, batch-p@3: 5.47%, ETA: 6:21:29 (0.92s/it)

If I use learning_rate = 1e-6:

iter:    20, loss min|avg|max: 0.741|14.827|440.151, batch-p@3: 1.04%, ETA: 6:23:26 (0.92s/it)
iter:    40, loss min|avg|max: 0.712|9.662|146.125, batch-p@3: 2.86%, ETA: 6:03:24 (0.87s/it)
iter:    60, loss min|avg|max: 0.697|3.944|100.707, batch-p@3: 4.17%, ETA: 6:10:44 (0.89s/it)
iter:    80, loss min|avg|max: 0.695|2.408|75.002, batch-p@3: 2.86%, ETA: 5:44:48 (0.83s/it)
iter:   100, loss min|avg|max: 0.694|2.272|67.504, batch-p@3: 2.86%, ETA: 6:03:45 (0.88s/it)
iter:   120, loss min|avg|max: 0.694|1.091|17.292, batch-p@3: 2.86%, ETA: 5:42:45 (0.83s/it)
iter:   140, loss min|avg|max: 0.693|1.069|15.975, batch-p@3: 5.73%, ETA: 5:46:48 (0.84s/it)
...
iter:   900, loss min|avg|max: 0.693|0.694| 0.709, batch-p@3: 2.08%, ETA: 5:15:00 (0.78s/it)
iter:   920, loss min|avg|max: 0.693|0.693| 0.701, batch-p@3: 2.34%, ETA: 5:39:12 (0.85s/it)
iter:   940, loss min|avg|max: 0.693|0.694| 0.704, batch-p@3: 5.99%, ETA: 5:46:12 (0.86s/it)
iter:   960, loss min|avg|max: 0.693|0.693| 0.705, batch-p@3: 2.86%, ETA: 5:24:59 (0.81s/it)
iter:   980, loss min|avg|max: 0.693|0.693| 0.700, batch-p@3: 3.65%, ETA: 5:39:47 (0.85s/it)
iter:  1000, loss min|avg|max: 0.693|0.693| 0.698, batch-p@3: 3.39%, ETA: 5:27:59 (0.82s/it)
iter:  1020, loss min|avg|max: 0.693|0.693| 0.700, batch-p@3: 6.51%, ETA: 5:36:38 (0.84s/it)
iter:  1040, loss min|avg|max: 0.693|0.694| 0.699, batch-p@3: 2.86%, ETA: 5:22:05 (0.81s/it)
...
iter:  1640, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 2.60%, ETA: 5:09:58 (0.80s/it)
iter:  1660, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 2.08%, ETA: 5:48:27 (0.90s/it)
iter:  1680, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 4.43%, ETA: 5:23:23 (0.83s/it)
iter:  1700, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 6.51%, ETA: 5:25:04 (0.84s/it)
iter:  1720, loss min|avg|max: 0.693|0.693| 0.694, batch-p@3: 3.12%, ETA: 5:39:08 (0.87s/it)

What does this affectively mean? “Nonzero triplets never decreases” - not quite sure what that means?


I am using the vgg dataset with the file structure like this:

class_a/file.jpg
class_b/file.jpg
class_c/file.jpg
...

I set the pids, fids = [], [] like this:

classes = [path for path in os.listdir(DATA_DIR) if os.path.isdir(os.path.join(DATA_DIR, path))]
for c in classes:
    for file in glob.glob(DATA_DIR+c+"/*.jpg"):
        pids.append(c)
        fids.append(file)

where DATA_DIR is the directory of the vgg dataset.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 57 (57 by maintainers)

Most upvoted comments

I have learn’t two important things in the last week. Check your dataset about 5,000,000 times or write tests 😉 . Don’t let iteration count impact your assumption at all. I have used the parameters and as a last effort let it run over night:

tech = 'batch_hard'# 'batch_hard' 'batch_sample' 'batch_all'
arch = 'resnet_v1_50'
batch_k = 2
batch_p = 18
learning_rate = 3e-5
epsilon = 1e-8
optimizer_name = 'Adam' # Adam MO RMS
train_iterations = 50000
decay_start_iteration = 20000
checkpoint_frequency = 1000
net_input_size = (256, 256)
embedding_dim = 128
margin = 'soft' # 'soft'
metric='euclidean' #sqeuclidean
output_model = config.feature_model_dir + "tmp"
out_dir = output_model + "/save/"
log_every = 5

by iteration 25,000 pretty much every mean loss is below 0.7 and averaging 0.312 average min loss is < 0.05 but slightly worryingly the max loss has not changed at all 1-4. Looking forward to implementation!

As in one of toys or a small easy dataset? haha!

OMG completely misunderstood checkpointing. Well half misunderstood. Downloading now…