ulid: Duplicate ID's

I’m adding to the DB in very quick succession around 1000 rows.

I decided to output the ULIDs so I could catch if there were dupes:

Here are the relevant parts:

01F9VG0NT0ACNSHJM2J4G6WH3E
01F9VG0NT0ACNSHJM2J8WRTTJE
01F9VG0NT0ACNSHJM2JAC9W6YT
01F9VG0NT0ACNSHJM2JDWKQS65
01F9VG0NT0ACNSHJM2J8KV2SB2
01F9VG0NT0ACNSHJM2J4G6WH3E
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM2J4G6WH3E' for key 'PRIMARY'
01F9VG0NT0ACNSHJM2N53XTXJH
01F9VG0NT0ACNSHJM2N8BWJXPF
01F9VG0NT0ACNSHJM2N8QXRBDM
01F9VG0NT0ACNSHJM2N53XTXJH
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM2N53XTXJH' for key 'PRIMARY'
01F9VG0NT0ACNSHJM2SMR13HZ2
01F9VG0NT0ACNSHJM2SQVZW987
01F9VG0NT0ACNSHJM2SVQJ87QQ
01F9VG0NT0ACNSHJM2SMR13HZ2
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM2SMR13HZ2' for key 'PRIMARY'
01F9VG0NT0ACNSHJM30062WA6F
01F9VG0NT0ACNSHJM30297Z6DR
01F9VG0NT0ACNSHJM30062WA6F
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM30062WA6F' for key 'PRIMARY'
01F9VG0NT0ACNSHJM39T0D5B1B
01F9VG0NT0ACNSHJM39ZVD159R
01F9VG0NT0ACNSHJM3A59EMVZC
01F9VG0NT0ACNSHJM39XSN0A84
01F9VG0NT0ACNSHJM3A34FHCAR
01F9VG0NT0ACNSHJM3A5CKZKJT
01F9VG0NT0ACNSHJM39T0D5B1B
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM39T0D5B1B' for key 'PRIMARY'
01F9VG0NT0ACNSHJM3APRCBXA7
01F9VG0NT0ACNSHJM3ATGM0XHQ
01F9VG0NT0ACNSHJM3APRCBXA7
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM3APRCBXA7' for key 'PRIMARY'
01F9VG0NT0ACNSHJM3CTT1B09C
01F9VG0NT0ACNSHJM3CZHHK5QT
01F9VG0NT0ACNSHJM3D12FM3C4
01F9VG0NT0ACNSHJM3CXQGM6GM
01F9VG0NT0ACNSHJM3D3H1M9EE
01F9VG0NT0ACNSHJM3D4REME36
01F9VG0NT0ACNSHJM3CTT1B09C
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM3CTT1B09C' for key 'PRIMARY'
01F9VG0NT0ACNSHJM3DPF00J7M
01F9VG0NT0ACNSHJM3DG1C0KAC
01F9VG0NT0ACNSHJM3DQR5G89S
01F9VG0NT0ACNSHJM3DV3SHQ6M
01F9VG0NT0ACNSHJM3DY6049MW
01F9VG0NT0ACNSHJM3DPF00J7M
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM3DPF00J7M' for key 'PRIMARY'
01F9VG0NT0ACNSHJM3VYDT1FKJ
01F9VG0NT0ACNSHJM3VYDT1FKJ
Error 1062: Duplicate entry '01F9VG0NT0ACNSHJM3VYDT1FKJ' for key 'PRIMARY'

My code is broken into three parts, as it’s a complex long running process:

stored in a struct:

now := time.Now().UTC()
entropy := ulid.Monotonic(rand.New(rand.NewSource(now.UnixNano())), 0)

I generate the ID (m is a struct):

job.ID = common.GenUlid(m.now, m.entropy)

Func:

func GenUlid(now time.Time, entropy io.Reader) string {
	return ulid.MustNew(ulid.Timestamp(now), entropy).String()
}

Thanks for your time.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (10 by maintainers)

Most upvoted comments

I mean, ULID generation is actually perfectly deterministic. It consists of two parts: 48 bits of time, and 80 bits of “entropy” which is just 10 bytes that you claim are random. If you provide the same time and entropy components you get the same ULID.

The package gives you some helpers to generate these parts.

The ulid.Timestamp helper lets you pass a time.Time order to get a uint64 that represents that time in Unix milliseconds. If you run ULID generation in a loop, and pass time.Now in each loop iteration, and the whole thing runs entirely in the same millisecond, then the time part of every ULID will be the same. If it splits over 2 different milliseconds, then you will have some ULIDs with one time component, and some with another. If each loop iteration is expensive — say because it has to generate cryptographically secure random data each time, too — then it’s likely to take longer, and your final set of ULIDs will have many unique time components.

The package models entropy as an io.Reader for convenience but every time you make a ULID it just reads 10 bytes and uses them. If all you care about is not getting duplicates, then you can just write an io.Reader that returns [0 0 0 0 0 0 0 0 0 1] on the first read, [0 0 0 0 0 0 0 0 0 2] on the second, all the way up to [0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff], and then looping back. This will maximize the uniqueness. But it’s of course totally predictable. The package’s Monotonic helper does basically this, but adds in a configurable amount of entropy to avoid the predictability.

So I guess I’m saying there shouldn’t really be any surprises here, and — as far as I know — there’s no bugs in the package or anything. It’s just a question of understanding what’s going on and using the tools we give you correctly.

I am confident the problem is not in this package.

e.g.

package main

import (
	"encoding/binary"
	"fmt"
	"sync"
	"sync/atomic"
	"testing"
	"time"

	"github.com/oklog/ulid/v2"
)

type notReallyEntropy uint64

func (u *notReallyEntropy) Read(p []byte) (int, error) {
	v := atomic.AddUint64((*uint64)(u), 1)
	binary.BigEndian.PutUint64(p, v)
	return len(p), nil
}

func Test3(_ *testing.T) {
	var entropy notReallyEntropy
	var wg sync.WaitGroup
	for i := 0; i < 1000; i++ {
		wg.Add(1)
		go func() {
			t := time.Now()
			v, err := ulid.New(ulid.Timestamp(t), &entropy)
			if err != nil {
				fmt.Printf("err: %s\n", err)
			} else {
				fmt.Printf("%s\n", v.String())
			}
			wg.Done()
		}()
	}
	wg.Wait()
}
$ go test -v -run Test3 > out
$ wc -l out ; cat out | sort -u | wc -l
    1004 out
    1004

I had the same problem when using goroutines, and using crypto/rand solved it.

example:

package main

import (
	crand "crypto/rand"
	"fmt"
	"math/rand"
	"sync"
	"testing"
	"time"

	"github.com/oklog/ulid/v2"
)

func Test1(_ *testing.T) {
	var wg sync.WaitGroup
	for i := 0; i < 1000; i++ {
		wg.Add(1)
		go func() {
			t := time.Now()
			entropy := ulid.Monotonic(rand.New(rand.NewSource(t.UnixNano())), 0)
			v, err := ulid.New(ulid.Timestamp(t), entropy)
			if err != nil {
				fmt.Printf("err: %s\n", err)
			} else {
				fmt.Printf("%s\n", v.String())
			}
			wg.Done()
		}()
	}
	wg.Wait()
}

func Test2(_ *testing.T) {
	var wg sync.WaitGroup
	for i := 0; i < 1000; i++ {
		wg.Add(1)
		go func() {
			t := time.Now()
			v, err := ulid.New(ulid.Timestamp(t), crand.Reader)
			if err != nil {
				fmt.Printf("err: %s\n", err)
			} else {
				fmt.Printf("%s\n", v.String())
			}
			wg.Done()
		}()
	}
	wg.Wait()
}

Running Test1 generates duplicate IDs, I found this by running sort -u, Test2 does not.

$ go test -v -run Test1 > out
$ wc -l out; cat out | sort -u | wc -l
    1004 out
     849

$ go test -v -run Test2 > out
$ wc -l out; cat out | sort -u | wc -l
    1004 out
    1004

Can you share your code that causes duplicate? I also don’t find any update for 10k generate.

https://play.golang.org/p/GZotlS37bJf

@giautm

c3i1og23q5669c1odvgg is xid: https://github.com/rs/xid

@paulm17 Just in case you didn’t notice this - I’m not the repo owner, so don’t take my “I can’t reproduce this” as owner’s “I wont fix this”. So, I’m not sure it makes sense to close this if you are seeing this weird behaviour. But short code example which reproduce the issue at least on your system would be really nice to have.