openssl: PEM/DER decoding of PKCS8 RSA private keys are 80 times slower in 3.0

We have two sample programs to parse PKCS8 RSA private keys. First to parse one in PEM format:

#include <openssl/evp.h>
#include <openssl/x509.h>
#include <openssl/pem.h>

static char data[] = "-----BEGIN PRIVATE KEY-----\nMIIJQQIBADANBgkqhkiG9w0BAQEFAASCCSswggknAgEAAoICAQCqD9qzDx5C7niP8lvd8uXwQNHL\nJjakRyEzfbV5FJtJujgOXjF5POm7WICAgtdrct+g5v/SfYvgNkbs06E7th1zFY7cmEimJOIVuDsG\nlMUySlYioxW9QvTERgq3e9NONKYPEabGwMzh6ZYjHYSaKZUDAMlV58XhXLIf4+AAyVoSF7jihbs5\nGb+noksQ4alKxiJDKr/9ijfC+HMj4VQEc41Mib78+ImXMRzPuCj2IPSorY4svMsWLgjm8+4FV/Ps\nSkd2zOeSffeUm6Oa4F0rWnmOyRk6WBKIv6OW2mj3OTh6f7VOBy3tn9xUkY+wFcY6OrFfwgbH2kJ+\n5aibj00mrL4FXLLVKbYp5YAZFvEvujKuAX2M7ZocS95XHwfbQK3ol6wCNK75NLOTroVQtlH/5gcN\niJtxTiW81hwY17IhPwVBhq1ynb+9YEzNoUuqfsEvhtHeRLdU/YYOiXXQVESCYXgQ5ApFRExfVu1t\n8a/MibNIPM+nJIcJh80BNbl2fL9ShMJXuBgRz/Yyp//1NNoC8/MkO/vAaZBCrStOZStZLRyi0Aye\nvvw0a7IIS82h4R6cjte1Vfnd+eX4MOd98a/AHJ/4ecp9lIySQyVf5NrcGTz1A5LtyZdi4teGwBl+\nX2qp4RMu3uJE5+T1xBGRO0wdqJeGLw9jCck9wHP4d6DxUR7J+wIDAQABAoICAA1ytIItY2C6l+kW\nKsKZ5yoBDjYI3xBNmagHPFcHVKJXagBk3XevY/JPNNY0wpE6I8oHClrcV7fSwvgOYjUlGR4VKddy\n6WfOCdza1TwXfqKT80zI5bqyNUXiHg3VamfITQtrA2u7KliBDsDXIDnKqQB0SveSnPjNyj4wWHHn\nekps+s9a8Ou6iAfbEyGSHr+NfH8gPc9wYyl1WTGQq4Kwmo9fYy2A/+xnU1Zvwzl3cLF4DAKoqUyn\nNkgBKSTeCCl61DzmRje042OqjRz9uhBoFq2+ZFLTTR/oO6j9u4g1S6yQNcemVLDyT8uWOS0dA7Bu\nHMGsR7n9Hf4H7jXi9qBkz5/e5mDUHp79ccDUW3gTHDz0Mw15A9JXJd7R1XNkCvNmrTMN9eCDdQRW\nrPMryIX9+ljVswxjrSnSnTOa69Xmc/z90LEgCgYmUhyiax+F1T0AWycMwCdz67BO+rLYuEbcOBo0\nh35uVUeNwrerpFt7kUHBMBRtxkozoAvy1eZEAKifIcceeEkZGQDlzbUjXSAXaSWa3XQFq+lCu5Kn\n88tgjEDSLU8jVIfxdCO3y7pSCqzf02T9f91tb4f3zi/nxc19Ksc5tbC1vj1hMeOKLdDT02r8ygkr\n4Mrfc6AaXQWYpRapx3OA8tNydCnUW1LXq7rQsC12JYn0IKAa5bFx7Ecmi/2JAoIBAQDIZ0es+2D/\nbLVBH6DOGP/eAZzeILwbMOvjm1hB/us6zwTY4MIKWGiA1dkWF4ILNSaGLIbZ+sO6tLzL0LtkynqG\nuWm/25YY8sccwCA6cy8AQ9adN2zJUyWlD+O6tJ0PRim6nV5lnKAmgzSBX6BcTT7GN3BaB/u2JlBq\ncf2F2gkz2eyyOHPKvmT/DoEeTaTSDyAuBAN6lrgTfFI8cszPkMky5dlFz+9kZEC/GKJ33cgYTWxi\nMhKxN+TwIBulghM/LgZ2atgQ0VdRTkqfcSrG+i3XkzhQHc8VAVHzfkHqe1kOfv4ZvFy5QhTOAF0L\nk+sM8P7MIy+PMexpcPqXY89maBQpAoIBAQDZPbXR6tFkTDt/QqGjytbk7CjDzt1KIZPZ4EmFkeXL\nSQjC7NfJu+w8tmpG9V8hN/dCQo2NNaSTNwnX/eikOC/OJuqf11mr3JV+81N82fOz0c6nqZaMBQ7Y\ncZcr9/77pzmACIEe3hY5g+oKO54ENt8h/FYEYPjtn01WyKpXciLFo1Lt/qjxYUwNPRVKrICvDjmN\nCeWhZmpvjl0fyFB6tXLlcAL3Ocp5rbMzhf2emisnnXwbrPmgzmkJ/Wn0wpgk+8SxKA3eeeWKXWgb\nSMs/bbBEIM6MOQ1un8Seg0fVpSGBvCnVtFkwT27nCv6j7COrYkE6BZxj0K5NzyH0aA4LD9GDAoIB\nADuD6JZnxUO1/hJMGU57wCknY3XYVOTiX3ul280lrqg1aOQbw6Sc4tQ4LhNQge9gJoO8X4QG4+/j\n0xnYcH6bX035bH1s8iOQni9co3WYVYIHo4nnNuiHR+vAT0pYbzhlBumD6M/Wdv1ZA9PUGWSwEA9/\n0V77dfZ/ZGxoU/lXalo6wv+eoky4xHe20AO23VcA5PalfH8AmcQ3rJiFI2wVPJtgBWmlOhwfZdca\nss1UUSNeguyaoFB/H/9sGanKenrN6V9rlaVQ9lSQIrs9OY4EKG8YKqYoZCKB1NuySFMhtK4IauAr\nv4HJLTKMixVwJWMfgxwO6wXktqgNxG4HV0W7bRkCggEAYzixS8BxhNrgrd5UD4h8oDBQ6iYYolw1\nuGSdj/k0OKYR713XrVc8rfovDlvR6E00jLnzBxUCJw8TWuiokiDrjL/vl7P3S+zDBynB7xtpGK9y\nMNffX/KLdkZjYnyxpGUbeSPpPZz4D6r1gVj7cjdRsKcc7oEQERAaddHPI4OI6DYRkYwnw5/J6Z4F\nlIa3e70GgimMDSzG3k7qr7KBN5qacLq5UAvAM9UnLRg832zQ2xYt8kIN/eloxlxNQbKDZRjtHHEL\n7JpGQe0puJSF6GGECYnmbNs+DFHCrxeM/sKeTDAR936Y4dzV7YbzCRG4tPV6jzKy3FAa3IUHoCbK\nizjdWwKCAQBc7bQBV7ERpzvd9Ur5Ra7lIxW+0X9gD2U+nIdc7kt5NtvegOFu+OxlAbx7vfm5mx/m\nU9ugBgoUsProO4H6dLfyBbNpydHJvfkycJ65KiBgtr11Upm+8deeZTlckL3Aj2wQGJBKzKyDjBOw\nIHPCCPEczAFc/J6J3JPGcVK9ZCOAUQ2ZiuUl6TItCOCZJ2rpRJS8mpD3Am8dan2fz2UZSN7Xhnvl\nMTip8/M7UmYn9W9IaOybqa8STXBetJExkI2UsdQgZWXyuYXddCcVqPsrHe43leHUZKJq+H8OBdUU\nRCQED9fnCZSloFH2pGE9tQzlbktWmFmqsJib+X70SfLtkpRe\n-----END PRIVATE KEY-----\n";

int main(void)
{
  BIO* bio = BIO_new_mem_buf(data, -1);

  for (int i = 0; i < 1000; ++i) {
    BIO_reset(bio);
    EVP_PKEY* pkey = PEM_read_bio_PrivateKey(bio, NULL, NULL, NULL);
    EVP_PKEY_free(pkey);
  }

  BIO_free(bio);
}

With OpenSSL 1.1.1k, it takes about 0.017s to execute, while with master (6ef2f71ac70aff99da277be4a554e3b1fe739050) it takes about 1.4s.

Next one to decode a DER formatted private key:

#include <openssl/evp.h>
#include <openssl/x509.h>
#include <openssl/pem.h>

static char data[] = "\x30\x82\x09\x41\x02\x01\x00\x30\x0d\x06\x09\x2a\x86\x48\x86\xf7\x0d\x01\x01\x01\x05\x00\x04\x82\x09\x2b\x30\x82\x09\x27\x02\x01\x00\x02\x82\x02\x01\x00\xaa\x0f\xda\xb3\x0f\x1e\x42\xee\x78\x8f\xf2\x5b\xdd\xf2\xe5\xf0\x40\xd1\xcb\x26\x36\xa4\x47\x21\x33\x7d\xb5\x79\x14\x9b\x49\xba\x38\x0e\x5e\x31\x79\x3c\xe9\xbb\x58\x80\x80\x82\xd7\x6b\x72\xdf\xa0\xe6\xff\xd2\x7d\x8b\xe0\x36\x46\xec\xd3\xa1\x3b\xb6\x1d\x73\x15\x8e\xdc\x98\x48\xa6\x24\xe2\x15\xb8\x3b\x06\x94\xc5\x32\x4a\x56\x22\xa3\x15\xbd\x42\xf4\xc4\x46\x0a\xb7\x7b\xd3\x4e\x34\xa6\x0f\x11\xa6\xc6\xc0\xcc\xe1\xe9\x96\x23\x1d\x84\x9a\x29\x95\x03\x00\xc9\x55\xe7\xc5\xe1\x5c\xb2\x1f\xe3\xe0\x00\xc9\x5a\x12\x17\xb8\xe2\x85\xbb\x39\x19\xbf\xa7\xa2\x4b\x10\xe1\xa9\x4a\xc6\x22\x43\x2a\xbf\xfd\x8a\x37\xc2\xf8\x73\x23\xe1\x54\x04\x73\x8d\x4c\x89\xbe\xfc\xf8\x89\x97\x31\x1c\xcf\xb8\x28\xf6\x20\xf4\xa8\xad\x8e\x2c\xbc\xcb\x16\x2e\x08\xe6\xf3\xee\x05\x57\xf3\xec\x4a\x47\x76\xcc\xe7\x92\x7d\xf7\x94\x9b\xa3\x9a\xe0\x5d\x2b\x5a\x79\x8e\xc9\x19\x3a\x58\x12\x88\xbf\xa3\x96\xda\x68\xf7\x39\x38\x7a\x7f\xb5\x4e\x07\x2d\xed\x9f\xdc\x54\x91\x8f\xb0\x15\xc6\x3a\x3a\xb1\x5f\xc2\x06\xc7\xda\x42\x7e\xe5\xa8\x9b\x8f\x4d\x26\xac\xbe\x05\x5c\xb2\xd5\x29\xb6\x29\xe5\x80\x19\x16\xf1\x2f\xba\x32\xae\x01\x7d\x8c\xed\x9a\x1c\x4b\xde\x57\x1f\x07\xdb\x40\xad\xe8\x97\xac\x02\x34\xae\xf9\x34\xb3\x93\xae\x85\x50\xb6\x51\xff\xe6\x07\x0d\x88\x9b\x71\x4e\x25\xbc\xd6\x1c\x18\xd7\xb2\x21\x3f\x05\x41\x86\xad\x72\x9d\xbf\xbd\x60\x4c\xcd\xa1\x4b\xaa\x7e\xc1\x2f\x86\xd1\xde\x44\xb7\x54\xfd\x86\x0e\x89\x75\xd0\x54\x44\x82\x61\x78\x10\xe4\x0a\x45\x44\x4c\x5f\x56\xed\x6d\xf1\xaf\xcc\x89\xb3\x48\x3c\xcf\xa7\x24\x87\x09\x87\xcd\x01\x35\xb9\x76\x7c\xbf\x52\x84\xc2\x57\xb8\x18\x11\xcf\xf6\x32\xa7\xff\xf5\x34\xda\x02\xf3\xf3\x24\x3b\xfb\xc0\x69\x90\x42\xad\x2b\x4e\x65\x2b\x59\x2d\x1c\xa2\xd0\x0c\x9e\xbe\xfc\x34\x6b\xb2\x08\x4b\xcd\xa1\xe1\x1e\x9c\x8e\xd7\xb5\x55\xf9\xdd\xf9\xe5\xf8\x30\xe7\x7d\xf1\xaf\xc0\x1c\x9f\xf8\x79\xca\x7d\x94\x8c\x92\x43\x25\x5f\xe4\xda\xdc\x19\x3c\xf5\x03\x92\xed\xc9\x97\x62\xe2\xd7\x86\xc0\x19\x7e\x5f\x6a\xa9\xe1\x13\x2e\xde\xe2\x44\xe7\xe4\xf5\xc4\x11\x91\x3b\x4c\x1d\xa8\x97\x86\x2f\x0f\x63\x09\xc9\x3d\xc0\x73\xf8\x77\xa0\xf1\x51\x1e\xc9\xfb\x02\x03\x01\x00\x01\x02\x82\x02\x00\x0d\x72\xb4\x82\x2d\x63\x60\xba\x97\xe9\x16\x2a\xc2\x99\xe7\x2a\x01\x0e\x36\x08\xdf\x10\x4d\x99\xa8\x07\x3c\x57\x07\x54\xa2\x57\x6a\x00\x64\xdd\x77\xaf\x63\xf2\x4f\x34\xd6\x34\xc2\x91\x3a\x23\xca\x07\x0a\x5a\xdc\x57\xb7\xd2\xc2\xf8\x0e\x62\x35\x25\x19\x1e\x15\x29\xd7\x72\xe9\x67\xce\x09\xdc\xda\xd5\x3c\x17\x7e\xa2\x93\xf3\x4c\xc8\xe5\xba\xb2\x35\x45\xe2\x1e\x0d\xd5\x6a\x67\xc8\x4d\x0b\x6b\x03\x6b\xbb\x2a\x58\x81\x0e\xc0\xd7\x20\x39\xca\xa9\x00\x74\x4a\xf7\x92\x9c\xf8\xcd\xca\x3e\x30\x58\x71\xe7\x7a\x4a\x6c\xfa\xcf\x5a\xf0\xeb\xba\x88\x07\xdb\x13\x21\x92\x1e\xbf\x8d\x7c\x7f\x20\x3d\xcf\x70\x63\x29\x75\x59\x31\x90\xab\x82\xb0\x9a\x8f\x5f\x63\x2d\x80\xff\xec\x67\x53\x56\x6f\xc3\x39\x77\x70\xb1\x78\x0c\x02\xa8\xa9\x4c\xa7\x36\x48\x01\x29\x24\xde\x08\x29\x7a\xd4\x3c\xe6\x46\x37\xb4\xe3\x63\xaa\x8d\x1c\xfd\xba\x10\x68\x16\xad\xbe\x64\x52\xd3\x4d\x1f\xe8\x3b\xa8\xfd\xbb\x88\x35\x4b\xac\x90\x35\xc7\xa6\x54\xb0\xf2\x4f\xcb\x96\x39\x2d\x1d\x03\xb0\x6e\x1c\xc1\xac\x47\xb9\xfd\x1d\xfe\x07\xee\x35\xe2\xf6\xa0\x64\xcf\x9f\xde\xe6\x60\xd4\x1e\x9e\xfd\x71\xc0\xd4\x5b\x78\x13\x1c\x3c\xf4\x33\x0d\x79\x03\xd2\x57\x25\xde\xd1\xd5\x73\x64\x0a\xf3\x66\xad\x33\x0d\xf5\xe0\x83\x75\x04\x56\xac\xf3\x2b\xc8\x85\xfd\xfa\x58\xd5\xb3\x0c\x63\xad\x29\xd2\x9d\x33\x9a\xeb\xd5\xe6\x73\xfc\xfd\xd0\xb1\x20\x0a\x06\x26\x52\x1c\xa2\x6b\x1f\x85\xd5\x3d\x00\x5b\x27\x0c\xc0\x27\x73\xeb\xb0\x4e\xfa\xb2\xd8\xb8\x46\xdc\x38\x1a\x34\x87\x7e\x6e\x55\x47\x8d\xc2\xb7\xab\xa4\x5b\x7b\x91\x41\xc1\x30\x14\x6d\xc6\x4a\x33\xa0\x0b\xf2\xd5\xe6\x44\x00\xa8\x9f\x21\xc7\x1e\x78\x49\x19\x19\x00\xe5\xcd\xb5\x23\x5d\x20\x17\x69\x25\x9a\xdd\x74\x05\xab\xe9\x42\xbb\x92\xa7\xf3\xcb\x60\x8c\x40\xd2\x2d\x4f\x23\x54\x87\xf1\x74\x23\xb7\xcb\xba\x52\x0a\xac\xdf\xd3\x64\xfd\x7f\xdd\x6d\x6f\x87\xf7\xce\x2f\xe7\xc5\xcd\x7d\x2a\xc7\x39\xb5\xb0\xb5\xbe\x3d\x61\x31\xe3\x8a\x2d\xd0\xd3\xd3\x6a\xfc\xca\x09\x2b\xe0\xca\xdf\x73\xa0\x1a\x5d\x05\x98\xa5\x16\xa9\xc7\x73\x80\xf2\xd3\x72\x74\x29\xd4\x5b\x52\xd7\xab\xba\xd0\xb0\x2d\x76\x25\x89\xf4\x20\xa0\x1a\xe5\xb1\x71\xec\x47\x26\x8b\xfd\x89\x02\x82\x01\x01\x00\xc8\x67\x47\xac\xfb\x60\xff\x6c\xb5\x41\x1f\xa0\xce\x18\xff\xde\x01\x9c\xde\x20\xbc\x1b\x30\xeb\xe3\x9b\x58\x41\xfe\xeb\x3a\xcf\x04\xd8\xe0\xc2\x0a\x58\x68\x80\xd5\xd9\x16\x17\x82\x0b\x35\x26\x86\x2c\x86\xd9\xfa\xc3\xba\xb4\xbc\xcb\xd0\xbb\x64\xca\x7a\x86\xb9\x69\xbf\xdb\x96\x18\xf2\xc7\x1c\xc0\x20\x3a\x73\x2f\x00\x43\xd6\x9d\x37\x6c\xc9\x53\x25\xa5\x0f\xe3\xba\xb4\x9d\x0f\x46\x29\xba\x9d\x5e\x65\x9c\xa0\x26\x83\x34\x81\x5f\xa0\x5c\x4d\x3e\xc6\x37\x70\x5a\x07\xfb\xb6\x26\x50\x6a\x71\xfd\x85\xda\x09\x33\xd9\xec\xb2\x38\x73\xca\xbe\x64\xff\x0e\x81\x1e\x4d\xa4\xd2\x0f\x20\x2e\x04\x03\x7a\x96\xb8\x13\x7c\x52\x3c\x72\xcc\xcf\x90\xc9\x32\xe5\xd9\x45\xcf\xef\x64\x64\x40\xbf\x18\xa2\x77\xdd\xc8\x18\x4d\x6c\x62\x32\x12\xb1\x37\xe4\xf0\x20\x1b\xa5\x82\x13\x3f\x2e\x06\x76\x6a\xd8\x10\xd1\x57\x51\x4e\x4a\x9f\x71\x2a\xc6\xfa\x2d\xd7\x93\x38\x50\x1d\xcf\x15\x01\x51\xf3\x7e\x41\xea\x7b\x59\x0e\x7e\xfe\x19\xbc\x5c\xb9\x42\x14\xce\x00\x5d\x0b\x93\xeb\x0c\xf0\xfe\xcc\x23\x2f\x8f\x31\xec\x69\x70\xfa\x97\x63\xcf\x66\x68\x14\x29\x02\x82\x01\x01\x00\xd9\x3d\xb5\xd1\xea\xd1\x64\x4c\x3b\x7f\x42\xa1\xa3\xca\xd6\xe4\xec\x28\xc3\xce\xdd\x4a\x21\x93\xd9\xe0\x49\x85\x91\xe5\xcb\x49\x08\xc2\xec\xd7\xc9\xbb\xec\x3c\xb6\x6a\x46\xf5\x5f\x21\x37\xf7\x42\x42\x8d\x8d\x35\xa4\x93\x37\x09\xd7\xfd\xe8\xa4\x38\x2f\xce\x26\xea\x9f\xd7\x59\xab\xdc\x95\x7e\xf3\x53\x7c\xd9\xf3\xb3\xd1\xce\xa7\xa9\x96\x8c\x05\x0e\xd8\x71\x97\x2b\xf7\xfe\xfb\xa7\x39\x80\x08\x81\x1e\xde\x16\x39\x83\xea\x0a\x3b\x9e\x04\x36\xdf\x21\xfc\x56\x04\x60\xf8\xed\x9f\x4d\x56\xc8\xaa\x57\x72\x22\xc5\xa3\x52\xed\xfe\xa8\xf1\x61\x4c\x0d\x3d\x15\x4a\xac\x80\xaf\x0e\x39\x8d\x09\xe5\xa1\x66\x6a\x6f\x8e\x5d\x1f\xc8\x50\x7a\xb5\x72\xe5\x70\x02\xf7\x39\xca\x79\xad\xb3\x33\x85\xfd\x9e\x9a\x2b\x27\x9d\x7c\x1b\xac\xf9\xa0\xce\x69\x09\xfd\x69\xf4\xc2\x98\x24\xfb\xc4\xb1\x28\x0d\xde\x79\xe5\x8a\x5d\x68\x1b\x48\xcb\x3f\x6d\xb0\x44\x20\xce\x8c\x39\x0d\x6e\x9f\xc4\x9e\x83\x47\xd5\xa5\x21\x81\xbc\x29\xd5\xb4\x59\x30\x4f\x6e\xe7\x0a\xfe\xa3\xec\x23\xab\x62\x41\x3a\x05\x9c\x63\xd0\xae\x4d\xcf\x21\xf4\x68\x0e\x0b\x0f\xd1\x83\x02\x82\x01\x00\x3b\x83\xe8\x96\x67\xc5\x43\xb5\xfe\x12\x4c\x19\x4e\x7b\xc0\x29\x27\x63\x75\xd8\x54\xe4\xe2\x5f\x7b\xa5\xdb\xcd\x25\xae\xa8\x35\x68\xe4\x1b\xc3\xa4\x9c\xe2\xd4\x38\x2e\x13\x50\x81\xef\x60\x26\x83\xbc\x5f\x84\x06\xe3\xef\xe3\xd3\x19\xd8\x70\x7e\x9b\x5f\x4d\xf9\x6c\x7d\x6c\xf2\x23\x90\x9e\x2f\x5c\xa3\x75\x98\x55\x82\x07\xa3\x89\xe7\x36\xe8\x87\x47\xeb\xc0\x4f\x4a\x58\x6f\x38\x65\x06\xe9\x83\xe8\xcf\xd6\x76\xfd\x59\x03\xd3\xd4\x19\x64\xb0\x10\x0f\x7f\xd1\x5e\xfb\x75\xf6\x7f\x64\x6c\x68\x53\xf9\x57\x6a\x5a\x3a\xc2\xff\x9e\xa2\x4c\xb8\xc4\x77\xb6\xd0\x03\xb6\xdd\x57\x00\xe4\xf6\xa5\x7c\x7f\x00\x99\xc4\x37\xac\x98\x85\x23\x6c\x15\x3c\x9b\x60\x05\x69\xa5\x3a\x1c\x1f\x65\xd7\x1a\xb2\xcd\x54\x51\x23\x5e\x82\xec\x9a\xa0\x50\x7f\x1f\xff\x6c\x19\xa9\xca\x7a\x7a\xcd\xe9\x5f\x6b\x95\xa5\x50\xf6\x54\x90\x22\xbb\x3d\x39\x8e\x04\x28\x6f\x18\x2a\xa6\x28\x64\x22\x81\xd4\xdb\xb2\x48\x53\x21\xb4\xae\x08\x6a\xe0\x2b\xbf\x81\xc9\x2d\x32\x8c\x8b\x15\x70\x25\x63\x1f\x83\x1c\x0e\xeb\x05\xe4\xb6\xa8\x0d\xc4\x6e\x07\x57\x45\xbb\x6d\x19\x02\x82\x01\x00\x63\x38\xb1\x4b\xc0\x71\x84\xda\xe0\xad\xde\x54\x0f\x88\x7c\xa0\x30\x50\xea\x26\x18\xa2\x5c\x35\xb8\x64\x9d\x8f\xf9\x34\x38\xa6\x11\xef\x5d\xd7\xad\x57\x3c\xad\xfa\x2f\x0e\x5b\xd1\xe8\x4d\x34\x8c\xb9\xf3\x07\x15\x02\x27\x0f\x13\x5a\xe8\xa8\x92\x20\xeb\x8c\xbf\xef\x97\xb3\xf7\x4b\xec\xc3\x07\x29\xc1\xef\x1b\x69\x18\xaf\x72\x30\xd7\xdf\x5f\xf2\x8b\x76\x46\x63\x62\x7c\xb1\xa4\x65\x1b\x79\x23\xe9\x3d\x9c\xf8\x0f\xaa\xf5\x81\x58\xfb\x72\x37\x51\xb0\xa7\x1c\xee\x81\x10\x11\x10\x1a\x75\xd1\xcf\x23\x83\x88\xe8\x36\x11\x91\x8c\x27\xc3\x9f\xc9\xe9\x9e\x05\x94\x86\xb7\x7b\xbd\x06\x82\x29\x8c\x0d\x2c\xc6\xde\x4e\xea\xaf\xb2\x81\x37\x9a\x9a\x70\xba\xb9\x50\x0b\xc0\x33\xd5\x27\x2d\x18\x3c\xdf\x6c\xd0\xdb\x16\x2d\xf2\x42\x0d\xfd\xe9\x68\xc6\x5c\x4d\x41\xb2\x83\x65\x18\xed\x1c\x71\x0b\xec\x9a\x46\x41\xed\x29\xb8\x94\x85\xe8\x61\x84\x09\x89\xe6\x6c\xdb\x3e\x0c\x51\xc2\xaf\x17\x8c\xfe\xc2\x9e\x4c\x30\x11\xf7\x7e\x98\xe1\xdc\xd5\xed\x86\xf3\x09\x11\xb8\xb4\xf5\x7a\x8f\x32\xb2\xdc\x50\x1a\xdc\x85\x07\xa0\x26\xca\x8b\x38\xdd\x5b\x02\x82\x01\x00\x5c\xed\xb4\x01\x57\xb1\x11\xa7\x3b\xdd\xf5\x4a\xf9\x45\xae\xe5\x23\x15\xbe\xd1\x7f\x60\x0f\x65\x3e\x9c\x87\x5c\xee\x4b\x79\x36\xdb\xde\x80\xe1\x6e\xf8\xec\x65\x01\xbc\x7b\xbd\xf9\xb9\x9b\x1f\xe6\x53\xdb\xa0\x06\x0a\x14\xb0\xfa\xe8\x3b\x81\xfa\x74\xb7\xf2\x05\xb3\x69\xc9\xd1\xc9\xbd\xf9\x32\x70\x9e\xb9\x2a\x20\x60\xb6\xbd\x75\x52\x99\xbe\xf1\xd7\x9e\x65\x39\x5c\x90\xbd\xc0\x8f\x6c\x10\x18\x90\x4a\xcc\xac\x83\x8c\x13\xb0\x20\x73\xc2\x08\xf1\x1c\xcc\x01\x5c\xfc\x9e\x89\xdc\x93\xc6\x71\x52\xbd\x64\x23\x80\x51\x0d\x99\x8a\xe5\x25\xe9\x32\x2d\x08\xe0\x99\x27\x6a\xe9\x44\x94\xbc\x9a\x90\xf7\x02\x6f\x1d\x6a\x7d\x9f\xcf\x65\x19\x48\xde\xd7\x86\x7b\xe5\x31\x38\xa9\xf3\xf3\x3b\x52\x66\x27\xf5\x6f\x48\x68\xec\x9b\xa9\xaf\x12\x4d\x70\x5e\xb4\x91\x31\x90\x8d\x94\xb1\xd4\x20\x65\x65\xf2\xb9\x85\xdd\x74\x27\x15\xa8\xfb\x2b\x1d\xee\x37\x95\xe1\xd4\x64\xa2\x6a\xf8\x7f\x0e\x05\xd5\x14\x44\x24\x04\x0f\xd7\xe7\x09\x94\xa5\xa0\x51\xf6\xa4\x61\x3d\xb5\x0c\xe5\x6e\x4b\x56\x98\x59\xaa\xb0\x98\x9b\xf9\x7e\xf4\x49\xf2\xed\x92\x94\x5e";

int main(void)
{
  BIO* bio = BIO_new_mem_buf(data, 2373);
  PKCS8_PRIV_KEY_INFO* info = d2i_PKCS8_PRIV_KEY_INFO_bio(bio, NULL);
  BIO_free(bio);

  for (int i = 0; i < 1000; ++i) {
    EVP_PKEY* privKey = EVP_PKCS82PKEY(info);
    EVP_PKEY_free(privKey);
  }

  PKCS8_PRIV_KEY_INFO_free(info);
}

This takes about 0.005s with 1.1.1k, while it takes about 0.44s with master. I’m using x64 linux, with Core i7-8700K.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 40 (39 by maintainers)

Commits related to this issue

Most upvoted comments

There may be some structural issues to address before doing that. This collect_extra_decoder function seems to called 2000 times in a single private key parsing operation.

Poking around the code in the profile, it seems OpenSSL is quadratic in the number of decoders you have, and there are quite a lot of them. I don’t fully follow what OSSL_DECODER_CTX_add_extra is doing, but it seems OpenSSL now models generic “container” formats like PEM, PrivateKeyInfo, etc., as pluggable conversion functions with input/output formats? And then, before it does anything, it looks like OSSL_DECODER_CTX_add_extra tries to build all possible paths through these conversion functions, matching inputs to outputs?

That first layer of this path-building starts with a set of leaf “decoder instances” and then, for each one (twice, actually), iterates over every “decoder” to find the ones that can chain backwards to that decoder instances input type. That means decoders are tapped on the head 2 * decoder_instances * decoders times. (And then the next layer adds another O(decoders) iterations. I suspect this would also go exponential if the wrong series of container formats were added to the system.)

I couldn’t find any documentation on why this path-builder design was chosen. It is a little odd to me because one shouldn’t need to explore the whole tree of paths to parse one input. I think this comes out of OpenSSL working backwards from all possible outputs, rather than working forwards from the one input you have. The way these container formats are designed (and generally when parsing things), the forwards is the natural direction:

  • Based on PEM_read_PrivateKey being called, we know the input is PEM. PEM is a type string (BEGIN WHATEVER) followed by base64-encoded data.
  • Run the PEM decoder to get out the type string and base64-decode the data. This decoder can be key-independent. It’s just fancy base64.
  • Based on the type string, call the appropriate decoder on the data. This could be made pluggable by allowing providers to register decoders for PEM types
    • In @hannob’s sample, the type is RSA PRIVATE KEY, so you find the provider(s) that support RSA PRIVATE KEY and ask them to parse the data. They should parse an RSAPrivateKey ASN.1 structure and that’s the end of it.
    • In the original report, the type is PRIVATE KEY. That means the type is PKCS#8 PrivateKeyInfo. That too is a generic, provider-independent format that, more-or-less, expands to another (type, data) pair. This time the type is an AlgorithmIdentifier.
      • Call the PKCS#8 PrivateKeyInfo decoder to extract the type and data. This decoder can also be key-independent.
      • Now, based on the type OID in the PrivateKeyInfo, call the appropriate decoder. This likewise can be made pluggable by allowing providers to register decoders for PKCS#8 types.
      • If the type is rsaEncryption, find the provider(s) that support that and ask them to parse the data. They should again parse an RSAPrivateKey ASN.1 structure and that’s the end of it.

At no point do you need to explore every possible chain of decoders. The only operation you need is “I have an input of format X. Call the function that parses X”. Where there need only be a single implementation, that can be a direct function call. Where you wish it to be pluggable, that only needs a simple lookup. A container format decoder is just something that needs to do that dispatch a second time.

@mattcaswell I like your improvement but it adds complexity of the system - and @davidben describes smth that looks to me like a simplification…

Yes, I agree but it’s probably more of a 3.3 thing whereas I think we can get my patch into 3.2. It’s not completely clear to me if we can do what @davidben suggests without breaking changes/new API. It’s at the least a significant refactor. I do think we should do it though.

Regardless of my patch I think @davidben makes good points and we do need to look at the design of the decoder subsystem.

This reminds me that we had/have code that tries all possible formats and key types, even when we already know which format and key type it is. There are just too many calls to those functions.

Incorporating my recent performance fixes, as well as my libctx refactor, this drops from 1.315s in 3.0 to 0.982s (1.34x speedup). 1.1 is still 0.038s, so this remains an issue.

Yes the OSSL_LIB_CTX locking is particularly horrific (as mentioned in #17116) and ossl_lib_ctx_get_data will be a primary path for hitting that code - so its not a big surprise that we’re running into problems as a result.