irods: VERIFY_CHKSUM_KW failure leaves a current data object in a bad state
- master
- 4-2-stable
Bug Report
iRODS Version, OS and Version
4.2.7, Linux, Ubuntu bionic
What did you try to do?
rcDataObjPut
a file whose MD5 is 6a57b28b63eede75341f2e06b428cf7a
to a replication resource using VERIFY_CHKSUM_KW
while intentionally passing
an incorrect checksum of My hovercraft is full of eels
.
Expected behavior
The put to fail with USER_CHKSUM_MISMATCH
and the data object either not be created
or be created with a stale state.
Observed behavior (including steps to reproduce, if applicable)
The put fails with USER_CHKSUM_MISMATCH
and a data object is created with no
checksum and size 0 bytes. The data object is marked as current (&
with ils -L
).
Moreover, any client subsequently running ichksum
on the data object will generate a
checksum of 6a57b28b63eede75341f2e06b428cf7a
(MD5, or equivalent for other
schemes), rather than the checksum of a file of size 0 bytes.
This leaves a data object having “no contents” while paradoxically having a correct checksum that doesn’t match the retrievable data.
Data file:
C program:
#include <rodsClient.h>
/*
Tested on iRODS 4.2.7
This program puts a new data object using rcDataObjPut with the
VERIFY_CHKSUM_KW parameter set to an intentionally incorrect
checksum. This simulates a checksum verification failure.
On a replication resource, one replica is left in a current (&) state
having size 0 bytes and no checksum.
I would expect there to be either no data object at all, or for the
data object to be marked stale.
Moreover, any client subsequently running ichksum on the data object
will generate a checksum of 6a57b28b63eede75341f2e06b428cf7a (MD5, or
equivalent for other schemes), rather than the checksum of a file of
size 0 bytes.
This leaves a data object having "no contents" while paradoxically
having a correct checksum that doesn't match the retrievable data.
*/
int main(int argc, char *argv[]) {
setenv(SP_OPTION, "verify-reproc", 1);
char* local_file = "eels.txt";
// Real checksum is "6a57b28b63eede75341f2e06b428cf7a";
char *local_checksum = "My hovercraft is full of eels";
char *remote_path = "/testZone/home/irods/eels.txt";
int status;
rodsEnv env;
rErrMsg_t errmsg;
rcComm_t *conn = NULL;
status = getRodsEnv(&env);
if (status != 0) {
fprintf(stderr, "Failed to load iRODS environment\n");
return 1;
}
conn = rcConnect(env.rodsHost, env.rodsPort, env.rodsUserName,
env.rodsZone, NO_RECONN, &errmsg);
if (!conn) {
fprintf(stderr, "Failed to connect\n");
return 1;
}
fprintf(stderr, "Connected to %s:%d zone %s as %s\n",
env.rodsHost, env.rodsPort, env.rodsZone,
env.rodsUserName);
init_client_api_table();
status = clientLogin(conn, "", "");
if (status != 0) {
fprintf(stderr, "Failed to login\n");
return 1;
}
dataObjInp_t obj_inp;
memset(&obj_inp, 0, sizeof obj_inp);
obj_inp.openFlags = O_WRONLY;
obj_inp.createMode = 0750;
snprintf(obj_inp.objPath, MAX_NAME_LEN, "%s", remote_path);
addKeyVal(&obj_inp.condInput, DEF_RESC_NAME_KW, env.rodsDefResource);
addKeyVal(&obj_inp.condInput, VERIFY_CHKSUM_KW, local_checksum);
status = rcDataObjPut(conn, &obj_inp, local_file);
if (status != 0) {
char *err_subname;
const char *err_name = rodsErrorName(status, &err_subname);
fprintf(stderr, "Failed to put data object: error %d %s\n",
status, err_name);
if (status == USER_CHKSUM_MISMATCH) {
fprintf(stderr, "Got the expected failure\n");
status = 0;
} else {
fprintf(stderr, "Did not reproduce the expected failure\n");
}
}
rcDisconnect(conn);
if (status == 0) {
return status;
}
return 1;
}
Makefile:
CC = gcc
CFLAGS = -g -std=c99 -Wall -Werror
LIBS = -lirods_client -lirods_common -lstdc++ -lssl -lcrypto -lgssapi_krb5 -lpthread -ldl -lm -lrt -ljansson
PREFIX = /opt/irods
CPPFLAGS = -I$(PREFIX)/include -I$(PREFIX)/include/irods
LDFLAGS = -L$(PREFIX)/lib -L$(PREFIX)/lib/irods/externals
all: verify-repro
verify-repro: verify-repro.c
$(CC) $($CFLAGS) $(CPPFLAGS) -o verify-repro verify-repro.c $(LDFLAGS) $(LIBS)
test: verify-repro
./verify-repro
clean:
rm verify-repro
.PHONY: all test clean
Example:
$ make -f Makefile.repro test PREFIX=/home/keith/.local/miniconda/envs/baton-dev
gcc -I/home/keith/.local/miniconda/envs/baton-dev/include -I/home/keith/.local/miniconda/envs/baton-dev/include/irods -o verify-repro verify-repro.c -L/home/keith/.local/miniconda/envs/baton-dev/lib -L/home/keith/.local/miniconda/envs/baton-dev/lib/irods/externals -lirods_client -lirods_common -lstdc++ -lssl -lcrypto -lgssapi_krb5 -lpthread -ldl -lm -lrt -ljansson
./verify-repro
Connected to localhost:1247 zone testZone as irods
Failed to put data object: error -314000 USER_CHKSUM_MISMATCH
Got the expected failure
$ ils -L
/testZone/home/irods:
irods 0 replResc;unixfs2 0 2021-02-08.14:58 & eels.txt
generic /var/lib/irods/iRODS/VaultRy/home/irods/eels.txt
$ ichksum eels.txt
eels.txt 6a57b28b63eede75341f2e06b428cf7a
Total checksum performed = 1, Failed checksum = 0
$ ils -L
/testZone/home/irods:
irods 0 replResc;unixfs2 0 2021-02-08.14:58 & eels.txt
6a57b28b63eede75341f2e06b428cf7a generic /var/lib/irods/iRODS/VaultRy/home/irods/eels.txt
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (12 by maintainers)
Commits related to this issue
- [#5400] rsDataObjPut always honors the VERIFY_CHKSUM_KW keyword. — committed to korydraughn/irods by korydraughn 3 years ago
- [#5400] rsDataObjPut always honors the VERIFY_CHKSUM_KW keyword. — committed to korydraughn/irods by korydraughn 3 years ago
- [#5400] rsDataObjPut always honors the VERIFY_CHKSUM_KW keyword. — committed to irods/irods by korydraughn 3 years ago
- [#5400] rsDataObjPut always honors the VERIFY_CHKSUM_KW keyword. — committed to irods/irods by korydraughn 3 years ago
This is quite possible. Let’s confirm/deny that and leave this issue to address this use case.
We’ll open another issue for the “
ichksum
only should be calculating against the size of bytes in the catalog, not the whole file on disk” discussion.