fclones: macOS - fclones does not preserve extended attributes

when deduplicating (ref-link) fclones does not take care of extended attributes - it simply uses ones from the first file.

Example:

I created three identical files,

# find ~ > hello.txt

# cp hello.txt hello1.txt
# cp hello1.txt hello2.txt

# ls
hello.txt  hello1.txt  hello2.txt

# shasum *
750c7735502f7c6072d8b4c9239697302d393480  hello.txt
750c7735502f7c6072d8b4c9239697302d393480  hello1.txt
750c7735502f7c6072d8b4c9239697302d393480  hello2.txt

added sample extended attributes to two of them:

# xattr -w test "" hello.txt
# xattr -w test2 "" hello1.txt

# xattr -l *
hello.txt: test:
hello1.txt: test2:

and now deduplicated using fclones:

# fclones group . | fclones dedupe
[2022-08-31 14:31:39.160] fclones:  info: Started grouping
[2022-08-31 14:31:39.173] fclones:  info: Scanned 7 file entries
[2022-08-31 14:31:39.174] fclones:  info: Found 3 (148.3 MB) files matching selection criteria
[2022-08-31 14:31:39.174] fclones:  info: Found 2 (98.9 MB) candidates after grouping by size
[2022-08-31 14:31:39.174] fclones:  info: Found 2 (98.9 MB) candidates after grouping by paths
[2022-08-31 14:31:39.177] fclones:  info: Found 2 (98.9 MB) candidates after grouping by prefix
[2022-08-31 14:31:39.178] fclones:  info: Found 2 (98.9 MB) candidates after grouping by suffix
[2022-08-31 14:31:39.195] fclones:  info: Found 2 (98.9 MB) redundant files
[2022-08-31 14:31:39.202] fclones:  info: Started deduplicating
[2022-08-31 14:31:39.377] fclones:  info: Processed 2 files and reclaimed up to 98.9 MB space

as expected fclones found 2 duplicates and deduped them, however messing up external attr:

# xattr -l *
hello.txt: test:
hello1.txt: test:
hello2.txt: test:

I guess you just cp -c sourceFile destinationFile creating clone of source file which unfortunately is not enough. Metadata should not be changed.

External attributes should be preserved the same way like names - which are just basic attributes:).

The right way I would do it manually would be:

mv destinationFile tempFile
cp -ca sourceFile destinationFile
gcp --preserve=all --attributes-only tempFile destinationFile
rm tempFile

it unfortunately requires GNU cp (on macOS can be installed via brew brew install coreutils). I am sure there are maybe smarter ways to achieve the same:) What matters here is result.

I also tried another deduplicator (jdupe) using the same dataset. Result:

# jdupes --dedupe -r .
Scanning: 4 files, 1 items (in 1 specified)
[SRC] ./hello.txt
-##-> ./hello1.txt
-##-> ./hello2.txt

# xattr -l *
hello.txt: test:
hello1.txt: test2:

So this is definitely doable.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (26 by maintainers)

Commits related to this issue

Most upvoted comments

I can use specific C APIs from Apple SDK in Rust.