osxphotos: Exception in Export

Describe the bug

I’m receiving this error when using export:

Exception ignored in: <function ExportDBInMemory.__del__ at 0x105633040>
Traceback (most recent call last):
  File "/Users/eecue/mambaforge/lib/python3.9/site-packages/osxphotos/export_db.py", line 901, in __del__
    self.close()
  File "/Users/eecue/mambaforge/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in wrapped_f
    return self(f, *args, **kw)
  File "/Users/eecue/mambaforge/lib/python3.9/site-packages/tenacity/__init__.py", line 406, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/eecue/mambaforge/lib/python3.9/site-packages/tenacity/__init__.py", line 363, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x48a68b790 state=finished raised ProgrammingError>]

Note that the exports actually seem to work. Here’s the code:

photo = photosdb.photos(uuid=[uuid])[0]
pe = osxphotos.PhotoExporter(photo)
e = osxphotos.ExportOptions()
e.convert_to_jpeg = True
e.download_missing = True
e.overwrite = True
# e.use_photokit = True
pe.export("/tmp", f"photo_{hash}.jpeg", options=e)

Desktop (please complete the following information):

  • OS: Ventura 13.2.1
  • osxphotos version 0.57.3

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 48 (34 by maintainers)

Commits related to this issue

Most upvoted comments

@oPromessa There were bugs in 0.59.2, notably a memory leak, that would have caused it to slow down. There was also some unnecessary calls to the AlbumInfo code that was pretty slow so I’ve refactored those out. Give 0.59.3 a try. In my testing with my 125GB library, exporting to external Thunderbolt 4 drive, it’s slightly faster than 0.58.2.

Issues found on 0.59.1 – Will make specific reports for each one!

  1. One of the issues is that the export database jumped from 500MB to 3.9GB in size!
  • Understood. related with json size.
  1. Although my log file presents the profile information (normally presented at the end of the run), the process is still running and the exit code is not caught(which I normally catch and report). Will probably be forced to kill it.
  • Maybe with a smaller json and no --profile the run will finish normally. Will test.
  1. Some files have errors while exporting (possibly due to some exiftool error), re-attempts and success… only to generate a delete file at the end (with --update option).(…)
  • Added more info on #872 on this comment. I’m trying to debug it… do not yet master --breakpoint option 😉
  1. (small issue)The overall counter is not being incremented.Counter is always 0: “counter at 0/850000”
  • Fixed!
  1. While troubleshooting a specific pic with export error, used query --uuid with csv output correclty. But json options fails. Using python 3.11 for this one in particular. The main export I still use 3.9 python.
  • Should be fixed ( Projects folder on Photos would be the off/case !)

Thanks for the run time – that’s helpful! I’ve got a shallow json method implemented that is sufficient for export purposes and will add that in which will bring export db size back down.

Actually I do have a project in the test suite but wasn’t testing the asdict()/json() output for this. Added a test (currently failing), and now I can work a fix.

Another option for the json issue is to offer both a “shallow” and a “deep” json method. The deep method (0.59.1) is needed for multithreading / multiprocessing. The shallow method (0.58.1) should be sufficient for export.

@oPromessa Give v0.59.1 a try. I believe it should restore performance to what you’d experienced under 0.57.0. Will still need to work on the multi-threaded code / lock files but putting that on the back burner again as I’ve got several other projects demanding attention.

@oPromessa the lock file code is contained in two utility functions, lock_filename() and unlock_filename(). I can easily add an option for --no-lock or just disable these two for now (make them no-op but retain the code for when it’s eventually needed). Will take a look at what’s the best option. Interestingly, in my profiling, these take basically no time. But–I’m running on fast APFS formatted SSD, not a NAS. I really wish there was a way I could “mock” the NAS (slow network, flaky copies, etc.) for testing to catch these types of things.

      682    0.001    0.000    0.056    0.000 utils.py:425(lock_filename)
     1048    0.002    0.000    0.055    0.000 utils.py:443(unlock_filename)

While profiling, I found some other inefficiencies where I can pull code out of a loop to further speed-up export. Will implement these.

CLI Performance

Thanks so much. Will run tonight the CLI export with profile on versions 0.58.1 so as to have a baseline then

API Workers

On the API testing I’ve done one can see an improvement of about 36% for 251 pics. Of course, as you pointed out the API interface does not have all the other (nice) options, update, delete, exiftool template, etc…

Summary results

Pics Workers Time (s) CPU, LAN, NAS Remarks
251 1 119.4311 iMac i7, LAN 1GB, NAS No exiftool, templates, etc.
251 40 76.1256 iMac i7, LAN 1GB, NAS No exiftool, templates, etc.

Workers: 1

2023-04-02 18:38:40.624183 -- Processing database /Users/Shared/Pictures/iPhoto Shared Library.photoslibrary/database/photos.db
2023-04-02 18:38:55.741986 -- Done processing details from Photos library.
2023-04-02 18:38:56.508798 -- Exporting 251 photos to /Volumes/photo/Multi.Photos using 1 workers
2023-04-02 18:40:55.939137 -- Exported 251 photos to /Volumes/photo/Multi.Photos in 119.4311 seconds

Workers: 40 (default)

2023-04-02 18:42:12.651548 -- Processing database /Users/Shared/Pictures/iPhoto Shared Library.photoslibrary/database/Photos.sqlite
2023-04-02 18:42:27.883363 -- Done processing details from Photos library.
2023-04-02 18:42:28.669775 -- Exporting 251 photos to /Volumes/photo/Multi.Photos using 40 workers
2023-04-02 18:43:44.795103 -- Exported 251 photos to /Volumes/photo/Multi.Photos in 76.1256 seconds

I did some benchmarking. On my M1 Mac Mini (8 cores), the sweet spot appears to be 8 thread workers. The python default is number of processors * 5 but this appears to offer no benefit. This makes sense to me as a single thread per core allows the thread to saturate the core while reducing the number of thread joins. Using more threads doesn’t get more out of the CPU but increases the thread joins (which from profiling, appear to consume much of the time). All this testing was done to a fast external SSD connected directly to the Mac Mini using Thunderbolt 4. I did this to include the file transfer time in the test as doing an export on the same SSD uses copy-on-write and is very fast regardless of how many threads are used.

Screenshot 2023-04-01 at 1 23 59 PM

I believe this is now fixed in v0.59.0. I’ve also added an example script that performs concurrent export. In some quick testing, it does appear that on my M1 mini, there is a performance boost using concurrent export. Note that this only works on Python 3.11+ and only applies to the API; the osxphotos export CLI does not yet support concurrent export.

❯ python examples/concurrent_export.py $TESTDIR --added-after 2023-03-01 --workers 40
Exporting 653 photos to /Volumes/OWC_SDD/rhet/test using 40 workers
Exported 579 photos to /Volumes/OWC_SDD/rhet/test in 7.1581 seconds

❯ python examples/concurrent_export.py $TESTDIR --added-after 2023-03-01 --workers 1
Exporting 653 photos to /Volumes/OWC_SDD/rhet/test using 1 workers
Exported 579 photos to /Volumes/OWC_SDD/rhet/test in 19.0723 seconds

So I’ve gotten a very rough alpha implementation of a thread-safe export and it appears to actually be slower than single threaded. The test case was exporting 615 photos from my library to my internal SSD (M1 Mac Mini, 8 cores) with 8 max workers (concurrent.futures.ThreadPoolExecutor). The concurrent test took about twice as long as the single threaded case. Looking at profile data, it looks like there’s a lot of time spent managing threads:

     5542    9.896    0.002    9.896    0.002 {method 'acquire' of '_thread.lock' objects}
      615    0.001    0.000    9.850    0.016 _base.py:646(__exit__) # <-- this is in concurrent.futures
      615    0.001    0.000    9.849    0.016 thread.py:216(shutdown)
      622    0.001    0.000    9.848    0.016 threading.py:1080(join)
      622    0.001    0.000    9.846    0.016 threading.py:1118(_wait_for_tstate_lock)

To get this fully working I’ll also need to parallelize the exiftool code (which currently runs as singleton process).

I tried ProcessPoolExecutor() but it doesn’t work because the ExportDB object cannot be pickled so I’d have to refactor that too.

Confirmed that I no longer see the memory db warning.