beets: UTF-8 Character Encoding Breaks Somewhere on Windows

Problem

I’ve tried a few Unicode character replacements for the regular colon “:”. None of them work. I have an example that creates a folder called:

日比野則彦 - 2004 - Metal Gear Solid 3– Snake Eater– The First Bite

notice the – instead of (em dash, not en dash or hyphen). But I can use that character with a normal rename on a file in Windows 10. I’m running Win10 x64, and beets in the normal Windows CMD program.

Running this command in verbose (-vv) mode:

beet -vv import "e:\use\test\01. Snake Eater (Abstracted Camouflage).opus"
plugin paths:
Sending event: pluginload
library database: e:\songs\beetslibrary.bib
library directory: e:\s2
Sending event: library_opened
Sending event: import_begin
Sending event: import_task_created
Sending event: import_task_start
Looking up: e:\use\test\01. Snake Eater (Abstracted Camouflage).opus
Tagging Hibino Norihiko - Metal Gear Solid 3: Snake Eater: The First Bite
Searching for discovered album ID: 48746b35-d2bf-4199-9e44-69c2869b03bc
Candidate: 日比野則彦 - Metal Gear Solid 3: Snake Eater: The First Bite
Success. Distance: 0.31
Album ID match recommendation is Recommendation.low
Search terms: Hibino Norihiko - Metal Gear Solid 3: Snake Eater: The First Bite
Album might be VA: False
Sending event: albuminfo_received
Candidate: 日比野則彦 - Metal Gear Solid 3: Snake Eater: The First Bite
Duplicate.
Sending event: albuminfo_received
Candidate: Harry Gregson‐Williams & 日比野則彦 - Metal Gear Solid 3: Snake Eater
Success. Distance: 0.90
Sending event: albuminfo_received
Candidate: Harry Gregson‐Williams & 日比野則彦 - Metal Gear Solid 3: Snake Eater
Success. Distance: 0.89
Sending event: albuminfo_received
Candidate: Dan Bull feat. VI Seconds - Metal Gear Solid
Success. Distance: 0.83
Sending event: albuminfo_received
Candidate: Cynthia Harrell - Snake Eater -abstracted camouflage-
Success. Distance: 0.70
Evaluating 5 candidates.

e:\use\test\01. Snake Eater (Abstracted Camouflage).opus (1 items)
Sending event: before_choose_candidate
Correcting tags from:
    Hibino Norihiko - Metal Gear Solid 3: Snake Eater: The First Bite
To:
    日比野則彦 - Metal Gear Solid 3: Snake Eater: The First Bite
URL:
    https://musicbrainz.org/release/48746b35-d2bf-4199-9e44-69c2869b03bc
(Similarity: 68.8%) (missing tracks, artist) (CD, 2004, JP, コナミ株式会社)
Missing tracks (5/6 - 83.3%):
 ! Infiltration Into the Jungle (#2) (4:13)
 ! Escape (#3) (2:00)
 ! Chivalry (#4) (3:20)
 ! The Treading Behemoth (#5) (2:07)
 ! Snake Eater (Japanese version) (#6) (5:00)
Apply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort?
Enter one of A, M, S, U, T, G, E, I, B: a
Sending event: import_task_choice
Sending event: import_task_apply
0 of 1 items replaced
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: database_change
Sending event: before_item_moved
Sending event: item_moved
Sending event: database_change
Sending event: database_change
Sending event: write
Sending event: after_write
Sending event: database_change
Sending event: import_task_files
Sending event: album_imported
Sending event: import
Sending event: cli_exit

Here’s a link to the music files that trigger the bug (if relevant): 01. Snake Eater (Abstracted Camouflage).zip (dummy file)

Setup

  • OS: Win10 x64
  • beets version 1.4.3
  • Python version 3.6.0
  • Turning off plugins made problem go away (yes/no): no

My configuration (output of beet config) is:

directory: e:\s2\
library: e:\songs\beetslibrary.bib
import:
    copy: no
    move: yes
    write: yes
    timid: no
per_disc_numbering: yes


replace:
replace:
    '[\\/]': _
    '^\.': _
    '[\x00-\x1f]': _
    '[<>"\?\*\|]': _
    '\.$': _
    '\s+$': ''
    '^\s+': ''
    '\:': –

paths:
    default: $albumartist - $original_year - $album/$track. $title 

(A hasty copy paste led to my config having two “replace” lines but it shouldn’t affect the bug).

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Even when setting my console (cmd and powershell both) to UTF-8, the filename that is written ends up garbled. edit: This happens even when setting the config file to UTF-8 with Byte Order Mark (BOM).

edit:

>>> open('e:\\config.yaml', 'r').encoding
'cp1252'
>>> open('e:\\config.yaml', 'r', encoding="utf-8").encoding
'utf-8'

This would probably be a quicker workaround than anywhere else. Where does beets open the config file?