OpenSearch: Mapping `char_filter` for `'# => _hashsign_'` not working from version 2.4.0

Describe the bug We have the analyzer defined below:

from elasticsearch_dsl import analyzer, char_filter


STD_WITH_SPECIAL_CHARS: Final = analyzer(
    'std_with_special_chars',
    type='custom',
    tokenizer='standard',
    filter=['lowercase'],
    char_filter=char_filter(
        'social_char_filter',
        type='mapping',
        mappings=[
            '# => _hashsign_',
            '@ => _atsign_'
        ],
    ),
)

On version 2.3.0 and below, this works as expected:

STD_WITH_SPECIAL_CHARS.simulate('test #test @test', using=es_connection)

returns:

{
    'tokens': [
        {
            'token': 'test', 
            'start_offset': 0, 
            'end_offset': 4, 
            'type': '<ALPHANUM>', 
            'position': 0
        }, 
        {
            'token': '_hashsign_test', 
            'start_offset': 5, 
            'end_offset': 10, 
            'type': '<ALPHANUM>', 
            'position': 1
        }, 
        {
            'token': '_atsign_test', 
            'start_offset': 11, 
            'end_offset': 16, 
            'type': '<ALPHANUM>', 
            'position': 2
        }
    ]
}

On version 2.4.0 and above, it returns:

{
    'tokens': [
        {
            'end_offset': 4,
            'position': 0,
            'start_offset': 0,
            'token': 'test',
            'type': '<ALPHANUM>'
        },
        {
            'end_offset': 10,
            'position': 1,
            'start_offset': 6,
            'token': 'test',
            'type': '<ALPHANUM>'
        },
        {
            'end_offset': 16,
            'position': 2,
            'start_offset': 11,
            'token': '_atsign_test',
            'type': '<ALPHANUM>'
        }
    ]
}

For some reason the # is not converted to _hashsign_. Our @ character mapping still works so there must be something specific to # happening here?

To Reproduce Steps to reproduce the behavior:

  1. Create an analyzer with a mapping char_filter, mappings '# => _hashsign_'
  2. analyzer text with a hashtag
  3. See output is not converted

Expected behavior Analyzed tokens with # to have them converted to _hashsign_

Plugins N/A

Host/Environment (please complete the following information):

  • OS: N/A
  • Version 2.4.0

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 16 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I’ll see if I can get some time this week to work on a PR.

Definitely seems like a regression to me. The old code looked to have some handling for comments but this still worked. Also, not really sure why there would need to be a comment in the definition of the mappings here?