parglare: Problems with the use of EMPTY

parglare version: 0.12.0
Python version: Python 3.6.9 |Anaconda custom (64-bit)
Operating System: Mac OS-X Catalina 10.15.4

Description

I’m developing a parser for Korean based on morphological transformations of Korean text by a CNN-based analyzer. The grammar is substantially ambiguous and I’ve been finding that the GLR parser is working well, giving the alternate parsings I was hoping for in the case of these ambiguities.

However, as I fill out the grammar, some uses of EMPTY rules (explicitly or via ‘?’ or ‘*’) are causing complete parsing failures I wasn’t expecting. I can apparently avoid these by recoding the EMPTY use with multiple alternate rules permuting all the optionals, but this results in a very unwieldy grammar and I’m wondering if I can get help understanding the failures with EMPTY.

The GLRParser is being used like this:

    self.grammar = Grammar.from_file(os.path.join(os.path.dirname(__file__), "./korean.pg"))
    self.parser = GLRParser(self.grammar, debug=True, build_tree=True)

What I Did

Below are working and failing versions of a segment of the full grammar. The string I am parsing is as follows (a sequence of morphemes & part-of-speech tags):

 자전거:NNG; 를:JKO; 있:VV; 어요:SEF; .:SF;

The failing version of the grammar yields this error, showing that it got all the way to the period sentence-final morpheme.

*** LEAVING ERROR REPORTING MODE.
	Tokens expected: verbToNounModifyingForm, nominalizingSuffix, clauseConnector, adnominalSuffix
	Tokens found: [<sentenceEnd(.:SF;)>]
	Error:  Error at 1:30:"; 어요:SEF;  **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuf
                fix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>
Error at 1:30:"; 어요:SEF;  **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuffix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>

Here’s a working grammar segment; the // <-- comments point at the rule I’m concerned with, singleNounPhrase. In this version, the optional “determiner” terminal is made optional by using two alternates of the singleNounPhrase rule:

sentence:               interjection* sentence1
                    |   sentenceJoiningAdverb? sentence1;

sentence1:              subordinateClause* clause sentenceEnd;

subordinateClause:      clause clauseConnector punctuation*;

clause:                 phrase* verbPhrase
                    |   phrase* complement? copulaPhrase;

phrase:                 topic
                    |   subject
                    |   object
                    |   adjectivalPhrase
                    |   adverbialPhrase
                    |   nounPhrase;

topic:                  nounPhrase topicMarker;
subject:                nounPhrase subjectMarker;
object:                 nounPhrase objectMarker;
complement:             nounPhrase complementMarker?;

nounPhrase:             singleNounPhrase;

singleNounPhrase:       determiner noun+			// <--  
                    |   noun+;					// <--

noun:                   simpleNoun
                    |   nominalForm
                    |   nominalizedVerb
                    |   verbModifiedToNoun;


nominalizedVerb:        clause nominalizingSuffix;
verbModifiedToNoun:     clause verbToNounModifyingForm;



adjectivalPhrase:       adjective+ nounPhrase;

adjective:              clause adnominalSuffix
                    |   possessive;

possessive:             simpleNoun+ possessiveMarker;
copulaPhrase:           adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase:        nounPhrase adverbialParticle auxiliaryParticle*
                    |   verb adverbialParticle auxiliaryParticle*;

verbPhrase:             verb verbSuffix* predicateEndingSuffix?;
verb:                   simpleVerb;
interjection:           interjectionTerminal punctuation*;

terminals
    sentenceEnd:            /[^:]+:(SF);/;
    interjectionTerminal:   /[^:]+:(IC);/;
    punctuation:            /[^:]+:(SP|SS|SE|SO|SW|SWK);/;
    clauseConnector:        /[^:]+:(EC|CCF|CCMOD|CCNOM);/;
    topicMarker:            /[^:]+:(TOP);/;
    objectMarker:           /[^:]+:(JKO);/;
    subjectMarker:          /[^:]+:(JKS);/;
    complementMarker:       /[^:]+:(JKC);/;
    conjunction:            /[^:]+:(JC|CON);/;
    determiner:             /[^:]+:(MM);/;
    auxiliaryParticle:      /[^:]+:(JX);/;
    possessiveMarker:       /[^:]+:(JKG);/;
    nounModifyingSuffix:    /[^:]+:(XSN|JKV);/;      // # eg, 님, 들, 아/야 (vocative), todo: these should all have particle definitions
    nominalizingSuffix:     /[^:]+:(ETN);/;
    adnominalSuffix:        /[^:]+:(ETM);/;
    verbSuffix:             /[^:]+:(EP|TNS);/;
    predicateEndingSuffix:  /[^:]+:(SEF|EF);/;
    negative:               /[^:]+:(NEG);/;
    verbCombiner:           /고:(EC|CCF);/;
    honorificMarker:        /(으시|시):EP;/;
    verbModifier:           /[^:]+:(VMOD);/;
    verbNominal:            /[^:]+:(VNOM);/;
    adverbialParticle:      /[^:]+:(JKB);/;
    quotationSuffix:        /[^:]+:(QOT);/;
    shortQuotationSuffix:   /[^:]+:(SQOT);/;
    sentenceJoiningAdverb:  /[^:]+:MAJ;/;
    simpleNoun:             /[^:]+:(NNG|NNP|NNB|NR|SL|NP|SN);/;
    adverb:                 /[^:]+:(MAG);/;
    simpleVerb:             /[^:]+:(VV|VVD|VHV);/;
    descriptiveVerb:        /[^:]+:(VA|VCP|VCN|VAD|VHA);/;
    auxiliaryVerbConnector: /[^:]+:(EC);/;
    auxiliaryVerbForm:      /[^:]+:(EC);/;
    copula:                 /(되:VV)|([^:]+:(VCP|VCN));/;
    number:                 /[^:]+:(SN|NR);/;
    counter:                /[^:]+:(NNB|NNG);/;
    nominalForm:            /[^:]+:(NNOM);/;
    verbToNounModifyingForm: /[^:]+:(NMOD);/;
    nominalVerbForm:        /[^:]+:(VNOM);/;

Here’s the failing version, using ‘?’ for the optional determiner. All other rules are identical (the terminals are the same as above, but not repeated here):

sentence:               interjection* sentence1
                    |   sentenceJoiningAdverb? sentence1;

sentence1:              subordinateClause* clause sentenceEnd;

subordinateClause:      clause clauseConnector punctuation*;

clause:                 phrase* verbPhrase
                    |   phrase* complement? copulaPhrase;

phrase:                 topic
                    |   subject
                    |   object
                    |   adjectivalPhrase
                    |   adverbialPhrase
                    |   nounPhrase;

topic:                  nounPhrase topicMarker;
subject:                nounPhrase subjectMarker;
object:                 nounPhrase objectMarker;
complement:             nounPhrase complementMarker?;

nounPhrase:             singleNounPhrase;

singleNounPhrase:       determiner? noun+;   // <----
noun:                   simpleNoun
                    |   nominalForm
                    |   nominalizedVerb
                    |   verbModifiedToNoun;


nominalizedVerb:        clause nominalizingSuffix;
verbModifiedToNoun:     clause verbToNounModifyingForm;



adjectivalPhrase:       adjective+ nounPhrase;       
adjective:              clause adnominalSuffix
                    |   possessive;

possessive:             simpleNoun+ possessiveMarker;
copulaPhrase:           adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase:        nounPhrase adverbialParticle auxiliaryParticle*
                    |   verb adverbialParticle auxiliaryParticle*;

verbPhrase:             verb verbSuffix* predicateEndingSuffix?;
verb:                   simpleVerb;
interjection:           interjectionTerminal punctuation*;

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 18 (10 by maintainers)

Commits related to this issue

Add regression test for issue #112 — committed to igordejanovic/parglare by igordejanovic 4 years ago
Extend regression tests for issue #112 — committed to igordejanovic/parglare by igordejanovic 4 years ago

Most upvoted comments

I’ve just finished the rework. It is on master branch. It should now correctly handle all context-free grammars. You can see the regression test for your issue

Thanks for the contribution. I’m closing this issue. Feel free to open if you notice anything wrong.

igordejanovic on Jun 14, 2020