parglare: Problems with the use of EMPTY

  • parglare version: 0.12.0
  • Python version: Python 3.6.9 |Anaconda custom (64-bit)
  • Operating System: Mac OS-X Catalina 10.15.4

Description

I’m developing a parser for Korean based on morphological transformations of Korean text by a CNN-based analyzer. The grammar is substantially ambiguous and I’ve been finding that the GLR parser is working well, giving the alternate parsings I was hoping for in the case of these ambiguities.

However, as I fill out the grammar, some uses of EMPTY rules (explicitly or via ‘?’ or ‘*’) are causing complete parsing failures I wasn’t expecting. I can apparently avoid these by recoding the EMPTY use with multiple alternate rules permuting all the optionals, but this results in a very unwieldy grammar and I’m wondering if I can get help understanding the failures with EMPTY.

The GLRParser is being used like this:

    self.grammar = Grammar.from_file(os.path.join(os.path.dirname(__file__), "./korean.pg"))
    self.parser = GLRParser(self.grammar, debug=True, build_tree=True)

What I Did

Below are working and failing versions of a segment of the full grammar. The string I am parsing is as follows (a sequence of morphemes & part-of-speech tags):

 자전거:NNG; 를:JKO; 있:VV; 어요:SEF; .:SF;

The failing version of the grammar yields this error, showing that it got all the way to the period sentence-final morpheme.

*** LEAVING ERROR REPORTING MODE.
	Tokens expected: verbToNounModifyingForm, nominalizingSuffix, clauseConnector, adnominalSuffix
	Tokens found: [<sentenceEnd(.:SF;)>]
	Error:  Error at 1:30:"; 어요:SEF;  **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuf
                fix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>
Error at 1:30:"; 어요:SEF;  **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuffix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>

Here’s a working grammar segment; the // <-- comments point at the rule I’m concerned with, singleNounPhrase. In this version, the optional “determiner” terminal is made optional by using two alternates of the singleNounPhrase rule:

sentence:               interjection* sentence1
                    |   sentenceJoiningAdverb? sentence1;

sentence1:              subordinateClause* clause sentenceEnd;

subordinateClause:      clause clauseConnector punctuation*;

clause:                 phrase* verbPhrase
                    |   phrase* complement? copulaPhrase;

phrase:                 topic
                    |   subject
                    |   object
                    |   adjectivalPhrase
                    |   adverbialPhrase
                    |   nounPhrase;

topic:                  nounPhrase topicMarker;
subject:                nounPhrase subjectMarker;
object:                 nounPhrase objectMarker;
complement:             nounPhrase complementMarker?;

nounPhrase:             singleNounPhrase;

singleNounPhrase:       determiner noun+			// <--  
                    |   noun+;					// <--

noun:                   simpleNoun
                    |   nominalForm
                    |   nominalizedVerb
                    |   verbModifiedToNoun;


nominalizedVerb:        clause nominalizingSuffix;
verbModifiedToNoun:     clause verbToNounModifyingForm;



adjectivalPhrase:       adjective+ nounPhrase;

adjective:              clause adnominalSuffix
                    |   possessive;

possessive:             simpleNoun+ possessiveMarker;
copulaPhrase:           adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase:        nounPhrase adverbialParticle auxiliaryParticle*
                    |   verb adverbialParticle auxiliaryParticle*;

verbPhrase:             verb verbSuffix* predicateEndingSuffix?;
verb:                   simpleVerb;
interjection:           interjectionTerminal punctuation*;

terminals
    sentenceEnd:            /[^:]+:(SF);/;
    interjectionTerminal:   /[^:]+:(IC);/;
    punctuation:            /[^:]+:(SP|SS|SE|SO|SW|SWK);/;
    clauseConnector:        /[^:]+:(EC|CCF|CCMOD|CCNOM);/;
    topicMarker:            /[^:]+:(TOP);/;
    objectMarker:           /[^:]+:(JKO);/;
    subjectMarker:          /[^:]+:(JKS);/;
    complementMarker:       /[^:]+:(JKC);/;
    conjunction:            /[^:]+:(JC|CON);/;
    determiner:             /[^:]+:(MM);/;
    auxiliaryParticle:      /[^:]+:(JX);/;
    possessiveMarker:       /[^:]+:(JKG);/;
    nounModifyingSuffix:    /[^:]+:(XSN|JKV);/;      // # eg, 님, 들, 아/야 (vocative), todo: these should all have particle definitions
    nominalizingSuffix:     /[^:]+:(ETN);/;
    adnominalSuffix:        /[^:]+:(ETM);/;
    verbSuffix:             /[^:]+:(EP|TNS);/;
    predicateEndingSuffix:  /[^:]+:(SEF|EF);/;
    negative:               /[^:]+:(NEG);/;
    verbCombiner:           /고:(EC|CCF);/;
    honorificMarker:        /(으시|시):EP;/;
    verbModifier:           /[^:]+:(VMOD);/;
    verbNominal:            /[^:]+:(VNOM);/;
    adverbialParticle:      /[^:]+:(JKB);/;
    quotationSuffix:        /[^:]+:(QOT);/;
    shortQuotationSuffix:   /[^:]+:(SQOT);/;
    sentenceJoiningAdverb:  /[^:]+:MAJ;/;
    simpleNoun:             /[^:]+:(NNG|NNP|NNB|NR|SL|NP|SN);/;
    adverb:                 /[^:]+:(MAG);/;
    simpleVerb:             /[^:]+:(VV|VVD|VHV);/;
    descriptiveVerb:        /[^:]+:(VA|VCP|VCN|VAD|VHA);/;
    auxiliaryVerbConnector: /[^:]+:(EC);/;
    auxiliaryVerbForm:      /[^:]+:(EC);/;
    copula:                 /(되:VV)|([^:]+:(VCP|VCN));/;
    number:                 /[^:]+:(SN|NR);/;
    counter:                /[^:]+:(NNB|NNG);/;
    nominalForm:            /[^:]+:(NNOM);/;
    verbToNounModifyingForm: /[^:]+:(NMOD);/;
    nominalVerbForm:        /[^:]+:(VNOM);/;


Here’s the failing version, using ‘?’ for the optional determiner. All other rules are identical (the terminals are the same as above, but not repeated here):

sentence:               interjection* sentence1
                    |   sentenceJoiningAdverb? sentence1;

sentence1:              subordinateClause* clause sentenceEnd;

subordinateClause:      clause clauseConnector punctuation*;

clause:                 phrase* verbPhrase
                    |   phrase* complement? copulaPhrase;

phrase:                 topic
                    |   subject
                    |   object
                    |   adjectivalPhrase
                    |   adverbialPhrase
                    |   nounPhrase;

topic:                  nounPhrase topicMarker;
subject:                nounPhrase subjectMarker;
object:                 nounPhrase objectMarker;
complement:             nounPhrase complementMarker?;

nounPhrase:             singleNounPhrase;

singleNounPhrase:       determiner? noun+;   // <----
noun:                   simpleNoun
                    |   nominalForm
                    |   nominalizedVerb
                    |   verbModifiedToNoun;


nominalizedVerb:        clause nominalizingSuffix;
verbModifiedToNoun:     clause verbToNounModifyingForm;



adjectivalPhrase:       adjective+ nounPhrase;       
adjective:              clause adnominalSuffix
                    |   possessive;

possessive:             simpleNoun+ possessiveMarker;
copulaPhrase:           adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase:        nounPhrase adverbialParticle auxiliaryParticle*
                    |   verb adverbialParticle auxiliaryParticle*;

verbPhrase:             verb verbSuffix* predicateEndingSuffix?;
verb:                   simpleVerb;
interjection:           interjectionTerminal punctuation*;

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve just finished the rework. It is on master branch. It should now correctly handle all context-free grammars. You can see the regression test for your issue

Thanks for the contribution. I’m closing this issue. Feel free to open if you notice anything wrong.