parglare: Problems with the use of EMPTY
- parglare version: 0.12.0
- Python version: Python 3.6.9 |Anaconda custom (64-bit)
- Operating System: Mac OS-X Catalina 10.15.4
Description
I’m developing a parser for Korean based on morphological transformations of Korean text by a CNN-based analyzer. The grammar is substantially ambiguous and I’ve been finding that the GLR parser is working well, giving the alternate parsings I was hoping for in the case of these ambiguities.
However, as I fill out the grammar, some uses of EMPTY rules (explicitly or via ‘?’ or ‘*’) are causing complete parsing failures I wasn’t expecting. I can apparently avoid these by recoding the EMPTY use with multiple alternate rules permuting all the optionals, but this results in a very unwieldy grammar and I’m wondering if I can get help understanding the failures with EMPTY.
The GLRParser is being used like this:
self.grammar = Grammar.from_file(os.path.join(os.path.dirname(__file__), "./korean.pg"))
self.parser = GLRParser(self.grammar, debug=True, build_tree=True)
What I Did
Below are working and failing versions of a segment of the full grammar. The string I am parsing is as follows (a sequence of morphemes & part-of-speech tags):
자전거:NNG; 를:JKO; 있:VV; 어요:SEF; .:SF;
The failing version of the grammar yields this error, showing that it got all the way to the period sentence-final morpheme.
*** LEAVING ERROR REPORTING MODE.
Tokens expected: verbToNounModifyingForm, nominalizingSuffix, clauseConnector, adnominalSuffix
Tokens found: [<sentenceEnd(.:SF;)>]
Error: Error at 1:30:"; 어요:SEF; **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuf
fix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>
Error at 1:30:"; 어요:SEF; **> .:SF; " => Expected: adnominalSuffix or clauseConnector or nominalizingSuffix or verbToNounModifyingForm but found <sentenceEnd(.:SF;)>
Here’s a working grammar segment; the // <-- comments point at the rule I’m concerned with, singleNounPhrase. In this version, the optional “determiner” terminal is made optional by using two alternates of the singleNounPhrase rule:
sentence: interjection* sentence1
| sentenceJoiningAdverb? sentence1;
sentence1: subordinateClause* clause sentenceEnd;
subordinateClause: clause clauseConnector punctuation*;
clause: phrase* verbPhrase
| phrase* complement? copulaPhrase;
phrase: topic
| subject
| object
| adjectivalPhrase
| adverbialPhrase
| nounPhrase;
topic: nounPhrase topicMarker;
subject: nounPhrase subjectMarker;
object: nounPhrase objectMarker;
complement: nounPhrase complementMarker?;
nounPhrase: singleNounPhrase;
singleNounPhrase: determiner noun+ // <--
| noun+; // <--
noun: simpleNoun
| nominalForm
| nominalizedVerb
| verbModifiedToNoun;
nominalizedVerb: clause nominalizingSuffix;
verbModifiedToNoun: clause verbToNounModifyingForm;
adjectivalPhrase: adjective+ nounPhrase;
adjective: clause adnominalSuffix
| possessive;
possessive: simpleNoun+ possessiveMarker;
copulaPhrase: adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase: nounPhrase adverbialParticle auxiliaryParticle*
| verb adverbialParticle auxiliaryParticle*;
verbPhrase: verb verbSuffix* predicateEndingSuffix?;
verb: simpleVerb;
interjection: interjectionTerminal punctuation*;
terminals
sentenceEnd: /[^:]+:(SF);/;
interjectionTerminal: /[^:]+:(IC);/;
punctuation: /[^:]+:(SP|SS|SE|SO|SW|SWK);/;
clauseConnector: /[^:]+:(EC|CCF|CCMOD|CCNOM);/;
topicMarker: /[^:]+:(TOP);/;
objectMarker: /[^:]+:(JKO);/;
subjectMarker: /[^:]+:(JKS);/;
complementMarker: /[^:]+:(JKC);/;
conjunction: /[^:]+:(JC|CON);/;
determiner: /[^:]+:(MM);/;
auxiliaryParticle: /[^:]+:(JX);/;
possessiveMarker: /[^:]+:(JKG);/;
nounModifyingSuffix: /[^:]+:(XSN|JKV);/; // # eg, 님, 들, 아/야 (vocative), todo: these should all have particle definitions
nominalizingSuffix: /[^:]+:(ETN);/;
adnominalSuffix: /[^:]+:(ETM);/;
verbSuffix: /[^:]+:(EP|TNS);/;
predicateEndingSuffix: /[^:]+:(SEF|EF);/;
negative: /[^:]+:(NEG);/;
verbCombiner: /고:(EC|CCF);/;
honorificMarker: /(으시|시):EP;/;
verbModifier: /[^:]+:(VMOD);/;
verbNominal: /[^:]+:(VNOM);/;
adverbialParticle: /[^:]+:(JKB);/;
quotationSuffix: /[^:]+:(QOT);/;
shortQuotationSuffix: /[^:]+:(SQOT);/;
sentenceJoiningAdverb: /[^:]+:MAJ;/;
simpleNoun: /[^:]+:(NNG|NNP|NNB|NR|SL|NP|SN);/;
adverb: /[^:]+:(MAG);/;
simpleVerb: /[^:]+:(VV|VVD|VHV);/;
descriptiveVerb: /[^:]+:(VA|VCP|VCN|VAD|VHA);/;
auxiliaryVerbConnector: /[^:]+:(EC);/;
auxiliaryVerbForm: /[^:]+:(EC);/;
copula: /(되:VV)|([^:]+:(VCP|VCN));/;
number: /[^:]+:(SN|NR);/;
counter: /[^:]+:(NNB|NNG);/;
nominalForm: /[^:]+:(NNOM);/;
verbToNounModifyingForm: /[^:]+:(NMOD);/;
nominalVerbForm: /[^:]+:(VNOM);/;
Here’s the failing version, using ‘?’ for the optional determiner. All other rules are identical (the terminals are the same as above, but not repeated here):
sentence: interjection* sentence1
| sentenceJoiningAdverb? sentence1;
sentence1: subordinateClause* clause sentenceEnd;
subordinateClause: clause clauseConnector punctuation*;
clause: phrase* verbPhrase
| phrase* complement? copulaPhrase;
phrase: topic
| subject
| object
| adjectivalPhrase
| adverbialPhrase
| nounPhrase;
topic: nounPhrase topicMarker;
subject: nounPhrase subjectMarker;
object: nounPhrase objectMarker;
complement: nounPhrase complementMarker?;
nounPhrase: singleNounPhrase;
singleNounPhrase: determiner? noun+; // <----
noun: simpleNoun
| nominalForm
| nominalizedVerb
| verbModifiedToNoun;
nominalizedVerb: clause nominalizingSuffix;
verbModifiedToNoun: clause verbToNounModifyingForm;
adjectivalPhrase: adjective+ nounPhrase;
adjective: clause adnominalSuffix
| possessive;
possessive: simpleNoun+ possessiveMarker;
copulaPhrase: adverb* copula verbSuffix* predicateEndingSuffix?;
adverbialPhrase: nounPhrase adverbialParticle auxiliaryParticle*
| verb adverbialParticle auxiliaryParticle*;
verbPhrase: verb verbSuffix* predicateEndingSuffix?;
verb: simpleVerb;
interjection: interjectionTerminal punctuation*;
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (10 by maintainers)
Commits related to this issue
- Add regression test for issue #112 — committed to igordejanovic/parglare by igordejanovic 4 years ago
- Extend regression tests for issue #112 — committed to igordejanovic/parglare by igordejanovic 4 years ago
I’ve just finished the rework. It is on master branch. It should now correctly handle all context-free grammars. You can see the regression test for your issue
Thanks for the contribution. I’m closing this issue. Feel free to open if you notice anything wrong.