docs: Validation Error: 'cop' not expected to have children inappropriate for Old Irish

I’ve gotten the following error when trying to validate an upcoming Old Irish treebank:

[L3 Syntax leaf-aux-cop] 'cop' not expected to have children (4:nda:cop --> 3:no:compound)

I know the copula is restricted in many European languages, particularly in modern ones, however, the Old Irish copula is very complex. It inflects for person, number, tense, voice, and more. It has specific “conjunct” forms, used only in close compounds with preceding words.

One of these enclitic copula forms is causing the validation error in the sentence amal nondafrecṅdirccsa “for that I am present”. Here a relative construction is created by preceding the conjunct copula form, da “I am”, with the semantically empty verbal particle no. The purpose of this is that it allows the nasal n to be infixed between no and da, which gives the copula relative force nonda “that I am”.

This semantically empty particle, no, can also be compounded with verbs in much the same way as meaningful verbal particles can, and so the dependency relation used to attach any such particle to a verb, compound:prt is also used to attach it to a copula. But copulas are not expected to have children. Can this be changed for Old Irish?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 34 (34 by maintainers)

Most upvoted comments

Thanks for tagging me @nschneid . I’m not sure if I’d be much help here because as @AdeDoyle points out, modern Irish is very different to Old Irish. The copula in modern Irish gives us a headache in UD - but this seems to be a whole new can of worms 😉 I’d imagine Dorus Fransen would also be able to contribute too.

Re the discussion of subject/ predicate in the example of is mise rí na hÉireann “I am king of Ireland”. The confusion around this merely comes from the English phrasing of such a sentence. In fact, a more accurate translation is really “The king of Ireland is me” (not you, or him). Mise is the emphatic form that is explicitly telling you the new information. I wrote about this in the context of UD in my thesis (page 64) https://doras.dcu.ie/21014/1/Teresa_PhDThesis_final.pdf

So the copular construction analyses are usually COP PRED SUBJ

I also reference in that section of the thesis that there is an argument that the copula in Irish is really just a linking particle between a subject and predicate. I didn’t follow that analysis, but it’s worth reflecting on…

But the main discussion here is about no and (1) what it should be attached to and (2) what the relation is.

For (2) I would probably disagree with the compound analysis, if you say that no is semantically empty. We use compound:prt in the Modern Irish treebank for particle verbs (give up, lay out, etc.).

Other preverb particles (like ní) are attached as advmod. If it’s not adverbial, what about mark:prt?

I do see your argument that the copula contains the subject am rí “I_am a king”, at rí “you_are a king”, is rí “he/it_is a king”. As an Irish speaker I get that 😃 Maybe it’s similar to the pronominal prepositions in the sense of being marked morphologically.

My observation and suggestion to get around this issue of copulas not having dependents is ( based on limited knowledge of Old Irish and the discussion above): I’d argue that you could attach nda to the root as nsubj where you’re focusing on the nominal feature of that word instead of the copular feature. There is no labelled subject in the analysis of the sentence as it stands, which is strange in itself. Syntactic analysis without a labelled copula is fine - but without a subject seems suspicious. The morph features of nda could then capture the copular aspect (so that you don’t lose that information). And then attaching no and sa to nda wouldn’t be such an issue. UPOS = PRON?

It is quite common in various languages that the verb inflects for person and number, but I do not see why it should mean that the verb should also be seen as the subject. If such a language has a verbal copula, it is natural that the copula also inflects for person and number to cross-reference the subject.

To return to the original topic of this thread: In amal nondafrecṅdirccsa “for that I am present”, the particle non is not separated from da by a space. It seems odd to first separate it as a self-standing syntactic word, only to attach it back to da via the compound relation. To me, the need to use a relation like compound is a strong indicator that nonda should stay as one syntactic word. There might be indicators of the opposite though. The example does not show it but if it is possible to write no as a separate orthographic word (taking into account that orthography was not standardized in the times of Old Irish), then we may either want to make no a syntactic word, as I understand it is done now, and have the attachment issue again, or define an exceptional word with space “no da” for Old Irish.

fixed, flat, and goeswith should be understood as headless relations. There is a technical head in the data format (the first word) but linguistically there is no asymmetry asserted between head and modifier.

If you think the copula is linguistically the head, then since UD understands copulas as functional support for other predicates, presumably that predicate should be annotated as the head of no as well.

An analogy might be made to the English contraction didn’t: even though n’t seems to modify the auxiliary, since auxiliaries are not allowed to be heads we say in UD that both are dependents of the main predicate.

(There are plenty of linguistic arguments for function words being heads in certain languages, but if you want that you should use a different framework, like SUD.)

So if the subject is 3rd person plural, the copula will have different forms depending on whether noun(s)-subject(s) are present?

In fact, it is the 3rd person plural form of the copula in any case. It’s the use of the 3rd singular that’s alarming with overt subjects. It’s used with 1st and 2nd person sg. and pl. as well as with 3rd sg.

Stifter describes this better than I do, “All independent personal pronouns except for the 3rd pl. can be construed with the 3rd sg. of the copula: is mé ‘it is I,’ is tú ‘it is thou’, is é ‘it is he’, is sí ‘it is she’, is ed ‘it is it’, is sní ‘it is we’, is sib ‘it is you.’ Only the 3rd pl. always takes the 3rd pl. form of the copula: it é ‘it is they’ …” (Stifter, David [2006]. Sengoidelc. Syracuse University Press, p. 171.).

All copula forms other than 3rd sg. and pl., therefore, inflect for subject only when no overt subject is present. 3rd plural is already inflected correctly. Forgive the Simon & Garfunkel vibe of the following examples:

No overt subject

Sg.

  1. am inis “I am an island”
  2. at inis “thou art an island”
  3. is inis “he/she/it is an island”

Pl.

  1. ammi insi “we are islands”
  2. adi insi “you are islands”
  3. it insi “they are islands”

BUT

Overt subject

Sg.

  1. is mé “it is I”
  2. is tú “it is thou”
  3. is é/sí/ed “it is he/she/it”

Pl.

  1. is sní “it is we”
  2. is sib “it is you”
  3. it é “it is they” (lit. “they are they”)

If the 3rd plural formation with no overt subject followed the pattern of using is, this could cause confusion with the 3rd singular masc. Both would be is é, as the 3rd sg. masc. and 3rd pl. pronouns are the same, é. This, perhaps, is the reason this formation resisted the use of is in this one position, at least, until a discrete 3rd plural personal pronoun, iad, emerged.

Regardless, it is clear that this formation is interpreted by Stifter (as well as in other learning and grammar material) as the 3rd sg. and pl. forms of the copula being used to express the subject(s), with the following pronouns being the predicates “it is me”, “it is you”, “it is they” etc. As this is analogous to what happens with the copula in modern Irish treebanks, however, it seemed logical to me to treat this use of the copula as it is in modern Irish treebanks, at least, until the subject of words being dependent on the copula is raised. The same cannot be done where there is no overt subject.

If I want to say (perhaps as an answer to “what are Ireland and Britain?”) “They are islands”, will the copula change its form to signal that now it includes the subject because the nouns are not present? Or will it still be it insi?

Specifically as a response to an interrogative, I personally suspect an emphatic particle would be likely to be used as well, it insi-som “they are islands”, however, in general constructions something like it insi would be perfectly acceptable. Compare, for example, with Wb. 1c7 .i. it huissi ɫ. itcointfi “i.e. they are worthy or they are proper”.

Yes. Tests like this one must be based on the relation alone because of the possible promotion in case of ellipsis. (There are other tests though, that check the compatibility of some relations with some UPOS tags.) … Teresa writes about morph features, not about the UPOS tag. Having AUX as nsubj would be quite strange, although the current version of the validator probably won’t flag it; but I thought you wanted to re-tag it as PRON.

I see she has suggested UPOS = PRON, though that would be very hard to square with the grammar, and diachronic development of the language. There is no pronoun there in Old Irish, it seems there never was, and these copula forms never develop into a pronoun in later forms of Irish. If anything, I’d be inclined to change it to VERB, but I think if I can get away with doing so, leaving it as AUX would be the right choice. Particularly as it seems it has always been an inflecting auxiliary verb, even in the prehistory of the language. For example, Stifter (p. 120) reconstructs the Proto-Celtic forms:

Sg.

  1. *emmi “I am”
  2. *esi “thou art”
  3. *esti “he/she/it is”

Pl.

  1. *emmosi “we are”
  2. *etesi “you are”
  3. *(s)inti “they are”

It may have been the case, in the prehistory of the language, that forms such as these would have been used with independent personal subject pronouns, and the inflection marked agreement. We can only conjecture without attestation. By the Old Irish period, however, this was not the case. The copula was the primary expression of the subject, and emphatic particles, etc, could inflect to emphasise this subject without any need or expectation that there should be another subject pronoun.

As a compromise between UD and the grammar as it is understood in the books, I like the interpretation that the copula can be promoted in the same way as if pro-dropping were occurring, using nsubj instead of cop, as @tlynn747 suggested, but maintaining the POS tag which identifies it as as also being an auxiliary verb. This allows the more intuitive linguistic understanding of the Old Irish copula to be realised, while also highlighting that it contains the subject, which is arguably the more important component.

If there is a noun acting as the subject, the form of the copula does not change (citing @AdeDoyle’s example: is dorcha in adaig “the night is dark”).

This is true only in persons and numbers other than the 3rd plural. As you’ll see in another example I gave, it insi ériu ocus albu “Ireland and Britain are islands” (lit. “they_are islands Ireland and Britain”), the third plural takes the form it “they are”.

So if the subject is 3rd person plural, the copula will have different forms depending on whether noun(s)-subject(s) are present? If I want to say (perhaps as an answer to “what are Ireland and Britain?”) “They are islands”, will the copula change its form to signal that now it includes the subject because the nouns are not present? Or will it still be it insi?

I understood that the validator would fail it either if it was POS tagged as AUX or related by cop, but if it’s based on the relation alone

Yes. Tests like this one must be based on the relation alone because of the possible promotion in case of ellipsis. (There are other tests though, that check the compatibility of some relations with some UPOS tags.)

I really like @tlynn747 's suggestion

Teresa writes about morph features, not about the UPOS tag. Having AUX as nsubj would be quite strange, although the current version of the validator probably won’t flag it; but I thought you wanted to re-tag it as PRON. Then it would go very well with the nsubj relation but perhaps the features would be strange (if, for example, you need Tense). But there are no UD-wide restrictions on what features you can use with what UPOS category, so you will be able to explain it in documentation and then register the features you need with PRON.

this […] is it a function word or a content word?

When ‘this’ is the entire NP, there is no real distinction between function/content - it is the only exponent of the subject NP (would be nominative in a nom/acc language). If it’s used as a determiner, then it’s safe to call it a function word (“this <-det- book”), and then the lexical noun is the head.

then I would expect “itself” would be dependent on the subject

Yes, at least in the English corpora, emphatic reflexives are annotated as dependents of the NP they modify, using the label nmod:npmod (examples). If they are not contiguous, it is usually interpreted as a dependent of the verb and labeled obl:npmod.

if “this” and “is” were considered a single word in English

If that were the case, we would need to do one of the following things:

  1. Assume this is a multiword token that needs to be broken up, in the same way the French “au” is subtokenized into “a” + “le”. Then each subtoken gets its own deprel etc.
  2. Assume that this is primarily an argument role filler. In this case, there is no copula, and we have a nominal sentence with no verbal component
  3. Assume this is primarily a verbal word, in which case we are looking at pro-drop, and the only expression of the subject is simply the agreement behavior of the verb’s morphology. In this case, the word is really a regular pro-drop copula, which is either the root if there is no additional predicate, or a cop dependent otherwise.

From everything written above, I understood that the situation is 3., i.e. the word in question is like Latin “est” with no overt subject, or Italian “è” in “è bellissimo”, ‘it’s beautiful!’, which would definitely be root(bellissimo), cop(bellissimo,è).

Should only the verb-headed no use the relation compound:prt?

I’m not sure any of these should be compound:prt. What is the reasoning for calling it a compound? Is it a morphological reason, or is there a special sense/dictionary entry for no+nda? From the examples it looks like it’s a clause level particle that is free to combine with any predication (copula or otherwise), so I would have probably gone with discourse (if it’s like an interjection) or advmod if there are signs that it’s adverbial.

In UD, in deciding what the head should be (see Syntax, we have a couple of principles that might be useful here:

So in the case you describe:

We could complicate it more by adding back in the emphatic particle, sa, without an explicit predicate, nondasa “that I (as opposed to you or anybody else) am” = no (PART) + nda (AUX) + sa (PRON). In instances like these no predicate is explicitly stated, and the subject is represented only by the copula. I don’t see any way that examples such as these can be reconciled with what you’re suggesting.

You can make the copula the head (promotion by head elision) in the absence of a content word.

In instances like these no predicate is explicitly stated, and the subject is represented only by the copula. I don’t see any way that examples such as these can be reconciled with what you’re suggesting.

Sure, this is similar to English “I didn’t!”. In such situations, UD promotes the auxiliary to take the place of the missing lexical predicate. See about promotion here

the particle itself does not have any semantic force like n’t that would allow us to argue that it itself is modifying anything.

I would say that it’s modifying the predication as a whole, much like we attach discourse dependents or other stance adverbials to the sentence root. Note that modifiers of auxiliaries and copulas in general attach to the lexical predicate in UD. For example in the tree for “Ugh, indeed this is still not so simple”, everything depends on “simple”, including modifiers which ostensibly belong to the sentence as a whole or to the auxiliary in a non-UD analysis:

In a simple English phrase like it isn’t, I suspect that both is and n’t would be dependent on the subject, the neuter pronoun, it

No, based on the promotion guidelines above, “is” would be the root.

the copula acts more like the Latin verb est “it is”

Sure, but in Latin too, we would consider “est” to be a copula as soon as the predicate is present:

This is how the UD Latin treebanks are annotated as well. See also this recent paper about harmonizing UD Latin annotation, which strongly adheres to the copula-as-auxiliary premise.

In UD, if there is no predicate separate from the copula, the copula gets “promoted” to predicate. In that case it can have dependents. But when it attaches to the predicate as cop it cannot. Either way it is tagged as AUX.

It may feel strange to make something the dependent of the copula only when there is no main predicate, and the dependent of the main predicate otherwise. But that is how UD works as a compromise across languages. Every language has some constructions that are a bit awkward in UD.