This document formaly introduces and defines version 2 of the Emu Query Language (EQL).
most of the EBNF was adapted from (John 2012)
Operator | Meaning |
---|---|
# | Result modifier (projection) |
, | Parameter list separator |
== | Equality |
!= | Unequality |
=~ | Regular expression match |
!~ | Regular expression not match |
= | Equality |
> | Greater than |
>= | Equal or greater than |
< | Less than |
>= | Equal or less than |
| | Alternatives separator |
& | Conjunction of equal rank |
^ | Dominanceconjunction |
-> | Sequenceoperator |
Bracket | Meaning |
---|---|
’ | Quotes literal string |
( | Function parameter list begin |
) | Function parameter list end |
[ | Sequence or dominance enclosing begin bracket |
] | Sequence or dominance enclosing end bracketr |
Function | Meaning |
---|---|
Start | Start |
Medial | Medial |
End | Final |
Num | Count |
X | Y |
---|---|
EQL | = KONJA | SEQA | DOMA; |
DOMA | =‘[’,(KONJA|DOMA|SEQA),’ˆ’,(KONJA|DOMA|SEQA),’]’; (Ebenen müssen hierarchisch oder autosegmentell assoziert sein) |
SEQA | =‘[’,(KONJA|SEQA|DOMA),’->’,(KONJA|SEQA|DOMA),’]’; (Ebenen müssen linear assoziert sein) |
KONJA | = {‘[’},EA,{’&’,EA},{’]’}; (Ebenen müssen linear assoziert sein) |
EA | = ETIKETTA | FUNKA; |
ETIKETTA | = [‘#’],EBENE,(‘=’ | ‘==’ | ‘!=’ | ‘=~’ | ‘!~’),ETIKETTALTERNATIVEN; |
FUNKA | = POSA | NUMA; |
POSA | = POSFKT, ‘(’,EBENE,‘,’,EBENE,‘)’,‘=’,‘0’| ‘1’; (Ebenen müssen hierarchisch oder autosegmentell assoziiert sein) (zweite Ebene gibt Semantik vor) |
NUMA | = ‘Num’,‘(’,EBENE,‘,’,EBENE,‘)’,VOP,INTPN; (Ebenen müssen hierarchisch oder autosegmentell assoziiert sein) (erste Ebene gibt Semantik vor) |
ETIKETTALTERNATIVEN | = ETIKETT , {‘|’,ETIKETT}; |
ETIKETT | = ETIKETTIERUNG | (“’“,ETIKETTIERUNG,“’“); (EBENE sind Ebenen aus der Etikettierungsstruktur der Datenbank) (ETIKETTIERUNG ist eine freiwählbare Zeichenkette bzw. eine legal labels Klasse aus dem Etikettierungsschema.) |
POSFKT = | ‘Start’|‘Medial’|‘End’; |
VOP = | ‘=’ | ‘==’ | ‘!=’ | ‘>’ | ‘<’ | ‘<=’ | ‘>=’; |
INTPN = | ‘0’ | INTP; |
INTP = | DIGIT-‘0’,{DIGIT}; |
DIGIT = | ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’; |
One query can only contain one result modifier ‘#’ (hashtag)
Hier werden die Argumente mit -> (folgt) verbunden.
Semantik: Die a Einheit von Level L kommt vor der b Einheit von Level L Bedingung: Die Argumente muessen von der gleichen, oder parallelen Ebenen sein
Eine Reihenfolge von m I Segmenten auf der Phonetic-Ebene [#Phonetic = m -> #Phonetic = I] [Phonetic = m -> Phonetic = I] (Die Segment-Liste hat die Startzeit von m und die Endzeit von I) Wie oben – aber hier wollen wir nur den m Segment haben [#Phonetic = m -> Phonetic = I] (Startzeit von m, Endzeit von m) Wie oben – aber hier wollen wir nur den I Segment [Phonetic = m -> # Phonetic = I]
2.2. Mehrfache Reihenfolgen Hier handelt es sich um eine Reihenfolge von Argumenten, a1, a2, a3, a4, … Das muss so geklammert werden: [[[[a1-> a2] ->a3]->a4] ->a5] Alle Reihenfolgen von [m I n][[Phonetic = m -> Phonetic = I] -> Phonetic = n] alle Reihenfolgen von ‘john could lend’ (aus der Text-Ebene) [ [Text = john -> Text = could ] -> Text = lend] ‘the’, dann beliebige zwei Woerter, dann ‘managed’ (alle aus der Text-Ebene) [ [ [Text = the -> Text != x ] -> Text !=x ] -> Text = managed]
TODO
emuR package arranges bundles (utterances) in sessions. Converted legacy EMU databases have one default session ‘0000’ containing all bundles. Therefore the ‘utts’ column of all segment lists is prefixed by the session name and starts with ‘0000:’ for example ‘0000:msajc003’.
> emu.query('andosl','*','[Text=spring & #Accent=S]')
moving data from Tcl to R
Read 1 records
segment list from database: andosl
query was: [Text=spring & #Accent=S]
labels start end utts
1 spring 2288.959 2704.466 msajc094
> emu.query('andosl','*','[#Text=spring & #Accent=S]')
moving data from Tcl to R
Read 1 records
segment list from database: andosl
query was: [#Text=spring & #Accent=S]
labels start end utts
1 spring 2288.959 2704.466 msajc094
The hash character has no effect to both queries
> query(andosl,"[Text=spring & #Accent=S]",resultType='emusegs')
segment list from database: andosl
query was: [Text=spring & #Accent=S]
labels start end utts
1 S 2288.975 2704.475 0000:msajc094
Returns the same segment (same item), but with the label of the hashed attribute name
> query(andosl,"[#Text=spring & #Accent=S]",resultType='emusegs')
Error in query.database.eql.KONJA(database, qTrim) :
Only one hashtag allowed in linear query term: #Text=spring & #Accent=S
EQL2 throws an error here, because to fulfill the request it would be necessary to return each item doubled to get both Text and Accent labels
> emu.query('ae','*',"[Text!=beautiful|futile ^ Phoneme=u:]")
moving data from Tcl to R
Read 4 records
segment list from database: ae
query was: [Text!=beautiful|futile ^ Phoneme=u:]
labels start end utts
1 new 475.802 666.743 msajc057
2 futile 571.999 1091.000 msajc010
3 to 1091.000 1222.389 msajc010
4 beautiful 2033.739 2604.489 msajc003
I assume that the OR operator ‘|’ is ignored in connection with the not equal operator ‘!=’
> query(ae,"[Text!=beautiful|futile ^ Phoneme=u:]",resultType='emusegs')
segment list from database: ae
query was: [Text!=beautiful|futile ^ Phoneme=u:]
labels start end utts
1 to 1091.025 1222.375 0000:msajc010
2 new 475.825 666.725 0000:msajc057
> emu.query("andosl","*","[[Syllable=W -> Syllable=W] ^ [Phoneme=n-> Phoneme=S]]")
*** stack smashing detected ***: /usr/lib/R/bin/exec/R terminated
*** caught segfault ***
address 0x726f66ff, cause 'memory not mapped'
> emu.query("andosl","*","[[Syllable=W -> Syllable=W] ^ [Phoneme=n->Phoneme=S]]")
Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class = "tclObj") :
[tcl] Error in constructor: error compiling query: Expected closing bracket.
The errors are caused by missing blanks around the ‘->’ operator. This query string works well: ‘[[Syllable=W -> Syllable=W] ^ [Phoneme=n -> Phoneme=S]]’
emuR accepts these queries without blanks.
emuR accepts also the double equal character string ‘==’ as equal operator.
emuR EQL2 has the capability to query labels by matching regular expressions using the ‘=~’ (match) and ‘!~’ (not match) operators. #### Example
> query(andosl,"Text=~.*tz.*",resultType='emusegs')
segment list from database: andosl
query was: Text=~.*tz.*
labels start end utts
1 blitzed 1586.875 2112.475 0000:msadb081
2 blitzed 1540.225 2022.475 0000:msajc081
John, Tina. 2012. “Emu Speech Database System.” PhD thesis, LMU-Munich.