This vignette is considered deprecated! It’s content has been moved to the the EMU-SDMS manual (+ expanded and updated). Specifially see the The emuDB Format chapter.
This document describes the emuDB format that is used by the emuR
package and shows how to create and interact with this format. The emuDB format is meant as a simple, general purpose way of storing speech databases that may contain complex, rich, hierarchical annotations as well as derived and complementary speech data. These different components will be described throughout this document and examples given as to how to generate and manipulate them. This document is meant as a practical guide / reference document to the emuDB format. The examples given below can be executed in any R session with the emuR
package installed and may of course be adapted to your personal needs. First let us have a look at the general structure of an emuDB. Whenever we use a name like _XXX
in the following we imply a varying prefix name (or base name) before the _
while the XXX
is an obligatory string, e.g. _bndl
implies file names such as rec1_bndl
, rec2_bndl
of the type bundle folder
. The extension .json
denotes a text file in JSON format.
The database structure is basically a set of files and folders that adhere to a certain structure and naming convention (see Figure below).
emuDB file & folder structure
The database root directory must contain a single _DBconfig.json
file which, as the name implies, contains the configuration options of the database such as its level definitions, how these levels are linked in the database hierarchy and what is displayed in the EMU-webApp. The database root folder also contains arbitrarily named session folders ending with _ses
, e.g. 0000_ses
. These session folders can be used to logically group the recordings of a database. All files belonging to a single recording are contained in a so called bundle folder described below. A possible grouping into sessions could for instance be that all recordings of a speaker AAA
are contained in one session called AAA_ses
.
Each session folder can contain any number of _bndl
folders, e.g. rec1_bndl rec2_bndl ... rec9_bndl
. All the files belonging to a recording, i.e. all files describing the same time line of events, are stored in the corresponding bundle folder. This must include the actual recording (.wav
) and can contain optional derived / complimentary signal files in the SSFF format (???) such as formants (.fms
) or the fundamental frequency (.f0
), both of which can be generated using the wrassp
package. Each bundle folder must also contain the annotation file (_annot.json
) of that bundle. This file contains the actual annotations including the hierarchical linking information. JSON schema files are provided to ensure the syntactic integrity of the database (see the dist/schemaFiles/
directory of the EMU-webApp GitHub repository). The following restrictions apply:
_DBconfig.json
file. It is obligatory that the prefix of the _DBconfig.json
file matches the value of the field name
within the _DBconfig.json
file (which specifies the official name of the emuDB), and that the root folder of the emuDB has the same prefix name as well. It is recommended however not obligatory that the root folder has the suffix _emuDB
._ses
. Their prefixes can be chosen by the database maintainer._bndl
. Their prefixes can be chosen by the database maintainer; prefixes must be unique within a session but not across sessions._bndl
folder prefix, e.g. the signal file in bundle rec1_bndl
must have the name rec1.wav
to be recognized by the emuDB system.basename_bndl
must have the same prefix as its bundle: basename_annot.json
.Files that do not follow this naming convention will simply be ignored by the database interaction functions of the emuR
package (for instance additional multiple audio channels stored in individual audio files).
Optional files that may also be included in the database root directory are the _bundleList.json
files. These files specify which annotator is assigned to which bundles. These files are used by EMU-websocket-protocol servers that implement user management to assign the correct bundles to the annotators. The serve()
function implemented in the emuR
package DOES NOT support user management which means that these files will simply be ignored by this function.
For more detailed information about the file formats used see the File descriptions section of this document. Let us now have a look at creating a new emuDB.
There are multiple ways of creating emuDBs. The two main strategies are to either convert existing databases or file collections to the new format or to create new databases from scratch. Refer to the emuR\_intro
vignette (command: vignette("emuR_intro", package="emuR")
) on how existing databases can be converted; in the following the latter of both strategies is described.
To create an emuDB from scratch simply call:
# load the package
library(emuR)
create_emuDB(name = 'fromScratchDB',
targetDir = tempdir(),
verbose = F)
This will create an empty emuDB that does not have any ssffTrackDefinitions or levelsDefinitions as well as not containing any sessions or bundles. Adding these to the emuDB is described in the next section.
The initial step in manipulating or generally interacting with a database is to load the according database into your current R session.
# generate path to the empty fromScratchDB created above
dbPath = file.path(tempdir(), 'fromScratchDB_emuDB')
# load database
dbHandle = load_emuDB(dbPath, verbose = F)
print(dbHandle)
## [1] "<emuDBhandle> (dbName = 'fromScratchDB', basePath = '/private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/fromScratchDB_emuDB')"
This will load the database into it’s cached form for quick access to the data. Note that if a large emuDB has never been loaded and no cache has previously been generated, this can take a while to complete. Once a cache is present only altered annotation files have to be updated which reduces load times dramatically. As you can see the load_emuDB()
function returns a database handle. This emuDBhandle is used to reference the loaded database in most database interaction functions of the emuR
package.
Next, let us look at some actual database manipulation functions. The general function prefix naming convention of database manipulation functions for loaded databases are:
add_XXX
add a new instance of XXX
/ set_
set the current instance of XXX
list_XXX
list the current instances of XXX
/ get_
get the current instance of XXX
remove_XXX
removing existing instances of XXX
Unlike other systems the EMU Speech Database Management System requires the user to formally define the structure of the database. An essential structural element of any emuDB are its levels. A level is a more general term for what is often referred to as a “tier”. It is more general in the sense that people usually expect tiers to contain time information. Levels can either contain time information if they are of the type “EVENT” or of the type “SEGMENT” but are timeless if they are of the type “ITEM”. Generally speaking, every unit of annotation is referred to as an “ITEM” in the context of an emuDB and “EVENT”s and “SEGMENT”s are special instances of these containing time information in the form of sample values.
The EMU system generally distinguishes between the actual representations of a structural element which are contained within the database and their formal definitions. An example of an actual representation would be a level contained in an annotation file that contains “SEGMENT”s that annotate a recording. The corresponding formal definition would be this level’s level definition, which specifies and validates the level’s existence within the database.
NOTE: if instances are mentioned in the course of this document, the actual representations are meant. Formal definitions are referred to as such.
As the already loaded ‘fromScratchDB’ does not contain any formal definitions of structural elements including levels we will begin by adding such a formal definition in the form of a new level definition:
To check if this action was successful we can simply list the current level definitions by calling:
## name type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT 1 Phonetic;
alternatively a summary of the emuDB also gives us this as well as additional information:
## Name: fromScratchDB
## UUID: 7e4e80ec-092e-4785-97ce-3226cfc8a361
## Directory: /private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/fromScratchDB_emuDB
## Session count: 0
## Bundle count: 0
## Annotation item count: 0
## Label count: 0
## Link count: 0
##
## Database configuration:
##
## SSFF track definitions:
## NULL
##
## Level definitions:
## name type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT 1 Phonetic;
##
## Link definitions:
## NULL
Let us add a further level definition that will contain the orthographic word transcriptions for the words uttered in our recordings. This level will be of the type “ITEM” meaning that elements contained within the level are sequentially ordered but do not contain any time information:
# add
add_levelDefinition(dbHandle,
name = 'Word',
type = 'ITEM')
# list
list_levelDefinitions(dbHandle)
## name type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT 1 Phonetic;
## 2 Word ITEM 1 Word;
Finally we could remove one of the level definitions with the function remove_levelDefinition()
, which we will once again not invoke here as we still wish to use these level definitions.
NOTE: If there are actual instances of annotation items (“SEGMENT”s, “EVENT”s or “ITEM”s) present in the emuDB it will not be possible to remove the level definition. These items would have to be removed first.
Each level definition can contain multiple attributes, the most common and currently only supported attribute being a label ("type": "STRING"
). Thus it is possible to have multiple parallel labels in a single level. This means that a single annotation item instance can contain multiple labels while sharing other properties such as the start and duration information. This can be quite useful when modeling certain types of data. A illustrative example of this would be the ‘Phonetic’ level created above. It is often the case that databases contain both the phonetic transcript using IPA UTF-8 symbols as well as using the Speech Assessment Methods Phonetic Alphabet (SAMPA). This is a perfect choice for using multiple attribute definitions within a single level:
## name level type hasLabelGroups hasLegalLabels
## 1 Phonetic Phonetic STRING FALSE FALSE
Even though we have not added a single attribute definition to the ‘Phonetic’ level definition, it already contains the obligatory attribute definition that has the same name as it’s level. This indicates that it is the primary attribute of that level. To follow the above example let us now add a further attribute definition to the level definition that will contain the SAMPA versions of our annotations.
## NULL
## name level type hasLabelGroups hasLegalLabels
## 1 Phonetic Phonetic STRING FALSE FALSE
## 2 SAMPA Phonetic STRING FALSE FALSE
As you might have guessed from the columns hasLabelGroups
and hasLegalLabels
in the return value of the list_attributeDefinitions()
function, attribute definitions can also contain two further fields. The legalLabels
field contains an array of strings that specifies the labels that are legal (i.e. allowed / valid) for the given attribute. As the EMU-webApp won’t allow the annotator to enter any labels that are not specified in this array, this is a simple way of assuring that a level has a consistent label set.
For example, let’s say we wish to annotate only the following vowels in the Phonetic
level: /i/, /iː/, /u/, /uː/, /ə/ which in SAMPA correspond to /i/, /i:/, /u/, /u:/, /@/. Let us now add these as legal labels to the Phonetic
as well as the SAMPA
attribute of the Phonetic
level.
ipaVowels = c('i', 'iː', 'u', 'uː', 'ə')
sampaVowels = c('i', 'i:', 'u', 'u:', '@')
# set legalLabels values for phonetic attributeDefinition
set_legalLabels(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'Phonetic',
legalLabels = ipaVowels)
# get
get_legalLabels(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'Phonetic')
## [1] "i" "iː" "u" "uː" "ə"
# set legalLabels values for phonetic attributeDefinition
set_legalLabels(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'SAMPA',
legalLabels = sampaVowels)
# get
get_legalLabels(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'SAMPA')
## [1] "i" "i:" "u" "u:" "@"
NOTE: The legalLabels as well as the labelGroups field described below are optional. If not set the attribute definition can contain any label and no labelGroups may be referenced in the query string.
A further optional field is the labelGroups
field. It contains specifications of groups of labels that can be referenced by a name given to the group while querying the emuDB. Say we wish to reference all the long vowels in our Phonetic
attribute definition with the name ‘long’ and all our short vowels with the name ‘short’. Let us now update our emuDB to contain these label groups:
# add long vowels
add_attrDefLabelGroup(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'Phonetic',
labelGroupName = 'long',
labelGroupValues = c('iː', 'uː'))
# add short vowels
add_attrDefLabelGroup(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'Phonetic',
labelGroupName = 'short',
labelGroupValues = c('i', 'u', 'ə'))
# list
list_attrDefLabelGroups(dbHandle,
levelName = 'Phonetic',
attributeDefinitionName = 'Phonetic')
## name values
## 1 long iː; uː
## 2 short i; u; ə
NOTE: It is also possible to define label groups for the entire DB. For more information on this see the R documentation for the add/list/remove_labelGroups
functions.
INFO: For users who are familiar with or transitioning from the legacy EMU system the label groups correspond to the unfavorably named ‘Legal Labels’ entries of the GTemplate Editor (i.e. legal entries in the .tpl file) of the legacy system. In the new system the legalLabel entries specify the legal / allowed labels values of an attribute definitions while the label groups specify groups of labels that can be referenced by the names given to the groups while performing queries.
An essential and very powerful conceptual and structural element of any emuDB is its hierarchy. Using hierarchical structures is highly recommended but not a must. Hierarchical annotations allow for complex rich data modeling and are often cleaner representations of the annotations at hand. The permitted hierarchical relationships in an emuDB are expressed through link definitions between level definitions. There are three types of valid links:
As the names imply these links specify the permitted relationships between instances of annotation items of one level and those of another. The structure of the hierarchy of the ‘ae’ demo database that comes with the emuR
package can be seen below. This hierarchy demonstrates a reasonably complex hierarchy including how hierarchical annotation structures can be used to accurately model data. “ITEM” levels that do not contain time information inherit their time information from the levels they are linked to. This de-referencing of time information is provided by the querying mechanism that the emuR
package provides.
Hierarchical structure of ‘ae’ emuDB
Let us now add a link definition to link the ‘Phonetic’ level to the ‘Word’ level created above:
# add
add_linkDefinition(dbHandle,
type = 'ONE_TO_MANY',
superlevelName = 'Word',
sublevelName = 'Phonetic')
The simple hierarchical structure of our ‘fromScratchDB’ now looks like this:
Hierarchical structure of ‘fromScratchDB’ emuDB
Up until now we have defined the structure of our database. An essential part that is missing is of course the recordings that we wish to analyze. To import audio files, referred to as media files in the context of an emuDB, into the database one simply has to do the following:
# get path to folder containing wav files
# (in this case wav files that come with the wrassp package)
fp = system.file('extdata', package='wrassp')
# import media files into emuDB session called filesFromWrassp
import_mediaFiles(dbHandle,
dir = fp,
targetSessionName = 'filesFromWrassp',
verbose = F)
# list session
list_sessions(dbHandle)
## name
## 1 filesFromWrassp
## session name
## 1 filesFromWrassp lbo001
## 2 filesFromWrassp lbo002
## 3 filesFromWrassp lbo003
## 4 filesFromWrassp lbo004
## 5 filesFromWrassp lbo005
## 6 filesFromWrassp lbo006
## 7 filesFromWrassp lbo007
## 8 filesFromWrassp lbo008
## 9 filesFromWrassp lbo009
We have now added a new session called ‘filesFromWrassp’ to the ‘fromScratchDB’ containing a new bundle for each of our imported media files. These bundles adhere to the structure we have specified above. Note however that the levels in the annotation files (_annot.json
) that were created during the import are still empty. These will have to be created manually at a later stage using the EMU-webApp. To list the files that are part of the emuDB call:
## # A tibble: 6 x 4
## session bundle file absolute_file_path
## <chr> <chr> <chr> <chr>
## 1 filesFromW… lbo001 lbo001.wav /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 2 filesFromW… lbo001 lbo001_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 3 filesFromW… lbo002 lbo002.wav /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 4 filesFromW… lbo002 lbo002_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 5 filesFromW… lbo003 lbo003.wav /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 6 filesFromW… lbo003 lbo003_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
The emuR
package also provides a mechanism for adding files to preexisting bundle folders as this can be quite tedious to perform manually due to the nested folder structure of an emuDB. Let us create a set of files that contain the zero-crossing-rate values of the wav files we added above and for the sake of demonstration save them to a different location to then re-add them to the database.
# list all wav files in new emuDB
wavFilePaths = list.files(dbPath,
pattern = "wav$",
full.names = T,
recursive = T)
# create folder to store zcr values in
outDirPath = file.path(tempdir(), 'zcranaVals')
dir.create(outDirPath)
# calculate zero-crossing-rate files
# using zcrana function of wrassp package
library(wrassp)
zcrana(listOfFiles = wavFilePaths,
outputDirectory = outDirPath)
# add zcr files to emuDB
add_files(dbHandle,
dir = outDirPath,
fileExtension = 'zcr',
targetSessionName = 'filesFromWrassp')
## # A tibble: 6 x 4
## session bundle file absolute_file_path
## <chr> <chr> <chr> <chr>
## 1 filesFromW… lbo001 lbo001.wav /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 2 filesFromW… lbo001 lbo001.zcr /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 3 filesFromW… lbo001 lbo001_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 4 filesFromW… lbo002 lbo002.wav /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 5 filesFromW… lbo002 lbo002.zcr /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 6 filesFromW… lbo002 lbo002_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
A further important structural element of any emuDB are the so called ssffTracks (often simply referred to as tracks). These ssffTracks reference data that is stored in the Simple Signal File Format (SSFF) in the according _bndl
folders. The two main types of data are:
Let us now add an ssffTrackDefinition to our database and calculate the SSFF files at the same time:
# add track and calculate SSFF files by specifying
# one of the signal processing functions the wrassp package provides
# (in this case the forest (formant estimation) function)
add_ssffTrackDefinition(dbHandle,
name = 'formantValues',
columnName = 'fm',
fileExtension = 'fms',
onTheFlyFunctionName = 'forest')
# list
list_ssffTrackDefinitions(dbHandle)
# show head of list_files to check if files where added
head(list_files(dbHandle))
INFO: to see the fileExtension and columnName defaults produced by the various signal processing functions of the wrassp package see ?wrasspOutputInfos
. For a list of all the available signal processing functions that the wrassp package provides see ?wrassp
.
As you might have noticed the .zcr
files we added in the previous section are listed as being part of the bundles but have no ssffTrackDefinition associated with them. Let’s fix that by adding another ssffTrackDefinition to the database:
# add
add_ssffTrackDefinition(dbHandle,
name = 'zeroCrossing',
columnName = 'zcr',
fileExtension = 'zcr')
# list
list_ssffTrackDefinitions(dbHandle)
## name columnName fileExtension
## 1 formantValues fm fms
## 2 zeroCrossing zcr zcr
INFO: as the get_trackdata()
function can perform signal processing functions and calculates all necessary values in real time, it is seldom necessary to define ssffTracks for tracks produced by the wrassp package. For complementary data as well as data that has to be manipulated manually (e.g. manual formant corrections) this is still a feasible and necessary option. Also, if you wish to display SSFF data in the EMU-webApp it is necessary to pre-calculate the ssffTracks as the web application can not perform real-time calculations.
Note also that there are currently two special ssffTrackDefinitions. They are special in the sense that if they have either the name “FORMANTS” or the name “EPG” the EMU-webApp will expect the according SSFF files to be formated in a specific way and will also display them differently to the other tracks. If the track is named “FORMANTS” and this track is assigned to be overlayed on the spectrogram the EMU-webApp will frequency align the formant contours to the spectrogram and will permit these contours to be manually corrected. If the track is called “EPG” and the EMU-webApp is configured to display this track in the twoDimCanvases it will display an EPG plot of the data (see the File descriptions section of this document for more information on twoDimCanvases).
Before we can start manually annotating our speech database we have to configure our ‘fromScratchDB’ to contain information about how the database is to be displayed by the EMU-webApp. The EMU-webApp subdivides different ways to look at an emuDB into so called perspectives. These perspectives, between which you can switch in the web application, contain information on what levels are displayed, which ssffTracks are drawn, and so on. Let us list the current perspectives of our database:
## name signalCanvasesOrder levelCanvasesOrder
## 1 default OSCI; SPEC
As you can see there is already a perspective available called ‘default’. This perspective was automatically added to the emuDB during the import of our mediaFiles. It currently only displays the oscillogram (“OSCI”) followed by the spectrogram (“SPEC”). “OSCI” and “SPEC” can be viewed as predefined tracks that are always available to the EMU-webApp. Using the add/remove_perspective()
functions we could now add and remove as many additional perspectives to the database as we like. For now we will maintain the ‘default’ perspective and add the order in which we would like to display our levels.
# get order array of levels of default perspective
get_levelCanvasesOrder(dbHandle,
perspectiveName = 'default')
## NULL
# set order array of levels of default perspective
set_levelCanvasesOrder(dbHandle,
perspectiveName = 'default',
order = c('Phonetic'))
# get order array of levels of default perspective
get_levelCanvasesOrder(dbHandle,
perspectiveName = 'default')
## [1] "Phonetic"
As you can see we only added the “Phonetic” and not the “Word” level to be displayed in the “default” perspective as only levels of the type “SEGMENT” or “EVENT” are allowed to be displayed. All “ITEM” levels can be viewed by clicking the “showHierarchy” button in the top menu bar of the EMU-webApp and choosing an appropriate path through the hierarchy.
As the final configuration step let us also add the ssffTracks we defined and calculated above to the “default” perspective:
# get order array of signals of default perspective
get_signalCanvasesOrder(dbHandle,
perspectiveName = 'default')
## [1] "OSCI" "SPEC"
# set order array of signals of default perspective
set_signalCanvasesOrder(dbHandle,
perspectiveName = 'default',
order = c("OSCI", "SPEC", "formantValues", "zeroCrossing"))
# get order array of signals of default perspective
get_signalCanvasesOrder(dbHandle,
perspectiveName = 'default')
## [1] "OSCI" "SPEC" "formantValues" "zeroCrossing"
We have now completed the configuration of the ‘fromScratchDB’ emuDB. By calling the function serve(dbName)
we can now start a server in our R session and connect the EMU-webApp to our database to visualize and annotate the emuDB.
INFO: the EMU-webApp is highly configurable and only a small subset of the configuration options are available through the emuR
package. More complex visualization configurations can be achieved by manually editing the _DBconfig.json
file and reloading the database. For a comprehensive list of all the available fields in the _DBconfig.json
and their meanings see the File descriptions section of this document.
Autobuilding is a process that lets the emuDB maintainer semi-automatically build hierarchical structures from preexisting annotations by linking annotational units together. To have some preexisting annotations to play with, let us convert a TextGridCollection and load the newly created emuDB into the current R session.
# create demo data in folder provided by the tempdir() function
create_emuRdemoData(dir = tempdir())
# get the path to a folder containing .wav & .TextGrid files that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "TextGrid_collection")
# convert this TextGridCollection to the emuDB format
convert_TextGridCollection(path2folder, dbName = "myTGcolDB",
targetDir = tempdir(), verbose = F)
# load database
dbHandle = load_emuDB(file.path(tempdir(), "myTGcolDB_emuDB"), verbose = F)
By inspecting the emuDB we can see that it has eleven levelDefinitions but no linkDefinitions. This means that it will not be possible to perform hierarchical queries on this emuDB, as there is no explicit hierarchical information in the database.
## name type nrOfAttrDefs attrDefNames
## 1 Utterance SEGMENT 1 Utterance;
## 2 Intonational SEGMENT 1 Intonational;
## 3 Intermediate SEGMENT 1 Intermediate;
## 4 Word SEGMENT 1 Word;
## 5 Accent SEGMENT 1 Accent;
## 6 Text SEGMENT 1 Text;
## 7 Syllable SEGMENT 1 Syllable;
## 8 Phoneme SEGMENT 1 Phoneme;
## 9 Phonetic SEGMENT 1 Phonetic;
## 10 Tone EVENT 1 Tone;
## 11 Foot SEGMENT 1 Foot;
## NULL
As it is a very laborious task to manually link ITEMs together using the EMU-webApp and the hierarchical information is already implicitly contained in the time information of the SEGMENTs and EVENTs of each level (see figure below), the emuR
package provides a function to build these hierarchical structures from this information.
Example annotation structure after convert_TextGridCollection()
For the sake of brevity let’s focus on just three of the eleven levels. We will use the autobuild_linkFromTimes()
function to build the following hierarchical structure:
Hierarchical structure to be produced by autobuild_linkFromTimes()
The convertSuperlevel
argument of the autobuild_linkFromTimes()
function, that we will set to TRUE
in the example below, tells the function to convert the super level to a level of type ITEM
. As this is a very risky procedure as all the time information will be deleted from the “Syllable” level, the function automatically creates a backup of the level called “Syllable-autobuildBackup”. Before we can invoke the autobuild function we must however first add a linkDefinition
to our emuDB that specifies the type of relationship that our level have:
# add linkDefinition
add_linkDefinition(dbHandle, type = "ONE_TO_MANY",
superlevelName = "Syllable",
sublevelName = "Phoneme")
# list
list_linkDefinitions(dbHandle)
## type superlevelName sublevelName
## 1 ONE_TO_MANY Syllable Phoneme
# invoke autobuild function
autobuild_linkFromTimes(dbHandle,
superlevelName = "Syllable",
sublevelName = "Phoneme",
convertSuperlevel = TRUE,
verbose = FALSE)
# list
list_levelDefinitions(dbHandle)
## name type nrOfAttrDefs attrDefNames
## 1 Utterance SEGMENT 1 Utterance;
## 2 Intonational SEGMENT 1 Intonational;
## 3 Intermediate SEGMENT 1 Intermediate;
## 4 Word SEGMENT 1 Word;
## 5 Accent SEGMENT 1 Accent;
## 6 Text SEGMENT 1 Text;
## 7 Syllable ITEM 1 Syllable;
## 8 Phoneme SEGMENT 1 Phoneme;
## 9 Phonetic SEGMENT 1 Phonetic;
## 10 Tone EVENT 1 Tone;
## 11 Foot SEGMENT 1 Foot;
## 12 Syllable-autobuildBackup SEGMENT 1 Syllable-autobuildBackup;
As we can see we have now converted the original “Syllable” level to the type ITEM
and the backup level was added to the emuDB. Let us now perform the same procedure for the “Phoneme” and “Phonetic” levels:
# add linkDefinition
add_linkDefinition(dbHandle, type = "MANY_TO_MANY",
superlevelName = "Phoneme",
sublevelName = "Phonetic")
# list
list_linkDefinitions(dbHandle)
## type superlevelName sublevelName
## 1 ONE_TO_MANY Syllable Phoneme
## 2 MANY_TO_MANY Phoneme Phonetic
# invoke autobuild function
autobuild_linkFromTimes(dbHandle,
superlevelName = "Phoneme",
sublevelName = "Phonetic",
convertSuperlevel = TRUE,
verbose = FALSE)
# list
list_levelDefinitions(dbHandle)
## name type nrOfAttrDefs attrDefNames
## 1 Utterance SEGMENT 1 Utterance;
## 2 Intonational SEGMENT 1 Intonational;
## 3 Intermediate SEGMENT 1 Intermediate;
## 4 Word SEGMENT 1 Word;
## 5 Accent SEGMENT 1 Accent;
## 6 Text SEGMENT 1 Text;
## 7 Syllable ITEM 1 Syllable;
## 8 Phoneme ITEM 1 Phoneme;
## 9 Phonetic SEGMENT 1 Phonetic;
## 10 Tone EVENT 1 Tone;
## 11 Foot SEGMENT 1 Foot;
## 12 Syllable-autobuildBackup SEGMENT 1 Syllable-autobuildBackup;
## 13 Phoneme-autobuildBackup SEGMENT 1 Phoneme-autobuildBackup;
This time we chose to add a linkDefinition
of the type MANY_TO_MANY
between the two levels. This is due to the fact that reduction processes can cause multiple phonemes can be produced as a single phone and due to insertion processes a single phoneme can be produced as multiple phones. We have now created the above hierarchical structure that we where aiming for.
The DBconfig
file, as mentioned above, contains the configuration options of the database. People familiar with the legacy EMU system will recognize this as the replacement file for the legacy template (.tpl
) file. By convention variables / strings written entirely in capital letters indicate a constant variable that usually has a special meaning. This is also the case with strings like this found in the DBconfig
("STRING"
,"ITEM"
,"SEGMENT"
, "EVENT"
, "OSCI"
, … ).
The _DBconfig.json
file contains the following fields:
"name"
specifying the name of the database"UUID"
a unique ID given to each database"mediafileExtension"
the main mediafileExtension (currently only uncompressed mono 16-bit .wav
files are supported in every component of the EMU system. This is also the recommended audio format for the EMU-SDMS.)"ssffTrackDefinitions"
an array of definitions defining the SSFF tracks of the database. Each ssffTrackDefinition consists of:"name"
the name of the ssffTrackDefinition"columnName"
the name of the column of the associated SSFF file. For more information on the columns the various function of the wrassp
produce see the track fields of wrasspOutputInfos
object that is part of the wrassp
package. Further, although the SSFF file format is a binary file format it has a plain text header which means that if you open a SSFF file in the text editor of your choice you will be able to see the columns contained within it. Another way of accessing column information about a specific SSFF file is to use the wrassp
function res = read.AsspDataObj('/path/2/SSFF/file')
to read the file from the file system. names(res)
will then give you the names of the columns present in this file. NOTE: In the context of the SSFF file format the term column and in the context of the EMU system the term track / ssffTrack is used. They both refer to the same data."fileExtention"
the file extension of the associated SSFF file (also see ?wrasspOutputInfos
for the default extensions produced by the wrassp
functions)"levelDefinitions"
array of definitions defining the levels of the database. A level is a more general term for what is often referred to as a tier. It is more general in the sense that people quite often expect tiers to contain time information. Levels can however either contain time information if they are of the type "EVENT"
or of the type "SEGMENT"
but may also be timeless if they are of the type "ITEM"
. Each "levelDefinitions"
consists of:"name"
the name of the levelDefinition"type"
specifying the type of the level (either "ITEM"
| "EVENT"
| "SEGMENT"
)"attributeDefinitions"
an array of definitions defining the attributes of the level. Each attributeDefinition consists out of:"name"
the name of the "attributeDefinition"
"type"
specifying the type of the attribute (currently only "STRING"
permitted)"labelGroups"
an (optional) array containing label group definitions. These can be used as a shorthand notation for querying certain groups of labels."name"
name of label group. This will be the value used in a query to refer to this group."values"
array of strings representing the labels"legalLabels"
(optional) array of strings specifying which labels are valid / legal for this attribute definition. The EMU-webApp adheres to this set of values and will not let the annotator enter any values other than the ones specified in this field. This can be used to ensure consistent label sets within levels."anagestConfig"
if specified (optional) this will convert the level into a special type of level for labeling articulatory data. This will also serve as a marker for the EMU-webApp to treat this level differently. This optional field may only be set for levels of the type "EVENT"
."verticalPosSsffTrackName"
name of ssffTrack containing the vertical position data"velocitySsffTrackName"
name of ssffTrack containing the velocity data"autoLinkLevelName"
name of level that will be used to link the created events to"multiplicationFactor"
factor to multiply with (either -1
| 1
)"threshold"
a value between 0 and 1 defining the threshold"gestureOnOffsetLabels"
array containing two strings that specify the on- and offset labels"maxVelocityOnOffsetLabels"
array containing two strings that specify the on- and offset labels"constrictionPlateauBeginEndLabels"
array containing two strings that specify the begin- and end labels"maxConstrictionLabel"
string maximum constriction specifying label"linkDefinitions"
an array of definitions defining the links between levels of the database. The combination of all link definitions specifies the hierarchy of the database. Each linkDefinition consists of:"type"
specifying the type of link (either "ONE_TO_MANY"
| "MANY_TO_MANY"
| "ONE_TO_ONE"
)."superlevelName"
specifies the name of the super-level"sublevelName"
specifies the name of the sub-level"labelGroups"
an (optional) array containing label group definitions. These can be used as a shorthand notation for querying certain groups of labels. Compared to the "labelGroups"
that can be defined within an attributeDefinition the labelGroups defined here are globally defined for the entire database."name"
name of label group"values"
array of strings containing labels"EMUwebAppConfig"
specifies the configuration options intended for the EMU-webApp, i.e. how the database is to be displayed. This field can contain all the configurations options that are specified in the EMU-webApp’s configuration schema (see the dist/schemaFiles/emuwebappConfigSchema.json
file of the EMU-webApp GitHub repository). The "EMUwebAppConfig"
contains the following fields:"main"
main behavior options"autoConnect"
: auto connect to the "serverUrl"
on initial load of the webApp to automatically load a database (mainly used for development)."serverUrl"
: default server URL that is displayed in the connect modal (and used if "autoConnect"
is set to true
). The default: "ws://localhost:17890"
points to the server started by the serve()
function of the emuR
package."serverTimeoutInterval"
: the maximum amount of time the EMU-webApp waits (in milliseconds) for the server to respond."comMode"
: communication mode that the EMU-webApp is in. Currently the only option that is available is "WS"
(websocket)."catchMouseForKeyBinding"
: check if mouse has to be in labeler for key bindings to work"keyMappings"
keyboard shortcut definitions. For the sake of brevity not every key-code is shown (see schema for extensive list)"toggleSideBarLeft"
integer value that represents the key-code that toggles the left side bar (== bundleList side bar)"toggleSideBarRight"
integer value that represents the key-code that toggles the right side bar (== perspective side bar)"spectrogramSettings"
specifies the default settings of the spectrogram. The possible settings are:"windowSizeInSecs"
specifies the window size in seconds"rangeFrom"
specifies the lowest frequency (in Hz) that will be displayed by the spectrogram"rangeTo"
specifies the highest frequency (in Hz) that will be displayed by the spectrogram"dynamicRange"
specifies the dynamic rang for Maximum (in DB)"window"
specifies the window type (either "BARTLETT"
| "BARTLETTHANN"
| "BLACKMAN"
| "COSINE"
| "GAUSS"
| "HAMMING"
| "HANN"
| "LANCZOS"
| "RECTANGULAR"
| "TRIANGULAR"
)"preEmphasisFilterFactor"
specifies the preemphasis factor (in formula: s’(k) = s(k) - preEmphasisFilterFactor * s(k-1) )"transparency"
specifies the transparency of the spectrogram (integer from 0 to 255)"drawHeatMapColors"
(optional) should the spectrogram be drawn using heat-map colors (either true or false)"heatMapColorAnchors"
(optional) specify the heat-map color anchors (array of the form [[255, 0, 0], [0, 255, 0], [0, 0, 255]])"perspectives"
array containing perspective configurations. Each "perspective"
consists of:"name"
name of perspective"signalCanvases"
configuration options for the signalCanvases"order"
array specifying the order in which the ssffTracks are to be displayed. Note that the ssffTrack names “OSCI” and “SPEC” are always available additionally to the ssffTrack defined in the database."assign"
array of configuration options to assign one ssffTrack to another effectively creating a visual overlay of one track over another. Each array element consists of:"signalCanvasName"
name of signal specified in the "order"
array"ssffTrackName"
name of ssffTrack to overlay onto "signalCanvasName"
"minMaxValLims"
array of configuration options to limit the y-axis range that is displayed for a specified SSFF track"ssffTrackName"
: name specifying which ssffTrack should be limited"minVal"
: minimum value which defines the lower y-axis limit"maxVal"
: maximum value which defines the lower y-axis limit"contourLims"
array containing contour limit values that specify an index range that is to be displayed. As a track / column can contain multi-dimensional data (e.g. 4 formant values per time stamp / 256 DFT values per time stamp / …) it is possible to specify an index range that specifies which values should be displayed (e.g. display formant 2 through 4)."ssffTrackName"
name specifying which ssffTrack should be limited"minContourIdx"
minimum contour index to display (starts at index 0)"maxContourIdx"
maximum contour index to display"contourColors"
array to specify colors of individual contours. This overrides the default of automatically calculating distinct colors for each contour"ssffTrackName"
name ssffTrackName for that colors are defined"colors"
array of rgb strings (e.g. ["rgb(238,130,238)", "rgb(127,255,212)"]
) to specify the color of the contour (first value = first contour color and so on)"levelCanvases"
configuration options for the levelCanvases"order"
array specifying order in which the levels are to be displayed. Note that only levels of the type “EVENT” or “SEGMENT” can be displayed as "levelCanvases"
"twoDimCanvases"
configuration options for the 2D canvas"order"
array specifying order in which the levels are to be displayed. Note that currently only a single twoDimDrawingDefinition can be displayed so this array may currently only contain a single element."twoDimDrawingDefinitions"
array containing two dimensional drawing definitions. Each two dimensional drawing definition consist of:"dots"
array containing dot definitions. Each dot definition consist of:"name"
name of dot"xSsffTrack"
ssffTrackName of track that contains the x axis values"xContourNr"
contour number of track that contains the x axis values"ySsffTrack"
ssffTrackName of track that contains the y axis values"yContourNr"
contour number of track that contains the y axis values"color"
rgb color string specifying color given to dot"connectLines"
array specifying which of the dots specified in the "dots"
definition array should be connected by a line"fromDot"
dot from which the line should start"toDot"
dot to which the line should go"type"
rgb string defining the color of the line"staticDots"
array containing static dot definitions"name"
name of static dots"xNameCoordinate"
x coordinate specifying the location where name should be drawn"yNameCoordinate"
y coordinate specifying the location where name should be drawn"xCoordinates"
array of x coordinates (e.g. [300, 300, 900, 900, 300]
)"yCoordinates"
array of y coordinates (e.g. [880, 2540, 2540, 880, 880]
)"connect"
boolean value that specifies if to connect the static dots with lines"color"
rgb string specifying color of static dots"staticContours"
array containing static contour definitions"name"
name of static contour"xSsffTrack"
ssffTrackName of track that contains the x axis values"xContourNr"
contour number of track that contains the x axis values"ySsffTrack"
ssffTrackName of track that contains the y axis values"yContourNr"
contour number of track that contains the y axis values"connect"
boolean value that specifies if to connect the static dots with lines"color"
rgb string specifying color of static contour"labelCanvasConfig"
configuration options for the label canvases"addTimeMode"
mode to add / subtract time to boundaries"addTimeValue"
: amount of samples added / subtracted to boundaries"newSegmentName"
value given to default label if a new SEGMENT is added (default is "" == empty string)"newEventName"
value given to default label if a new EVENT is added (default is "" == empty string)"restrictions"
"playback"
boolean value specifying whether to allow audio playback"correctionTool"
boolean value specifying whether correction tools is available"editItemSize"
boolean value specifying whether to allow changing the size of an ITEM (i.e. move boundaries)"editItemName"
boolean value specifying whether to allow changing the label of an ITEM"deleteItemBoundary"
boolean value specifying whether to allow deletion of boundaries"deleteItem"
boolean value specifying whether to allow the deletion of entire ITEMs"deleteLevel"
boolean value specifying whether to allow the deletion of entire levels"addItem"
boolean value specifying whether to allow the adding of new ITEMs"drawCrossHairs"
boolean value specifying whether to draw the cross hairs on signal canvases"drawSampleNrs"
boolean value specifying whether to draw the samples numbers in the OSCI canvas if zoomed in close enough to see samples (mainly for debugging / development purposes)"drawZeroLine"
boolean value specifying whether to draw zero value line in OSCI canvas"bundleComments"
boolean value specifying whether to allow the annotator to add comments to bundles she / he has annotated. A bundle comment field will show up in the bundle list side bar for each bundle if this is set to true. Note that the server has to support saving these comments which the serve()
function of the emuR
package doesn’t."bundleFinishedEditing"
boolean value specifying whether to allow the annotator to mark when she / he has finished annotating a bundle. A finished editing toggle button will show up in the bundle list side bar for each bundle if this is set to true. Note that the server has to support saving these comments which the
serve()function of the
emuR` package doesn’t."showPerspectivesSidebar"
boolean value specifying whether to show the perspectives side bar"activeButtons"
specifications of which top-/bottom-menu buttons should be active / displayed by the EMU-webApp"addLevelSeg"
boolean value specifying whether to show the add SEGMENT level button in the top menu bar"addLevelEvent"
boolean value specifying whether to show the add EVENT level button in the top menu bar"renameSelLevel"
boolean value specifying whether to allow the user to rename the currently selected level"downloadTextGrid"
boolean value specifying whether to allow the user to download the current annotation as a TextGrid file by displaying a download TextGrid button in the top menu bar"downloadAnnotation"
boolean value specifying whether to allow the user to download the current annotation as a annotJSON file by displaying a download annotJSON button in the top menu bar"specSettings"
boolean value specifying whether to show the spec. settings button in the top menu bar"connect"
boolean value specifying whether to display the connect button in the top menu bar"clear"
boolean value specifying whether to display the clear button in the top menu bar"deleteSingleLevel"
boolean value specifying whether to allow the user to delete a level containing time information"resizeSingleLevel"
boolean value specifying whether to allow the user to resize a level"saveSingleLevel"
boolean value specifying whether to allow the user to download a single level in the ESPS/waves+ format"resizeSignalCanvas"
boolean value specifying whether to allow the user to resize the signalCanvases ("OSCI"
, "SPEC"
, …)"openDemoDB"
boolean value specifying whether to show the open demoDB button"saveBundle"
boolean value specifying whether to show the save button in bundle list side bar for each bundle"openMenu"
boolean value specifying whether open bundle list side bar button (== ☰) is displayed"showHierarchy"
boolean value specifying whether to show the “show hierarchy”" button"editEMUwebAppConfig"
boolean value specifying whether to show “edit EMUwebAppConfig” button"demoDBs"
array of strings specifying which demoDBs to display in the open demo
drop-down menu. Currently available demo databases are ["ae", "ema", "epgdorsal"]
The _annot.json
files contain the the actual annotation information as well as the hierarchical linking information. Legacy EMU users should note that all the information that used to be split into several ESPS/waves+ label files as well as a .hlb
file is now contained in this single file.
The _annot.json
file contains the following fields:
"name"
specify name of annotation file (has to be equal to the bundle folder prefix as well as the _annot.json
prefix)"annotates"
specifies the (relative) media file path that this _annot.json
file annotates"sampleRate"
specifies the sample rate of the annotation (should be the same as the sample rate of the file listed in "annotates"
)"levels"
contains an array of level annotation informations. Each element consists of:"name"
specifying the name of the level"items"
array containing the annotational units (i.e. items) of the level"id"
unique ID of item (only unique within _annot.json
file / bundle not globally for the emuDB)"sampleStart"
contains start sample value of “SEGMENT” item."sampleDur"
contains sample duration value of “SEGMENT” item. Note that the EMU-webApp does not support overlapping “SEGMENT”s as well as “SEGMENT” sequences containing gaps. This infers that each sample is explicitly and unambiguously associated with a single “SEGMENT”. This means that the "sampleStart"
value of a following “SEGMENT” has to be "sampleStart"
+ "sampleDur"
+ 1 of the previous “SEGMENT”."samplePoint"
contains sample point value of “EVENT” items"labels"
array containing labels that belong to this item. Each element consists of:"name"
specifying the "attributeDefinition"
that this label is for"value"
specifying the label value"links"
array containing links between two items. These links have to adhere to the links specified in "linkDefinitions"
of the according emuDB. Each link consists of:"fromID"
ID value of item to link from (i.e. item in super-level)"toID"
ID value of item to link to (i.e. item in sub-level)