databrary

library(databraryr)
#> Welcome to the databraryr package.

Vignette Info

Databrary is a powerful tool for storing and sharing video data and documentation with other researchers. With the databraryr package, it becomes even more powerful. Rather than interact with Databrary through a web browser, users can write their own code to download participant data or even specific files.

I wrote the Databrary API so I could better understand how the site works under the hood, so that I could streamline my own analysis and data sharing workflows. Let’s get started.

Registering

Access to most of the material on Databrary requires prior [registration]https://nyu.databrary.org/user/register) and authorization from an institution. The authorization process requires formal agreement by an institution. But you’ll create an account ID (email) and secure password when you register. Then, when you log in with your new credentials, you’ll select an existing institution (if yours is on the list), a new institution (if yours isn’t), or an existing authorized investigator (if you are a student, postdoc, or collaborator) to request authorization from.

First steps (while you await authorization)

But even before formal authorization is complete, a user can access the public materials on Databrary. For this vignette, we’ll assume you fall into this category.

First, we need to download and install the devtools package and then the databraryr package from GitHub.

To install devtools, run install.packages('devtools'). To install databraryr, run devtools::install_github("databrary/databraryr")

It’s a good idea to check that your installation worked by loading it into your local workspace.

library(databraryr)

Then, try this command to pull data about Databrary’s founders:

databraryr::list_people()
#>   id sortname prename                       affiliation
#> 1  5   Adolph   Karen               New York University
#> 2  6  Gilmore Rick O. The Pennsylvania State University
#> 3  7  Millman   David               New York University
#>                                url               orcid
#> 1 http://www.psych.nyu.edu/adolph/                <NA>
#> 2     http://gilmore-lab.github.io 0000-0002-7676-3982
#> 3                             <NA>                <NA>

Note that this command returns a data frame (tibble) with columns that include the first name (prename), last name (sortname), affiliation, lab or personal website, and ORCID ID if available.

Databrary assigns a unique number for each person and institution on the system called a ‘party id’. When we run list_people(1:25) we are asking the system to provide us information about all of the people whose party id’s are between 1 and 25. Let’s try it:

databraryr::list_people(people_list = 1:25)
#>    id        sortname       prename               orcid
#> 1   1           Simon         Dylan 0000-0002-2793-1679
#> 2   3         Steiger          Lisa                <NA>
#> 3   4           Byrne        Andrea                <NA>
#> 4   5          Adolph         Karen                <NA>
#> 5   6         Gilmore       Rick O. 0000-0002-7676-3982
#> 6   7         Millman         David                <NA>
#> 7  11   Tamis-LeMonda     Catherine                <NA>
#> 8  13             Roy Lina Wictoren                <NA>
#> 9  14        Franchak          John                <NA>
#> 10 16       Professor    Suzanne Q.                <NA>
#> 11 17 Jimenez-Robbins        Carmen                <NA>
#> 12 18             Coe           Jon                <NA>
#> 13 19             Foo         Vicky                <NA>
#> 14 20          Gordon         Peter                <NA>
#> 15 24            Chan        Gladys                <NA>
#>                              affiliation                              url
#> 1                                   <NA>                             <NA>
#> 2                              Databrary                             <NA>
#> 3                              Databrary                             <NA>
#> 4                    New York University http://www.psych.nyu.edu/adolph/
#> 5      The Pennsylvania State University     http://gilmore-lab.github.io
#> 6                    New York University                             <NA>
#> 7                    New York University                             <NA>
#> 8                                    NYU                             <NA>
#> 9    University of California, Riverside            http://padlab.ucr.edu
#> 10                             Databrary                             <NA>
#> 11                                  <NA>                             <NA>
#> 12                                  <NA>                             <NA>
#> 13                                  <NA>                             <NA>
#> 14 Teachers College, Columbia University                             <NA>
#> 15                                   NYU                             <NA>

It’s a bit slow, but you should see information about people beginning with Dylan Simon, the developer who designed and built most of the Databrary system, and ending with Gladys Chan, a graphic designer who created the Databrary and Datavyu logos and other graphic identity elements.

You can also try seeing what’s new on Databrary. The get_db_stats() command gives you information about the newly authorized people, institutions, and newly uploaded datasets. Try this:

databraryr::get_db_stats("stats")
#>                  date investigators affiliates institutions datasets_total
#> 1 2023-10-23 13:42:23          1730        679          768           1613
#>   datasets_shared n_files    hours       TB
#> 1             954  236998 113517.7 51.74739
databraryr::get_db_stats("people")
#> # A tibble: 5 × 4
#>      id sortname prename affiliation             
#>   <int> <chr>    <chr>   <chr>                   
#> 1 11557 Roth     Guy     Ben-Gurion university   
#> 2 11783 Fouhey   David   New York University     
#> 3 11824 Gervain  Judit   CNRS                    
#> 4  2620 Herrera  Carol   Brigham Young University
#> 5 11844 Reschke  Peter   BYU
databraryr::get_db_stats("institutions")
#> NULL
databraryr::get_db_stats("datasets")
#> # A tibble: 7 × 7
#>      id name                    body  creation owners permission publicsharefull
#>   <int> <chr>                   <chr> <chr>    <list>      <int> <lgl>          
#> 1  1658 test-volume             "tes… 2023-10… <df>            1 FALSE          
#> 2  1657 PLAYProject_PENNS       "Hom… 2023-10… <df>            1 FALSE          
#> 3  1655 Learning to Walk over … "Thi… 2023-10… <df>            1 FALSE          
#> 4  1653 Mitten data sharing fo… "Dat… 2023-10… <df>            1 FALSE          
#> 5  1652 LR-MEL Year 1           "Thi… 2023-10… <df>            1 FALSE          
#> 6  1632 Academic Language of P… "Thi… 2023-07… <df>            1 FALSE          
#> 7  1651 Parent-Child Conversat… "Vid… 2023-10… <df>            1 FALSE

Depending on when you run this command, there may or may not be new items.

Once you are authorized

Congratulations! Your institution has approved your access to Databrary’s identifiable data. Now, it’s time to set-up databraryr so you can access these materials.

Once you are authorized, you will gain access to a much wider range of materials on Databrary. When that happens, you’ll load the package with library(databraryr) and then run login_db(email = "<YOUR_EMAIL@PROVIDER.COM>"), substituting your actual Databrary account for <YOUR_EMAIL@PROVIDER.COM>, of course. I prefer to give the package name when I do this, so the following is how I do the same thing.

databraryr::login_db(email = "<YOUR_EMAIL@PROVIDER.COM>")

If this is the first time you’ve logged in, you will be asked to enter your Databrary password in a separate window. If everything works out, you should see a Login successful message at the R console. Congratulations, you are ready to access Databrary’s restricted shared information along with any private, but unshared information you have access to.

NOTE: You can save yourself some time if you store your Databrary login (email) as an environment variable:

  1. Install the usethis package via install.packages('usethis').
  2. Run usethis::edit_r_environ(). This will open your .Renviron file in a new window.
  3. Edit the .Renviron file by adding a line with DATABRARY_LOGIN="youremail@yourinstitution.edu", substituting your actual Databrary log in email.
  4. Save the file, and restart R.

Now, you can run Sys.getenv("DATABRARY_LOGIN") and it will return your Databrary login.

And going forward, you can use Sys.getenv("DATABRARY_LOGIN") wherever you would enter your Databrary login:

databraryr::login_db(email = Sys.getenv("DATABRARY_LOGIN"))

NOTE: You can also save yourself even more time by storing your Databrary user account (email) and password in your computer’s secure credentials database using the keyring package. The keyring package uses the encrypted file that your computer’s operating system uses for storing other passwords. There are alternative ways of storing user credentials, but this is the recommended one.

To do this, use the store and overwrite parameters in login_db():

databraryr::login_db(email = "<YOUR_EMAIL@PROVIDER.COM>", store = TRUE,
                     overwrite = TRUE)

This overwrites and securely stores your credentials, so that the next time you log in, you need only use this command:

databraryr::login_db(email = "<YOUR_EMAIL@PROVIDER.COM>")

or if you’ve stored your email as an environment variable:

databraryr::login_db(email = Sys.getenv("DATABRARY_LOGIN"))

Logging out

The package also has a log out command.

databraryr::logout_db()

Accessing data

Databrary is a data library, one specialized for storing and sharing video. Let’s see how to use databraryr to access data.

We’ll start simply. Let’s download a test video from volume 1 on Databrary.

The download_video() function handles this for us. Running it with the default parameters downloads a simple test video with numbers than increment. The file is stored in a temporary directory created by the file system using the function tempdir(). The download_asset() function returns a character string with the full file name.

download_video()

Depending on your operating system, the following commands may open the file so that you can play it with your default video player.

nums_vid <- download_video()
system(paste0("open ", nums_vid))

Or, you can navigate to the temporary directory to open and play the video manually. Use tempdir() to find the directory where test.mp4 is stored.

Now, let’s see what other files are shared in volume 1. This takes a moment to run because there are many files in this volume.

vol1_df <- list_assets_in_volume()

The command returns a data frame we can manipulate using standard R commands. Here are the variables in the data frame.

names(vol1_df)
#>  [1] "asset_id"       "asset_type_id"  "duration"       "segment"       
#>  [5] "name"           "permission"     "size"           "mimetype"      
#>  [9] "extension"      "asset_type"     "transcodable"   "classification"

The asset_type variable tells us the type of the data file.

unique(vol1_df$asset_type)
#> [1] "MPEG-4 video"           "Comma-separated values" "Portable document"

We can summarize the number of files using the stats::xtabs() function:

stats::xtabs(~ asset_type, data = vol1_df)
#> asset_type
#> Comma-separated values           MPEG-4 video      Portable document 
#>                      1                     14                      1

So, there are lots of videos and PDFs to examine. Here is a table of the ten longest videos.

vol1_df |>
  dplyr::filter(asset_type == "MPEG-4 video") |>
  dplyr::select(name, duration) |>
  dplyr::mutate(hrs = duration/(60*60*1000)) |>
  dplyr::select(name, hrs) |>
  dplyr::arrange(desc(hrs)) |>
  head() |>
  knitr::kable(format = 'html')
name hrs
Florian 0.1839478
Rick 0.1839172
Florian part 1 0.1762317
Rick part 1 0.1760950
Florian part 2 0.1055644
Rick part 2 0.1052267

Accessing metadata

Imagine you are interested in knowing more about this volume, the people who created it, or the agencies that funded it.

The list_volume_owners() function returns a data frame with information about the people who created and “own” this particular dataset. The function has a parameter this_vol_id which is an integer, unique across Databrary, that refers to the specific dataset. The list_volume_owners() function uses volume 1 as the default.

list_volume_owners()
#>   vol_id person_id sortname prename
#> 1      1         5   Adolph   Karen
#> 2      1         6  Gilmore Rick O.

The command (and many like it) can be “vectorized” using the purrr package.

purrr::map(1:15, list_volume_owners) |> 
  purrr::list_rbind()
#>    vol_id person_id      sortname   prename
#> 1       1         5        Adolph     Karen
#> 2       1         6       Gilmore   Rick O.
#> 3       2         6       Gilmore   Rick O.
#> 4       4         5        Adolph     Karen
#> 5       5         5        Adolph     Karen
#> 6       7         5        Adolph     Karen
#> 7       8        11 Tamis-LeMonda Catherine
#> 8       9         5        Adolph     Karen
#> 9      10        20        Gordon     Peter
#> 10     11         5        Adolph     Karen
#> 11     11        11 Tamis-LeMonda Catherine
#> 12     11        32       Karasik      Lana
#> 13     15        70     Messinger    Daniel

The list_volume_metadata() command gives slightly more information.

list_volume_metadata()
#>   vol_id                                     name
#> 1      1 Databrary sponsored workshops and events
#>                                                     owners permission
#> 1 Adolph, Karen; Gilmore, Rick O.; Staff; Admin, Databrary          1
#>                               doi
#> 1 https://doi.org/10.17910/B7159Q

This command can also be “vectorized.”

purrr::map(c(1:50), list_volume_metadata) |>
  purrr::list_rbind()
#>    vol_id
#> 1       1
#> 2       2
#> 3       4
#> 4       5
#> 5       7
#> 6       8
#> 7       9
#> 8      10
#> 9      11
#> 10     15
#> 11     16
#> 12     23
#> 13     24
#> 14     27
#> 15     28
#> 16     29
#> 17     30
#> 18     31
#> 19     32
#> 20     33
#> 21     34
#> 22     35
#> 23     36
#> 24     37
#> 25     38
#> 26     42
#> 27     43
#> 28     44
#> 29     45
#> 30     46
#> 31     47
#> 32     49
#> 33     50
#>                                                                                                                                                                                       name
#> 1                                                                                                                                                 Databrary sponsored workshops and events
#> 2                                                                                                                              Head-mounted camera views of adults in natural environments
#> 3                                                                                                                                   Crawling and walking infants see the world differently
#> 4                                                           No bridge too high: Infants decide whether to cross based on the probability of falling not the severity of the potential fall
#> 5                                                                                                            Ledge and wedge: Younger and older adults' perception of action possibilities
#> 6  Language, cognitive, and socio-emotional skills from 9 months until their transition to first grade in U.S. children from African-American, Dominican, Mexican, and Chinese backgrounds
#> 7                                                                                                                                         Children's social and motor play on a playground
#> 8                                                                                                                                Numerical Cognition Without Words: Evidence from Amazonia
#> 9                                                                                                                                               The Ties That Bind: Cradling in Tajikistan
#> 10                                             Facial expressions in 6-month old infants and their parents in the still face paradigm and attachment at 15 months in the Strange Situation
#> 11                                                                                                         Excerpt volume: Human quadrupeds, primate quadrupedalism, and Uner Tan Syndrome
#> 12                                                                                                                 An analysis of optic flow observed by infants during natural activities
#> 13                                                                                                        Excerpt volume: Specificity of learning: Why infants fall over a veritable cliff
#> 14                               Preliminary investigation of visual attention to human figures in photographs: Potential considerations for the design of aided AAC visual scene displays
#> 15                                                                                                                        Excerpt volume: Learning in the development of infant locomotion
#> 16                                                                             Where infants look determines how they see: eye movements and object perception performance in 3-month-olds
#> 17                                                                                                                                        The Child Affective Facial Expression (CAFE) set
#> 18                                                                                Four-month-olds' discrimination of optic flow patterns depicting different directions of observer motion
#> 19                                                                                          Spatio-temporal tuning of coherent motion evoked responses in 4–6 month old infants and adults
#> 20                                                                                                                                  Representing exact number visually using mental abacus
#> 21                                                                                                         Development of infants’ attention to faces during the first year (eye-tracking)
#> 22                                                                                                           Number as a cognitive technology: Evidence from Pirahã language and cognition
#> 23                                                                                                                        Measuring the development of social attention using free-viewing
#> 24                                                                                                                               Visual search and attention to faces during early infancy
#> 25                                                                                                           The development of predictive processes in children’s discourse understanding
#> 26                                                                                                                   Head camera clips: parent infant object play at 36 to 57 weeks of age
#> 27                                                                                                                                             Statistical learning by 8-month-old infants
#> 28                                                                                                                                              Children use syntax to learn verb meanings
#> 29                                                                                                                                            Cultural transmission of social essentialism
#> 30                                                                        Different Gestalt processing for different actions? Comparing object-directed reaching and looking time measures
#> 31                                                                                                                        Examples of rhythmical stereotypical behaviors at 4 and 7 months
#> 32                                                                                                         Cortical responses to optic flow and motion contrast across patterns and speeds
#> 33                                                                                                                                                                      Biological Motions
#>                                                      owners permission
#> 1  Adolph, Karen; Gilmore, Rick O.; Staff; Admin, Databrary          1
#> 2                                          Gilmore, Rick O.          1
#> 3                                             Adolph, Karen          1
#> 4                                             Adolph, Karen          1
#> 5                                             Adolph, Karen          1
#> 6                                  Tamis-LeMonda, Catherine          1
#> 7                                             Adolph, Karen          1
#> 8                                             Gordon, Peter          1
#> 9    Karasik, Lana; Tamis-LeMonda, Catherine; Adolph, Karen          1
#> 10                                        Messinger, Daniel          1
#> 11                             Adolph, Karen; Shapiro, Liza          1
#> 12                                         Gilmore, Rick O.          1
#> 13                                            Adolph, Karen          1
#> 14                                        Wilkinson, Krista          1
#> 15                                            Adolph, Karen          1
#> 16                                           Johnson, Scott          1
#> 17                            LoBue, Vanessa; Thrasher, Cat          1
#> 18                                         Gilmore, Rick O.          1
#> 19                                         Gilmore, Rick O.          1
#> 20                                        Frank, Michael C.          1
#> 21                                        Frank, Michael C.          1
#> 22                                        Frank, Michael C.          1
#> 23                                        Frank, Michael C.          1
#> 24                                        Frank, Michael C.          1
#> 25                                        Frank, Michael C.          1
#> 26                                          Smith, Linda B.          1
#> 27                                           Saffran, Jenny          1
#> 28                                         Naigles, Letitia          1
#> 29                                         Rhodes, Marjorie          1
#> 30                                           Vishton, Peter          1
#> 31                                       Fabricius, William          1
#> 32                                         Gilmore, Rick O.          1
#> 33                                   Bertenthal, Bennett I.          1
#>                                doi
#> 1  https://doi.org/10.17910/B7159Q
#> 2  https://doi.org/10.17910/B7WC7S
#> 3  https://doi.org/10.17910/B7RP4H
#> 4  https://doi.org/10.17910/B7MW2K
#> 5  https://doi.org/10.17910/B7H592
#> 6  https://doi.org/10.17910/B7CC74
#> 7  https://doi.org/10.17910/B77P4V
#> 8  https://doi.org/10.17910/B73W2X
#> 9   https://doi.org/10.17910/b7.11
#> 10 https://doi.org/10.17910/B7059D
#> 11 https://doi.org/10.17910/B7VC7G
#> 12 https://doi.org/10.17910/B7QP46
#> 13 https://doi.org/10.17910/B7KW28
#> 14 https://doi.org/10.17910/B7G59R
#> 15 https://doi.org/10.17910/B7BC7T
#> 16 https://doi.org/10.17910/B76P4J
#> 17 https://doi.org/10.17910/B7301K
#> 18 https://doi.org/10.17910/B7Z593
#> 19 https://doi.org/10.17910/B7TG6T
#> 20 https://doi.org/10.17910/B7PP4W
#> 21 https://doi.org/10.17910/B7K01X
#> 22 https://doi.org/10.17910/B7F59F
#> 23 https://doi.org/10.17910/B79G65
#> 24 https://doi.org/10.17910/B75P47
#> 25 https://doi.org/10.17910/B72018
#> 26 https://doi.org/10.17910/B7SG6H
#> 27 https://doi.org/10.17910/B7NP4K
#> 28 https://doi.org/10.17910/B7J01M
#> 29 https://doi.org/10.17910/B7D594
#> 30 https://doi.org/10.17910/B78G6V
#> 31 https://doi.org/10.17910/B74S3K
#> 32 https://doi.org/10.17910/B7101Z
#> 33 https://doi.org/10.17910/B7W884

The permission variable indicates whether a volume is visible by others by a user.

So, if you are not logged-in to Databrary, only data that are visible to the public will be returned. Assuming you are not logged-in, the above commands will show volumes with permission equal to 1. The permission field derives from a set of constants the system uses.

db_constants <- assign_constants()
db_constants$permission
#> [1] "NONE"   "PUBLIC" "SHARED" "READ"   "EDIT"   "ADMIN"

The permission array is indexed beginning with 0. So the 1th value is “PUBLIC”. So, the 1 means that the volumes shown above are all visible to the public, and to you.

Volumes that you have not shared and are not visible to the public, will have permission equal to 5, or “ADMIN”. We can’t demonstrate this to you because we don’t have privileges on the same unshared volume, but you can try it on a volume you’ve created but not yet shared.

The list_volume() command returns even more extensive information about volume 1. The list_volume_funding() command returns information about any funders listed for the project. Again, the default volume is 1.

list_volume_funding()
#> # A tibble: 2 × 4
#>   vol_id funder_id funder_name                                             award
#>    <dbl>     <int> <chr>                                                   <chr>
#> 1      1 100000001 National Science Foundation (NSF)                       BCS-…
#> 2      1 100000071 National Institute of Child Health and Human Developme… U01-…

This can also be “vectorized.”

purrr::map(c(1:20), list_volume_funding) |>
  purrr::list_rbind()
#> # A tibble: 25 × 4
#>    vol_id funder_id funder_name                                            award
#>     <int>     <int> <chr>                                                  <chr>
#>  1      1 100000001 National Science Foundation (NSF)                      BCS-…
#>  2      1 100000071 National Institute of Child Health and Human Developm… U01-…
#>  3      2 100000001 National Science Foundation (NSF)                      BCS-…
#>  4      3        NA <NA>                                                   <NA> 
#>  5      4 100000071 National Institute of Child Health and Human Developm… R37-…
#>  6      5 100000071 National Institute of Child Health and Human Developm… R37-…
#>  7      6        NA <NA>                                                   <NA> 
#>  8      7 100000071 National Institute of Child Health and Human Developm… R37-…
#>  9      8 100000001 National Science Foundation (NSF)                      0721…
#> 10      9 100000071 National Institute of Child Health and Human Developm… R37-…
#> # ℹ 15 more rows

The list_volume_links() command returns information about any external (web) links that have been added to a volume, such as to related publications or a GitHub repo.

list_volume_links()
#> # A tibble: 2 × 3
#>   vol_id link_name                                       url                    
#>    <dbl> <chr>                                           <chr>                  
#> 1      1 Video as data (Invited article in APS Observer) http://www.psychologic…
#> 2      1 2016-12-16 NIH PLAY workshop videocast          https://videocast.nih.…