How to use GetLattesData

Marcelo Perlin

2017-11-12

Lattes is an unique and largest platform for academic curriculumns. There you can find information about the academic work of all Brazilian scholars. It includes institution of PhD, current employer, field of work, all publications metadata and more. It is an unique and reliable source of information for bibliometric studies.

I’ve been working with Lattes data for some time. Here I present a short list of papers that have used this data.

Package GetLattesData is a wrap up of functions I’ve been using for accessing the dataset. It’s main innovation is the possibility of downloading data directly from Lattes, without any manual work or captcha solving.

Example of usage

Let’s consider a simple example of downloading information for a group of scholars. I selected a couple of coleagues at my university. Their Lattes id can be easilly found in Lattes website. After searching for a name, notice the internet address of the resulting CV, such as http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4713546D3. Lattes ID is the final 10 digit code of this address. In our example, it is 'K4713546D3'.

Since we all work in the business department of UFRGS, the impact of our publications is localy set by the Qualis ranking of Management, Accounting and Tourism ('ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'). Qualis is the local journal ranking in Brazil. You can read more about Qualis in Wikipedia and here

Now, based on the two sets of information, vector of ids and field of Qualis, we use GetLattesData to download all up to date information available in Lattes:

library(GetLattesData)

# ids from EA-UFRGS
my.ids <- c('K4713546D3', 'K4440252H7', 'K4723925J2')

# qualis for the field of management
field.qualis = 'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'

l.out <- gld_get_lattes_data(id.vec = my.ids, field.qualis = field.qualis)
## 
## Downloading file  /tmp/RtmpNc2uWB/K4713546D3_2017-11-12.zip
## Downloading file  /tmp/RtmpNc2uWB/K4440252H7_2017-11-12.zip
## Downloading file  /tmp/RtmpNc2uWB/K4723925J2_2017-11-12.zip
## Reading  K4713546D3_2017-11-12.zip -  Marcelo Scherer Perlin
##  Found 19 published papers
##  Found 1 accepted paper(s)
##  Found 10 supervisions
##  Found 2 published books
##  Found 0 book chapters
##  Found 17 conference papers
## Reading  K4440252H7_2017-11-12.zip -  Marcelo Brutti Righi
##  Found 47 published papers
##  Found 2 accepted paper(s)
##  Found 4 supervisions
##  Found 2 published books
##  Found 1 book chapters
##  Found 0 conference papers
## Reading  K4723925J2_2017-11-12.zip -  Denis Borenstein
##  Found 65 published papers
##  Found 0 accepted paper(s)
##  Found 95 supervisions
##  Found 1 published books
##  Found 6 book chapters
##  Found 89 conference papers

The output my.l is a list with the following dataframes:

names(l.out)
## [1] "tpesq"             "tpublic.published" "tpublic.accepted" 
## [4] "tsupervisions"     "tbooks"            "tconferences"

The first is a dataframe with information about researchers:

tpesq <- l.out$tpesq
str(tpesq)
## 'data.frame':    3 obs. of  9 variables:
##  $ name           : chr  "Marcelo Scherer Perlin" "Marcelo Brutti Righi" "Denis Borenstein"
##  $ last.update    : Date, format: "2017-11-09" "2017-11-10" ...
##  $ phd.institution: chr  "University of Reading" "Universidade Federal de Santa Maria" "University of Strathclyde"
##  $ phd.start.year : num  2007 2013 1991
##  $ phd.end.year   : num  2010 2015 1995
##  $ country.origin : chr  "Brasil" "Brasil" "Brasil"
##  $ major.field    : chr  "CIENCIAS_SOCIAIS_APLICADAS" "CIENCIAS_SOCIAIS_APLICADAS" "ENGENHARIAS"
##  $ minor.field    : chr  "Administração" "Administração" "Engenharia de Produção"
##  $ id.file        : chr  "K4713546D3_2017-11-12.zip" "K4440252H7_2017-11-12.zip" "K4723925J2_2017-11-12.zip"

The second dataframe contains information about all published publications, including Qualis and SJR:

dplyr::glimpse(l.out$tpublic.published)
## Observations: 131
## Variables: 11
## $ name          <chr> "Marcelo Scherer Perlin", "Marcelo Scherer Perli...
## $ article.title <chr> "Análise do Perfil dos Acadêmicos e de suas Publ...
## $ year          <dbl> 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, ...
## $ language      <chr> "Português", "Inglês", "Inglês", "Português", "I...
## $ journal.title <chr> "RAC. Revista de Administração Contemporânea (Im...
## $ ISSN          <chr> "1415-6555", "1678-6971", "1389-4420", "1062-940...
## $ order.aut     <dbl> 2, 3, 3, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, ...
## $ n.authors     <dbl> 3, 3, 5, 3, 5, 1, 2, 4, 2, 2, 2, 1, 3, 3, 3, 2, ...
## $ qualis        <chr> "A2", "B1", "A2", "A2", NA, "B1", "B1", "A1", "B...
## $ SJR           <dbl> NA, NA, 0.481, 0.525, 2.029, NA, NA, 0.615, NA, ...
## $ H.SJR         <int> NA, NA, 29, 27, 50, NA, NA, 45, NA, NA, NA, NA, ...

Other dataframes in l.out included information about accepted papers, supervisions, books and conferences.

An application of GetLattesData

GetLattesData makes it easy to create academic reports for a large number of researchers. See next, where we plot the number of publications for each researcher, conditioning on Qualis ranking.

tpublic.published <- l.out$tpublic.published

library(ggplot2)

p <- ggplot(tpublic.published, aes(x = qualis)) +
  geom_bar(position = 'identity') + facet_wrap(~name) +
  labs(x = paste0('Qualis: ', field.qualis))
print(p)

We can also use dplyr to do some simple assessment of academic productivity:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
my.tab <- tpublic.published %>%
  group_by(name) %>%
  summarise(n.papers = n(),
            max.SJR = max(SJR, na.rm = T),
            mean.SJR = mean(SJR, na.rm = T),
            n.A1.qualis = sum(qualis == 'A1', na.rm = T),
            n.A2.qualis = sum(qualis == 'A2', na.rm = T),
            median.authorship = median(as.numeric(order.aut), na.rm = T ))

knitr::kable(my.tab)
name n.papers max.SJR mean.SJR n.A1.qualis n.A2.qualis median.authorship
Denis Borenstein 65 3.674 1.3193333 23 15 2
Marcelo Brutti Righi 47 1.767 0.4340968 7 18 1
Marcelo Scherer Perlin 19 2.029 0.7397143 2 4 1