The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Tools to calculate SII and its extensions

Tian-Yuan Huang

This vignette introduces how to use siie package to calculate SII and its extensions introduced in the paper “Superior identification index: Quantifying the capability of academic journals to recognize good research”(https://doi.org/10.1007/s11192-022-04372-z). First, we construct a data set manually, suspecting that there are 10,000 papers from 26 journals with their citation counts.

set.seed(19960822)
nr_of_rows = 1e4
data.frame(
  Id = 1:1e4,
  Journal = sample(LETTERS,nr_of_rows,replace = TRUE),
  CiteCount = sample(1:100,nr_of_rows,replace = TRUE)
) -> journal_table

To get the SII (Superior Identification Index) and SIE (Superior Identification Efficiency) for the 26 journals (represented by letters), we can:

library(siie)
library(tidyfst)
#> 
#> Life's short, use R.

journal_table %>% siie(group = "Journal",index = "CiteCount")
#> Key: <Journal>
#>     Journal superior_no total_no        sii        sie
#>      <char>       <int>    <int>      <num>      <num>
#>  1:       A          44      393 0.04251208 0.11195929
#>  2:       B          44      380 0.04251208 0.11578947
#>  3:       C          39      381 0.03768116 0.10236220
#>  4:       D          46      385 0.04444444 0.11948052
#>  5:       E          43      358 0.04154589 0.12011173
#>  6:       F          38      372 0.03671498 0.10215054
#>  7:       G          43      415 0.04154589 0.10361446
#>  8:       H          42      386 0.04057971 0.10880829
#>  9:       I          42      376 0.04057971 0.11170213
#> 10:       J          41      368 0.03961353 0.11141304
#> 11:       K          37      390 0.03574879 0.09487179
#> 12:       L          37      392 0.03574879 0.09438776
#> 13:       M          38      372 0.03671498 0.10215054
#> 14:       N          28      397 0.02705314 0.07052897
#> 15:       O          42      384 0.04057971 0.10937500
#> 16:       P          51      415 0.04927536 0.12289157
#> 17:       Q          36      364 0.03478261 0.09890110
#> 18:       R          39      408 0.03768116 0.09558824
#> 19:       S          45      399 0.04347826 0.11278195
#> 20:       T          40      387 0.03864734 0.10335917
#> 21:       U          31      384 0.02995169 0.08072917
#> 22:       V          47      392 0.04541063 0.11989796
#> 23:       W          30      344 0.02898551 0.08720930
#> 24:       X          28      383 0.02705314 0.07310705
#> 25:       Y          40      401 0.03864734 0.09975062
#> 26:       Z          44      374 0.04251208 0.11764706
#>     Journal superior_no total_no        sii        sie

Note that the default superior cutoff (parameter p) is 10, indicating that top 10% papers are regarded as superior. If we want to use a different p, say 1, we can:

journal_table %>% siie(group = "Journal",index = "CiteCount",p = 1)

To get the PRP (Paper Rank Percentile) for the 26 journals, we can:

prp(journal_table,group = "Journal",index = "CiteCount")
#>     Journal total_no      prp
#>      <char>    <int>    <num>
#>  1:       X      383 53.53256
#>  2:       M      372 52.88790
#>  3:       U      384 51.88940
#>  4:       R      408 51.10132
#>  5:       H      386 51.09964
#>  6:       W      344 51.05587
#>  7:       G      415 50.99173
#>  8:       O      384 50.49888
#>  9:       N      397 50.40763
#> 10:       Q      364 50.40338
#> 11:       Y      401 49.54594
#> 12:       F      372 49.45449
#> 13:       K      390 49.19364
#> 14:       L      392 48.90227
#> 15:       V      392 48.76166
#> 16:       J      368 48.68158
#> 17:       S      399 48.64158
#> 18:       B      380 48.47558
#> 19:       C      381 48.46646
#> 20:       A      393 48.43221
#> 21:       D      385 48.41839
#> 22:       T      387 48.31010
#> 23:       Z      374 47.36270
#> 24:       E      358 47.31212
#> 25:       P      415 46.86055
#> 26:       I      376 46.53165
#>     Journal total_no      prp

Last, if we want to draw p-SIE curve for Journals A, B and C, we can:

library(ggplot2)

p_sie(journal_table,group = "Journal",
      index = "CiteCount",to_compare = c("A","B","C")) -> p_sie_df

p_sie_df
#>      Journal     p         sie
#>       <char> <int>       <num>
#>   1:       A     1 0.005089059
#>   2:       B     1 0.010526316
#>   3:       C     1 0.007874016
#>   4:       A     2 0.030534351
#>   5:       B     2 0.026315789
#>  ---                          
#> 296:       B    99 1.000000000
#> 297:       C    99 1.000000000
#> 298:       A   100 1.000000000
#> 299:       B   100 1.000000000
#> 300:       C   100 1.000000000

p_sie_df %>%
  ggplot(aes(p/100,sie,color = Journal)) +
  geom_point() +
  geom_line() +
  geom_abline(slope = 1,linetype = "dashed") +
  scale_x_continuous(labels = tidyfst::percent) +
  scale_y_continuous(labels = tidyfst::percent) +
  labs(x = "p",y = "SIE") +
  theme_bw() +
  theme(legend.position = c(0.8, 0.3),
        legend.background = element_rect(linewidth=0.5,
                                         color = "black",linetype="solid"))


Notice that we use the tidyfst::percent to change the scales of x and y.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.