Benchmarking

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Benchmarking

Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.

library(akc)
library(dplyr)

Then, we prepare the needed data. The built-in data table biblio_data_table would be used here.

bibli_data_table %>% 
  keyword_clean() %>% 
  keyword_merge() -> clean_data

Next, a combination of network size and community detection algorithms are designed to be tested:

100:300 -> topn_sample
ls("package:akc") %>% 
  str_extract("^group.+") %>% 
  na.omit() %>% 
  setdiff(c("group_biconnected_component",
            "group_components",
            "group_optimal")) -> com_detect_fun_list

all = tibble()
for(i in com_detect_fun_list){
    for(j in topn_sample){
      system.time({
        clean_data %>% 
          keyword_group(top = j,com_detect_fun = get(i)) %>% 
          as_tibble -> grouped_network_table
      }) %>% na.omit-> time_info
      grouped_network_table %>% nrow -> node_no
      grouped_network_table %>% distinct(group) %>% nrow -> group_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(mean(n)) %>% 
        .[[1]] -> group_avg_node_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(sd(n)) %>% 
        .[[1]] -> group_sd_node_no
      c(com_detect_fun = i, 
        topn = j,
        node_no = node_no,group_no = group_no,
        avg = group_avg_node_no,
        sd = group_sd_node_no,time_info[1:3]) %>% 
        bind_rows(all,.) -> all
    }
}

res = all %>% 
  mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>% 
  distinct(com_detect_fun,node_no,.keep_all = T) %>% 
  select(-topn,-contains("self")) %>% 
  setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
             "Average node number in each group","Standard deviation of node number",
             "Computer running time for keyword_group function"))

com_detect_fun	No. of total nodes	No. of total groups	Average node number in each group	Standard deviation of node number	Computer running time for keyword_group function
group_edge_betweenness	103	36	2.86	9.17	0.50
group_edge_betweenness	207	68	3.04	12.53	2.98
group_edge_betweenness	326	89	3.66	13.12	10.03
group_fast_greedy	103	5	20.60	8.17	0.17
group_fast_greedy	207	5	41.40	24.36	0.18
group_fast_greedy	326	6	54.33	34.77	0.19
group_infomap	103	1	103.00	NA	0.17
group_infomap	207	4	51.75	94.83	0.22
group_infomap	326	6	54.33	114.98	0.34
group_label_prop	103	1	103.00	NA	0.16
group_label_prop	207	1	207.00	NA	0.17
group_label_prop	326	1	326.00	NA	0.18
group_leading_eigen	103	4	25.75	9.57	0.17
group_leading_eigen	207	5	41.40	19.19	0.18
group_leading_eigen	326	7	46.57	35.15	0.22
group_louvain	103	5	20.60	12.14	0.16
group_louvain	207	8	25.88	14.11	0.17
group_louvain	326	9	36.22	19.08	0.18
group_spinglass	103	5	20.60	5.13	1.66
group_spinglass	207	8	25.88	13.38	4.04
group_spinglass	326	8	40.75	12.07	7.30
group_walktrap	103	103	1.00	0.00	0.16
group_walktrap	207	207	1.00	0.00	0.17
group_walktrap	326	326	1.00	0.00	0.17

sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=C                               
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
#>  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.9        jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.1   
#> [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.6       jsonlite_1.8.9

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.