The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.
First, we’ll load the needed packages.
library(akc)
library(dplyr)
Then, we prepare the needed data. The built-in data table
biblio_data_table
would be used here.
%>%
bibli_data_table keyword_clean() %>%
keyword_merge() -> clean_data
Next, a combination of network size and community detection algorithms are designed to be tested:
100:300 -> topn_sample
ls("package:akc") %>%
str_extract("^group.+") %>%
na.omit() %>%
setdiff(c("group_biconnected_component",
"group_components",
"group_optimal")) -> com_detect_fun_list
Finally, we’ll implement the computation and record the results.
= tibble()
all for(i in com_detect_fun_list){
for(j in topn_sample){
system.time({
%>%
clean_data keyword_group(top = j,com_detect_fun = get(i)) %>%
-> grouped_network_table
as_tibble %>% na.omit-> time_info
}) %>% nrow -> node_no
grouped_network_table %>% distinct(group) %>% nrow -> group_no
grouped_network_table %>%
grouped_network_table count(group) %>%
summarise(mean(n)) %>%
1]] -> group_avg_node_no
.[[%>%
grouped_network_table count(group) %>%
summarise(sd(n)) %>%
1]] -> group_sd_node_no
.[[c(com_detect_fun = i,
topn = j,
node_no = node_no,group_no = group_no,
avg = group_avg_node_no,
sd = group_sd_node_no,time_info[1:3]) %>%
bind_rows(all,.) -> all
}
}
= all %>%
res mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>%
distinct(com_detect_fun,node_no,.keep_all = T) %>%
select(-topn,-contains("self")) %>%
setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
"Average node number in each group","Standard deviation of node number",
"Computer running time for keyword_group function"))
The results are displayed in the following table.
::kable(res) knitr
com_detect_fun | No. of total nodes | No. of total groups | Average node number in each group | Standard deviation of node number | Computer running time for keyword_group function |
---|---|---|---|---|---|
group_edge_betweenness | 103 | 36 | 2.86 | 9.17 | 0.50 |
group_edge_betweenness | 207 | 68 | 3.04 | 12.53 | 2.98 |
group_edge_betweenness | 326 | 89 | 3.66 | 13.12 | 10.03 |
group_fast_greedy | 103 | 5 | 20.60 | 8.17 | 0.17 |
group_fast_greedy | 207 | 5 | 41.40 | 24.36 | 0.18 |
group_fast_greedy | 326 | 6 | 54.33 | 34.77 | 0.19 |
group_infomap | 103 | 1 | 103.00 | NA | 0.17 |
group_infomap | 207 | 4 | 51.75 | 94.83 | 0.22 |
group_infomap | 326 | 6 | 54.33 | 114.98 | 0.34 |
group_label_prop | 103 | 1 | 103.00 | NA | 0.16 |
group_label_prop | 207 | 1 | 207.00 | NA | 0.17 |
group_label_prop | 326 | 1 | 326.00 | NA | 0.18 |
group_leading_eigen | 103 | 4 | 25.75 | 9.57 | 0.17 |
group_leading_eigen | 207 | 5 | 41.40 | 19.19 | 0.18 |
group_leading_eigen | 326 | 7 | 46.57 | 35.15 | 0.22 |
group_louvain | 103 | 5 | 20.60 | 12.14 | 0.16 |
group_louvain | 207 | 8 | 25.88 | 14.11 | 0.17 |
group_louvain | 326 | 9 | 36.22 | 19.08 | 0.18 |
group_spinglass | 103 | 5 | 20.60 | 5.13 | 1.66 |
group_spinglass | 207 | 8 | 25.88 | 13.38 | 4.04 |
group_spinglass | 326 | 8 | 40.75 | 12.07 | 7.30 |
group_walktrap | 103 | 103 | 1.00 | 0.00 | 0.16 |
group_walktrap | 207 | 207 | 1.00 | 0.00 | 0.17 |
group_walktrap | 326 | 326 | 1.00 | 0.00 | 0.17 |
The session information is displayed as below:
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.3
#> [5] evaluate_0.16 highr_0.9 stringi_1.7.8 cachem_1.0.6
#> [9] rlang_1.0.4 cli_3.3.0 rstudioapi_0.13 jquerylib_0.1.4
#> [13] bslib_0.4.0 rmarkdown_2.14 tools_4.2.1 stringr_1.4.0
#> [17] xfun_0.32 yaml_2.3.5 fastmap_1.1.0 compiler_4.2.1
#> [21] htmltools_0.5.3 knitr_1.39 sass_0.4.2
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.