The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
CiteSource provides three custom metadata fields for labeling
citation records: cite_source, cite_label, and
cite_string. Most workflows use cite_source to
identify the database and cite_label to track the review
stage (search, screened, final). The cite_string field
provides a third dimension for cases where you need to distinguish
between variations of a search strategy within the same source.
The most common use case is within-source string
comparison: you are testing multiple query formulations in a
single database before finalizing your search strategy, and you want to
compare how each performs without conflating the query variation with
the source identity. Encoding the variations as separate
cite_source values would work, but it loses the ability to
aggregate results at the database level. Using cite_string
keeps the database identity intact while enabling a separate axis of
analysis.
In this example, five search strings were run in Web of Science. We
use cite_source to record the database and
cite_string to label each query variation, then compare
their performance against a set of benchmark studies.
file_path <- "../vignettes/new_benchmark_data/"
citation_files <- list.files(path = file_path, pattern = "\\.ris", full.names = TRUE)
citation_files
#> [1] "../vignettes/new_benchmark_data/benchmark_15.ris"
#> [2] "../vignettes/new_benchmark_data/search1_166.ris"
#> [3] "../vignettes/new_benchmark_data/search2_278.ris"
#> [4] "../vignettes/new_benchmark_data/search3_302.ris"
#> [5] "../vignettes/new_benchmark_data/search4_460.ris"
#> [6] "../vignettes/new_benchmark_data/search5_495.ris"The key difference from a standard import: cite_source
is the same database (“WoS”) for all search strings, while
cite_string differentiates the query variations. The
benchmark file gets cite_source = NA and
cite_label = "benchmark".
imported_tbl <- tibble::tribble(
~files, ~cite_sources, ~cite_labels, ~cite_strings,
"benchmark_15.ris", NA, "benchmark", NA,
"search1_166.ris", "WoS", "search", "string 1",
"search2_278.ris", "WoS", "search", "string 2",
"search3_302.ris", "WoS", "search", "string 3",
"search4_460.ris", "WoS", "search", "string 4",
"search5_495.ris", "WoS", "search", "string 5"
) |>
dplyr::mutate(files = paste0(file_path, files))
raw_citations <- read_citations(metadata = imported_tbl, verbose = FALSE)
#> Note: the following cite_label value(s) are not in the standard vocabulary (search / screened / final): benchmark. Phase-analysis functions expect these exact labels.unique_citations <- dedup_citations(raw_citations)
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> flagging potential pairs for manual dedup...
#> 1716 citations loaded...
#> 1217 duplicate citations removed...
#> 499 unique citations remaining!
n_unique <- count_unique(unique_citations)
# Compare by string rather than source
string_comparison <- compare_sources(unique_citations, comp_type = "strings")initial_records <- calculate_initial_records(unique_citations)
create_initial_record_table(initial_records)| Record Counts | ||
| Records Imported1 | Distinct Records2 | |
|---|---|---|
| WoS | 1701 | 495 |
| NA | 4 | 4 |
| Total | 1705 | 499 |
| 1 Number of records imported from each source. | ||
| 2 Number of records after internal source deduplication. | ||
The upset plot shows how records are distributed across string combinations. This tells you which strings are finding records the others miss and how much overlap exists between query variations.
plot_contributions() shows unique and shared record
counts for each string. Strings with a high proportion of unique records
are contributing coverage that the other strings miss; strings with
mostly shared records may be redundant.
Filtering to the benchmark records and using the record-level table shows exactly which benchmark studies each string found — and which were missed entirely.
detailed_records <- calculate_detailed_records(unique_citations, n_unique)
create_detailed_record_table(detailed_records)| Record Summary | |||||||
| Records Imported1 | Distinct Records2 | Unique Records3 | Non-unique Records4 | Source Contribution %5 | Source Unique Contribution %6 | Source Unique %7 | |
|---|---|---|---|---|---|---|---|
| WoS | 1701 | 495 | 1701 | -1206 | 99.2% | 100.0% | 343.6% |
| NA | 4 | 4 | NA | NA | 0.8% | NA | NA |
| Total | 1705 | 8 499 | 1701 | -1206 | NA | NA | NA |
| 1 Number of raw records imported from each database. | |||||||
| 2 Number of records after internal source deduplication. | |||||||
| 3 Number of records not found in another source. | |||||||
| 4 Number of records found in at least one other source. | |||||||
| 5 Percent distinct records contributed to the total number of distinct records. | |||||||
| 6 Percent of unique records contributed to the total unique records. | |||||||
| 7 Percentage of records that were unique from each source. | |||||||
| 8 Total citations discovered (after internal and cross-source deduplication). | |||||||
| Scenario | Recommended field |
|---|---|
| Different databases (PubMed, Scopus, WoS) | cite_source |
| Same database, different query variations | cite_string |
| Hand searching, citation chasing alongside database searches | cite_string (method) + cite_source
(target) |
| Tracking records through review stages | cite_label |
For most reviews, cite_source and
cite_label are sufficient. cite_string becomes
valuable when you are doing pre-search validation with multiple query
variants, or when you want to distinguish supplementary search methods
from the primary database searches while keeping both associated with
the same source.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.