The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
parse_names() isparse_names() is an optional, standalone
utility for cleaning author name strings. It does two
things:
"First Last" (or other styles).first,
last, particle, suffix), returned
as the "parts" attribute.It is not called by any reader or network builder. bibnets matches entity labels verbatim; you opt in to normalisation by calling this function yourself.
Bibliometric exports use three incompatible conventions.
parse_names() recognises all three; the rule is decided per
string.
| Input | Convention | Detected by |
|---|---|---|
"Saqr, Mohammed" |
Last, First |
the comma |
"WANG Y" |
SURNAME Initials (Scopus/bibnets) |
trailing uppercase 1–3 letter token |
"Mohammed Saqr" |
First Last |
default for comma-less, non-initial |
parse_names(c("Saqr, Mohammed", "WANG Y", "Mohammed Saqr"))
#> [1] "Mohammed Saqr" "Y WANG" "Mohammed Saqr"
#> attr(,"parts")
#> original first last particle suffix type
#> 1 Saqr, Mohammed Mohammed Saqr <NA> <NA> person
#> 2 WANG Y Y WANG <NA> <NA> person
#> 3 Mohammed Saqr Mohammed Saqr <NA> <NA> personA comma always means Last, First. For comma-less strings
the surname_first argument controls interpretation:
"auto" (default) — surname-first iff
the trailing token looks like initials (all-uppercase, 1–3 letters).
This is the bibnets-takes-precedence bias: native
bibnets/Scopus labels parse correctly with no extra arguments, and
ordinary mixed-case "First Last" is never misread."yes" / TRUE — force surname-first."no" / FALSE — force given-first
(comma-less returned unchanged).parse_names("Wang Yong", surname_first = "yes") # force surname-first
#> [1] "Yong Wang"
#> attr(,"parts")
#> original first last particle suffix type
#> 1 Wang Yong Yong Wang <NA> <NA> person
parse_names("WANG Y", surname_first = "no") # force given-first
#> [1] "WANG Y"
#> attr(,"parts")
#> original first last particle suffix type
#> 1 WANG Y WANG Y <NA> <NA> personParticles and suffixes are handled, and detection is case-insensitive so it works on bibnets’ upper-cased labels:
parse_names(c("van der Berg, Jan", "Smith, John, Jr.",
"DE LA CRUZ, ANA", "VAN DER BERG J"))
#> [1] "Jan van der Berg" "John Smith Jr" "ANA DE LA CRUZ" "J VAN DER BERG"
#> attr(,"parts")
#> original first last particle suffix type
#> 1 van der Berg, Jan Jan Berg van der <NA> person
#> 2 Smith, John, Jr. John Smith <NA> Jr person
#> 3 DE LA CRUZ, ANA ANA CRUZ DE LA <NA> person
#> 4 VAN DER BERG J J BERG VAN DER <NA> personGroup / corporate authors, NA, and empty strings are
left untouched:
formatnm <- c("Saqr, Mohammed", "van der Berg, Jan", "Garcia Marquez, Gabriel Jose")
data.frame(
first_last = parse_names(nm),
last_initials = parse_names(nm, format = "last_initials"),
last = parse_names(nm, format = "last")
)
#> first_last last_initials last
#> 1 Mohammed Saqr Saqr M. Saqr
#> 2 Jan van der Berg van der Berg J. van der Berg
#> 3 Gabriel Jose Garcia Marquez Garcia Marquez G.J. Garcia Marquez"parts" attributeThe parsed components ride along on every call, independent of
format:
x <- parse_names(c("van der Berg, Jan", "Smith, John, Jr."))
attr(x, "parts")
#> original first last particle suffix type
#> 1 van der Berg, Jan Jan Berg van der <NA> person
#> 2 Smith, John, Jr. John Smith <NA> Jr persontype is one of "person",
"organization", "empty",
"missing".
parse_names() works on one flat character
vector. It is not a data-frame function.
bibnets readers store authors as a list-column: each paper has a variable number of authors, so the cell holds a vector, not a single string.
papers <- data.frame(id = c("P1", "P2", "P3"), stringsAsFactors = FALSE)
papers$authors <- list(
c("Saqr, Mohammed", "Lopez, Ana"),
c("SAQR M", "Lopez, Ana"),
c("Saqr, Mohammed", "Chen, Wei"))
papers$authors
#> [[1]]
#> [1] "Saqr, Mohammed" "Lopez, Ana"
#>
#> [[2]]
#> [1] "SAQR M" "Lopez, Ana"
#>
#> [[3]]
#> [1] "Saqr, Mohammed" "Chen, Wei"Map the function over the list-column with lapply():
papers$authors <- lapply(papers$authors, parse_names,
format = "last_initials")
papers$authors
#> [[1]]
#> [1] "Saqr M." "Lopez A."
#> attr(,"parts")
#> original first last particle suffix type
#> 1 Saqr, Mohammed Mohammed Saqr <NA> <NA> person
#> 2 Lopez, Ana Ana Lopez <NA> <NA> person
#>
#> [[2]]
#> [1] "SAQR M." "Lopez A."
#> attr(,"parts")
#> original first last particle suffix type
#> 1 SAQR M M SAQR <NA> <NA> person
#> 2 Lopez, Ana Ana Lopez <NA> <NA> person
#>
#> [[3]]
#> [1] "Saqr M." "Chen W."
#> attr(,"parts")
#> original first last particle suffix type
#> 1 Saqr, Mohammed Mohammed Saqr <NA> <NA> person
#> 2 Chen, Wei Wei Chen <NA> <NA> personA flat character column (or a network’s from /
to) is called directly, no lapply():
Node identity in bibnets is fixed when the network is built (labels
are upper-cased and matched verbatim). Two spellings of one author merge
into a single node only if normalised before
author_network().
Here "Saqr, Mohammed" and "SAQR M" are the
same person written two ways. After normalising they both become
SAQR M., so the Saqr–Lopez collaboration is correctly
counted as 2:
net <- author_network(papers, type = "collaboration")
net
#> # bibnets network: author_collaboration | 3 nodes · 2 edges | counting: full
#> from to weight count
#> 1 LOPEZ A. SAQR M. 2 2
#> 2 CHEN W. SAQR M. 1 1Had we built the network first and called parse_names()
on from / to afterwards, the two spellings
would already have been counted as two separate nodes — too late to
merge by relabelling.
The network object is a data frame (from,
to, weight, count) with an extra
bibnets_network class for printing:
You can relabel from / to
directly, but parse_names() is graph-blind. Edges, pairing,
weight and count are preserved, but:
"last_initials"), and
bibnets does not re-aggregate the resulting duplicate
edges.net$from <- as.vector(parse_names(net$from, format = "last"))
net$to <- as.vector(parse_names(net$to, format = "last"))
net
#> # bibnets network: author_collaboration | 3 nodes · 2 edges | counting: full
#> from to weight count
#> 1 LOPEZ SAQR 2 2
#> 2 CHEN SAQR 1 1Use as.vector() when assigning back so the
"parts" attribute is not carried on the column.
auto
heuristic is biased toward the bibnets/Scopus surname-first convention
and may misread uppercase "GIVEN SURNAME" when the surname
is 1–3 letters (e.g. "MOHAMMED LI"). Pass
surname_first = "no" to override."Jr., Sammy Davis") is
not specially handled.parse_names(x) — vector in, vector out, with a
"parts" attribute.lapply(df$authors, parse_names) — for the authors
list-column.author_network() for
correct node merging.format = "first_last" /
"last_initials" / "last";
surname_first = "auto" / "yes" /
"no".These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.