The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package aggregates the lexicogrammatical and functional features described by Biber (1985) and widely used for text-type, register, and genre classification tasks.
The scripts are not really taggers. Rather, they use either udpipe or spaCy (via spacyr) part-of-speech tagging and dependency parsing to summarize and aggregate patterns.
Because they rely on existing part-of-speech tagging, the accuracy of the resulting counts are dependent on the accuracy of tagging. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs.
The package provides one function, biber()
, which takes
either udpipe- or spacyr-tagged text and produces a data frame of
features for each document.
For example,
library(spacyr)
library(pseudobibeR)
spacy_initialize(model = "en_core_web_sm")
<- biber(
features spacy_parse(
c("doc_1" = "The task was done by Steve"),
dependency = TRUE,
tag = TRUE,
pos = TRUE
) )
pseudobibR uses testthat
for unit testing. To avoid having to distribute spacy or updipe models
for tests – as these models can be many megabytes – the tests use saved
output. Specifically, in the tests/testthat/text-samples/
directory,
samples.tsv
contains sample sentences for tests. Each
line contains a document ID and then a sample text, separated by a tab
character.parse-samples.R
can be run to parse these sentences and
save them to an RDS file.If you update samples.tsv
, you must run
parse-samples.R
to get the new parsed sentences.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.