The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Data for Morpheme Tokenization
Version: 1.2.0
Description: Provides data about morphemes, the smallest units of meaning in a language.
License: Apache License (≥ 2)
Encoding: UTF-8
RoxygenNote: 7.1.2
URL: https://github.com/macmillancontentscience/morphemepiece.data
BugReports: https://github.com/macmillancontentscience/morphemepiece.data/issues
Suggests: testthat (≥ 3.0.0)
Depends: R (≥ 3.5.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2022-04-18 17:19:26 UTC; jonth
Author: Jonathan Bratt ORCID iD [aut], Jon Harmon ORCID iD [aut, cre], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
Maintainer: Jon Harmon <jonthegeek@gmail.com>
Repository: CRAN
Date/Publication: 2022-04-18 17:42:28 UTC

Generate the inst path

Description

Generate the inst path

Usage

.get_path(filetype, n_tokens)

Arguments

filetype

Character scalar; the type of file, like "lookup" or "vocab".

n_tokens

Integer scalar; The number of tokens used for that file.

Value

Character scalar; the path to the file.


Load an RDS from inst Dir

Description

Load an RDS from inst Dir

Usage

.load_inst_rds(filetype, n_tokens)

Arguments

filetype

Character scalar; the type of file, like "lookup" or "vocab".

n_tokens

Integer scalar; The number of tokens used for that file.

Value

The R object.


Load a Morphemepiece Lookup

Description

A morphemepiece lookup is a named character vector. The names of the vector are the words, and the values are the space-separated morpheme breakdowns of those words.

Usage

morphemepiece_lookup()

Value

A named character vector.

Examples

head(morphemepiece_lookup())

Load a Morphemepiece Vocabulary

Description

A morphemepiece vocabulary is a named integer vector with class "morphemepiece_vocabulary". The names of the vector are the morphemes, and the values are the integer identifiers of those tokens. The vocabulary is 0-indexed for compatibility with Python implementations.

Usage

morphemepiece_vocab()

Value

A morphemepiece_vocabulary.

Examples

head(morphemepiece_vocab())

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.