The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Google's Compact Language Detector 2
Version: 1.2.6
Description: Bindings to Google's C++ library Compact Language Detector 2 (see https://github.com/cld2owners/cld2#readme for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.
License: Apache License 2.0
Encoding: UTF-8
URL: https://docs.ropensci.org/cld2/ https://ropensci.r-universe.dev/cld2
BugReports: https://github.com/ropensci/cld2/issues
Imports: Rcpp
LinkingTo: Rcpp
RoxygenNote: 6.0.1
Suggests: testthat, readtext, cld3
NeedsCompilation: yes
Packaged: 2025-03-22 19:58:34 UTC; jeroen
Author: Jeroen Ooms ORCID iD [aut, cre], Dirk Sites [cph] (Author of CLD2 C++ library)
Maintainer: Jeroen Ooms <jeroenooms@gmail.com>
Repository: CRAN
Date/Publication: 2025-03-22 20:30:17 UTC

Compact Language Detector 2

Description

The function detect_language() is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined. The function detect_language_multi() is not vectorised and analyses the entire character vector as a whole. The output includes the top 3 detected languages including the relative proportion and the total number of text bytes that was reliably classified.

Usage

detect_language(text, plain_text = TRUE, lang_code = TRUE)

detect_language_mixed(text, plain_text = TRUE)

Arguments

text

a string with text to classify or a connection to read from

plain_text

if FALSE then code skips HTML tags and expands HTML entities

lang_code

return a language code instead of name

Examples

# Vectorized function
text <- c("To be or not to be?", "Ce n'est pas grave.", "Nou breekt mijn klomp!")
detect_language(text)

## Not run: 
# Read HTML from connection
detect_language(url('http://www.un.org/ar/universal-declaration-human-rights/'), plain_text = FALSE)

# More detailed classification output
detect_language_mixed(
  url('http://www.un.org/fr/universal-declaration-human-rights/'), plain_text = FALSE)

detect_language_mixed(
  url('http://www.un.org/zh/universal-declaration-human-rights/'), plain_text = FALSE)

## End(Not run)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.