The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
When working with textual data and corpora, issues with encodings are a frequent and nasty problem. While issues with encodings are sometimes quite obvious as you see “trash” character signs, they may also cause warnings and errors that are difficult to understand. In this vignette, we seek to explain what polmineR users should be aware of to avoid problems with encodings.
Windows: latin-1 macOS and Linux: UTF-8
When it was initiallly developed, the Corpus Workbench (CWB) worked with latin-1 encodings.
To query corpora, query strings are entered on the terminal or passed
in via a script. User input is assumed to have the locale of the
session. The encoding of user input is assumed to correspond to
localeToCharset()
, polmineR has a wrapper on
localeToCharset()
(encoding()
) that will
assume that your session charset is “UTF-8” rather than
NA
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.