| Type: | Package | 
| Title: | Semi-Supervised Model for Geographical Document Classification | 
| Version: | 0.9.2 | 
| Maintainer: | Kohei Watanabe <watanabe.kohei@gmail.com> | 
| Description: | Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional). | 
| License: | MIT + file LICENSE | 
| URL: | https://github.com/koheiw/newsmap | 
| BugReports: | https://github.com/koheiw/newsmap/issues | 
| LazyData: | TRUE | 
| Encoding: | UTF-8 | 
| Depends: | R (≥ 3.5), methods | 
| Imports: | utils, Matrix, quanteda (≥ 2.1), quanteda.textstats, stringi | 
| Suggests: | testthat | 
| Language: | en-GB | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-10 08:04:17 UTC; watan | 
| Author: | Kohei Watanabe [aut, cre, cph], Stefan Müller [aut], Dani Madrid-Morales [aut], Katerina Tertytchnaya [aut], Ke Cheng [aut], Chung-hong Chan [aut], Claude Grasland [aut], Giuseppe Carteny [aut], Elad Segev [aut], Dai Yamao [aut], Barbara Ellynes Zucchi Nobre Silva [aut], Lanabi la Lova [aut], Lungta Seki [aut] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-10 12:50:12 UTC | 
Evaluate classification accuracy in precision and recall
Description
Evaluate classification accuracy in precision and recall
Usage
accuracy(x, y)
Arguments
| x | vector of predicted classes | 
| y | vector of true classes | 
Examples
class_pred <- c('US', 'GB', 'US', 'CN', 'JP', 'FR', 'CN') # prediction
class_true <- c('US', 'FR', 'US', 'CN', 'KP', 'EG', 'US') # true class
acc <- accuracy(class_pred, class_true)
print(acc)
summary(acc)
Compute average feature entropy (AFE)
Description
AFE computes randomness of occurrences features in labelled documents.
Usage
afe(x, y, smooth = 1)
Arguments
| x | a dfm for features | 
| y | a dfm for labels | 
| smooth | a numeric value for smoothing to include all the features | 
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*.
Description
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*.
Usage
as.coefficients_textmodel(x)
Arguments
| x | an object to be coerced | 
Coerce various objects to statistics_textmodel
Description
This is a helper function used in summary.textmodel_*.
Usage
as.statistics_textmodel(x)
Arguments
| x | an object to be coerced | 
Assign the summary.textmodel class to a list
Description
Assign the summary.textmodel class to a list
Usage
as.summary.textmodel(x)
Arguments
| x | a named list | 
Extract coefficients for features
Description
Extract coefficients for features
Usage
## S3 method for class 'textmodel_newsmap'
coef(object, n = 10, select = NULL, ...)
## S3 method for class 'textmodel_newsmap'
coefficients(object, n = 10, select = NULL, ...)
Arguments
| object | a Newsmap model fitted by  | 
| n | the number of coefficients to extract. | 
| select | returns the coefficients for the selected class; specify by the
names of rows in  | 
| ... | not used. | 
Seed geographical dictionary in Arabic
Description
Seed geographical dictionary in Arabic
Author(s)
Dai Yamao daiyamao@scs.kyushu-u.ac.jp
Seed geographical dictionary in German
Description
Seed geographical dictionary in German
Author(s)
Stefan Müller mullers@tcd.ie
Seed geographical dictionary in English
Description
Seed geographical dictionary in English
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Spanish
Description
Seed geographical dictionary in Spanish
Author(s)
Dani Madrid-Morales dani.madrid@my.cityu.edu.hk
Seed geographical dictionary in French
Description
Seed geographical dictionary in French
Author(s)
Claude Grasland claude.grasland@parisgeo.cnrs.fr
Seed geographical dictionary in Hebrew
Description
Seed geographical dictionary in Hebrew
Author(s)
Elad Segev eladseg@gmail.com
Seed geographical dictionary in Italian
Description
Seed geographical dictionary in Italian
Author(s)
Giuseppe Carteny giuseppe.carteny@unimi.it
Seed geographical dictionary in Japanese
Description
Seed geographical dictionary in Japanese
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Portuguese
Description
Seed geographical dictionary in Portuguese
Author(s)
Barbara Ellynes Zucchi Nobre Silva barbara@zucchi.science
Seed geographical dictionary in Russian
Description
Seed geographical dictionary in Russian
Author(s)
Katerina Tertytchnaya katerina.tertytchnaya@gmail.com
Lanabi la Lova l.lalova@lse.ac.uk
Seed geographical dictionary in Turkish
Description
Seed geographical dictionary in Turkish
Author(s)
Lungta Seki yahoo.co.jp0409@gmail.com
Seed geographical dictionary in Chinese (simplified)
Description
Seed geographical dictionary in Chinese (simplified)
Author(s)
Ke Cheng kecheng.ac@gmail.com
Seed geographical dictionary in Chinese (traditional)
Description
Seed geographical dictionary in Chinese (traditional)
Author(s)
Chung-hong Chan chainsawtiney@gmail.com
Prediction method for textmodel_newsmap
Description
Predict document class using trained a Newsmap model
Usage
## S3 method for class 'textmodel_newsmap'
predict(
  object,
  newdata = NULL,
  confidence = FALSE,
  rank = 1L,
  type = c("top", "all"),
  rescale = FALSE,
  min_conf = -Inf,
  min_n = 0L,
  ...
)
Arguments
| object | a fitted Newsmap textmodel. | 
| newdata | dfm on which prediction should be made. | 
| confidence | if  | 
| rank | rank of the class to be predicted. Only used when  | 
| type | if  | 
| rescale | if  | 
| min_conf | return  | 
| min_n | set the minimum number of polarity words in documents. | 
| ... | not used. | 
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel.
Description
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel.
Usage
## S3 method for class 'coefficients_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| x | a coefficients_textmodel object | 
| digits | minimal number of significant digits, see
 | 
| ... | additional arguments not used | 
Implements print methods for textmodel_statistics
Description
Implements print methods for textmodel_statistics
Usage
## S3 method for class 'statistics_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| x | a textmodel_wordscore_statistics object | 
| digits | minimal number of significant digits, see
 | 
| ... | further arguments passed to or from other methods | 
print method for summary.textmodel
Description
print method for summary.textmodel
Usage
## S3 method for class 'summary.textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| x | a  | 
| digits | minimal number of significant digits, see
 | 
| ... | additional arguments not used | 
Calculate micro and macro average measures of accuracy
Description
This function calculates micro-average precision (p) and recall (r) and
macro-average precision (P) and recall (R) based on a confusion matrix from
accuracy().
Usage
## S3 method for class 'textmodel_newsmap_accuracy'
summary(object, ...)
Arguments
| object | output of accuracy() | 
| ... | not used. | 
Semi-supervised Bayesian multinomial model for geographical document classification
Description
Train a Newsmap model to predict geographical focus of documents with labels given by a dictionary.
Usage
textmodel_newsmap(
  x,
  y,
  label = c("all", "max"),
  smooth = 1,
  boolean = FALSE,
  drop_label = TRUE,
  verbose = quanteda_options("verbose"),
  entropy = c("none", "global", "local", "average"),
  ...
)
Arguments
| x | a dfm or fcm created by  | 
| y | a dfm or a sparse matrix that record class membership of the
documents. It can be created applying  | 
| label | if "max", uses only labels for the maximum value in each row of
 | 
| smooth | a value added to the frequency of words to smooth likelihood ratios. | 
| boolean | if  | 
| drop_label | if  | 
| verbose | if  | 
| entropy | [experimental] the scheme to compute the entropy to
regularize likelihood ratios. The entropy of features are computed over
labels if  | 
| ... | additional arguments passed to internal functions. | 
Details
Newsmap learns association between words and classes as likelihood
ratios based on the features in x and the labels in y. The large
likelihood ratios tend to concentrate to a small number of features but the
entropy of their frequencies over labels or documents helps to disperse the
distribution.
References
Kohei Watanabe. 2018. "Newsmap: semi-supervised approach to geographical news classification." Digital Journalism 6(3): 294-309.
Examples
require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
             text2 = "The South Korean prime minister was re-elected.")
toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)
feat_dfm_en <- dfm(toks_en, tolower = FALSE)
model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)