The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
textTopicsWordCloud().top_frequent = NULL and
ngram_select = "estimate"Performance & Robustness * topicsGrams() speed-up:
Rebuilt the n-gram and per-document frequency computation using a single
sparse-matrix pass with quanteda, replacing the slow
per-n-gram regex counting loop (major runtime improvement on
medium/large datasets). * Memory-safe output: freq_per_user
now avoids accidental sparse → dense coercion (the “allocating GiB”
warning). It supports auto wide/long output, returning long format when
wide would be too large.
Harmonization with topicsDtm() * Aligned settings &
reproducibility: topicsGrams() now mirrors
topicsDtm() preprocessing controls (e.g., lower,
punctuation/numbers removal, removalword, shuffle, seed, threads,
optional stemming/lemmatization hook) and returns a saved settings list
in the output.
topicsTutorialData(): New utility
function to download and prepare long-text essay data directly from
Hugging Face. Supports custom sample_size,
min_word_count, max_word_count, and
seed.topicsPlotOverview(): Introduced a
high-level plotting function for structured overviews. Supports
side-by-side comparisons (ngrams), 1D layouts, and 2-D 3x3 grids with a
central distribution plot.topicsTest()x_variable and y_variable now fully support
Factors and Character vectors.test_method is now assigned per-variable. The package
automatically detects binary data (0/1 or 2-level factors) to apply
logistic_regression while using
linear_regression for continuous data.logistic_level string in the output list to
clarify the Baseline (0) vs. Target (1) mapping.topicsPreds() can
now be accessed via descriptive aliases:
topicsPredict()topicsAssess()topicsClassify()topicsPlot() for better aesthetic consistency.text-package.topicsGrams() now uses exact word boundary matching for
n-grams (e.g., “lack” is matched as a standalone word, excluding partial
matches like “black” or “lacking”).topicsTest().creat_plot help
function.rJava to suggest to enable compatibility with
the text-package.scatter_legend_dots_alpha and
scatter_legend_bg_dots_alpha parameters for the
topicsPlot() function.logistic_regression.occurance_rate to topicsGrams()removal_mode, removal_rate_most and
removal_rate_least to topicsGrams()ngram_window = c(1) now supported by
topicsDtm()topicsPlot() with ngramssize in the dot legend will be based on
prevalence if scatter_legend_dot_size = “prevalence”. And
the popouts are not transparent.generate_scatter_plot.highlight_topic_words is set to
NULL in the topicsPlot() function.topicsGrams(), including
removing top_n and treating n-grams type differently.stopwords function to
topicsGrams().pmi calculation.ngrams_max parameter in
`topicsPlot()```.allowed_word_overlap in
topicsPlot() for plotting the most prevalence.highlight_topic_words parameter to add different
colours for a word list.stopwords removal for
topicsGram().ngrams_max functionality to
topicsPlot().save_dir and load_dir from all
function; only topicsPlot() now has the
save_dir as an option.prevalence.p_adjust_method to
topicsPlots().scatter_show_axis_values to the
topcisPlot().n_most_prevalent_topics.default to linear_regression if not the
variable only contains 0s and 1s; i.e., now different tests can be
applied to different axes.dtm for downstream use in other
functions.topicsPred() function
including num_iteration, sampling_interval,
burn_in.create_new_dtm for creating a new
dtm for new datatopics dimension for training
using textTrainRegression().topicsTest()
incl. x_variable, y_variable and controlspmi_threshold (experimental) to
topicsDtm()split procedure
in the topicsDtm()topicsDtm()p_threshold to p_alphap_alpha from the topicsTest()
function to the topicsPlots() functiontopicsTest()text-packagetopicsPlot().topicsTest().These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.