The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The topics-package enables Differential Language Analysis using words, phrases and topics
Please reference our tutorial article when using the package:
Language visualisation methods for psychological
assessments and Ackermann L., Zhuojun G. & Kjell O.N.E.
(2024). An R-package for visualizing text in topics. https://github.com/theharmonylab/topics.
DOI:zenodo.org/records/11165378
..
This Getting Started tutorial is going through the most central topics functions.
In an example where the topics are used to predict the PHQ-9 score, the pipeline can be run as follows:
1. Data Preprocessing
To preprocess the data,
run the following command:
library(topics)
#> [0;34m
#> This is topics: your text's new best friend (version 0.50).
#> [0m[0;33mPlease note that the topics package requires you to download and install java from www.java.com.
#> [0m[0;32m
#> For more information about the topics package see www.r-topics.org and www.r-text.org.[0m
dtm <- topicsDtm(
data = dep_wor_data$Depword)
# Check the results from the dtm and refine stopwords and removal rates if necessary
dtm_evaluation <- topicsDtmEval(
dtm)
dtm_evaluation$frequency_plot
2. Model Training
To train the LDA model, run
the following command:
3. Model Inference
To infer the topic term
distribution of the documents, run the following command:
4. Statistical Analysis
To analyze the
relationship between the topics and the prediction variable, run the
following command:
test <- topicsTest(
data = dep_wor_data,
model = model,
preds = preds,
x_variable = "PHQ9tot",
controls = c("Age"),
test_method = "linear_regression")
5. Visualization
To visualize the significant
topics as wordclouds, run the following command:
plot_list <- topicsPlot(
model = model,
test = test,
figure_format = "png")
# showing some of the plots
plot_list$square1
#> $t_5
Differentiating balance and harmony through natural language analysis: A cross-national exploration of two understudied wellbeing-related concepts
The below list consists of papers analyzing human language in a similar fashion that is possible in topics.
Methods Articles
Gaining
insights from social media language: Methodologies and
challenges..
Kern et al., (2016). Psychological Methods.
Computer Science: Python Software
DLATK: Differential language analysis toolkit. Schwartz, H. A., Giorgi, et al., (2017). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.