The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

SemanticDistance Dialogues

Jamie Reilly, Hannah R. Mechtenberg, Emily B. Myers, Jonathan E. Peelle

2025-08-27

Dialogues

This could be a conversation transcript or any language sample where you care about talker/interlocutor information (e.g., computing semantic distance across turns in a conversation). Your dataframe should nominally contain a text column and a speaker/talker column.

sample dialogue transcript included in the package

knitr::kable(head(Dialogue_Typical, 6), format = "pipe")
text speaker
Hi Peter. It’s nice to see you Mary
Hi Mary. Hot out today Peter
It sure is. Mary
Did you read that book? Peter
No I haven’t had time. Mary

Step 1: Clean Dialogue Transcript (clean_dialogue)

Decide on your cleaning parameters (e.g., stopwords? lemmatization?). Specify these in the argument(s) to your function calls.

Arguments to clean_dialogue() are:
dat your raw dataframe with at least one column of text AND a talker column
wordcol column name (quoted) containing the text you want cleaned
who_talk column name (quoted) containing the talker ID (will convert to factor)
omit_stops omits stopwords, T/F default is TRUE
lemmatize transforms raw word to lemmatized form, T/F default is TRUE

Dialogue_Cleaned <- clean_dialogue(dat=Dialogue_Typical, wordcol="text", who_talking="speaker", omit_stops=TRUE, lemmatize=TRUE)
knitr::kable(head(Dialogue_Cleaned, 12), format = "pipe")
id_row_orig text_initialsplit speaker word_clean id_row_postsplit turn_count
1 hi Mary NA 1 1
1 peter Mary peter 2 1
1 its Mary NA 3 1
1 its Mary NA 4 1
1 nice Mary nice 5 1
1 to Mary NA 6 1
1 see Mary see 7 1
1 you Mary NA 8 1
2 hi Peter NA 9 2
2 mary Peter mary 10 2
2 hot Peter hot 11 2
2 out Peter out 12 2

Step 2: Compute Semantic Distances

Dialogue Distance Turn-to-Turn (dist_dialogue)

Averages the semantic vectors for all content words in a turn then computes the cosine distance to the average of the semantic vectors of the content words in the subsequent turn. Note: this function only works on dialogue samples marked by a talker variable (e.g., conversation transcripts). It averages across the semantic vectors of all words within a turn and then computes cosine distance to all the words in the next turn. You just need to feed it a transcript formatted with clean_dialogue. ‘dist_dialogue’ will return a summary dataframe that distance values aggregated by talker and turn (id_turn). Arguments to dist_dialogue are:
dat = dataframe w/ a dialogue sample cleaned and prepped using ‘clean_dialogue’

DialogueDists <- dist_dialogue(dat=Dialogue_Cleaned, who_talking="speaker")
knitr::kable(head(DialogueDists, 12), format = "pipe", digits=2)
turn_count speaker n_words glo_cosdist sd15_cosdist
1 Mary 3 0.83 0.58
2 Peter 4 0.85 0.58
3 Mary 1 0.86 0.58
4 Peter 3 0.86 0.45
5 Mary 5 NA NA

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.