The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Lots of wild operations happen in this next step that transform your
unstructured text to numeric time series objects aggregated by
conversation and interlocutor. It is important that you have a handle on
what prep_dyads()
does and what processes such as
lemmatization and stopword removal mean.
prep_dyads()
uses numerous regex to clean and format the
data your just read into R in the previous step.
ConversationAlign
applies an ordered sequence of cleaning
steps on the road toward vectorizing your original text into a
one-word-per row format. These steps include: converting all text to
lowercase, expanding contractions, omitting all non-alphabetic
characters (e.g., numbers, punctuation, line breaks). In addition to
text cleaning, users guide options for stopword removal and
lemmatization. During formatting prep_dyads()
will prompt
you to select up to three variables for computing alignment on. This
works by joining values from a large internal lookup database to each
word in your language transcript. prep_dyads()
is
customizable via the following arguments.
There are two important arguments regarding stopword removal.
omit_stops
specifies whether or not to remove stopwords.
which_stopwords
specifies which stopword list you would
like to apply with the default being Temple_stops25
. The
full list of choices is: none
, SMART_stops
,
CA_orig_stops
. MIT_stops
, and
Temple_stops25
. Stopword removal is an important, yet also
controversial step in text cleaning.
ConversationAlign
calls the textstem
package as a dependency to lemmatize your language transcript. This
converts morphologiocal derivatives to their root forms. The default is
lemmatize=T. Sometimes you want to retain language output in its native
form. If this is the case, change the argument in clean_dyads to
lemmatize=F. clean_dyads()
outputs word count metrics
pre/post cleaning by dyad and interlocutor. This can be useful if you
are interested in whether one person just doesn’t produce many words or
produces a great deal of empty utterances.
This is where the magic happens. prep_dyads()
will yoke
published norms for >40 possible dimensions to every content word in
your transcript (up to 3 at a time). This join is executed by merging
your vectorized conversation transcript with a huge internal lexical
database with norms spanning over 100k English words.
prep_dyads()
will prompt you to select anywhere from 1 to 3
target dimensions at a time. Enter the number corresponding to each
dimension of interest separated by spaces and then hit enter (e.g., 10
14 19) ConversationAlign
will append a published norm if
available (e.g., concreteness, word length) to every running word in
your transcript. These quantitative values are used in the subsequent
summarize_dyads()
step to compute alignment.
prep_dyads()
-Cleans, formats, and vectorizes conversation transwcripts to a
one-word-per-row format -Yokes psycholinguistic norms for up to three
dimensions at a time (from <40 possible dimensions) to each content
word. -Retains metadata
Arguments to
prep_dyads
:
1) dat_read= name
of the dataframe created during read_dyads()
2)
omit_stops= T/F (default=T) option to remove stopwords
3) lemmatize= lemmatize strings converting each entry
to its dictionary form, default is lemmatize=TRUE
4)
which_stoplist= quoted argument specifying stopword
list, options include none
, MIT_stops
,
SMART
, CA_OriginalStops
, or
Temple_stops25
. Default is Temple_stops25
#Example of running the function
NurseryRhymes_Prepped <- prep_dyads(dat_read=NurseryRhymes, lemmatize=TRUE, omit_stops=T, which_stoplist="Temple_stops25")
This embedded as external data in the package with ‘anger’ values yoked to each word.
Event_ID | Participant_ID | Exchange_Count | Turn_Count | Text_Prep | Text_Clean | emo_anger |
---|---|---|---|---|---|---|
ItsySpider | Yin | 1 | 1 | the | NA | NA |
ItsySpider | Yin | 1 | 1 | itsy | itsy | -0.02 |
ItsySpider | Yin | 1 | 1 | bitsy | bitsy | -0.02 |
ItsySpider | Yin | 1 | 1 | spider | spider | 0.04 |
ItsySpider | Yin | 1 | 1 | climbed | climb | -0.09 |
ItsySpider | Yin | 1 | 1 | up | up | -0.06 |
ItsySpider | Yin | 1 | 1 | the | NA | NA |
ItsySpider | Yin | 1 | 1 | water | water | -0.17 |
ItsySpider | Yin | 1 | 1 | spout | spout | 0.05 |
ItsySpider | Maya | 1 | 2 | down | down | 0.03 |
ItsySpider | Maya | 1 | 2 | came | come | -0.13 |
ItsySpider | Maya | 1 | 2 | the | NA | NA |
ItsySpider | Maya | 1 | 2 | rain | rain | 0.07 |
ItsySpider | Maya | 1 | 2 | and | NA | NA |
ItsySpider | Maya | 1 | 2 | washed | wash | 0.06 |
ItsySpider | Maya | 1 | 2 | the | NA | NA |
ItsySpider | Maya | 1 | 2 | spider | spider | 0.04 |
ItsySpider | Maya | 1 | 2 | out | out | 0.04 |
ItsySpider | Yin | 2 | 3 | out | out | 0.04 |
ItsySpider | Yin | 2 | 3 | came | come | -0.13 |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.