The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Prediction Data from GPT Detectors
Version: 0.1.0
Description: Researchers carried out a series of experiments passing a number of essays to different GPT detection models. Juxtaposing detector predictions for papers written by native and non-native English writers, the authors argue that GPT detectors disproportionately classify real writing from non-native English writers as AI-generated.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.2.3
Depends: R (≥ 2.10)
LazyData: true
URL: https://simonpcouch.github.io/detectors/
Suggests: knitr
NeedsCompilation: no
Packaged: 2023-10-26 14:23:49 UTC; simoncouch
Author: Simon Couch [cre, aut]
Maintainer: Simon Couch <simonpatrickcouch@gmail.com>
Repository: CRAN
Date/Publication: 2023-10-26 15:30:02 UTC

Predictions from GPT Detectors

Description

Data derived from the paper GPT detectors are biased against non-native English writers. The study authors carried out a series of experiments passing a number of essays to different GPT detection models. Juxtaposing detector predictions for papers written by native and non-native English writers, the authors argue that GPT detectors disproportionately classify real writing from non-native English writers as AI-generated.

Usage

detectors

Format

A data frame with 6,185 rows and 9 columns:

kind

Whether the essay was written by a "Human" or "AI".

.pred_AI

The class probability from the GPT detector that the inputted text was written by AI.

.pred_class

The uncalibrated class prediction, encoded as if_else(.pred_AI > .5, "AI", "Human")

detector

The name of the detector used to generate the predictions.

native

For essays written by humans, whether the essay was written by a native English writer or not. These categorizations are coarse; values of "Yes" may actually be written by people who do not write with English natively. NA indicates that the text was not written by a human.

name

A label for the experiment that the predictions were generated from.

model

For essays that were written by AI, the name of the model that generated the essay.

document_id

A unique identifier for the supplied essay. Some essays were supplied to multiple detectors. Note that some essays are AI-revised derivatives of others.

prompt

For essays that were written by AI, a descriptor for the form of "prompt engineering" passed to the model.

For more information on these data, see the source paper.

Source

doi:10.1016/j.patter.2023.100779

Examples


detectors

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.