The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Quick Start Guide - jiebaR

Chinese Version

This is a package for Chinese text segmentation, keyword extraction and speech tagging.

Example

Text Segmentation

You can use worker() to initialize a worker, and then use [] or segment() to do the segmentation.

## Loading required package: jiebaRD
## [1] "This" "is"   "a"    "good" "day"

You can use file path as input.

## [1] "temp" "dat"

You can initialize multiple engines simultaneously.

The public settings of the model can be modified by $ cutter$symbol = T. Private settings are fixed when the engine is initialized, and you can get them by cutter$PrivateVarible.

## [1] "UTF-8"
## [1] TRUE
## [1] FALSE

You can use custom dictionar. jiebaR is able to identify new words, but adding your own new words can ensure a higher accuracy. imewlconverter is a good tools for dictionary construction.

## [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/jiebaRD/dict"

Speech Tagging

Speech Tagging function [.tagger and tagging tag each word in a sentence after segmentation, using labels compatible with ictclas.

##     eng     eng 
## "hello" "world"

Keyword Extraction

Keyword Extraction worker use MixSegment model to cut word and use TF-IDF algorithm to find the keywords.

## 11.7392 
##   "fun"

Simhash Distance

Simhash worker can do keyword extraction and find the keywords from two inputs, and then computes Hamming distance between them.

## $simhash
## [1] "3804341492420753273"
## 
## $keyword
## 11.7392 
## "hello"
## $distance
## [1] 0
## 
## $lhs
## 11.7392 
## "hello" 
## 
## $rhs
## 11.7392 
## "hello"

More Docs

See https://jiebaR.qinwf.com/

More Information and Issues

https://github.com/qinwf/jiebaR

https://github.com/yanyiwu/cppjieba

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.