52ky 发表于 2022-9-28 15:58:27

Crf 中文分词开源版

CRF中文分词开源版中文分词是互联网使用不行短少的基础技术之一,也是别的语音和言语产品必不行少的技术组件。自2003年第一届S界中文分词评测以来,由字构词的分词方法取得了压倒性优势,G内首要通过CRF开源软件包来学习该分词方法,可是CRF过于杂乱的代码结构,致使了该算法的普及率。CRF中文分词开源版只是包括CRF软件包中分词解码器部分,简化了CRF杂乱代码结构,清除了分词解码器不需要的代码,大大提高了分词解码器的可读性和可懂度。一起为了方便学习者可视化跟踪和调试代码,在Windows平台下分别搭建了VC6.0和VS2008两个工程文件,使得VC6.0用户和VS2008用户都能轻玩转中文分词。开源包中的分词知识库较小,分词精度较低,仅供学习Crf分词算法使用,可以通过如下路径取得更高精度的分词知识库和更高速度的分词引擎DLL或OCX:1)致电nlptech360@gmail2)在博客langiner.blog.51cto留言3)在搜索引擎上搜索:极速分词
CrfDeocder-windows-source\common.h
CrfDeocder-windows-source\crf_test-vc6.0.dsp
CrfDeocder-windows-source\crf_test-vc6.0.dsw
CrfDeocder-windows-source\crf_test-vs2008.sln
CrfDeocder-windows-source\crf_test-vs2008.vcproj
CrfDeocder-windows-source\crf_test.cpp
CrfDeocder-windows-source\darts.h
CrfDeocder-windows-source\feature.cpp
CrfDeocder-windows-source\feature_cache.cpp
CrfDeocder-windows-source\feature_cache.h
CrfDeocder-windows-source\feature_index.cpp
CrfDeocder-windows-source\feature_index.h
CrfDeocder-windows-source\free.model
CrfDeocder-windows-source\freelist.h
CrfDeocder-windows-source\mmap.h
CrfDeocder-windows-source\node.h
CrfDeocder-windows-source\path.h
CrfDeocder-windows-source\readme.txt
CrfDeocder-windows-source\scoped_ptr.h
CrfDeocder-windows-source\tagger.cpp
CrfDeocder-windows-source\tagger.h
CrfDeocder-windows-source\test.txt

(CRF Chinese word segmentation open source version Chinese word segmentation is one of the basic technologies that are indispensable for Internet use, and it is also an indispensable technical component of other speech and speech products. Since the first World Chinese Word Segmentation Evaluation in 2003, the word segmentation method based on word formation has achieved an overwhelming advantage. In China, the CRF open source software package is the first to learn the word segmentation method. However, the code structure of CRF is too complicated, which makes the algorithm penetration rate. The CRF Chinese word segmentation open source version only includes the word segmentation decoder part of the CRF software package, which simplifies the CRF messy code structure, clears the code that the word segmentation decoder does not need, and greatly improves the readability and intelligibility of the word segmentation decoder. At the same time, in order to facilitate the learners to visually track and debug the code, two project files, VC6.0 and VS2008, were built under the Windows platform, so that VC6.0 users and VS2008 users can easily play Chinese word segmentation. The word segmentation knowledge base in the open source package is small and the word segmentation accuracy is low. It is only used for learning the Crf word segmentation algorithm. You can obtain a higher precision word segmentation knowledge base and a higher speed word segmentation engine DLL or OCX through the following paths: 1) Call nlptech360 @gmail 2) Leave a message on the blog langiner.blog.51cto 3) Search on the search engine: extremely fast word segmentation
CrfDeocder-windows-source\common.h
CrfDeocder-windows-source\crf_test-vc6.0.dsp
CrfDeocder-windows-source\crf_test-vc6.0.dsw
CrfDeocder-windows-source\crf_test-vs2008.sln
CrfDeocder-windows-source\crf_test-vs2008.vcproj
CrfDeocder-windows-source\crf_test.cpp
CrfDeocder-windows-source\darts.h
CrfDeocder-windows-source\feature.cpp
CrfDeocder-windows-source\feature_cache.cpp
CrfDeocder-windows-source\feature_cache.h
CrfDeocder-windows-source\feature_index.cpp
CrfDeocder-windows-source\feature_index.h
CrfDeocder-windows-source\free.model
CrfDeocder-windows-source\freelist.h
CrfDeocder-windows-source\mmap.h
CrfDeocder-windows-source\node.h
CrfDeocder-windows-source\path.h
CrfDeocder-windows-source\readme.txt
CrfDeocder-windows-source\scoped_ptr.h
CrfDeocder-windows-source\tagger.cpp
CrfDeocder-windows-source\tagger.h
CrfDeocder-windows-source\test.txt)




页: [1]
查看完整版本: Crf 中文分词开源版