[并口编程] Lucene 原理与代码分析

[复制链接]
发表于 2022-9-28 10:59:28
luncene的简介ApacheLucene是一个开放源程序的搜索器引擎,使用它可以容易地为Java软件参加全文搜索功能。Lucene的最首要工作是替文件的每一个字作索引,索引让搜索的效率比传统的逐字比较大大提高,Lucen提供一组解读,过滤,分析文件,编列和使用索引的API,它的强悍的地方除了高效和简单外,是最重要的是使使用者可以随时应自己需求自订其功能。luncene的功能Lucene是apache软件基金会[4]jakarta项目组的一个子项目,是一个开放源代码[5]的全文检索引擎工具包,即它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎(英文与德文两种西方言语)。Lucene的意图是为软件开发人员提供一个简单易用的工具包,以方便的在方案系统中终结全文检索的功能,或许是以此为基础建立起完整的全文检索引擎。luncene的贡献者Lucene的原作者是DougCutting,他是一位资深全文索引/检索专家,曾经是V-Twin搜索引擎[6]的首要开发者,后在Excite[7]担任高档系统架构设计师,目前从事于一些Internet底层架构的研究。新近发布在作者自己的lucene/,后来发布在SourceForge[8],2001年年末变成apache软件基金会jakarta的一个子项目:jakarta.apache.org/lucene/。Lucene的使用、属性及优势作为一个开放源代码项目,Lucene从问世以后,引发了开放源代码社群的无穷反应,程序员们不只使用它构建详细的全文检索使用,并且将之集成到各种系统软件中去,以及构建Web使用,乃至某些商业软件也采纳了Lucene作为其内部全文检索子系统的核心。apache软件基金会的网站使用了Lucene作为全文检索的引擎,IBM的开源软件eclipse[9]的2.1版本中也采纳了Lucene作为帮助子系统的全文索引引擎,相应的IBM的商业软件WebSphere[10]中也采纳了Lucene。Lucene以其开放源代码的特性、优良的索引结构、杰出的系统架构取得了越来越多的使用。Luncene作为全文检索引擎的杰出长处<imgsrc="sad.gif"smilieid="2"border="0"alt=""/>1)索引文件格式独立于使用方式。Lucene定义了一套以8位字节为基础的索引文件格式,使得兼容系统或许不一样方式的使用可以共享建立的索引文件。(2)在传统全文检索引擎的倒排索引的基础上,终结了分块索引,可以用于新的文件建立小文件索引,提高索引速度。然后通过与原有索引的兼并,达到优化的意图。(3)优异的面向目标的系统架构,使得用于Lucene扩展的学习难度下降,方便扩展新功能。(4)设计了独立于言语和文件格式的文本分析接口,索引器通过承受Token流终结索引文件的创建,用户扩展新的言语和文件格式,只需求终结文本分析的接口。(5)现已默许终结了一套强悍的查询引擎,用户无需自己编写代码即便系统可取得强悍的查询才能,Lucene的查询终结中默许终结了布尔操作、含糊查询(FuzzySearch[11])、分组查询等等。Luncene的远景面临现已存在的商业全文检索引擎,Lucene也具有适当的优势。首要,它的开发源代码发行方式(恪守ApacheSoftwareLicense[12]),在此基础上程序员不只仅可以充分的使用Lucene所提供的强悍功能,并且可以深入细致的学习到全文检索引擎制作技术和面相目标编程的实践,进而在此基础上依据使用的实际状况编写出更好的更合适当时使用的全文检索引擎。在这一点上,商业软件的灵活性远远不及Lucene。其次,Lucene遵循了开放源代码一向的架构优良的优势,设计了一个合理而极具扩展才能的面向目标架构,程序员可以在Lucene的基础上扩展各种功能,比方扩展中文管理才能,从文本扩展到HTML、PDF[13]等等文本格式的管理,编写这些扩展的功能不只仅不杂乱,并且由于Lucene恰当合理的对系统设备做了程序上的笼统,扩展的功能也能容易的达到跨方式的才能。最终,转移到apache软件基金会后,借助于apache软件基金会的网络方式,程序员可以方便的和开发者、其它程序员交换,促进资源的共享,乃至直接取得现已编写齐备的扩展功能。最终,尽管Lucene使用Java言语写成,但是开放源代码社区的程序员正在不懈的将之使用各种传统言语终结(例如framework[14]),在恪守Lucene索引文件格式的基础上,使得Lucene可以运行在各式各样的方式上,系统管理员可以依据当时的方式合适的言语来合理的选择。
Lucene原理与代码分析完整版.pdf

(About lucene Apache Lucene is an open source search engine that can easily participate in full-text search functions for Java software. The most important job of Lucene is to index every word of the file. The index makes the search more efficient than the traditional word-by-word comparison. Lucen provides a set of APIs for interpreting, filtering, analyzing files, compiling and using indexes. Its powerful In addition to being efficient and simple, the most important thing is that users can customize its functions at any time according to their needs. The function of lucene Lucene is a sub-project of the apache software foundation [4] jakarta project group, and is an open source [5] full-text search engine toolkit, that is, it is not a complete full-text search engine, but a full-text search engine The architecture of the engine provides a complete query engine and indexing engine, and part of the text analysis engine (two western languages, English and German). The intention of Lucene is to provide software developers with a simple and easy-to-use toolkit to conveniently terminate the full-text retrieval function in the solution system, and perhaps build a complete full-text retrieval engine based on this. The original author of Lucene, a contributor to lucene, is Doug Cutting, a senior full-text indexing/retrieval expert. He was once the chief developer of the V-Twin search engine [6], and later served as a senior system architect at Excite [7]. Currently engaged in some research on the underlying architecture of the Internet. Newly published in the author's own lucene/ and later on SourceForge[8], it became a subproject of the Apache Software Foundation jakarta in late 2001: jakarta.apache.org/lucene/. The use, properties and advantages of Lucene As an open source project, since its inception, Lucene has aroused endless responses from the open source community. Programmers not only use it to build detailed full-text search applications, but also integrate it into various System software, as well as building Web applications, and even some commercial software have adopted Lucene as the core of its internal full-text retrieval subsystem. The website of the apache software foundation uses Lucene as the full-text search engine, and the 2.1 version of IBM's open source software eclipse[9] also adopts Lucene as the full-text index engine of the help subsystem, and the corresponding IBM commercial software WebSphere[10] Also adopted Lucene. Lucene has been used more and more with its open source features, excellent index structure, and outstanding system architecture. Luncene's outstanding strengths as a full-text search engine <imgsrc="sad.gif"smilieid="2"border="0"alt=""/>1) The index file format is independent of how it is used. Lucene defines a set of index file formats based on 8-bit bytes, so that compatible systems may use the index files that can be shared in different ways. (2) On the basis of the inverted index of the traditional full-text search engine, the block index is terminated, which can be used to build a small file index for new files and improve the indexing speed. Then through the merger with the original index, the optimization intention is achieved. (3) The excellent target-oriented system architecture reduces the learning difficulty for Lucene expansion and facilitates the expansion of new functions. (4) A text analysis interface independent of language and file format is designed. The indexer terminates the creation of index files by accepting the Token stream. Users expand new languages ??and file formats, and only need to terminate the text analysis interface. (5) A set of powerful query engines has been tacitly terminated. Users do not need to write their own code even if the system can obtain powerful query capabilities. In the query termination of Lucene, Boolean operations, fuzzy queries (FuzzySearch[11]), grouping and grouping are tacitly terminated. query etc. The prospect of Lucene faces the commercial full-text search engines that already exist, and Lucene also has appropriate advantages. First of all, its development source code distribution method (abide by the Apache Software License [12]), on this basis, programmers can not only fully use the powerful functions provided by Lucene, but also learn in-depth and detailed full-text search engine production technology and aspects Based on the practice of target programming, a better and more suitable full-text search engine is written according to the actual situation of use. At this point, commercial software is far less flexible than Lucene. Secondly, Lucene follows the advantages of open source code's excellent architecture, and designs a reasonable and highly scalable target-oriented architecture. Programmers can expand various functions on the basis of Lucene, such as expanding Chinese management capabilities, from text to text. Extending to the management of text formats such as HTML, PDF [13], etc., the functions of writing these extensions are not only not messy, but also because Lucene properly and reasonably generalizes the system equipment, the extended functions can also easily achieve cross-border way of talent. Finally, after transferring to the Apache Software Foundation, with the help of the network method of the Apache Software Foundation, programmers can easily and)

[下载]10592811896.rar




上一篇:MFC实现基于AfxMessageBox的自定义弹出窗
下一篇:MFC对话框应用和图像的自动变化

使用道具 举报

Archiver|手机版|小黑屋|吾爱开源 |网站地图

Copyright 2011 - 2012 Lnqq.NET.All Rights Reserved( ICP备案粤ICP备14042591号-1粤ICP14042591号 )

关于本站 - 版权申明 - 侵删联系 - Ln Studio! - 广告联系

本站资源来自互联网,仅供用户测试使用,相关版权归原作者所有

快速回复 返回顶部 返回列表