Lucene.Net 2.3.1开发介绍 —— 一、接触Lucene.Net

1、引用Lucene.Net类库

找到Lucene.Net的源代码，在“C#\src\Lucene.Net”目录。打开Visual Studio，我的版本是2008，而Lucene.Net默认的是2005。先创建一个项目，简单起见，创建一个C#控制台程序。

@H_502_12@

图 1.1

然后添加Lucene.Net进项目，如图 1.2 - 1.3。

@H_502_12@

图 1.2

@H_502_12@

图 1.3

这个过程要进行一个VS2005到2008的转换。添加后，解决方案就有Lucene.Net项目了，如图1.4。

@H_502_12@

图 1.4

然后把Lucene.Net引入TestLucene项目。如图1.5 -1.6:

@H_502_12@

图1.5

@H_502_12@

图1.6

点确定后就可以了。这时候，就可以在TestLucene项目中使用Lucene.Net的API了。

2、简单示例
对Lucene.Net的操作分为建立索引，和搜索两部分。

2.1 建立索引

通过代码 2.1.1，就可以简单地建立一个索引了。代码 2.1.1将在应用程序目录下建立一个IndexDirectory目录，并在目录下创建索引文件。

代码 2.1.1

@H_502_12@

1@H_502_12@

using System;
2@H_502_12@

using System.Collections.Generic;
3@H_502_12@

using System.Text;
4@H_502_12@

5@H_502_12@

6@H_502_12@

namespace TestLucene
7@H_502_12@

{
8@H_502_12@

using Lucene.Net.Index;
9@H_502_12@

using Lucene.Net.Store;
10@H_502_12@

using Lucene.Net.Analysis;
11@H_502_12@

using Lucene.Net.Analysis.Standard;
12@H_502_12@

using Lucene.Net.Documents;
13@H_502_12@

14@H_502_12@

class Program
15@H_502_12@

{
16@H_502_12@

static void Main(string[] args)
17@H_502_12@

{
18@H_502_12@

Analyzer analyzer = new StandardAnalyzer();
19@H_502_12@

IndexWriter writer = new IndexWriter("IndexDirectory", analyzer, true);
20@H_502_12@

AddDocument(writer, "sql Server 2008 的发布", "sql Server 2008 的新特性");
21@H_502_12@

AddDocument(writer, "ASP.Net MVC框架配置与分析", "而今，微软推出了新的MVC开发框架，也就是Microsoft ASP.NET 3.5 Extensions");
22@H_502_12@

writer.Optimize();
23@H_502_12@

writer.Close();
24@H_502_12@

}
25@H_502_12@

26@H_502_12@

static void AddDocument(IndexWriter writer, string title, string content)
27@H_502_12@

{
28@H_502_12@

Document document = new Document();
29@H_502_12@

document.Add(new Field("title", title, Field.Store.YES, Field.Index.TOKENIZED));
30@H_502_12@

document.Add(new Field("content", content, Field.Index.TOKENIZED));
31@H_502_12@

writer.AddDocument(document);
32@H_502_12@

}
33@H_502_12@

}
34@H_502_12@

}
35@H_502_12@

复制代码

2.2 搜索索引

代码2.2.1就可以搜索刚才建立的索引。

代码 2.2.1

@H_502_12@

1@H_502_12@

using System;
2@H_502_12@

using System.Collections.Generic;
3@H_502_12@

using System.Text;
4@H_502_12@

5@H_502_12@

6@H_502_12@

namespace TestLucene
7@H_502_12@

{
8@H_502_12@

using Lucene.Net.Index;
9@H_502_12@

using Lucene.Net.Store;
10@H_502_12@

using Lucene.Net.Analysis;
11@H_502_12@

using Lucene.Net.Analysis.Standard;
12@H_502_12@

using Lucene.Net.Documents;
13@H_502_12@

using Lucene.Net.Search;
14@H_502_12@

using Lucene.Net.QueryParsers;
15@H_502_12@

16@H_502_12@

class Program
17@H_502_12@

{
18@H_502_12@

static void Main(string[] args)
19@H_502_12@

{
20@H_502_12@

Analyzer analyzer = new StandardAnalyzer();
21@H_502_12@

//IndexWriter writer = new IndexWriter("IndexDirectory", true);
22@H_502_12@

//AddDocument(writer, "sql Server 2008 的发布", "sql Server 2008 的新特性");
23@H_502_12@

//AddDocument(writer, "ASP.Net MVC框架配置与分析", "而今，微软推出了新的MVC开发框架，也就是Microsoft ASP.NET 3.5 Extensions");
24@H_502_12@

//writer.Optimize();
25@H_502_12@

//writer.Close();
26@H_502_12@

27@H_502_12@

IndexSearcher searcher = new IndexSearcher("IndexDirectory");
28@H_502_12@

MultiFieldQueryParser parser = new MultiFieldQueryParser(new string[] { "title", "content" }, analyzer);
29@H_502_12@

Query query = parser.Parse("sql");
30@H_502_12@

Hits hits = searcher.Search(query);
31@H_502_12@

32@H_502_12@

for (int i = 0; i < hits.Length(); i++)
33@H_502_12@

{
34@H_502_12@

Document doc = hits.Doc(i);
35@H_502_12@

Console.WriteLine(string.Format("title:{0} content:{1}", doc.Get("title"), doc.Get("content")));
36@H_502_12@

}
37@H_502_12@

searcher.Close();
38@H_502_12@

39@H_502_12@

Console.ReadKey();
40@H_502_12@

}
41@H_502_12@

42@H_502_12@

//static void AddDocument(IndexWriter writer, string title, string content)
43@H_502_12@

//{
44@H_502_12@

// Document document = new Document();
45@H_502_12@

// document.Add(new Field("title", Field.Index.TOKENIZED));
46@H_502_12@

// document.Add(new Field("content", Field.Index.TOKENIZED));
47@H_502_12@

// writer.AddDocument(document);
48@H_502_12@

//}
49@H_502_12@

}
50@H_502_12@

}
51@H_502_12@

复制代码

运行后输出：

title:sql Server 2008 的发布 content:sql Server 2008 的新特性

2.3 疑问

2.1,2.2小节介绍了最简单的建立和搜索索引的方式。虽然代码很短，使用也很简单，但是理解起来却不是太容易。

代码 2.1.1中，先是建立了一个分词器。什么是分词器？为什么要有分词器？分词器是怎么工作的？这些问题真让人头疼。接着建立一个IndexWriter的实例，这个类是负责创建索引的，有很多构造函数，这里使用的是其中的一个。三个参数分别是：索引建立到哪个目录，用什么分词器，还有就是是否创建。如果是否创建为false,那么就是以增量的方式来创建。再下来调用了AddDocument方法,在AddDocument方法中，先组织一个Docuement对象，然后把这个对象交给IndexWriter。然后再调用Optimize优化索引，最后关闭创建过程。这里面又有什么是Document,Document是怎么往存储器里写入的？Optimize方法能干什么？问题真多。

代码2.2.1则相对简单，先是创建IndexSearcher对象实例，并指定其搜索的目录，然后构造了一个查询Query，然后查出Hits，这样就得到想要的结果了。但是这个查询的过程是什么样的呢？这个Query代表什么？Hits是怎么得出来的？结果的顺序是怎么决定的？这些又是留下来的问题。

这么多问题，不能一次说完，欲知后事如何，下面一一道来。

Lucene.Net 2.3.1开发介绍 —— 一、接触Lucene.Net

相关推荐