Skip navigation links

Lucene 2.9.4 API

Apache Lucene is a high-performance, full-featured text search engine library.

See: Description

Core 
Package Description
org.apache.lucene
Top-level package.
org.apache.lucene.analysis
API and code to convert text into indexable/searchable tokens.
org.apache.lucene.analysis.standard
A fast grammar-based tokenizer constructed with JFlex.
org.apache.lucene.analysis.tokenattributes  
org.apache.lucene.document
The logical representation of a Document for indexing and searching.
org.apache.lucene.index
Code to maintain and access indices.
org.apache.lucene.messages
For Native Language Support (NLS), system of software internationalization.
org.apache.lucene.queryParser
A simple query parser implemented with JavaCC.
org.apache.lucene.search
Code to search indices.
org.apache.lucene.search.function
Programmatic control over documents scores.
org.apache.lucene.search.payloads
The payloads package provides Query mechanisms for finding and using payloads.
org.apache.lucene.search.spans
The calculus of spans.
org.apache.lucene.store
Binary i/o API, used for all index data.
org.apache.lucene.util
Some utility classes.
org.apache.lucene.util.cache  
Demo 
Package Description
org.apache.lucene.demo  
org.apache.lucene.demo.html  
contrib: Analysis 
Package Description
org.apache.lucene.analysis.ar
Analyzer for Arabic.
org.apache.lucene.analysis.br
Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.cjk
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
org.apache.lucene.analysis.cn
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer Hidden Markov Model package.
org.apache.lucene.analysis.compound
A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation
The code for the compound word hyphenation is taken from the Apache FOP project.
org.apache.lucene.analysis.cz
Analyzer for Czech.
org.apache.lucene.analysis.de
Analyzer for German.
org.apache.lucene.analysis.el
Analyzer for Greek.
org.apache.lucene.analysis.fa
Analyzer for Persian.
org.apache.lucene.analysis.fr
Analyzer for French.
org.apache.lucene.analysis.miscellaneous
Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram
Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl
Analyzer for Dutch.
org.apache.lucene.analysis.payloads
Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.position
Filter for assigning position increments.
org.apache.lucene.analysis.query
Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse
Filter to reverse token text.
org.apache.lucene.analysis.ru
Analyzer for Russian.
org.apache.lucene.analysis.shingle
Word n-gram filters
org.apache.lucene.analysis.sinks
Implementations of the SinkTokenizer that might be useful.
org.apache.lucene.analysis.th
Analyzer for Thai.
contrib: Ant 
Package Description
org.apache.lucene.ant
Ant task to create Lucene indexes.
contrib: Benchmark 
Package Description
org.apache.lucene.benchmark

The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora.

org.apache.lucene.benchmark.byTask
Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feeds
Sources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.programmatic
Sample performance test written programmatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.stats
Statistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasks
Extendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utils
Utilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.quality
Search Quality Benchmarking.
org.apache.lucene.benchmark.quality.trec
Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs.
org.apache.lucene.benchmark.quality.utils
Miscellaneous utilities for search quality benchmarking: query parsing, submission reports.
org.apache.lucene.benchmark.stats  
org.apache.lucene.benchmark.utils  
contrib: Collation 
Package Description
org.apache.lucene.collation
CollationKeyFilter and ICUCollationKeyFilter convert each token into its binary CollationKey using the provided Collator, and then encode the CollationKey as a String using IndexableBinaryStringTools, to allow it to be stored as an index term.
contrib: DB 
Package Description
com.sleepycat.db  
org.apache.lucene.store.db
Berkeley DB 4.3 based implementation of Directory.
org.apache.lucene.store.je
Berkeley DB Java Edition based implementation of Directory.
contrib: Fast Vector Highlighter 
Package Description
org.apache.lucene.search.vectorhighlight
This is an another highlighter implementation.
contrib: Highlighter 
Package Description
org.apache.lucene.search.highlight
The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.
contrib: Instantiated 
Package Description
org.apache.lucene.store.instantiated
InstantiatedIndex, alternative RAM store for small corpora.
contrib: Lucli 
Package Description
lucli
Lucene Command Line Interface
contrib: Memory 
Package Description
org.apache.lucene.index.memory
High-performance single-document main memory Apache Lucene fulltext search index.
contrib: Misc  
Package Description
org.apache.lucene.misc  
org.apache.lucene.queryParser.analyzing
QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
org.apache.lucene.queryParser.precedence
QueryParser designed to handle operator precedence in a more sensible fashion than the default QueryParser.
contrib: Queries 
Package Description
org.apache.lucene.search.similar
Document similarity query generators.
contrib: Query Parser 
Package Description
org.apache.lucene.queryParser.complexPhrase
QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
org.apache.lucene.queryParser.core
Contains the core classes of the flexible query parser framework
org.apache.lucene.queryParser.core.builders
Contains the necessary classes to implement query builders
org.apache.lucene.queryParser.core.config
Contains the base classes used to configure the query processing
org.apache.lucene.queryParser.core.messages
Contains messages usually used by query parser implementations
org.apache.lucene.queryParser.core.nodes
Contains query nodes that are commonly used by query parser implementations
org.apache.lucene.queryParser.core.parser
Contains the necessary interfaces to implement text parsers
org.apache.lucene.queryParser.core.processors
Interfaces and implementations used by query node processors
org.apache.lucene.queryParser.core.util
Utility classes to used with the Query Parser
org.apache.lucene.queryParser.standard
Contains the implementation of the Lucene query parser using the flexible query parser frameworks
org.apache.lucene.queryParser.standard.builders
Standard Lucene Query Node Builders
org.apache.lucene.queryParser.standard.config
Standard Lucene Query Configuration
org.apache.lucene.queryParser.standard.nodes
Standard Lucene Query Nodes
org.apache.lucene.queryParser.standard.parser
Lucene Query Parser
org.apache.lucene.queryParser.standard.processors
Lucene Query Node Processors
contrib: RegEx 
Package Description
org.apache.lucene.search.regex
Regular expression Query.
org.apache.regexp
This package exists to allow access to useful package protected data within Jakarta Regexp.
contrib: Snowball 
Package Description
org.apache.lucene.analysis.snowball
TokenFilter and Analyzer implementations that use Snowball stemmers.
contrib: Spatial 
Package Description
org.apache.lucene.spatial.geohash
Support for Geohash encoding, decoding, and filtering.
org.apache.lucene.spatial.geometry  
org.apache.lucene.spatial.geometry.shape  
org.apache.lucene.spatial.tier
Support for filtering based upon geographic location.
org.apache.lucene.spatial.tier.projections  
contrib: SpellChecker 
Package Description
org.apache.lucene.search.spell
Suggest alternate spellings for words.
contrib: Surround Parser 
Package Description
org.apache.lucene.queryParser.surround.parser
This package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.query
This package contains SrndQuery and its subclasses.
contrib: Swing 
Package Description
org.apache.lucene.swing.models
Decorators for JTable TableModel and JList ListModel encapsulating Lucene indexing and searching functionality.
contrib: Wikipedia 
Package Description
org.apache.lucene.wikipedia.analysis
Tokenizer that is aware of Wikipedia syntax.
contrib: WordNet 
Package Description
org.apache.lucene.wordnet
This package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion.
contrib: XML Query Parser 
Package Description
org.apache.lucene.xmlparser
Parser that produces Lucene Query objects from XML streams.
org.apache.lucene.xmlparser.builders  
Other Packages 
Package Description
org.tartarus.snowball  
org.tartarus.snowball.ext  

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
                                          new IndexWriter.MaxFieldLength(25000));
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.ANALYZED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory, true)// read-only=true
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    directory.close();

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Documents by adding Fields;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
  [ ... ]

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
  [ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
  [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
    [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHTML demo is more sophisticated.  It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
  [ ... create an index containing all the relnotes ]

> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html

Skip navigation links

Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.