Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.query |
Automatically filter high-frequency stopwords.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.sinks |
Implementations of the SinkTokenizer that might be useful.
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
A fast grammar-based tokenizer constructed with JFlex.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.collation |
CollationKeyFilter and ICUCollationKeyFilter
convert each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
org.apache.lucene.wikipedia.analysis |
Tokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
ISOLatin1AccentFilter
Deprecated.
in favor of
ASCIIFoldingFilter which covers a superset
of Latin 1. This class will be removed in Lucene 3.0. |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
SinkTokenizer
Deprecated.
Use
TeeSinkTokenFilter instead |
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream |
class |
TeeTokenFilter
Deprecated.
Use
TeeSinkTokenFilter instead |
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
Analyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
TokenStream |
KeywordAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
SimpleAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
StopAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
WhitespaceAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
abstract TokenStream |
Analyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
TokenStream |
KeywordAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
SimpleAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
StopAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Filters LowerCaseTokenizer with StopFilter.
|
TokenStream |
WhitespaceAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Constructor and Description |
---|
ASCIIFoldingFilter(TokenStream input) |
CachingTokenFilter(TokenStream input) |
ISOLatin1AccentFilter(TokenStream input)
Deprecated.
|
LengthFilter(TokenStream in,
int min,
int max)
Build a filter that removes words that are too long or too
short from the text.
|
LowerCaseFilter(TokenStream in) |
PorterStemFilter(TokenStream in) |
StopFilter(boolean enablePositionIncrements,
TokenStream in,
java.util.Set stopWords)
Constructs a filter which removes words from the input
TokenStream that are named in the Set.
|
StopFilter(boolean enablePositionIncrements,
TokenStream input,
java.util.Set stopWords,
boolean ignoreCase)
Construct a token stream filtering the given input.
|
StopFilter(boolean enablePositionIncrements,
TokenStream input,
java.lang.String[] stopWords)
Deprecated.
Use
StopFilter.StopFilter(boolean, TokenStream, Set) instead. |
StopFilter(boolean enablePositionIncrements,
TokenStream in,
java.lang.String[] stopWords,
boolean ignoreCase)
Deprecated.
|
StopFilter(TokenStream in,
java.util.Set stopWords)
Deprecated.
Use
StopFilter.StopFilter(boolean, TokenStream, Set) instead |
StopFilter(TokenStream input,
java.util.Set stopWords,
boolean ignoreCase)
Deprecated.
|
StopFilter(TokenStream input,
java.lang.String[] stopWords)
Deprecated.
|
StopFilter(TokenStream in,
java.lang.String[] stopWords,
boolean ignoreCase)
Deprecated.
|
TeeSinkTokenFilter(TokenStream input)
Instantiates a new TeeSinkTokenFilter.
|
TeeTokenFilter(TokenStream input,
SinkTokenizer sink)
Deprecated.
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ArabicAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
ArabicAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Constructor and Description |
---|
ArabicNormalizationFilter(TokenStream input) |
ArabicStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Method and Description |
---|---|
TokenStream |
BrazilianAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
BrazilianAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Constructor and Description |
---|
BrazilianStemFilter(TokenStream in) |
BrazilianStemFilter(TokenStream in,
java.util.Set exclusiontable) |
Modifier and Type | Class and Description |
---|---|
class |
CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
CJKAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
CJKAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
A
TokenFilter with a stop word table. |
class |
ChineseTokenizer
Tokenize Chinese text as individual chinese characters.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
ChineseAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in the
provided Reader . |
TokenStream |
ChineseAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Constructor and Description |
---|
ChineseFilter(TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
SentenceTokenizer
Tokenizes input text into sentences.
|
class |
WordTokenFilter
A
TokenFilter that breaks sentences into words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SmartChineseAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
SmartChineseAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Constructor and Description |
---|
WordTokenFilter(TokenStream in)
Construct a new WordTokenizer.
|
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Constructor and Description |
---|
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary) |
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(TokenStream input,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary) |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(TokenStream input,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary) |
DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary) |
DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.util.Set dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.lang.String[] dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CzechAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in
the provided Reader . |
TokenStream |
CzechAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Modifier and Type | Class and Description |
---|---|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GermanAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
GermanAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Constructor and Description |
---|
GermanStemFilter(TokenStream in) |
GermanStemFilter(TokenStream in,
java.util.Set exclusionSet)
Builds a GermanStemFilter that uses an exclusion table.
|
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, analyzing given ("greek") charset.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
GreekAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
GreekAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
Constructor and Description |
---|
GreekLowerCaseFilter(TokenStream in) |
GreekLowerCaseFilter(TokenStream in,
char[] charset)
Deprecated.
|
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PersianAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
PersianAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided
Reader . |
Constructor and Description |
---|
PersianNormalizationFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
FrenchStemFilter
A
TokenFilter that stems french words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FrenchAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the
text in the provided Reader . |
TokenStream |
FrenchAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided
Reader . |
Constructor and Description |
---|
ElisionFilter(TokenStream input)
Constructs an elision filter with standard stop words
|
ElisionFilter(TokenStream input,
java.util.Set articles)
Constructs an elision filter with a Set of stop words
|
ElisionFilter(TokenStream input,
java.lang.String[] articles)
Constructs an elision filter with an array of stop words
|
FrenchStemFilter(TokenStream in) |
FrenchStemFilter(TokenStream in,
java.util.Set exclusiontable) |
Modifier and Type | Class and Description |
---|---|
class |
EmptyTokenStream
An always exhausted token stream.
|
class |
PrefixAndSuffixAwareTokenFilter
Links two
PrefixAwareTokenFilter . |
class |
PrefixAwareTokenFilter
Joins two token streams and leaves the last token of the first stream available
to be used when updating the token values in the second stream based on that token.
|
class |
SingleTokenTokenStream
A
TokenStream containing a single token. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PrefixAwareTokenFilter.getPrefix() |
TokenStream |
PrefixAwareTokenFilter.getSuffix() |
Modifier and Type | Method and Description |
---|---|
void |
PrefixAwareTokenFilter.setPrefix(TokenStream prefix) |
void |
PrefixAwareTokenFilter.setSuffix(TokenStream suffix) |
Constructor and Description |
---|
PrefixAndSuffixAwareTokenFilter(TokenStream prefix,
TokenStream input,
TokenStream suffix) |
PrefixAwareTokenFilter(TokenStream prefix,
TokenStream suffix) |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Constructor and Description |
---|
EdgeNGramTokenFilter(TokenStream input) |
EdgeNGramTokenFilter(TokenStream input,
EdgeNGramTokenFilter.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
EdgeNGramTokenFilter(TokenStream input,
java.lang.String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
NGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Creates NGramTokenFilter with given min and max n-grams.
|
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
A
TokenFilter that stems Dutch words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
DutchAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the
text in the provided Reader . |
TokenStream |
DutchAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the
provided Reader . |
Constructor and Description |
---|
DutchStemFilter(TokenStream _in) |
DutchStemFilter(TokenStream _in,
java.util.Set exclusiontable)
Builds a DutchStemFilter that uses an exclusion table.
|
DutchStemFilter(TokenStream _in,
java.util.Set exclusiontable,
java.util.Map stemdictionary) |
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
Token.type() a payload. |
Constructor and Description |
---|
DelimitedPayloadTokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
DelimitedPayloadTokenFilter(TokenStream input,
char delimiter,
PayloadEncoder encoder) |
NumericPayloadTokenFilter(TokenStream input,
float payload,
java.lang.String typeMatch) |
TokenOffsetPayloadTokenFilter(TokenStream input) |
TypeAsPayloadTokenFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Constructor and Description |
---|
PositionFilter(TokenStream input)
Constructs a PositionFilter that assigns a position increment of zero to
all but the first token from the given input stream.
|
PositionFilter(TokenStream input,
int positionIncrement)
Constructs a PositionFilter that assigns the given position increment to
all but the first token from the given input stream.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
QueryAutoStopWordAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
QueryAutoStopWordAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Constructor and Description |
---|
ReverseStringFilter(TokenStream in)
Create a new ReverseStringFilter that reverses all tokens in the
supplied
TokenStream . |
ReverseStringFilter(TokenStream in,
char marker)
Create a new ReverseStringFilter that reverses and marks all tokens in the
supplied
TokenStream . |
Modifier and Type | Class and Description |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a
Tokenizer that extends LetterTokenizer
by additionally looking up letters in a given "russian charset". |
class |
RussianLowerCaseFilter
Normalizes token text to lower case, analyzing given ("russian") charset.
|
class |
RussianStemFilter
A
TokenFilter that stems Russian words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
RussianAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text
in the provided Reader . |
TokenStream |
RussianAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the
provided Reader . |
Constructor and Description |
---|
RussianLowerCaseFilter(TokenStream in) |
RussianLowerCaseFilter(TokenStream in,
char[] charset)
Deprecated.
|
RussianStemFilter(TokenStream in) |
RussianStemFilter(TokenStream in,
char[] charset)
Deprecated.
Use
RussianStemFilter.RussianStemFilter(TokenStream) instead. |
Modifier and Type | Class and Description |
---|---|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleMatrixFilter
A ShingleMatrixFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
ShingleAnalyzerWrapper.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
ShingleAnalyzerWrapper.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Constructor and Description |
---|
ShingleFilter(TokenStream input)
Construct a ShingleFilter with default shingle size.
|
ShingleFilter(TokenStream input,
int maxShingleSize)
Constructs a ShingleFilter with the specified single size from the
TokenStream input |
ShingleFilter(TokenStream input,
java.lang.String tokenType)
Construct a ShingleFilter with the specified token type for shingle tokens.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize)
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
java.lang.Character spacerCharacter)
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
java.lang.Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle)
Creates a shingle filter using the default
ShingleMatrixFilter.TokenSettingsCodec . |
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
java.lang.Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle,
ShingleMatrixFilter.TokenSettingsCodec settingsCodec)
Creates a shingle filter with ad hoc parameter settings.
|
Modifier and Type | Class and Description |
---|---|
class |
DateRecognizerSinkTokenizer
Deprecated.
Use
DateRecognizerSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenRangeSinkTokenizer
Deprecated.
Use
TokenRangeSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenTypeSinkTokenizer
Deprecated.
Use
TokenTypeSinkFilter and TeeSinkTokenFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SnowballAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
StandardTokenizer filtered by a
StandardFilter , a LowerCaseFilter ,
a StopFilter , and a SnowballFilter |
TokenStream |
SnowballAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Constructs a
StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter , a StopFilter ,
and a SnowballFilter |
Constructor and Description |
---|
SnowballFilter(TokenStream input,
SnowballProgram stemmer) |
SnowballFilter(TokenStream in,
java.lang.String name)
Construct the named stemming filter.
|
Modifier and Type | Class and Description |
---|---|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
StandardAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Deprecated.
|
TokenStream |
StandardAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
|
Constructor and Description |
---|
StandardFilter(TokenStream in)
Construct filtering in.
|
Modifier and Type | Class and Description |
---|---|
class |
ThaiWordFilter
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ThaiAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
ThaiAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Constructor and Description |
---|
ThaiWordFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter
Converts each token into its
com.ibm.icu.text.CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CollationKeyAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
ICUCollationKeyAnalyzer.reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
CollationKeyAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
TokenStream |
ICUCollationKeyAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader) |
Constructor and Description |
---|
CollationKeyFilter(TokenStream input,
java.text.Collator collator) |
ICUCollationKeyFilter(TokenStream input,
Collator collator) |
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
AbstractField.tokenStream |
Modifier and Type | Method and Description |
---|---|
TokenStream |
Fieldable.tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.
|
TokenStream |
Field.tokenStreamValue()
The TokesStream for this field to be used when indexing, or null.
|
TokenStream |
NumericField.tokenStreamValue()
Returns a
NumericTokenStream for indexing the numeric value. |
Modifier and Type | Method and Description |
---|---|
void |
Field.setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.
|
void |
Field.setValue(TokenStream value)
Deprecated.
|
Constructor and Description |
---|
Field(java.lang.String name,
TokenStream tokenStream)
Create a tokenized and indexed field that is not stored.
|
Field(java.lang.String name,
TokenStream tokenStream,
Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with
storing term vectors.
|
Modifier and Type | Class and Description |
---|---|
class |
SynonymTokenFilter
Injects additional tokens for synonyms of token terms fetched from the
underlying child stream; the child stream must deliver lowercase tokens
for synonyms to be found.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
MemoryIndex.keywordTokenStream(java.util.Collection keywords)
Convenience method; Creates and returns a token stream that generates a
token for each keyword in the given collection, "as is", without any
transforming text analysis.
|
TokenStream |
PatternAnalyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a token stream that tokenizes all the text in the given Reader;
This implementation forwards to
tokenStream(String, String) and is
less efficient than tokenStream(String, String) . |
TokenStream |
PatternAnalyzer.tokenStream(java.lang.String fieldName,
java.lang.String text)
Creates a token stream that tokenizes the given string into token terms
(aka words).
|
Modifier and Type | Method and Description |
---|---|
void |
MemoryIndex.addField(java.lang.String fieldName,
TokenStream stream)
Equivalent to
addField(fieldName, stream, 1.0f) . |
void |
MemoryIndex.addField(java.lang.String fieldName,
TokenStream stream,
float boost)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
Constructor and Description |
---|
SynonymTokenFilter(TokenStream input,
SynonymMap synonyms,
int maxSynonyms)
Creates an instance for the given underlying stream and synonym table.
|
Modifier and Type | Method and Description |
---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Analyzer analyzer)
A convenience method that tries a number of approaches to getting a token stream.
|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Document doc,
Analyzer analyzer)
A convenience method that tries to first get a TermPositionVector for the specified docId, then, falls back to
using the passed in
Document to retrieve the TokenStream. |
TokenStream |
WeightedSpanTermExtractor.getTokenStream() |
static TokenStream |
TokenSources.getTokenStream(Document doc,
java.lang.String field,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
java.lang.String field) |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(java.lang.String field,
java.lang.String contents,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv,
boolean tokenPositionsGuaranteedContiguous)
Low level api.
|
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
Highlighter.getBestFragment(TokenStream tokenStream,
java.lang.String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
java.lang.String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
java.lang.String text,
int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
java.lang.String |
Highlighter.getBestFragments(TokenStream tokenStream,
java.lang.String text,
int maxNumFragments,
java.lang.String separator)
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
java.lang.String text,
boolean mergeContiguousFragments,
int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.
|
java.util.Map |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
java.util.Map |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream,
java.lang.String fieldName)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
java.util.Map |
WeightedSpanTermExtractor.getWeightedSpanTermsWithScores(Query query,
TokenStream tokenStream,
java.lang.String fieldName,
IndexReader reader)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
void |
SimpleSpanFragmenter.start(java.lang.String originalText,
TokenStream tokenStream) |
void |
Fragmenter.start(java.lang.String originalText,
TokenStream tokenStream)
Initializes the Fragmenter.
|
void |
SimpleFragmenter.start(java.lang.String originalText,
TokenStream stream) |
void |
NullFragmenter.start(java.lang.String s,
TokenStream tokenStream) |
Constructor and Description |
---|
TokenGroup(TokenStream tokenStream) |
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.