添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

For more information on past and future Lucene versions, please see: http://s.apache.org/luceneversions

Release 9.0.0

  • New Features (8)
  • LUCENE-9322 , LUCENE-9855 : Vector-valued fields, Lucene90 Codec
    (Mike Sokolov, Julie Tibshirani, Tomoko Uchida)
  • LUCENE-9004 , LUCENE-10040 : Approximate nearest vector search via NSW graphs
    (Mike Sokolov, Tomoko Uchida et al.)
  • LUCENE-9659 : SpanPayloadCheckQuery now supports inequalities.
    (Kevin Watters, Gus Heck)
  • LUCENE-9589 : Swedish Minimal Stemmer
    (janhoy)
  • LUCENE-9313 : Add SerbianAnalyzer based on the snowball stemmer.
    (Dragan Ivanovic)
  • LUCENE-10095 : Add NepaliAnalyzer based on the snowball stemmer.
    (Robert Muir)
  • LUCENE-10096 : Add TamilAnalyzer based on the snowball stemmer.
    (Robert Muir)
  • LUCENE-10102 : Add JapaneseCompletionFilter for Input Method-aware auto-completion
    (Tomoko Uchida, Robert Muir, Jun Ohtani)
  • System Requirements (1)
  • LUCENE-8738 : Move to Java 11 as minimum Java version.
    (Adrien Grand, Uwe Schindler)
  • API Changes (44)
  • LUCENE-8638 : Remove many deprecated methods and classes including FST.lookupByOutput(), LegacyBM25Similarity and Jaspell suggester.
  • LUCENE-8982 : Separate out native code to another module to allow cpp build with gradle. This also changes the name of the native "posix-support" library to LuceneNativeIO.
    (Zachary Chen, Dawid Weiss)
  • LUCENE-9562 : All binary analysis packages (and corresponding Maven artifacts) with names containing '-analyzers-' have been renamed to '-analysis-'.
    (Dawid Weiss)
  • LUCENE-8474 : RAMDirectory and associated deprecated classes have been removed.
    (Dawid Weiss)
  • LUCENE-3041 : The deprecated Weight#extractTerms() method has been removed
    (Alan Woodward, Simon Willnauer, David Smiley, Luca Cavanna)
  • LUCENE-8805 : StoredFieldVisitor#stringField now takes a String rather than a byte[] that stores the UTF-8 bytes of the stored string.
    (Namgyu Kim via Adrien Grand)
  • LUCENE-8811 : BooleanQuery#setMaxClauseCount() and #getMaxClauseCount() have moved to IndexSearcher. The checks are now implemented using a QueryVisitor and apply to all queries, rather than only booleans.
    (Atri Sharma, Adrien Grand, Alan Woodward)
  • LUCENE-8909 : The deprecated IndexWriter#getFieldNames() method has been removed.
    (Adrien Grand, Munendra S N)
  • LUCENE-8948 : Change "name" argument in ICU factories to "form". Here, "form" is named after "Unicode Normalization Form".
    (Tomoko Uchida)
  • LUCENE-8933 : Validate JapaneseTokenizer user dictionary entry.
    (Tomoko Uchida)
  • LUCENE-8905 : Better defence against malformed arguments in TopDocsCollector
    (Atri Sharma)
  • LUCENE-9089 : FST Builder renamed FSTCompiler with fluent-style Builder.
    (Bruno Roustant)
  • LUCENE-9212 : Deprecated Intervals.multiterm() methods that take a bare Automaton have been removed
    (Alan Woodward)
  • LUCENE-9264 : SimpleFSDirectory has been removed in favor of NIOFSDirectory.
    (Yannick Welsch)
  • LUCENE-9281 : Use java.util.ServiceLoader to load codec components and analysis factories to be compatible with Java Module System. This allows to load factories without META-INF/service from a Java module exposing the factory in the module descriptor. This breaks backwards compatibility as custom analysis factories must now also implement the default constructor (see MIGRATE.md).
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-9307 : BufferedIndexInput#setBufferSize has been removed.
    (Adrien Grand)
  • LUCENE-9340 : SimpleBindings#add(SortField) has been removed.
    (Alan Woodward)
  • LUCENE-9462 : Fields without positions should still return MatchIterator.
    (Alan Woodward, Dawid Weiss)
  • LUCENE-9516 : Removed the ability to replace the IndexingChain / DocConsumer in Lucenes IndexWriter. The interface is not sufficient to efficiently replace the functionality with reasonable efforts.
    (Simon Willnauer)
  • LUCENE-9317 LUCENE-9318 LUCENE-9319 LUCENE-9558 LUCENE-9600 : Clean up package name conflicts between modules. See MIGRATE.md for details.
    (David Ryan, Tomoko Uchida, Uwe Schindler, Dawid Weiss)
  • LUCENE-9646 : Set BM25Similarity discountOverlaps via the constructor
    (Patrick Marty via Bruno Roustant)
  • LUCENE-9480 : Make DataInput's skipBytes(long) abstract as the implementation was not performant. IndexInput's api is unaffected: skipBytes() is implemented via seek().
    (Greg Miller)
  • LUCENE-9796 : SortedDocValues no longer extends BinaryDocValues, as binaryValue() was not performant. See MIGRATE.md for details.
    (Robert Muir)
  • LUCENE-9853 : JapaneseAnalyzer should use CJKWidthCharFilter for full-width and half-width character normalization.
    (Tomoko Uchida)
  • LUCENE-9387 : Removed CodecReader#ramBytesUsed.
    (Adrien Grand)
  • LUCENE-9334 : Require consistency between data-structures on a per-field basis. A field across all documents within an index must be indexed with the same index options and data-structures. As a consequence of this, doc values updates are only applicable for fields that are indexed with doc values only.
    (Mayya Sharipova, Adrien Grand, Simon Willnauer)
  • LUCENE-9047 : Directory API is now little endian.
    (Ignacio Vera, Adrien Grand)
  • LUCENE-9948 : No longer require the user to specify whether-or-not a field is multi-valued in LongValueFacetCounts (detect automatically based on what is indexed).
    (Greg Miller)
  • LUCENE-9843 : Remove compression option on default codec's docvalues.
    (Jack Conradson)
  • LUCENE-9204 : SpanQuery and its subclasses have been moved from core/ into the queries/ module.
    (Alan Woodward)
  • LUCENE-9454 : Analyzer no longer has a mutable version field.
    (Alan Woodward)
  • LUCENE-9956 : Expose the getBaseQuery, getDrillDownQueries APIs from DrillDownQuery
    (Gautam Worah)
  • LUCENE-8143 : SpanBoostQuery has been removed.
    (Alan Woodward)
  • LUCENE-9998 : Remove unused parameter fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish(), including those subclasses.
    (kkewwei)
  • LUCENE-7020 : TieredMergePolicy#setMaxMergeAtOnceExplicit has been removed. TieredMergePolicy no longer sets a limit on the maximum number of segments that can be merged at once via a forced merge.
    (Adrien Grand, Shawn Heisey)
  • LUCENE-10027 : Directory reader open API from indexCommit and leafSorter has been modified to add an extra parameter - minSupportedMajorVersion.
    (Mayya Sharipova)
  • LUCENE-9620 : Added a (sometimes) faster implementation for IndexSearcher#count that relies on the new Weight#count API. The Weight#count API represents a cleaner way for Query classes to optimize their counting method.
    (Gautam Worah, Adrien Grand)
  • LUCENE-10089 : Add a method to SortField that allows to enable or disable numeric sort optimization to use the points index to skip over non-competitive documents, which is enabled by default from 9.0
    (Mayya Sharipova, Adrien Grand)
  • LUCENE-10115 : Add an extension point, BaseQueryParser#getFuzzyDistance, to allow custom query parsers to determine the similarity distance for fuzzy queries.
    (Chris Hegarty)
  • LUCENE-10132 : Support addition of diagnostics by custom merge policies
    (Chris Hegarty)
  • LUCENE-9325 : Sort is now final, and the `setSort()` method has been removed
    (Alan Woodward)
  • LUCENE-9431 : The UnifiedHighlighter's WEIGHT_MATCHES flag is now set by default, provided its requirements are met. It can be disabled via over-riding getFlags
    (Animesh Pandey, David Smiley)
  • LUCENE-10158 : Add a new interface Unwrappable to the utils package to allow code to unwrap wrappers/delegators that are added by Lucene's testing framework. This will allow testing new MMapDirectory implementation based on JDK Project Panama.
    (Uwe Schindler)
  • LUCENE-10260 : LucenePackage class has been removed. The implementation string can be retrieved from Version.getPackageImplementationVersion().
    (Uwe Schindler, Dawid Weiss)
  • Improvements (48)
  • LUCENE-10234 : Added Automatic-Module-Name to all JARs. This is the first step to enable full Java module system (JMS) support in later Lucene versions. At the moment, the automatic names should not be considered stable.
    (Dawid Weiss, Uwe Schindler)
  • LUCENE-10182 : TestRamUsageEstimator used RamUsageTester.sizeOf throughout, making some of the tests trivial. Now, it compares results from RamUsageEstimator with those from RamUsageTester. To prevent this error in the future, RamUsageTester.sizeOf was renamed to ramUsed.
    (Uwe Schindler, Dawid Weiss, Stefan Vodita)
  • LUCENE-10129 : RamUsageEstimator overloads the shallowSizeOf method for primitive arrays to avoid falling back on shallowSizeOf(Object), which could lead to performance traps.
    (Robert Muir, Uwe Schindler, Stefan Vodita)
  • LUCENE-10139 : ExternalRefSorter returns a covariant with a subtype of BytesRefIterator that is Closeable.
    (Dawid Weiss) .
  • LUCENE-10135 : Correct passage selector behavior for long matching snippets
    (Dawid Weiss) .
  • LUCENE-9960 : Avoid unnecessary top element replacement for equal elements in PriorityQueue.
    (Dawid Weiss)
  • LUCENE-9633 : Improve match highlighter behavior for degenerate intervals (on non-existing positions).
    (Dawid Weiss)
  • LUCENE-9618 : Do not call IntervalIterator.nextInterval after NO_MORE_DOCS is returned.
    (Patrick Zhai)
  • LUCENE-9576 : Improve ConcurrentMergeScheduler settings by default, assuming modern I/O. Previously Lucene was too conservative, jumping through hoops to detect if disks were SSD-backed. In many common modern cases (VMs, RAID arrays, containers, encrypted mounts, non-Linux OS), the pessimistic heuristics were wrong, resulting in slower indexing performance. Heuristics were also complex and would trigger JDK issues even on unrelated mount points. Merge scheduler defaults are now modernized and the heuristics removed. Users with spinning disks that want to maximize I/O performance should tweak ConcurrentMergeScheduler.
    (Robert Muir)
  • LUCENE-9463 : Query match region retrieval component, passage scoring and formatting for building custom highlighters.
    (Alan Woodward, Dawid Weiss)
  • LUCENE-9370 : RegExp query is no longer lenient about inappropriate backslashes and follows the Java Pattern policy for rejecting illegal syntax.
    (Mark Harwood)
  • LUCENE-9336 : RegExp query now supports \w \W \d \D \s \S expressions. This is a break with previous behaviour where these were (mis)interpreted as literally the characters w W d etc.
    (Mark Harwood)
  • LUCENE-8757 : When provided with an ExecutorService to run queries across multiple threads, IndexSearcher now groups small segments together, up to 250k docs per slice.
    (Atri Sharma via Adrien Grand)
  • LUCENE-8857 : Introduce Custom Tiebreakers in TopDocs.merge for tie breaking on docs on equal scores. Also, remove the ability of TopDocs.merge to set shard indices
    (Atri Sharma, Adrien Grand, Simon Willnauer)
  • LUCENE-8958 : Shared count early termination for relevance sorted indices
    (Atri Sharma)
  • LUCENE-8937 : Avoid aggressive stemming on numbers in the FrenchMinimalStemmer.
    (Adrien Gallou via Tomoko Uchida)
  • LUCENE-8596 : Kuromoji user dictionary now accepts entries containing hash mark (#) that were previously treated as beginning a line-ending comment
    (Satoshi Kato and Masaru Hasegawa via Michael Sokolov)
  • LUCENE-9109 : Use StackWalker to implement TestSecurityManager's detection of JVM exit
    (Uwe Schindler)
  • LUCENE-9110 : Refactor stack analysis in tests to use generalized LuceneTestCase methods that use StackWalker
    (Uwe Schindler)
  • LUCENE-9206 : IndexMergeTool gets additional options to control the merging. This tool no longer forceMerge(1)s to a single segment by default. If you rely upon this behavior, pass -max-segments 1 instead.
    (Robert Muir)
  • LUCENE-9220 : Upgrade snowball to 2.0. New snowball stemmers: Hindi, Indonesian, Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball' task to regenerate and ease future upgrades.
    (Robert Muir, Dawid Weiss)
  • LUCENE-9354 : Improvements to snowball french stopwords list, so that it is less aggressive.
    (Philippe Ouellet)
  • LUCENE-9114 : Improve ValueSourceScorer's Default Cost Implementation
    (Atri Sharma, David Smiley)
  • LUCENE-9074 : Introduce Slice Executor For Dynamic Runtime Execution Of Slices
    (Atri Sharma)
  • LUCENE-9280 : Add an ability for field comparators to skip non-competitive documents. Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE instructs Lucene to skip non-competitive documents whenever possible. For numeric sort fields the skipping functionality works when the same field is indexed both with doc values and points. In this case, there is an assumption that the same data is stored in these points and doc values
    (Mayya Sharipova, Jim Ferenczi, Adrien Grand)
  • LUCENE-9449 : Enhance DocComparator to provide an iterator over competitive documents when searching with "after". This iterator can quickly position on the desired "after" document skipping all documents and segments before "after". Also redesign numeric comparators to provide skipping functionality by default.
    (Mayya Sharipova, Jim Ferenczi)
  • LUCENE-9527 : Upgrade javacc to 7.0.4, regenerate query parsers.
    (Dawid Weiss)
  • LUCENE-9531 : Consolidated CharStream and FastCharStream classes: these have been moved from each query parser package to org.apache.lucene.queryparser.charstream
    (Dawid Weiss) .
  • LUCENE-9450 : Use BinaryDocValues for the taxonomy index instead of StoredFields. Add backwards compatibility tests for the taxonomy index.
    (Gautam Worah, Michael McCandless)
  • LUCENE-9605 : Update snowball to d8cf01ddf37a, adds Yiddish stemmer.
    (Robert Muir)
  • LUCENE-8982 : Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory (Zach Chen, Uwe Schindler, Mike McCandless, Dawid Weiss).
  • LUCENE-9674 : Implement faster advance on VectorValues using binary search.
    (Anand Kotriwal, Mike Sokolov)
  • LUCENE-9794 : Speed up implementations of DataInput.skipBytes().
    (Greg Miller)
  • LUCENE-9898 : Removes no longer used scorePayload method from BM25Similarity
    (Pieter van Boxtel)
  • LUCENE-9850 : Switch to PFOR encoding for doc IDs (instead of FOR).
    (Greg Miller)
  • LUCENE-9929 : Add NorwegianNormalizationFilter, which does the same as ScandinavianNormalizationFilter except it does not fold oo->ø and ao->å.
    (janhoy, Robert Muir, Adrien Grand)
  • LUCENE-9535 : Improve DocumentsWriterPerThreadPool to prefer larger instances.
    (Adrien Grand)
  • LUCENE-10000 : MultiCollectorManager now has parity with MultiCollector with respect to how it handles CollectionTerminationException and setMinCompetitiveScore calls.
    (Greg Miller)
  • LUCENE-10019 : Align file starts in CFS files to have proper alignment (8 bytes)
    (Uwe Schinder)
  • LUCENE-9662 : Make CheckIndex concurrent by parallelizing index check across segments.
    (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir)
  • LUCENE-9476 : Add new getBulkPath API to DirectoryTaxonomyReader to more efficiently retrieve FacetLabels for multiple facet ordinals at once. This API is 2-4% faster than iteratively calling getPath. The getPath API now throws an IAE instead of returning null if the ordinal is out of bounds.
    (Gautam Worah, Mike McCandless)
  • LUCENE-10113 : Use VarHandles to access int/long/short primitive types in byte arrays. This improves readability and performance of encoding/decoding of primitives to index file format in input/output classes like DataInput / DataOutput and codecs.
    (Uwe Schindler, Robert Muir)
  • LUCENE-10112 : Improve LZ4 Compression performance with direct primitive read/writes.
    (Tim Brooks, Uwe Schindler, Robert Muir, Adrien Grand)
  • LUCENE-10125 : Optimize primitive writes in OutputStreamIndexOutput.
    (Uwe Schindler, Robert Muir, Adrien Grand)
  • LUCENE-10143 : Delegate primitive writes in RateLimitedIndexOutput.
    (Uwe Schindler, Robert Muir, Adrien Grand)
  • LUCENE-10145 , LUCENE-10153 : Faster flushes and merges of points by leveraging VarHandles.
    (Adrien Grand)
  • LUCENE-10201 : Spatial-Extras: Upgrading Spatial4j to 0.8 improving a varitety of minor things. See release notes. https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8
    (David Smiley)
  • LUCENE-10062 : Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values with its own custom encoding.
    (Greg Miller)
  • Bug fixes (15)
  • LUCENE-9686 : Fix read past EOF handling in DirectIODirectory.
    (Zach Chen, Julie Tibshirani)
  • LUCENE-8663 : NRTCachingDirectory.slowFileExists may open a file while it's inaccessible.
    (Dawid Weiss)
  • LUCENE-9117 : RamUsageEstimator hangs with AOT compilation. Removed any attempt to estimate Long.valueOf cache size.
    (Cleber Muramoto, Dawid Weiss)
  • LUCENE-9290 : Don't assume that different XYPoint have different hash code
    (Ignacio Vera via Mike Drob)
  • LUCENE-9372 : Fix paths for cygwin/msys before gradle wrapper jar lookup.
    (Peter Barna)
  • LUCENE-9365 : FuzzyQuery was missing matches when prefix length was equal to the term length
    (Mark Harwood, Mike Drob)
  • LUCENE-9580 : Fix bug in the polygon tessellator when introducing collinear edges during polygon splitting.
    (Ignacio Vera)
  • LUCENE-9930 : The Ukrainian analyzer was reloading its dictionary for every new TokenStreamComponents, which could lead to memory leaks.
    (Alan Woodward)
  • LUCENE-9940 : The order of disjuncts in DisjunctionMaxQuery does not matter for equality checks
    (Alan Woodward)
  • LUCENE-9971 : Requesting facet counts for unseen dimensions in SortedSetDocValueFacetCounts and ConcurrentSortedSetDocValueFacetCounts now returns null / -1 instead of throwing IllegalArgumentException as per Javadoc spec in Facets.
    (Alexander Lukyanchikov)
  • LUCENE-9823 : Prevent unsafe rewrites for SynonymQuery and CombinedFieldQuery. Before, rewriting could slightly change the scoring when weights were specified.
    (Naoto Minami via Julie Tibshirani)
  • LUCENE-10047 : Fix a value de-duping bug in LongValueFacetCounts and RangeFacetCounts
    (Greg Miller)
  • LUCENE-10101 , LUCENE-9281 : Use getField() instead of getDeclaredField() to minimize security impact by analysis SPI discovery.
    (Uwe Schindler)
  • LUCENE-10114 : Remove unused byte order mark in Lucene90PostingsWriter. This was initially introduced by accident in Lucene 8.4.
    (Uwe Schindler)
  • LUCENE-10140 : Fix cases where minimizing interval iterators could return incorrect matches
    (Nikolay Khitrin, Alan Woodward)
  • Changes in Backwards Compatibility Policy (3)
  • LUCENE-9904 : regenerated UAX29URLEmailTokenizer and the corresponding analyzer with up-to-date top level domains. This may change the token sequence compared to previous Lucene versions.
    (Dawid Weiss)
  • LUCENE-9669 : DirectoryReader#open now accepts an argument to open indices created with versions older than N-1. Lucene now can open indices created with a major version of N-2 in read-only mode. Opening an index created with a major version of N-2 with an IndexWriter is not supported. Further does lucene only support file-format compatibilty which enables reading of old indices while semantic changes like analysis or certain encoding on top of the file format are only supported on a best effort basis.
    (Simon Willnauer)
  • LUCENE-10232 : Fix MultiRangeQuery to confirm all dimensions for a given range match.
    (Greg Miller)
  • Build (2)
  • LUCENE-10198 : LUCENE-10198 : Allow external JAVA_OPTS in gradlew scripts; use sane defaults
    ([email protected], Dawid Weiss)
  • LUCENE-10163 : Move LICENSE and NOTICE files to top level to satisfy src artifact requirements
    (janhoy)
  • Other (23)
  • LUCENE-10122 : Use NumericDocValues to store taxonomy parent array
    (Patrick Zhai)
  • LUCENE-10136 : allow 'var' declarations in source code
    (Dawid Weiss)
  • LUCENE-9570 , LUCENE-9564 : Apply google java format and enforce it on source Java files. Review diffs and correct automatic formatting oddities.
    (Erick Erickson, Bruno Roustant, Dawid Weiss)
  • LUCENE-9631 : Properly override slice() on subclasses of OffsetRange.
    (Dawid Weiss)
  • LUCENE-9391 : Upgrade HPPC to 0.8.2.
    (Patrick Zhai)
  • LUCENE-10021 : Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap.
    (Patrick Zhai)
  • LUCENE-8768 : Fix Javadocs build in Java 11.
    (Namgyu Kim)
  • LUCENE-9092 : upgrade randomizedtesting to 2.7.5
    (Dawid Weiss)
  • LUCENE-8656 : Deprecations in FuzzyQuery and get compiler warnings out of queryparser code
    (Alan Woodward, Erick Erickson)
  • LUCENE-9344 : Convert .txt files to properly formatted .md files.
    (Tomoko Uchida, Uwe Schindler)
  • LUCENE-9267 : Update MatchingQueries documentation to correct time unit.
    (Pierre-Luc Perron via Mike Drob)
  • LUCENE-9411 : Fail compilation on warnings, 9x gradle-only (Erick Erickson, Dawid Weiss) Deserves mention here as well as Lucene CHANGES.txt since it affects both.
  • LUCENE-9077 LUCENE-9433 : Support Gradle build, remove Ant support from trunk
    (Dawid Weiss, Erick Erickson, Uwe Schindler et.al.)
  • LUCENE-9215 : Replace checkJavaDocs.py with doclet
    (Robert Muir, Dawid Weiss, Uwe Schindler)
  • LUCENE-9497 : Integrate Error Prone, a static analysis tool during compilation
    (Dawid Weiss, Varun Thacker)
  • LUCENE-9544 : add regenerate gradle script for nori dictionary
    (Namgyu Kim)
  • LUCENE-9627 : Remove unused Lucene50FieldInfosFormat codec and small refactor some codecs to separate reading header/footer from reading content of the file.
    (Ignacio Vera)
  • LUCENE-9773 : Upgrade icu to 68.2
    (Robert Muir)
  • LUCENE-9822 : Add assertion to PFOR exception encoding, documenting the BLOCK_SIZE assumption.
    (Greg Miller)
  • LUCENE-9883 : Turn on ecj missingEnumCaseDespiteDefault setting.
    (Zach Chen)
  • LUCENE-9705 : Make new versions of all index formats for the Lucene90 codec and move the existing ones to the backwards codecs.
    (Julie Tibshirani, Ignacio Vera)
  • LUCENE-9907 : Remove dependency on PackedInts#getReader() from the current codecs and move the method to backwards codec.
    (Ignacio Vera)
  • LUCENE-10024 : Catch NoSuchFileException when opening index directory with Luke.
    (Michael Wechner, Tomoko Uchida)
  • Release 8.11.0 [2021-11-16]

  • API Changes (1)
  • (No changes)
  • New Features (1)
  • (No changes)
  • Improvements (2)
  • LUCENE-9662 : Make CheckIndex concurrent by parallelizing index check across segments.
    (Zach Chen, Mike McCandless, Dawid Weiss, Robert Muir)
  • LUCENE-10103 : Make QueryCache respect Accountable queries.
    (Patrick Zhai)
  • Optimizations (2)
  • LUCENE-9673 : Substantially improve RAM efficiency of how MemoryIndex stores postings in memory, and reduced a bit of RAM overhead in IndexWriter's internal postings book-keeping
    (mashudong)
  • LUCENE-10196 : Improve IntroSorter with 3-ways partitioning.
    (Bruno Roustant)
  • Bug Fixes (6)
  • LUCENE-10111 : Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter.
    (Lu Xugang)
  • LUCENE-10116 : Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter.
    (Lu Xugang)
  • LUCENE-10070 Skip deleted docs when accumulating facet counts for all docs.
    (Ankur Goel, Greg Miller)
  • LUCENE-10134 : ConcurrentSortedSetDocValuesFacetCounts shouldn't share liveDocs Bits across threads.
    (Ankur Goel)
  • LUCENE-10154 : NumericLeafComparator to define getPointValues.
    (Mayya Sharipova, Adrien Grand)
  • LUCENE-10208 : Ensure that the minimum competitive score does not decrease in concurrent search.
    (Jim Ferenczi, Adrien Grand)
  • Build (1)
  • LUCENE-10104 , SOLR-15631 : Upgrade forbiddenapis to version 3.2.
    (Uwe Schindler)
  • Other (1)
  • LUCENE-10098 : Add docs/links to GermanAnalyzer describing how to decompound nouns.
    (Robert Muir)
  • Older Releases

    Release 8.10.1 [2021-10-18]

  • Bug Fixes (3)
  • LUCENE-10110 : MultiCollector now handles single leaf collector that wants to skip low-scoring hits but the combined score mode doesn't allow it.
    (Jim Ferenczi)
  • LUCENE-10119 : Sort optimization with search_after can wrongly skip documents whose values are equal to the last value of the previous page
    (Nhat Nguyen)
  • LUCENE-10126 : Sort optimization with a chunked bulk scorer can wrongly skip documents
    (Nhat Nguyen, Mayya Sharipova)
  • Release 8.10.0 [2021-09-27]

  • API Changes (5)
  • LUCENE-9962 : DrillSideways allows sub-classes to provide "drill down" FacetsCollectors. They may provide a null collector if they choose to bypass "drill down" facet collection.
    (Greg Miller)
  • LUCENE-9902 : Change the getValue method from IntTaxonomyFacets to be protected instead of private. Users can now access the count of an ordinal directly without constructing an extra FacetLabel. Also use variable length arguments for the getOrdinal call in TaxonomyReader.
    (Gautam Worah)
  • LUCENE-10036 : Replaced the ScoreCachingWrappingScorer ctor with a static factory method that ensures unnecessary wrapping doesn't occur.
    (Greg Miller)
  • LUCENE-10027 : Add a new Directory reader open API from indexCommit and a custom comparator for sorting leaf readers.
    (Mayya Sharipova)
  • LUCENE-7020 : TieredMergePolicy#setMaxMergeAtOnceExplicit is deprecated and the number of segments that get merged via explicit merges is unlimited by default.
    (Adrien Grand, Shawn Heisey)
  • New Features (2)
  • LUCENE-10083 : Analyzer and stemmer for Telugu language
    (Vinod Singh)
  • LUCENE-10035 : The SimpleText codec now writes skip lists.
    (wuda via Adrien Grand)
  • Improvements (12)
  • LUCENE-9944 : Allow DrillSideways users to provide their own CollectorManager without also requiring them to provide an ExecutorService.
    (Greg Miller)
  • LUCENE-9946 : Support for multi-value fields in LongRangeFacetCounts and DoubleRangeFacetCounts.
    (Greg Miller)
  • LUCENE-9965 : Added QueryProfilerIndexSearcher and ProfilerCollector to support debugging query execution strategy and timing.
    (Jack Conradson, Julie Tibshirani)
  • LUCENE-9981 : Operations.getCommonSuffix/Prefix(Automaton) is now much more efficient, from a worst case exponential down to quadratic cost in the number of states + transitions in the Automaton. These methods no longer use the costly determinize method, removing the risk of TooComplexToDeterminizeException
    (Robert Muir, Mike McCandless)
  • LUCENE-9981 : Operations.determinize now throws TooComplexToDeterminizeException based on too much "effort" spent determinizing rather than a precise state count on the resulting returned automaton, to better handle adversarial cases like det(rev(regexp("(.*a){2000}"))) that spend lots of effort but result in smallish eventual returned automata.
    (Robert Muir, Mike McCandless)
  • LUCENE-9983 : Stop sorting determinize powersets unnecessarily.
    (Patrick Zhai)
  • LUCENE-9177 : ICUNormalizer2CharFilter no longer requires normalization-inert characters as boundaries for incremental processing, vastly improving worst-case performance.
    (Michael Gibney)
  • LUCENE-10030 : Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring
    (Grigoriy Troitskiy)
  • LUCENE-9945 : Extend DrillSideways to support exposing FacetCollectors directly.
    (Greg Miller, Sejal Pawar)
  • LUCENE-10043 : Decrease default for LRUQueryCache's skipCacheFactor to 10. This prevents caching a query clause when it is much more expensive than running the top-level query.
    (Julie Tibshirani)
  • LUCENE-5309 : Optimize facet counting for single-valued SSDV / StringValueFacetCounts.
    (Greg Miller)
  • LUCENE-9917 : The BEST_SPEED compression mode now trades more compression ratio in exchange of faster reads.
    (Adrien Grand)
  • Optimizations (4)
  • LUCENE-9996 : Improved memory efficiency of IndexWriter's RAM buffer, in particular in the case of many fields and many indexing threads.
    (Adrien Grand)
  • LUCENE-10022 : Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery.
    (David Harsha via Julie Tibshirani)
  • LUCENE-10031 : Slightly faster segment merging for sorted indices.
    (Adrien Grand)
  • LUCENE-10014 : Lucene90DocValuesFormat was using too many bits per value when compressing via gcd, unnecessarily wasting index storage.
    (weizijun)
  • Bug Fixes (12)
  • LUCENE-9988 : Fix DrillSideways correctness bug introduced in LUCENE-9944
    (Greg Miller)
  • LUCENE-9964 : Duplicate long values in a document field should only be counted once when using SortedNumericDocValuesFields
    (Gautam Worah)
  • LUCENE-9999 : CombinedFieldQuery can fail with an exception when document is missing some fields.
    (Jim Ferenczi, Julie Tibshirani)
  • LUCENE-10020 : DocComparator should not skip docs with the same docID on multiple sorts with search after
    (Mayya Sharipova, Julie Tibshirani)
  • LUCENE-10026 : Fix CombinedFieldQuery equals and hashCode, which ensures query rewrites don't drop CombinedFieldQuery clauses.
    (Julie Tibshirani)
  • LUCENE-10039 : Correct CombinedFieldQuery scoring when there is a single field.
    (Julie Tibshirani)
  • LUCENE-10046 : Counting bug fixed in StringValueFacetCounts.
    (Greg Miller)
  • LUCENE-9963 : FlattenGraphFilter is now more robust when handling incoming holes in the input token graph
    (Geoff Lawson)
  • LUCENE-10008 : Respect ignoreCase in CommonGramsFilterFactory
    (Vigya Sharma)
  • LUCENE-10060 : Ensure DrillSidewaysQuery instances never get cached.
    (Greg Miller, Zachary Chen)
  • LUCENE-10081 : KoreanTokenizer should check the max backtrace gap on whitespaces.
    (Jim Ferenczi)
  • LUCENE-10106 : Sort optimization can wrongly skip the first document of each segment
    (Nhat Nguyen)
  • Other (1)
  • (No changes)
  • API Changes (1)
  • LUCENE-9680 : IndexWriter#getFieldNames() method added to get fields present in index. This method was removed in LUCENE-8909 .
    (Oren Ovadia)
  • New Features (8)
  • LUCENE-9507 : Custom order for leaves in IndexReader and IndexWriter
    (Mayya Sharipova, Mike McCandless, Jim Ferenczi)
  • LUCENE-9575 : PatternTypingFilter has been added to allow setting a type attribute on tokens based on a configured set of regular expressions
    (Gus Heck) .
  • LUCENE-9572 : TypeAsSynonymFilter has been enhanced support ignoring some types, and to allow the generated synonyms to copy some or all flags from the original token
    (Gus Heck) .
  • LUCENE-9574 A token filter to drop tokens that match all specified flags.
    (Gus Heck, Uwe Schindler)
  • LUCENE-9537 : Added smoothingScore method and default implementation to Scorable abstract class. The smoothing score allows scorers to calculate a score for a document where the search term or subquery is not present. The smoothing score acts like an idf so that documents that do not have terms or subqueries that are more frequent in the index are not penalized as much as documents that do not have less frequent terms or subqueries and prevents scores which are the product or terms or subqueries from going to zero. Added the implementation of the Indri AND and the IndriDirichletSimilarity from the academic Indri search engine: http://www.lemurproject.org/indri.php.
    (Cameron VandenBerg)
  • LUCENE-9694 : New tool for creating a deterministic index to enable benchmarking changes on a consistent multi-segment index even when they require re-indexing.
    (Patrick Zhai)
  • LUCENE-9385 : Add FacetsConfig option to control which drill-down terms are indexed for a FacetLabel
    (Zachary Chen)
  • LUCENE-9950 : New facet counting implementation for general string doc value fields (SortedSetDocValues / SortedDocValues) not created through FacetsConfig
    (Greg Miller)
  • Improvements (5)
  • LUCENE-9725 : BM25FQuery was extended to handle similarities beyond BM25Similarity. It was renamed to CombinedFieldQuery to reflect its more general scope.
    (Julie Tibshirani)
  • LUCENE-9663 : Adding compression to terms dict from SortedSet/Sorted DocValues.
    (Jaison Bi via Bruno Roustant)
  • LUCENE-9687 : Hunspell support improvements: add API for spell-checking and suggestions, support compound words, fix various behavior differences between Java and C++ implementations, improve performance
    (Peter Gromov, Dawid Weiss)
  • LUCENE-9877 : Reduce index size by increasing allowable exceptions in PForUtil from 3 to 7.
    (Greg Miller)
  • LUCENE-9935 : Enable bulk merge for stored fields with index sort.
    (Robert Muir, Adrien Grand, Nhat Nguyen)
  • Optimizations (2)
  • LUCENE-9932 : Performance improvement for BKD index building
    (neoremind)
  • LUCENE-9827 : Speed up merging of stored fields and term vectors for smaller segments.
    (Daniel Mitterdorfer, Dimitrios Liapis, Adrien Grand, Robert Muir)
  • Bug Fixes (6)
  • LUCENE-9791 : BytesRefHash.equals/find is now thread safe, fixing a Luwak/Monitor bug causing registered queries to sometimes fail to match.
    (Paweł Bugalski)
  • LUCENE-9887 : Fixed parameter use in RadixSelector.
    (liupanfeng via Adrien Grand)
  • LUCENE-9958 : Fixed performance regression for boolean queries that configure a minimum number of matching clauses.
    (Adrien Grand, Matt Weber)
  • LUCENE-9953 : LongValueFacetCounts should count each document at most once when determining the total count for a dimension. Prior to this fix, multi-value docs could contribute a > 1 count to the dimension count.
    (Greg Miller)
  • LUCENE-9967 : Do not throw NullPointerException while trying to handle another exception in ReplicaNode.start
    (Steven Schlansker)
  • LUCENE-9991 : Fix edge case failure in TestStringValueFacetCounts
    (Greg Miller)
  • Other (4)
  • LUCENE-9836 : Removed the pure Maven build. It is no longer possible to build artifacts using Maven (this feature was no longer working correctly). Due to migration to Gradle for Lucene/Solr 9.0, the maintenance of the Maven build was no longer reasonable. POM files are generated for deployment to Maven Central only. Please use "ant generate-maven-artifacts" to produce and deploy artifacts to any repository.
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-9836 : Migrate Maven tasks to use "maven-resolver-ant-tasks" instead of the no longer maintained "maven-ant-tasks".
    (Uwe Schindler)
  • LUCENE-9985 : Upgrade jetty to 9.4.41
    (janhoy)
  • LUCENE-9976 : Fix WANDScorer assertion error.
    (Zach Chen, Adrien Grand, Dawid Weiss)
  • Release 8.8.2 [2021-04-12]

  • Bug Fixes (3)
  • LUCENE-9870 : Fix Circle2D intersectsLine t-value (distance) range clamp
    (Jørgen Nystad)
  • LUCENE-9744 : NPE on a degenerate query in MinimumShouldMatchIntervalsSource $MinimumMatchesIterator.getSubMatches().
    (Alan Woodward)
  • LUCENE-9762 : DoubleValuesSource.fromQuery (also used by FunctionScoreQuery.boostByQuery) could throw an exception when the query implements TwoPhaseIterator and when the score is requested repeatedly.
    (David Smiley, hossman)
  • Release 8.8.1 [2021-02-22]

  • Bug Fixes (1)
  • (No changes)
  • New Features (5)
  • LUCENE-9552 : New LatLonPoint query that accepts an array of LatLonGeometries.
    (Ignacio Vera)
  • LUCENE-9641 : LatLonPoint query support for spatial relationships.
    (Ignacio Vera)
  • LUCENE-9553 : New XYPoint query that accepts an array of XYGeometries.
    (Ignacio Vera)
  • LUCENE-9378 : Doc values now allow configuring how to trade compression for retrieval speed.
    (Adrien Grand)
  • LUCENE-9413 : Add CJKWidthCharFilter and its factory
    (Tomoko Uchida)
  • Improvements (3)
  • LUCENE-9455 : ExitableTermsEnum should sample timeout and interruption check before calling next().
    (Zach Chen via Bruno Roustant)
  • LUCENE-9023 : GlobalOrdinalsWithScore should not compute occurrences when the provided min is 1.
    (Jim Ferenczi)
  • LUCENE-9675 : Binary doc values fields now expose their configured compression mode in the attributes of the field info.
    (Jim Ferenczi)
  • Optimizations (4)
  • LUCENE-9536 : Reduced memory usage for OrdinalMap when a segment has all values.
    (Julie Tibshirani via Adrien Grand)
  • LUCENE-9021 : QueryParser: re-use the LookaheadSuccess exception.
    (Przemek Bruski via Mikhail Khludnev)
  • LUCENE-9636 : Faster decoding of postings for some numbers of bits per value.
    (Guo Feng via Adrien Grand)
  • LUCENE-9346 : WANDScorer now supports queries that have a `minimumNumberShouldMatch` configured.
    (Xi Zachary Chen via Adrien Grand)
  • Bug Fixes (8)
  • LUCENE-9508 : DocumentsWriter was only stalling threads for 1 second allowing documents to be indexed even the DocumentsWriter wasn't able to keep up flushing. Unless IW can't make progress due to an ill behaving DWPT this issue was barely noticeable.
    (Simon Willnauer)
  • LUCENE-9581 : Japanese tokenizer should discard the compound token instead of disabling the decomposition of long tokens when discardCompoundToken is activated.
    (Jim Ferenczi)
  • LUCENE-9595 : Make Component2D#withinPoint implementations consistent with ShapeQuery logic.
    (Ignacio Vera)
  • LUCENE-9606 : Wrap boolean queries generated by shape fields with a Constant score query.
    (Ignacio Vera)
  • LUCENE-9635 : BM25FQuery - Mask encoded norm long value in array lookup.
    (Yilun Cui)
  • LUCENE-9617 : Fix per-field memory leak in IndexWriter.deleteAll(). Reset next available internal field number to 0 on FieldInfos.clear(), to avoid wasting FieldInfo references.
    (Michael Froh)
  • LUCENE-9642 : When encoding triangles in ShapeField, make sure generated triangles are CCW by rotating triangle points before checking triangle orientation.
    (Ignacio Vera)
  • LUCENE-9661 : Fix deadlock in TermsEnum.EMPTY that occurs when trying to initialize TermsEnum and BaseTermsEnum at the same time
    (Namgyu Kim)
  • Other (2)
  • SOLR-14995 : Update Jetty to 9.4.34
    (Mike Drob)
  • LUCENE-9637 : Removes some unused code and replaces the Point implementation on ShapeField/ShapeQuery random tests.
    (Ignacio Vera)
  • Release 8.7.0 [2020-11-03]

  • API Changes (2)
  • LUCENE-9437 : Lucene's facet module's DocValuesOrdinalsReader.decode method is now public, making it easier for applications to decode facet ordinals into their corresponding labels
    (Ankur Goel)
  • LUCENE-9515 : IndexingChain now accepts individual primitives rather than a DocumentsWriterPerThread instance in order to create a new DocConsumer.
    (Simon Willnauer)
  • New Features (4)
  • LUCENE-9386 : RegExpQuery added case insensitive matching option.
    (Mark Harwood)
  • LUCENE-8962 : Add IndexWriter merge-on-refresh feature to selectively merge small segments on getReader, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching.
    (Simon Willnauer)
  • LUCENE-9484 : Allow sorting an index after it was created. With SortingCodecReader, existing unsorted segments can be wrapped and merged into a fresh index using IndexWriter#addIndices
    (Simon Willnauer, Adrien Grand)
  • LUCENE-9444 : Add utility class to retrieve facet labels from the taxonomy index for a facet field so such fields do not also have to be redundantly stored
    (Ankur Goel)
  • Improvements (10)
  • LUCENE-8574 : Add a new ExpressionValueSource which will enforce only one value per name per hit in dependencies, ExpressionFunctionValues will no longer recompute already computed values
    (Patrick Zhai)
  • LUCENE-9416 : Fix CheckIndex to print an invalid non-zero norm as unsigned long when detecting corruption.
  • LUCENE-9440 : FieldInfo#checkConsistency called twice from Lucene50(60)FieldInfosFormat#read; Removed the (redundant?) assert and do these checks for real.
    (Yauheni Putsykovich)
  • LUCENE-9446 : In BooleanQuery rewrite, always remove MatchAllDocsQuery filter clauses when possible.
    (Julie Tibshirani)
  • LUCENE-9501 : Improve coverage for Asserting* test classes: make sure to handle singleton doc values, and sometimes exercise Weight#scorer instead of Weight#bulkScorer for top-level queries.
    (Julie Tibshirani)
  • LUCENE-9511 : Include StoredFieldsWriter in DWPT accounting to ensure that it's heap consumption is taken into account when IndexWriter stalls or should flush DWPTs.
    (Simon Willnauer)
  • LUCENE-9514 : Include TermVectorsWriter in DWPT accounting to ensure that it's heap consumption is taken into account when IndexWriter stalls or should flush DWPTs.
    (Simon Willnauer)
  • LUCENE-9523 : In query shapes over shape fields, skip points while traversing the BKD tree when the relationship with the document is already known.
    (Ignacio Vera)
  • LUCENE-9539 : Use more compact datastructures to represent sorted doc-values in memory when sorting a segment before flush and in SortingCodecReader.
    (Simon Willnauer)
  • LUCENE-9458 : WordDelimiterGraphFilter should order tokens at the same position by endOffset to emit longer tokens first. The same graph is produced.
    (David Smiley)
  • Optimizations (4)
  • LUCENE-9395 : ConstantValuesSource now shares a single DoubleValues instance across all segments
    (Tony Xu)
  • LUCENE-9447 , LUCENE-9486 : Stored fields now get higer compression ratios on highly compressible data.
    (Adrien Grand)
  • LUCENE-9373 : FunctionMatchQuery now accepts a "matchCost" optimization hint.
    (Maxim Glazkov, David Smiley)
  • LUCENE-9510 : Indexing with an index sort is now faster by not compressing temporary representations of the data.
    (Adrien Grand)
  • Bug Fixes (6)
  • LUCENE-9427 : Fix a regression where the unified highlighter didn't produce highlights on fuzzy queries that correspond to exact matches.
    (Julie Tibshirani)
  • LUCENE-9467 : Fix NRTCachingDirectory to use Directory#fileLength to check if a file already exists instead of opening an IndexInput on the file which might throw a AccessDeniedException in some Directory implementations.
    (Simon Willnauer)
  • LUCENE-9501 : Fix a bug in IndexSortSortedNumericDocValuesRangeQuery where it could violate the DocIdSetIterator contract.
    (Julie Tibshirani)
  • LUCENE-9401 : Include field in ComplexPhraseQuery's toString()
    (Thomas Hecker via Munendra S N)
  • LUCENE-9578 : Fix TermRangeQuery when there is no upper bound and the lower bound is the empty string excluded. This would previously match no strings at all while it should match all non-empty strings.
    (Christoph Buescher via Adrien Grand)
  • LUCENE-9524 : Fix NPE in SpanWeight#explain when no scoring is required and SpanWeight has null Similarity.SimScorer.
    (Zach Chen)
  • Documentation (1)
  • LUCENE-9424 : Add a performance warning to AttributeSource.captureState javadocs
    (Patrick Zhai)
  • Changes in Runtime Behavior (1)
  • LUCENE-9539 : SortingCodecReader now doesn't cache doc values fields anymore. Previously, SortingCodecReader used to cache all doc values fields after they were loaded into memory. This reader should only be used to sort segments after the fact using IndexWriter#addIndices.
    (Simon Willnauer)
  • Other (3)
  • LUCENE-9292 : Refactor BKD point configuration into its own class.
    (Ignacio Vera)
  • LUCENE-9470 : Make TestXYMultiPolygonShapeQueries more resilient for CONTAINS queries.
    (Ignacio Vera)
  • LUCENE-9512 : Move LockFactory stress test to be a unit/integration test.
    (Uwe Schindler, Dawid Weiss, Robert Muir)
  • Build (1)
  • Upgrade forbiddenapis to version 3.1.
    (Uwe Schindler)
  • Release 8.6.3 [2020-10-07]

  • Bug Fixes (1)
  • (No changes)
  • Bug Fixes (1)
  • LUCENE-9478 : Prevent DWPTDeleteQueue from referencing itself and leaking memory. The queue passed an implicit this reference to the next queue instance on flush which leaked about 500byte of memory on each full flush, commit or getReader call.
    (Simon Willnauer)
  • Release 8.6.1 [2020-08-13]

  • Bug Fixes (1)
  • LUCENE-9443 : The UnifiedHighlighter was closing the underlying reader when there were multiple term-vector fields. This was a regression in 8.6.0.
    (David Smiley, Chris Beer)
  • Release 8.6.0 [2020-07-15]

  • API Changes (9)
  • LUCENE-9265 : SimpleFSDirectory is deprecated in favor of NIOFSDirectory.
    (Yannick Welsch)
  • LUCENE-9304 : Removed ability to set DocumentsWriterPerThreadPool on IndexWriterConfig. The DocumentsWriterPerThreadPool is a packaged protected final class which made it impossible to customize.
    (Simon Willnauer)
  • LUCENE-9339 : MergeScheduler#merge doesn't accept a parameter if a new merge was found anymore.
    (Simon Willnauer)
  • LUCENE-9330 : SortFields are now responsible for writing themselves into index headers if they are used as index sorts.
    (Alan Woodward, Uwe Schindler, Adrien Grand)
  • LUCENE-9340 : Deprecate SimpleBindings#add(SortField).
    (Alan Woodward)
  • LUCENE-9345 : MergeScheduler is now decoupled from IndexWriter. Instead it accepts a MergeSource interface that offers the basic methods to acquire pending merges, run the merge and do accounting around it.
    (Simon Willnauer)
  • LUCENE-9349 : QueryVisitor.consumeTermsMatching() now takes a Supplier<ByteRunAutomaton> to enable queries that build large automata to provide them lazily. TermsInSetQuery switches to using this method to report matching terms.
    (Alan Woodward)
  • LUCENE-9366 : DocValues.emptySortedNumeric() not longer takes a maxDoc parameter
    (Alan Woodward)
  • LUCENE-7822 : CodecUtil#checkFooter(IndexInput, Throwable) now throws a CorruptIndexException if checksums mismatch or if checksums can't be verified.
    (Martin Amirault, Adrien Grand)
  • New Features (2)
  • LUCENE-7889 : Grouping by range based on values from DoubleValuesSource and LongValuesSource
    (Alan Woodward)
  • LUCENE-8962 : Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching
    (Michael Froh, Mike Sokolov, Mike Mccandless, Simon Willnauer)
  • Improvements (13)
  • LUCENE-9276 : Use same code-path for updateDocuments and updateDocument in IndexWriter and DocumentsWriter.
    (Simon Willnauer)
  • LUCENE-9279 : Update dictionary version for Ukrainian analyzer to 4.9.1
    (Andriy Rysin via Dawid Weiss)
  • LUCENE-8050 : PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values.
    (David Smiley, Juan Rodriguez)
  • LUCENE-9304 : Removed ThreadState abstraction from DocumentsWriter which allows pooling of DWPT directly and improves the approachability of the IndexWriter code.
    (Simon Willnauer)
  • LUCENE-9324 : Add an ID to SegmentCommitInfo in order to compare commits for equality and make snapshots incremental on generational files.
    (Simon Willnauer, Mike Mccandless, Adrien Grand)
  • LUCENE-9342 : TotalHits' relation will be EQUAL_TO when the number of hits is lower than TopDocsColector's numHits
    (Tomás Fernández Löbbe)
  • LUCENE-9353 : Metadata of the terms dictionary moved to its own file, with the `.tmd` extension. This allows checksums of metadata to be verified when opening indices and helps save seeks when opening an index.
    (Adrien Grand)
  • LUCENE-9359 : SegmentInfos#readCommit now always returns a CorruptIndexException if the content of the file is invalid.
    (Adrien Grand)
  • LUCENE-9393 : Make FunctionScoreQuery use ScoreMode.COMPLETE for creating the inner query weight when ScoreMode.TOP_DOCS is requested.
    (Tomás Fernández Löbbe)
  • LUCENE-9392 : Make FacetsConfig.DELIM_CHAR publicly accessible
    (Ankur Goel)
  • LUCENE-9397 : UniformSplit supports encodable fields metadata.
    (Bruno Roustant)
  • LUCENE-9396 : Improved truncation detection for points.
    (Adrien Grand, Robert Muir)
  • LUCENE-9402 : Let MultiCollector handle minCompetitiveScore
    (Tomás Fernández Löbbe, Adrien Grand)
  • Optimizations (8)
  • LUCENE-9254 : UniformSplit keeps FST off-heap.
    (Bruno Roustant)
  • LUCENE-8103 : DoubleValuesSource and QueryValueSource now use a TwoPhaseIterator if one is provided by the Query.
    (Michele Palmia, David Smiley)
  • LUCENE-9287 : UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery.
    (Ignacio Vera)
  • LUCENE-9286 : FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster.
    (Bruno Roustant)
  • LUCENE-7788 : fail precommit on unparameterised log messages and examine for wasted work/objects
    (Erick Erickson)
  • LUCENE-9273 : Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic relate method for all relations, we use specialize methods for each one. In addition, the type of triangle is computed at deserialization time, therefore we can be more selective when decoding points of a triangle.
    (Ignacio Vera)
  • LUCENE-9087 : Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512.
    (Ignacio Vera)
  • LUCENE-9148 : Points now write their index in a separate file.
    (Adrien Grand)
  • Bug Fixes (14)
  • LUCENE-9259 : Fix wrong NGramFilterFactory argument name for preserveOriginal option
    (Paul Pazderski)
  • LUCENE-8849 : DocValuesRewriteMethod.visit wasn't visiting its embedded query
    (Michele Palmia, David Smiley)
  • LUCENE-9258 : DocTermsIndexDocValues assumed it was operating on a SortedDocValues (single valued) field when it could be multi-valued used with a SortedSetSelector
    (Michele Palmia)
  • LUCENE-9164 : Ensure IW processes all internal events before it closes itself on a rollback.
    (Simon Willnauer, Nhat Nguyen, Dawid Weiss, Mike Mccandless)
  • LUCENE-8908 : Return default value from objectVal when doc doesn't match the query in QueryValueSource
    (Bill Bell, hossman, Munendra S N, Michele Palmia)
  • LUCENE-9133 : Fix for potential NPE in TermFilteredPresearcher for empty fields
    (Marvin Justice via Mike Drob)
  • LUCENE-9309 : Wait for #addIndexes merges when aborting merges.
    (Simon Willnauer)
  • LUCENE-9337 : Ensure CMS updates it's thread accounting datastructures consistently. CMS today releases it's lock after finishing a merge before it re-acquires it to update the thread accounting datastructures. This causes threading issues where concurrently finishing threads fail to pick up pending merges causing potential thread starvation on forceMerge calls.
    (Simon Willnauer)
  • LUCENE-9314 : Single-document monitor runs were using the less efficient MultiDocumentBatch implementation.
    (Pierre-Luc Perron, Alan Woodward)
  • LUCENE-9362 : Fix equality check in ExpressionValueSource#rewrite. This fixes rewriting of inner value sources.
    (Dmitry Emets)
  • LUCENE-9405 : IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted.
    (Michael Froh, Simon Willnauer, Mike Mccandless, Mike Sokolov)
  • LUCENE-9400 : Tessellator might build illegal polygons when several holes share the shame vertex.
    (Ignacio Vera)
  • LUCENE-9417 : Tessellator might build illegal polygons when several holes share are connected to the same vertex.
    (Ignacio Vera)
  • LUCENE-9418 : Fix ordered intervals over interleaved terms
    (Alan Woodward)
  • Other (12)
  • LUCENE-9257 : Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed.
    (Bruno Roustant)
  • LUCENE-9272 : Checksums of the terms index are now verified when LeafReader#checkIntegrity is called rather than when opening the index.
    (Adrien Grand)
  • LUCENE-9270 : Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder.
    (Namgyu Kim)
  • LUCENE-9275 : Make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries.
    (Ignacio Vera)
  • LUCENE-9244 : Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point is shared by multiple leaves.
    (Ignacio Vera)
  • LUCENE-9271 : ByteBufferIndexInput was refactored to work on top of the ByteBuffer API.
    (Adrien Grand)
  • LUCENE-9191 : Make LineFileDocs's random seeking more efficient, making tests using LineFileDocs faster
    (Robert Muir, Mike McCandless)
  • LUCENE-9338 : Refactors SimpleBindings to improve type safety and cycle detection
    (Alan Woodward, Adrien Grand)
  • LUCENE-9358 : Change the way the multi-dimensional BKD tree builder generates the intermediate tree representation to be equal to the one dimensional case to avoid unnecessary tree and leaves rotation.
    (Ignacio Vera)
  • LUCENE-9288 : poll_mirrors.py release script can handle HTTPS mirrors.
    (Ignacio Vera)
  • LUCENE-9232 : Fix or suppress 13 resource leak precommit warnings in lucene/replicator
    (Andras Salamon via Erick Erickson)
  • LUCENE-9398 : Always keep BKD index off-heap. BKD reader does not implement Accountable any more.
    (Ignacio Vera)
  • Build (4)
  • Upgrade forbiddenapis to version 3.0.1.
    (Uwe Schindler)
  • LUCENE-9376 : Fix or suppress 20 resource leak precommit warnings in lucene/search
    (Andras Salamon via Erick Erickson)
  • LUCENE-9380 : Fix auxiliary class warnings in Lucene
    (Erick Erickson)
  • LUCENE-9389 : Enhance gradle logging calls validation: eliminate getMessage()
    (Andras Salamon via Erick Erickson)
  • Release 8.5.2 [2020-05-26]

  • Optimizations (1)
  • LUCENE-9350 : Partial reversion of LUCENE-9068 ; holding levenshtein automata on FuzzyQuery can end up blowing up query caches which use query objects as cache keys, so building the automata is now delayed to search time again.
    (Alan Woodward, Mike Drob)
  • Release 8.5.1 [2020-04-16]

  • Bug Fixes (1)
  • LUCENE-9300 : Fix corruption of the new gen field infos when doc values updates are applied on a segment created externally and added to the index with IndexWriter#addIndexes(Directory).
    (Jim Ferenczi, Adrien Grand)
  • Release 8.5.0 [2020-03-24]

  • API Changes (9)
  • LUCENE-9093 : Not an API change but a change in behavior of the UnifiedHighlighter's LengthGoalBreakIterator that will yield Passages sized a little different due to the fact that the sizing pivot is now the center of the first match and not its left edge.
  • LUCENE-9116 : PostingsWriterBase and PostingsReaderBase no longer support setting a field's metadata via a `long[]`.
    (Adrien Grand)
  • LUCENE-9116 : The FSTOrd postings format has been removed.
    (Adrien Grand)
  • LUCENE-8369 : Remove obsolete spatial module.
    (Nick Knize, David Smiley)
  • LUCENE-8621 : Refactor LatLonShape, XYShape, and all query and utility classes to core.
    (Nick Knize)
  • LUCENE-9218 : XY geometries API works in float space.
    (Ignacio Vera)
  • LUCENE-9212 : Intervals.multiterm() takes CompiledAutomaton rather than plain Automaton
    (Alan Woodward)
  • LUCENE-9150 : Restore support for dynamic PlanetModel in spatial3d.
    (Nick Knize)
  • LUCENE-9171 : QueryBuilder.newTermQuery() and .newSynonymQuery() now take boost parameters.
    (Alessandro Benedetti, Alan Woodward)
  • New Features (3)
  • LUCENE-8903 : Add LatLonShape and XYShape point query.
    (Ignacio Vera)
  • LUCENE-8707 : Add LatLonShape and XYShape distance query.
    (Ignacio Vera)
  • LUCENE-9238 : New XYPointField field and Queries for indexing, searching and sorting cartesian points.
    (Ignacio Vera)
  • Improvements (12)
  • LUCENE-9149 : Increase data dimension limit in BKD.
    (Nick Knize)
  • LUCENE-9102 : Add maxQueryLength option to DirectSpellchecker.
    (Andy Webb via Bruno Roustant)
  • LUCENE-9091 : UnifiedHighlighter HTML escaping should only escape essentials
    (Nándor Mátravölgyi)
  • LUCENE-9105 : UniformSplit postings format detects corrupted index and better handles IO exceptions.
    (Bruno Roustant)
  • LUCENE-9106 : UniformSplit postings format allows extension of block/line serializers.
    (Bruno Roustant)
  • LUCENE-9093 : UnifiedHighlighter's LengthGoalBreakIterator has a new fragmentAlignment option to better center the first match in the passage. Also the sizing point now pivots at the center of the first match term and not its left edge. This yields Passages that won't be identical to the previous behavior.
    (Nándor Mátravölgyi, David Smiley)
  • LUCENE-9153 : Allow WhitespaceAnalyzer to set a maxTokenLength other than the default of 255
    (Alan Woodward)
  • LUCENE-9152 : Improve line intersections with polygons when they are touching from the outside.
    (Ignacio Vera)
  • LUCENE-9123 : Add new JapaneseTokenizer constructors with discardCompoundToken option that controls whether the tokenizer emits original (compound) tokens when the mode is not NORMAL.
    (Kazuaki Hiraga via Tomoko Uchida)
  • LUCENE-9253 : KoreanTokenizer now supports custom dictionaries(system, unknown).
    (Namgyu Kim)
  • LUCENE-9171 : QueryBuilder can now use BoostAttributes on input token streams to selectively boost particular terms or synonyms in parsed queries.
    (Alessandro Benedetti, Alan Woodward)
  • LUCENE-9298 : Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared.
    (Yu Binglei, Simon Willnauer)
  • Optimizations (10)
  • LUCENE-9211 : Add compression for Binary doc value fields.
    (Mark Harwood)
  • LUCENE-4702 : Better compression of terms dictionaries.
    (Adrien Grand)
  • LUCENE-9228 : Sort dvUpdates in the term order before applying if they all update a single field to the same value. This optimization can reduce the flush time by around 20% for the docValues update user cases.
    (Nhat Nguyen, Adrien Grand, Simon Willnauer)
  • LUCENE-9245 : Reduce AutomatonTermsEnum memory usage.
    (Bruno Roustant, Robert Muir)
  • LUCENE-9237 : Faster UniformSplit intersect TermsEnum.
    (Bruno Roustant)
  • LUCENE-9260 : LeafReader#checkIntegrity verifies checksums of CFS files.
    (Adrien Grand)
  • LUCENE-9068 : FuzzyQuery builds its Automaton up-front
    (Alan Woodward, Mike Drob)
  • LUCENE-9113 : Faster merging of SORTED/SORTED_SET doc values.
    (Adrien Grand)
  • LUCENE-9125 : Optimize Automaton.step() with binary search and introduce Automaton.next().
    (Bruno Roustant)
  • LUCENE-9147 : The index of stored fields and term vectors in now off-heap.
    (Adrien Grand)
  • Bug Fixes (11)
  • LUCENE-9084 : Fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester
    (Paul Ward)
  • LUCENE-9115 : NRTCachingDirectory no longer caches files of unknown size.
    (Adrien Grand)
  • LUCENE-9144 : Fix error message on OneDimensionBKDWriter when too many points are added to the writer.
    (Ignacio Vera)
  • LUCENE-9135 : Make UniformSplit FieldMetadata counters long.
    (Bruno Roustant)
  • LUCENE-9200 : Fix TieredMergePolicy to use double (not float) math to make its merging decisions, fixing a corner-case bug uncovered by fun randomized tests
    (Robert Muir, Mike McCandless)
  • LUCENE-9099 : Unordered and Ordered interval queries now correctly handle repeated subterms - ordered intervals could supply an 'extra' minimized interval, resulting in odd matches when combined with eg CONTAINS queries; and unordered intervals would match duplicate subterms on the same position, so an query for UNORDERED(foo, foo) would match a document containing 'foo' only once.
    (Alan Woodward)
  • LUCENE-9250 : Add support for Circle2d#intersectsLine around the dateline.
    (Ignacio Vera)
  • LUCENE-9243 : Add fudge factor when creating a bounding box of a XYCircle.
    (Ignacio Vera)
  • LUCENE-9239 : Circle2D#WithinTriangle detects properly if a triangle is Within distance.
    (Ignacio Vera)
  • LUCENE-9251 : Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly.
    (Ignacio Vera)
  • LUCENE-9263 : Fix wrong transformation of distance in meters to radians in Geo3DPoint.
    (Ignacio Vera)
  • Other (6)
  • LUCENE-9109 : Backport some changes from master (except StackWalker) to improve TestSecurityManager
    (Uwe Schindler)
  • LUCENE-9110 : Backport refactored stack analysis in tests to use generalized LuceneTestCase methods
    (Uwe Schindler)
  • LUCENE-9141 : Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are executed with input objects that extend such interface.
    (Ignacio Vera)
  • LUCENE-9194 : Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are executed with input objects that extend such interface.
    (Ignacio Vera)
  • LUCENE-9096 : Simplification of CompressingTermVectorsWriter#flushOffsets.
    (kkewwei via Adrien Grand)
  • LUCENE-9225 : Rectangle extends LatLonGeometry so it can be used in a geometry collection.
    (Ignacio Vera)
  • Release 8.4.1 [2020-01-13]

  • Bug Fixes (1)
  • (No changes)
  • API Changes (1)
  • LUCENE-9029 : Deprecate SloppyMath toRadians/toDegrees in favor of Java Math.
    (Jack Conradson via Adrien Grand)
  • New Features (1)
  • LUCENE-8620 : Add CONTAINS support for LatLonShape and XYShape.
    (Ignacio Vera)
  • Improvements (7)
  • LUCENE-9002 : Skip costly caching clause in LRUQueryCache if it makes the query many times slower.
    (Guoqiang Jiang)
  • LUCENE-9006 : WordDelimiterGraphFilter's catenateAll token is now ordered before any token parts, like WDF did.
    (David Smiley)
  • LUCENE-9028 : introducing Intervals.multiterm()
    (Mikhail Khludnev)
  • LUCENE-9018 : ConcatenateGraphFilter now has a configurable separator.
    (Stanislav Mikulchik, David Smiley)
  • LUCENE-9036 : ExitableDirectoryReader may interupt scaning over DocValues
    (Mikhail Khludnev)
  • LUCENE-9062 : QueryVisitor now has a consumeTermsMatching() method, allowing queries that match a class of terms to pass a ByteRunAutomaton matching those that class back to the visitor.
    (Alan Woodward, David Smiley)
  • LUCENE-9073 : IntervalQuery to respond field on toString() and explain()
    (Mikhail Khludnev)
  • Optimizations (9)
  • LUCENE-8928 : When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4.
    (Ignacio Vera, Adrien Grand)
  • BaseDirectoryReader no longer sums up the `LeafReader#numDocs` of its leaves eagerly. This especially helps when creating views of readers that hide documents, since computing the number of live documents is an expensive operation.
    (Adrien Grand)
  • LUCENE-8992 : TopFieldCollector and TopScoreDocCollector can now share minimum scores across leaves concurrently.
    (Adrien Grand, Atri Sharma, Jim Ferenczi)
  • LUCENE-8932 : BKDReader's index is now stored off-heap when the IndexInput is an instance of ByteBufferIndexInput.
    (Jack Conradson via Adrien Grand)
  • LUCENE-9024 : IntroSelector now falls back to the median of medians algorithm instead of sorting when the maximum recursion level is exceeded, providing better worst-case runtime.
    (Paul Sanwald via Adrien Grand)
  • LUCENE-8920 : The denser arcs of FST now index labels with a bitset in order to provide near constant time access.
    (Bruno Roustant, Mike Sokolov via Adrien Grand)
  • LUCENE-9027 : Use SIMD instructions to decode postings.
    (Adrien Grand)
  • LUCENE-9049 : Remove FST cached root arcs now redundant with labels indexed by bitset. This frees some on-heap FST space.
    (Jack Conradson via Bruno Roustant)
  • LUCENE-9045 : Do not use TreeMap/TreeSet in BlockTree and PerFieldPostingsFormat.
    (Bruno Roustant)
  • Bug Fixes (7)
  • LUCENE-9001 : Fix race condition in SetOnce.
    (Przemko Robakowski)
  • LUCENE-9030 : Fix WordnetSynonymParser behaviour so it behaves similar to SolrSynonymParser.
    (Christoph Buescher via Alan Woodward)
  • LUCENE-9054 : Fix reproduceJenkinsFailures.py to not overwrite junit XML files when retrying
    (hossman)
  • LUCENE-9031 : UnsupportedOperationException on MatchesIterator.getQuery()
    (Alan Woodward, Mikhail Khludnev)
  • LUCENE-8996 : maxScore was sometimes missing from distributed grouped responses.
    (Julien Massenet, Diego Ceccarelli, Munendra S N, Christine Poerschke)
  • LUCENE-9055 : Fix the detection of lines crossing triangles through edge points.
    (Ignacio Vera)
  • LUCENE-9103 : Disjunctions can miss some hits in some rare conditions.
    (Adrien Grand)
  • Other (6)
  • LUCENE-8979 : Code Cleanup: Use entryset for map iteration wherever possible. - Part 2
    (Koen De Groote)
  • LUCENE-8994 : Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll().
    (Koen De Groote)
  • LUCENE-8746 : Refactor EdgeTree - Introduce a Component tree that represents the tree of components (e.g polygons). Edge tree is now just a tree of edges.
    (Ignacio Vera)
  • LUCENE-9046 : Fix wrong example in Javadoc of TermInSetQuery
    (Namgyu Kim)
  • LUCENE-8983 : Add sandbox PhraseWildcardQuery to control multi-terms expansions in a phrase.
    (Bruno Roustant)
  • LUCENE-9067 : Polygon2D#contains() is now thread safe.
    (Ignacio Vera)
  • Build (2)
  • Upgrade forbiddenapis to version 2.7; upgrade Groovy to 2.4.17.
    (Uwe Schindler)
  • LUCENE-9041 : Upgrade ecj to 3.19.0 to fix sporadic precommit javadoc issues
    (Kevin Risden)
  • Release 8.3.1 [2019-12-03]

  • Bug Fixes (1)
  • LUCENE-9050 : MultiTermIntervalsSource.visit() was not calling back to its visitor.
    (Alan Woodward)
  • Release 8.3.0 [2019-11-02]

  • API Changes (5)
  • LUCENE-8909 : IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316 , this method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method.
    (Adrien Grand, Munendra S N)
  • LUCENE-8755 : SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class. The quad and packed quad prefix trees are sensitive to this. It's recommended to pass the version like you should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions may be incompatible with older indexes.
    (Chongchen Chen, David Smiley)
  • LUCENE-8956 : QueryRescorer now only sorts the first topN hits instead of all initial hits.
    (Paul Sanwald via Adrien Grand)
  • LUCENE-8921 : IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq. And don't call if docFreq <= 0. The previous implementation survives as deprecated and final. It's removed in 9.0.
    (Bruno Roustant, David Smiley, Alan Woodward)
  • LUCENE-8990 : PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of PointValues#estimatePointCount(visitor).
    (Ignacio Vera, Adrien Grand)
  • New Features (6)
  • LUCENE-8936 : Add SpanishMinimalStemFilter
    (vinod kumar via Tomoko Uchida)
  • LUCENE-8764 LUCENE-8945 : Add "export all terms and doc freqs" feature to Luke with delimiters.
    (Leonardo Menezes, Amish Shah via Tomoko Uchida)
  • LUCENE-8747 : Composite Matches from multiple subqueries now allow access to their submatches, and a new NamedMatches API allows marking of subqueries and a simple way to find which subqueries have matched on a given document
    (Alan Woodward, Jim Ferenczi)
  • LUCENE-8769 : Introduce Range Query For Multiple Connected Ranges
    (Atri Sharma)
  • LUCENE-8960 : Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField
    (Ignacio Vera)
  • LUCENE-8753 : New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and extensibility. New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal term dictionary across fields.
    (Bruno Roustant, Juan Rodriguez, David Smiley)
  • Improvements (15)
  • LUCENE-8874 : Show SPI names instead of class names in Luke Analysis tab.
    (Tomoko Uchida)
  • LUCENE-8894 : Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes.
    (Tomoko Uchida)
  • LUCENE-8914 : move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor so we take advantage of the change introduced in LUCENE-7862 .
    (Ignacio Vera)
  • LUCENE-8955 : move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor so we take advantage of the change introduced in LUCENE-7862 .
    (Ignacio Vera)
  • LUCENE-8918 : PhraseQuery throws exceptions at construction time if it is passed null arguments.
    (Alan Woodward)
  • LUCENE-8916 : GraphTokenStreamFiniteStrings preserves all Token attributes through its finite strings TokenStreams
    (Alan Woodward)
  • LUCENE-8906 : Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it.
    (Bruno Roustant)
  • LUCENE-8942 : Remove redundant parameters and improve visibility strictness in LRUQueryCache
    (Atri Sharma)
  • SOLR-13663 : Introduce <SpanPositionRange> into XML Query Parser
    (Alessandro Benedetti via Mikhail Khludnev)
  • LUCENE-8952 : Use a sort key instead of true distance in NearestNeighbor
    (Julie Tibshirani) .
  • LUCENE-8620 : Tessellator labels the edges of the generated triangles whether they belong to the original polygon. This information is added to the triangle encoding.
    (Ignacio Vera)
  • LUCENE-8964 : Fix geojson shape parsing on string arrays in properties
    (Alexander Reelsen)
  • LUCENE-8976 : Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor.
    (Ignacio Vera)
  • LUCENE-8966 : The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters.
    (Jim Ferenczi)
  • LUCENE-8984 : MoreLikeThis MLT is biased for uncommon fields
    (Andy Hind via Anshum Gupta)
  • Optimizations (8)
  • LUCENE-8922 : DisjunctionMaxQuery more efficiently leverages impacts to skip non-competitive hits.
    (Adrien Grand)
  • LUCENE-8935 : BooleanQuery with no scoring clause can now early terminate the query when the total hits is not requested.
    (Jim Ferenczi)
  • LUCENE-8941 : Matches on wildcard queries will defer building their full disjunction until a MatchesIterator is pulled
    (Alan Woodward)
  • LUCENE-8755 : spatial-extras quad and packed quad prefix trees now index points faster.
    (Chongchen Chen, David Smiley)
  • LUCENE-8860 : add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery.
    (Igor Motov via Ignacio Vera)
  • LUCENE-8968 : Improve performance of WITHIN and DISJOINT queries for Shape queries by doing just one pass whenever possible.
    (Ignacio Vera)
  • LUCENE-8939 : Introduce shared count based early termination across multiple slices
    (Atri Sharma)
  • LUCENE-8980 : Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment. Large perf gain for ID/time like data when populated sequentially.
    (Guoqiang Jiang)
  • Bug Fixes (2)
  • LUCENE-8755 : spatial-extras quad and packed quad prefix trees could throw a NullPointerException for certain cell edge coordinates
    (Chongchen Chen, David Smiley)
  • LUCENE-9005 : BooleanQuery.visit() would pull subVisitors from its parent visitor, rather than from a visitor for its own specific query. This could cause problems when BQ was nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from the parent.
    (Alan Woodward)
  • Other (7)
  • LUCENE-8778 LUCENE-8911 LUCENE-8957 : Define analyzer SPI names as static final fields and document the names in Javadocs.
    (Tomoko Uchida, Uwe Schindler)
  • LUCENE-8758 : QuadPrefixTree: removed levelS and levelN fields which weren't used.
    (Amish Shah)
  • LUCENE-8975 : Code Cleanup: Use entryset for map iteration wherever possible.
    (Koen De Groote)
  • LUCENE-8993 , LUCENE-8807 : Changed all repository and download references in build files to HTTPS.
    (Uwe Schindler)
  • LUCENE-8998 : Fix OverviewImplTest.testIsOptimized reproducible failure.
    (Tomoko Uchida)
  • LUCENE-8999 : LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test w/o wrapping in a new assertion failure unless the caller has explicitly expected them
    (hossman)
  • LUCENE-8062 : GlobalOrdinalsWithScoreQuery is no longer eligible for query caching.
    (Jim Ferenczi)
  • Release 8.2.0 [2019-07-26]

  • API Changes (3)
  • LUCENE-8865 : IndexSearcher now uses Executor instead of ExecutorSerivce. This change is fully backwards compatible since ExecutorService directly implements Executor.
    (Simon Willnauer)
  • LUCENE-8856 : Intervals queries have moved from the sandbox to the queries module.
    (Alan Woodward)
  • LUCENE-8893 : Intervals.wildcard() and Intervals.prefix() methods now take BytesRef rather than String.
    (Alan Woodward)
  • New Features (10)
  • LUCENE-8632 : New XYShape Field and Queries for indexing and searching general cartesian geometries.
    (Nick Knize)
  • LUCENE-8891 : Snowball stemmer/analyzer for the Estonian language.
    (Gert Morten Paimla via Tomoko Uchida)
  • LUCENE-8815 : Provide a DoubleValues implementation for retrieving the value of features without requiring a separate numeric field. Note that as feature values are stored with only 8 bits of mantissa the values returned may have a delta from the original values indexed.
    (Colin Goodheart-Smithe via Adrien Grand)
  • LUCENE-8803 : Provide a FeatureSortfield to allow sorting search hits by descending value of a feature. This is exposed via the factory method FeatureField#newFeatureSort.
    (Colin Goodheart-Smithe via Adrien Grand)
  • LUCENE-8784 : The KoreanTokenizer now preserves punctuations if discardPunctuation is set to false (defaults to true).
    (Namgyu Kim via Jim Ferenczi)
  • LUCENE-8812 : Add new KoreanNumberFilter that can change Hangul character to number and process decimal point. It is similar to the JapaneseNumberFilter.
    (Namgyu Kim)
  • LUCENE-8362 : Add doc-value support to range fields.
    (Atri Sharma via Adrien Grand)
  • LUCENE-8766 : Add monitor subproject (previously Luwak monitoring library). This allows a stream of documents to be matched against a set of registered queries in an efficient manner, for use as a monitoring or classification tool.
    (Alan Woodward)
  • LUCENE-7714 : Add a numeric range query in sandbox that takes advantage of index sorting.
    (Julie Tibshirani via Jim Ferenczi)
  • LUCENE-8859 : The completion suggester's postings format now have an option to load its internal FST off-heap.
    (Jim Ferenczi)
  • Bug Fixes (9)
  • LUCENE-8831 : Fixed LatLonShapeBoundingBoxQuery .hashCode methods.
    (Ignacio Vera)
  • LUCENE-8775 : Improve tessellator to handle better cases where a hole share a vertex with the polygon.
    (Ignacio Vera)
  • LUCENE-8785 : Ensure new threadstates are locked before retrieving the number of active threadstates. This causes assertion errors and potentially broken field attributes in the IndexWriter when IndexWriter#deleteAll is called while actively indexing.
    (Simon Willnauer)
  • LUCENE-8804 : Forbid calls to putAttribute on frozen FieldType instances.
    (Vamshi Vijay Nakkirtha via Adrien Grand)
  • LUCENE-8828 : Removes the buggy 'disallow overlaps' boolean from Intervals.unordered(), and replaces it with a new Intervals.unorderedNoOverlaps() method
    (Alan Woodward)
  • LUCENE-8843 : Don't ignore exceptions that are thrown when trying to open a file in IOUtils#fsync.
    (Jason Tedor via Adrien Grand)
  • LUCENE-8835 : FileSwitchDirectory now respects the file extension when listing directory contents to ensure we don't expose pending deletes if both directory point to the same underlying filesystem directory.
    (Simon Willnauer)
  • LUCENE-8853 : FileSwitchDirectory now applies best effort to place tmp files in the same directory as the target files.
    (Simon Willnauer)
  • LUCENE-8892 : Add missing closing parentheses in MultiBoolFunction's description()
    (Florian Diebold, Munendra S N)
  • Improvements (8)
  • LUCENE-7840 : Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier as opposed to eliminating them during scoring construction.
    (Atri Sharma via Jim Ferenczi)
  • LUCENE-8770 : BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid executing the second phase when scorers don't intersect.
    (Adrien Grand, Jim Ferenczi)
  • LUCENE-8818 : Fix smokeTestRelease.py encoding bug
    (janhoy)
  • LUCENE-8845 : Allow Intervals.prefix() and Intervals.wildcard() to specify their maximum allowed expansions
    (Alan Woodward)
  • LUCENE-8875 : Introduce a Collector optimized for use cases when large number of hits are requested
    (Atri Sharma)
  • LUCENE-8848 LUCENE-7757 LUCENE-8492 : The UnifiedHighlighter now detects that parts of the query are not understood by it, and thus it should not make optimizations that result in no highlights or slow highlighting. This generally works best for WEIGHT_MATCHES mode. Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser will now highlight correctly.
    (David Smiley)
  • LUCENE-8793 : Luke enhanced UI for CustomAnalyzer: show detailed analysis steps.
    (Jun Ohtani via Tomoko Uchida)
  • LUCENE-8855 : Add Accountable to some Query implementations
    (ab, Adrien Grand)
  • Optimizations (8)
  • LUCENE-8796 : Use exponential search instead of binary search in IntArrayDocIdSet#advance method
    (Luca Cavanna via Adrien Grand)
  • LUCENE-8865 : Use incoming thread for execution if IndexSearcher has an executor. Now caller threads execute at least one search on an index even if there is an executor provided to minimize thread context switching.
    (Simon Willnauer)
  • LUCENE-8868 : New storing strategy for BKD tree leaves with low cardinality. It stores the distinct values once with the cardinality value reducing the storage cost.
    (Ignacio Vera)
  • LUCENE-8885 : Optimise BKD reader by exploiting cardinality information stored on leaves.
    (Ignacio Vera)
  • LUCENE-8896 : Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries.
    (Ignacio Vera)
  • LUCENE-8901 : Load frequencies lazily only when needed in BlockDocsEnum and BlockImpactsEverythingEnum
    (Mayya Sharipova) .
  • LUCENE-8888 : Optimize distribution of points with data dimensions in BKD tree leaves.
    (Ignacio Vera)
  • LUCENE-8311 : Phrase queries now leverage impacts.
    (Adrien Grand)
  • Test Framework (1)
  • LUCENE-8825 : CheckHits now display the shard index in case of mismatch between top hits.
    (Atri Sharma via Adrien Grand)
  • Other (6)
  • LUCENE-8847 : Code Cleanup: Remove StringBuilder.append with concatenated strings.
    (Koen De Groote via Uwe Schindler)
  • LUCENE-8861 : Script to find open Github PRs that needs attention
    (janhoy)
  • LUCENE-8852 : ReleaseWizard tool for release managers
    (janhoy)
  • LUCENE-8838 : Remove support for Steiner points on Tessellator.
    (Ignacio Vera)
  • LUCENE-8879 : Improve BKDRadixSelector tests.
    (Ignacio Vera)
  • LUCENE-8886 : Fix TestMutablePointsReaderUtils tests.
    (Ignacio Vera)
  • Release 8.1.1 [2019-05-28]

  • Improvements (1)
  • LUCENE-8781 : FST lookup performance has been improved in many cases by encoding Arcs using full-sized arrays with gaps. The new encoding is enabled for postings in the default codec and for suggesters.
    (Mike Sokolov)
  • Release 8.1.0 [2019-05-16]

  • API Changes (2)
  • LUCENE-3041 : A query introspection API has been added. Queries should implement a visit() method, taking a QueryVisitor, and either pass the visitor down to any child queries, or call a visitX() or consumeX() method on it. All locations in the code that called Weight.extractTerms() have been changed to use this API, and the extractTerms() method has been deprecated.
    (Alan Woodward, Simon Willnauer, David Smiley, Luca Cavanna)
  • LUCENE-8735 : Directory.getPendingDeletions is now abstract to ensure subclasses override it. FilterDirectory now delegates the call, ensuring correct default behaviour for subclasses.
    (Henning Andersen)
  • New Features (1)
  • LUCENE-2562 : The well-known graphical user interface for inspecting Lucene indexes "Luke" was added as a Lucene module. It can be started from the binary distribution by calling the shell scripts in the module folder or from the source checkout by using `ant -f lucene/luke/build.xml run`. Luke provides a Swing-based user interface and can be used to open Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index commits and segments, or test (custom) analyzers. It also has maintenance functions to check index structures and force merge indexes for archival. Luke was originally developed by Andrzej Bialecki, later maintained by Dmitry Kan and finally rewritten by Tomoko Uchida to use the ASF licensing compatible Swing framework (as shipped with JDKs).
    (Tomoko Uchida, Uwe Schindler)
  • Bug fixes (10)
  • LUCENE-8736 : LatLonShapePolygonQuery returns incorrect WITHIN results with shared boundaries. Point in Polygon now correctly includes boundary points. Box and Polygon relations with triangles have also been improved to correctly include boundary points.
    (Nick Knize)
  • LUCENE-8712 : Polygon2D does not detect crossings through segment edges.
    (Ignacio Vera)
  • LUCENE-8720 : NameIntCacheLRU (in the facets module) had an int overflow bug that disabled cleaning of the cache
    (Russell A Brown)
  • LUCENE-8726 : ValueSource.asDoubleValuesSource() could leak a reference to IndexSearcher
    (Alan Woodward, Yury Pakhomov)
  • LUCENE-8719 : FixedShingleFilter can miss shingles at the end of a token stream if there are multiple paths with different lengths.
    (Alan Woodward)
  • LUCENE-8688 : TieredMergePolicy#findForcedMerges now tries to create the cheapest merges that allow the index to go down to `maxSegmentCount` segments or less.
    (Armin Braun via Adrien Grand)
  • LUCENE-8477 : Interval disjunctions could miss valid hits if some of the clauses of the disjunction are minimized away. We now rewrite intervals if a source contains a disjunction and the internal gaps matter for matching. This behaviour can be disabled if users are more interested in speed rather than accuracy of matching.
    (Alan Woodward, Jim Ferenczi)
  • LUCENE-8741 : ValueSource.fromDoubleValuesSource() was casting to Scorer instead of Scorable, leading to ClassCastExceptions
    (Markus Jelsma, Alan Woodward)
  • LUCENE-8754 : Fix ConcurrentModificationException in SegmentInfo if attributes are accessed in MergePolicy while the merge is running
    (Simon Willnauer)
  • LUCENE-8765 : Fixed validation of the number of added points in KD trees.
    (Zhao Yang via Adrien Grand)
  • Improvements (13)
  • LUCENE-8673 : Use radix partitioning when merging dimensional points instead of sorting all dimensions before hand.
    (Ignacio Vera, Adrien Grand)
  • LUCENE-8687 : Optimise radix partitioning for points on heap.
    (Ignacio Vera)
  • LUCENE-8699 : Change HeapPointWriter to use a single byte array instead to a list of byte arrays. In addition a new interface PointValue is added to abstract out the different formats between offline and on-heap writers.
    (Ignacio Vera)
  • LUCENE-8703 : Build point writers in the BKD tree only when they are needed.
    (Ignacio Vera)
  • LUCENE-8652 : SynonymQuery can now deboost the document frequency of each term when blending the score of the synonym.
    (Jim Ferenczi)
  • LUCENE-8631 : The Korean's user dictionary now picks the longest-matching word and discards the other matches.
    (Yeongsu Kim via Jim Ferenczi)
  • LUCENE-8732 : ConstantScoreQuery can now early terminate the query if the minimum score is greater than the constant score and total hits are not requested.
    (Jim Ferenczi)
  • LUCENE-8750 : Implements setMissingValue() on sort fields produced from DoubleValuesSource and LongValuesSource
    (Mike Sokolov via Alan Woodward)
  • LUCENE-8701 : ToParentBlockJoinQuery now creates a child scorer that disallows skipping over non-competitive documents if the score of a parent depends on the score of multiple children (avg, max, min). Additionally the score mode `none` that assigns a constant score to each parent can early terminate top scores's collection.
    (Jim Ferenczi)
  • LUCENE-8751 : Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1 (single document).
    (Jim Ferenczi)
  • LUCENE-8752 : Japanese new era name '令和' (Reiwa) is added to the dictionary used in JapaneseTokenizer so that the analyzer handles the era name correctly. Reiwa is set to replace the Heisei Era on May 1, 2019.
    (Tomoko Uchida)
  • LUCENE-8671 : Introduced reader attributes allows a per IndexReader configuration of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a per field basis
    (Simon Willnauer)
  • LUCENE-8787 : spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3 digits of milliseconds precision but now parses other lengths (although > 3 not used).
    (Thomas Lemmé via David Smiley)
  • Changes in Runtime Behavior (4)
  • LUCENE-8671 : Load FST off-heap also for ID-like fields if reader is not opened from an IndexWriter.
    (Simon Willnauer)
  • LUCENE-8730 : WordDelimiterGraphFilter always emits its original token first. This brings its behaviour into line with the deprecated WordDelimiterFilter, so that the only difference in output between the two is in the position length attribute.
    (Alan Woodward, Jim Ferenczi)
  • LUCENE-7386 : Disjunctions nested in disjunctions are now flattened. This might trigger changes in the produced scores due to changes to the order in which scores of sub clauses are summed up.
    (Adrien Grand)
  • LUCENE-8756 : MoreLikeThisQuery now respects custom term frequencies (TermFrequencyAttribute) at search time
    (Olli Kuonanoja)
  • Other (5)
  • LUCENE-8680 : Refactor EdgeTree#relateTriangle method.
    (Ignacio Vera)
  • LUCENE-8685 : Refactor LatLonShape tests.
    (Ignacio Vera)
  • LUCENE-8713 : Add Line2D tests.
    (Ignacio Vera)
  • LUCENE-8729 : Workaround: Disable accessibility doclints (Java 13+), so compilation with recent JDK succeeds.
    (Uwe Schindler)
  • LUCENE-8725 : Make TermsQuery.SeekingTermSetTermsEnum a top level class and public
    (noble)
  • Release 8.0.0 [2019-03-14]

  • API Changes (31)
  • LUCENE-8662 : TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum.
    (Jeffery Yuan via Tomás Fernández Löbbe, Simon Willnauer)
  • LUCENE-8469 : Deprecated StringHelper.compare has been removed.
    (Dawid Weiss)
  • LUCENE-8039 : Introduce a "delta distance" method set to GeoDistance. This allows distance calculations, especially for paths, to take into account an "excursion" to include the specified point.
  • LUCENE-8007 : Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq() and Terms.getSumTotalTermFreq() are now required: if frequencies are not stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(), respectively, because all freq() values equal 1.
    (Adrien Grand, Robert Muir)
  • LUCENE-8038 : Deprecated PayloadScoreQuery constructors have been removed
    (Alan Woodward)
  • LUCENE-8014 : Similarity.computeSlopFactor() and Similarity.computePayloadFactor() have been removed
    (Alan Woodward)
  • LUCENE-7996 : Queries are now required to produce positive scores.
    (Adrien Grand)
  • LUCENE-8099 : CustomScoreQuery, BoostedQuery and BoostingQuery have been removed
    (Alan Woodward)
  • LUCENE-8012 : Explanation now takes Number rather than float
    (Alan Woodward, Robert Muir)
  • LUCENE-8116 : SimScorer now only takes a frequency and a norm as per-document scoring factors.
    (Adrien Grand)
  • LUCENE-8113 : TermContext has been renamed to TermStates, and can now be constructed lazily if term statistics are not required
    (Alan Woodward)
  • LUCENE-8242 : Deprecated method IndexSearcher#createNormalizedWeight() has been removed
    (Alan Woodward)
  • LUCENE-8267 : Memory codecs removed from the codebase (MemoryPostings, MemoryDocValues).
    (Dawid Weiss)
  • LUCENE-8144 : Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework.
    (Nhat Nguyen via Adrien Grand)
  • LUCENE-8356 : StandardFilter and StandardFilterFactory have been removed
    (Alan Woodward)
  • LUCENE-8373 : StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed
    (Alan Woodward)
  • LUCENE-8388 : Unused PostingsEnum#attributes() method has been removed
    (Alan Woodward)
  • LUCENE-8405 : TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have an option to compute the maximum score when sorting by field.
    (Adrien Grand)
  • LUCENE-8411 : TopFieldCollector no longer takes a fillFields option, it now always fills fields.
    (Adrien Grand)
  • LUCENE-8412 : TopFieldCollector no longer takes a trackDocScores option. Scores need to be set on top hits via TopFieldCollector#populateScores instead.
    (Adrien Grand)
  • LUCENE-6228 : A new Scorable abstract class has been added, containing only those methods from Scorer that should be called from Collectors. LeafCollector.setScorer() now takes a Scorable rather than a Scorer.
    (Alan Woodward, Adrien Grand)
  • LUCENE-8475 : Deprecated constants have been removed from RamUsageEstimator.
    (Dimitrios Athanasiou)
  • LUCENE-8483 : Scorers may no longer take null as a Weight
    (Alan Woodward)
  • LUCENE-8352 : TokenStreamComponents is now final, and can take a Consumer<Reader> in its constructor
    (Mark Harwood, Alan Woodward, Adrien Grand)
  • LUCENE-8498 : LowerCaseTokenizer has been removed, and CharTokenizer no longer takes a normalizer function.
    (Alan Woodward)
  • LUCENE-7875 : Moved MultiFields static methods out of the class. getLiveDocs is now in MultiBits which is now public. getMergedFieldInfos and getIndexedFields are now in FieldInfos. getTerms is now in MultiTerms. getTermPositionsEnum and getTermDocsEnum were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms.
    (David Smiley)
  • LUCENE-8513 : MultiFields.getFields is now removed. Please avoid this class, and Fields in general, when possible.
    (David Smiley)
  • LUCENE-8497 : MultiTermAwareComponent has been removed, and in its place TokenFilterFactory and CharFilterFactory now expose type-safe normalize() methods. This decouples normalization from tokenization entirely.
    (Mayya Sharipova, Alan Woodward)
  • LUCENE-8597 : IntervalIterator now exposes a gaps() method that reports the number of gaps between its component sub-intervals. This can be used in a new filter available via Intervals.maxgaps().
    (Alan Woodward)
  • LUCENE-8609 : Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats().
    (Simon Willnauer)
  • LUCENE-8292 : Make TermsEnum fully abstract.
    (Simon Willnauer)
  • Changes in Runtime Behavior (15)
  • LUCENE-8333 : Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of numDocs.
    (Robert Muir, Dawid Weiss) .
  • LUCENE-7837 : Indices that were created before the previous major version will now fail to open even if they have been merged with the previous major version.
    (Adrien Grand)
  • LUCENE-8020 : Similarities are no longer passed terms that don't exist by queries such as SpanOrQuery, so scoring formulas no longer require divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null instead of returning bogus values for a non-existent term or field.
    (Robert Muir)
  • LUCENE-7996 : FunctionQuery and FunctionScoreQuery now return a score of 0 when the function produces a negative value.
    (Adrien Grand)
  • LUCENE-8116 : Similarities now score fields that omit norms as if the norm was 1. This might change score values on fields that omit norms.
    (Adrien Grand)
  • LUCENE-8134 : Index options are no longer automatically downgraded.
    (Adrien Grand)
  • LUCENE-8031 : Length normalization correctly reflects omission of term frequencies.
    (Robert Muir, Adrien Grand)
  • LUCENE-7444 : StandardAnalyzer no longer defaults to removing English stopwords
    (Alan Woodward)
  • LUCENE-8060 : IndexSearcher's search and searchAfter methods now only compute total hit counts accurately up to 1,000 in order to enable top-hits optimizations such as block-max WAND ( LUCENE-8135 ).
    (Adrien Grand)
  • LUCENE-8505 : IndexWriter#addIndices will now fail if the target index is sorted but the candidate is not.
    (Jim Ferenczi)
  • LUCENE-8535 : Highlighter and FVH doesn't support ToParent and ToChildBlockJoinQuery out of the box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery should be used.
    (Simon Willnauer, Jim Ferenczi, Julie Tibshirani)
  • LUCENE-8563 : BM25 scores don't include the (k1+1) factor in their numerator anymore. This doesn't affect ordering as this is a constant factor which is the same for every document.
    (Luca Cavanna via Adrien Grand)
  • LUCENE-8509 : WordDelimiterGraphFilter will no longer set the offsets of internal tokens by default, preventing a number of bugs when the filter is chained with tokenfilters that change the length of their tokens
    (Alan Woodward)
  • LUCENE-8633 : IntervalQuery scores do not use term weighting any more, the score is instead calculated as a function of the sloppy frequency of the matching intervals.
    (Alan Woodward, Jim Ferenczi)
  • LUCENE-8635 : FSTs can now remain off-heap, accessed via IndexInput, and the default codec's term dictionary (BlockTreeTermsReader) will now leave the FST for the terms index off-heap for non-primary-key fields using MMapDirectory, reducing heap usage for such fields.
    (Ankit Jain)
  • New Features (12)
  • LUCENE-8340 : LongPoint#newDistanceFeatureQuery may be used to boost scores based on how close a value of a long field is from an configurable origin. This is typically useful to boost by recency.
    (Adrien Grand)
  • LUCENE-8482 : LatLonPoint#newDistanceFeatureQuery may be used to boost scores based on the haversine distance of a LatLonPoint field to a provided point. This is typically useful to boost by distance.
    (Ignacio Vera)
  • LUCENE-8216 : Added a new BM25FQuery in sandbox to blend statistics across several fields using the BM25F formula.
    (Adrien Grand, Jim Ferenczi)
  • LUCENE-8564 : GraphTokenFilter is an abstract class useful for token filters that need to read-ahead in the token stream and take into account graph structures. This also changes FixedShingleFilter to extend GraphTokenFilter
    (Alan Woodward)
  • LUCENE-8612 : Intervals.extend() treats an interval as if it covered a wider span than it actually does, allowing users to force minimum gaps between intervals in a phrase.
    (Alan Woodward)
  • LUCENE-8629 : New interval functions: Intervals.before(), Intervals.after(), Intervals.within() and Intervals.overlapping().
    (Alan Woodward)
  • LUCENE-8622 : Adds a minimum-should-match interval function that produces intervals spanning a subset of a set of sources.
    (Alan Woodward)
  • LUCENE-8645 : Intervals.fixField() allows you to report intervals from one field as if they came from another.
    (Alan Woodward)
  • LUCENE-8646 : New interval functions: Intervals.prefix() and Intervals.wildcard()
    (Alan Woodward)
  • LUCENE-8655 : Add a getter in FunctionScoreQuery class in order to access to the underlying DoubleValuesSource.
    (Gérald Quaire via Alan Woodward)
  • LUCENE-8697 : GraphTokenStreamFiniteStrings correctly handles side paths containing gaps
    (Alan Woodward)
  • LUCENE-8702 : Simplify intervals returned from vararg Intervals factory methods
    (Alan Woodward)
  • Improvements (6)
  • LUCENE-7997 : Add BaseSimilarityTestCase to sanity check similarities. SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues. Add missing range checks for similarity parameters. Improve BM25 and ClassicSimilarity's explanations.
    (Robert Muir)
  • LUCENE-8011 : Improved similarity explanations.
    (Mayya Sharipova via Adrien Grand)
  • LUCENE-4198 : Codecs now have the ability to index score impacts.
    (Adrien Grand)
  • LUCENE-8135 : Boolean queries now implement the block-max WAND algorithm in order to speed up selection of top scored documents.
    (Adrien Grand)
  • LUCENE-8279 : CheckIndex now cross-checks terms with norms.
    (Adrien Grand)
  • LUCENE-8660 : TopDocsCollectors now return an accurate count (instead of a lower bound) if the total hit count is equal to the provided threshold.
    (Adrien Grand, Jim Ferenczi)
  • Optimizations (12)
  • LUCENE-8040 : Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms
    (David Smiley, Robert Muir)
  • LUCENE-4100 : Disjunctions now support faster collection of top hits when the total hit count is not required.
    (Stefan Pohl, Adrien Grand, Robert Muir)
  • LUCENE-7993 : Phrase queries are now faster if total hit counts are not required.
    (Adrien Grand)
  • LUCENE-8109 : Boolean queries propagate information about the minimum competitive score in order to make collection faster if there are disjunctions or phrase queries as sub queries, which know how to leverage this information to run faster.
    (Adrien Grand)
  • LUCENE-8439 : Disjunction max queries can skip blocks to select the top documents if the total hit count is not required.
    (Jim Ferenczi, Adrien Grand)
  • LUCENE-8204 : Boolean queries with a mix of required and optional clauses are now faster if the total hit count is not required.
    (Jim Ferenczi, Adrien Grand)
  • LUCENE-8448 : Boolean queries now propagates the mininum score to their sub-scorers.
    (Jim Ferenczi, Adrien Grand)
  • LUCENE-8511 : MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos
    (David Smiley)
  • LUCENE-8507 : TopFieldCollector can now update the minimum competitive score if the primary sort is by relevancy and the total hit count is not required.
    (Jim Ferenczi)
  • LUCENE-8464 : ConstantScoreScorer now implements setMinCompetitveScore in order to early terminate the iterator if the minimum score is greater than the constant score.
    (Christophe Bismuth via Jim Ferenczi)
  • LUCENE-8607 : MatchAllDocsQuery can shortcut when total hit count is not required
    (Alan Woodward, Adrien Grand)
  • LUCENE-8585 : Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values.
    (Toke Eskildsen, Adrien Grand)
  • Release 7.7.2 [2019-06-04]

  • Bug fixes (6)
  • LUCENE-8726 : ValueSource.asDoubleValuesSource() could leak a reference to IndexSearcher
    (Alan Woodward, Yury Pakhomov)
  • LUCENE-8735 : FilterDirectory.getPendingDeletions now forwards to the delegate even the method is not abstract in the super class. This prevents issues where our best effort in carrying on generations in the IndexWriter since pending deletions are swallowed by the FilterDirectory.
    (Henning Andersen, Simon Willnauer)
  • LUCENE-8688 : TieredMergePolicy#findForcedMerges now tries to create the cheapest merges that allow the index to go down to `maxSegmentCount` segments or less.
    (Armin Braun via Adrien Grand)
  • LUCENE-8785 : Ensure new threadstates are locked before retrieving the number of active threadstates. This causes assertion errors and potentially broken field attributes in the IndexWriter when IndexWriter#deleteAll is called while actively indexing.
    (Simon Willnauer)
  • LUCENE-8720 : NameIntCacheLRU (in the facets module) had an int overflow bug that disabled cleaning of the cache
    (Russell A Brown)
  • LUCENE-8809 : Refresh and rollback concurrently can leave segment states unclosed
    (Nhat Nguyen)
  • Release 7.7.1 [2019-03-01]

  • (No Changes)

    Release 7.7.0 [2019-02-11]

  • Changes in Runtime Behavior (1)
  • LUCENE-8527 : StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0, and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type.
  • Build (2)
  • LUCENE-8611 : Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core dependency.
    (Dawid Weiss)
  • LUCENE-8537 : ant test command fails under lucene/tools
    (Peter Somogyi)
  • Bug fixes (9)
  • LUCENE-8669 : Fix LatLonShape WITHIN queries that fail with Multiple search Polygons that share the dateline.
    (Nick Knize)
  • LUCENE-8603 : Fix the inversion of right ids for additional nouns in the Korean user dictionary.
    (Yoo Jeongin via Jim Ferenczi)
  • LUCENE-8624 : int overflow in ByteBuffersDataOutput.size().
    (Mulugeta Mammo, Dawid Weiss)
  • LUCENE-8625 : int overflow in ByteBuffersDataInput.sliceBufferList.
    (Mulugeta Mammo, Dawid Weiss)
  • LUCENE-8639 : Newly created threadstates while flushing / refreshing can cause duplicated sequence IDs on IndexWriter.
    (Simon Willnauer)
  • LUCENE-8649 : LatLonShape's within and disjoint queries can return false positives with indexed multi-shapes.
    (Ignacio Vera)
  • LUCENE-8654 : Polygon2D#relateTriangle returns the wrong answer if polygon is inside the triangle.
    (Ignacio Vera)
  • LUCENE-8650 : ConcatenatingTokenStream did not correctly clear its state in reset(), and was not propagating final position increments from its child streams correctly.
    (Dan Meehl, Alan Woodward)
  • LUCENE-8676 : The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
    (Jim Ferenczi)
  • New Features (3)
  • LUCENE-8026 : ExitableDirectoryReader may now time out queries that run on points such as range queries or geo queries.
    (Christophe Bismuth via Adrien Grand)
  • LUCENE-8508 : IndexWriter can now set the created version via IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature.
    (Adrien Grand)
  • LUCENE-8601 : Attributes set in the IndexableFieldType for each field during indexing will now be recorded into the corresponding FieldInfo's attributes, accessible at search
    (Murali Krishna P)
  • Improvements (8)
  • LUCENE-8463 : TopFieldCollector can now early-terminates queries when sorting by SortField.DOC.
    (Christophe Bismuth via Jim Ferenczi)
  • LUCENE-8562 : Speed up merging segments of points with data dimensions by only sorting on the indexed dimensions.
    (Ignacio Vera)
  • LUCENE-8529 : TopSuggestDocsCollector will now use the completion key to tiebreak completion suggestion with identical scores.
    (Jim Ferenczi)
  • LUCENE-8575 : SegmentInfos#toString now includes attributes and diagnostics.
    (Namgyu Kim via Adrien Grand)
  • LUCENE-8548 : The KoreanTokenizer no longer splits unknown words on combining diacritics and detects script boundaries more accurately with Character#UnicodeScript#of.
    (Christophe Bismuth, Jim Ferenczi)
  • LUCENE-8581 : Change LatLonShape encoding to use 4 bytes Per Dimension.
    (Ignacio Vera, Nick Knize, Adrien Grand)
  • LUCENE-8527 : Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer, increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization.
  • LUCENE-8640 : Date Range format validation
    (Lucky Sharma, David Smiley via Mikhail Khludnev)
  • Optimizations (6)
  • LUCENE-8552 : FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment.
    (Christophe Bismuth via David Smiley)
  • LUCENE-8590 : BufferedUpdates now uses an optimized storage for buffering docvalues updates that can safe up to 80% of the heap used compared to the previous implementation and uses non-object based datastructures.
    (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand)
  • LUCENE-8598 : Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats yields an up-to 4x performance improvement when applying doc values updates.
    (Simon Willnauer, Adrien Grand)
  • LUCENE-8599 : Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates.
    (Simon Willnauer, Adrien Grand)
  • LUCENE-8600 : Doc-value updates get applied faster by sorting with quicksort, rather than an in-place mergesort, which needs to perform fewer swaps.
    (Adrien Grand)
  • LUCENE-8623 : Decrease I/O pressure when merging high dimensional points.
    (Ignacio Vera)
  • Test Framework (1)
  • LUCENE-8604 : TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of logs per test class.
    (Dawid Weiss)
  • Other (3)
  • LUCENE-8573 : BKDWriter now uses FutureArrays#mismatch to compute shared prefixes.
    (Christoph Büscher via Adrien Grand)
  • LUCENE-8605 : Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery.
    (Ignacio Vera)
  • LUCENE-8609 : Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats() that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes.
    (Simon Willnauer, Nhat Nguyen)
  • Release 7.6.0 [2018-12-14]

  • Build (2)
  • LUCENE-8504 : Upgrade forbiddenapis to version 2.6.
    (Uwe Schindler)
  • LUCENE-8493 : Stop publishing insecure .sha1 files with releases
    (janhoy)
  • Bug fixes (13)
  • LUCENE-8479 : QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception if the number of expanded path reaches the BooleanQuery#maxClause limit.
    (Jim Ferenczi)
  • LUCENE-8522 : throw InvalidShapeException when constructing a polygon and all points are coplanar.
    (Ignacio Vera)
  • LUCENE-8531 : QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings in the graph if the slop is greater than 0. Span queries cannot be used in this case because they don't handle slop the same way than phrase queries.
    (Steve Rowe, Uwe Schindler, Jim Ferenczi)
  • LUCENE-8524 : Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer. This change also removes empty terms and trim surface form in Nori's Korean dictionary.
    (Trey Jones, Jim Ferenczi)
  • LUCENE-8550 : Fix filtering of coplanar points when creating linked list on polygon tesselator.
    (Ignacio Vera)
  • LUCENE-8549 : Polygon tessellator throws an error if some parts of the shape could not be processed.
    (Ignacio Vera)
  • LUCENE-8540 : Better handling of min/max values for Geo3d encoding.
    (Ignacio Vera)
  • LUCENE-8534 : Fix incorrect computation for triangles intersecting polygon edges in shape tessellation.
    (Ignacio Vera)
  • LUCENE-8559 : Fix bug where polygon edges were skipped when checking for intersections.
    (Ignacio Vera)
  • LUCENE-8556 : Use latitude and longitude instead of encoding values to check if triangle is ear when using morton optimisation.
    (Ignacio Vera)
  • LUCENE-8586 : Intervals.or() could get stuck in an infinite loop on certain indexes
    (Alan Woodward)
  • LUCENE-8595 : Fix interleaved DV update and reset. Interleaved update and reset value to the same doc in the same updates package looses an update if the reset comes before the update as well as loosing the reset if the update comes frist.
    (Simon Willnauer, Adrien Grand)
  • LUCENE-8592 : Fix index sorting corruption due to numeric overflow. The merge of sorted segments can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge (instead of last because of the reverse order) due to this bug. Indices affected by the bug can be detected by running the CheckIndex command on a distribution that contains the fix (7.6+).
    (Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer)
  • New Features (5)
  • LUCENE-8496 : Selective indexing - modify BKDReader/BKDWriter to allow users to select a fewer number of dimensions to be used for creating the index than the total number of dimensions used for field encoding. i.e., dimensions 0 to N may be used to determine how to split the inner nodes, and dimensions N+1 to D are ignored and stored as data dimensions at the leaves.
    (Nick Knize)
  • LUCENE-8538 : Add a Simple WKT Shape Parser for creating Lucene Geometries (Polygon, Line, Rectangle) from WKT format.
    (Nick Knize)
  • LUCENE-8462 : Adds an Arabic snowball stemmer based on https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl
    (Ryadh Dahimene via Jim Ferenczi)
  • LUCENE-8554 : Add new LatLonShapeLineQuery that queries indexed LatLonShape fields by arbitrary lines.
    (Nick Knize)
  • LUCENE-8555 : Add dateline crossing support to LatLonShapeBoundingBoxQuery.
    (Nick Knize)
  • Improvements (3)
  • LUCENE-8521 : Change LatLonShape encoding to 7 dimensions instead of 6; where the first 4 are index dimensions defining the bounding box of the Triangle and the remaining 3 data dimensions define the vertices of the triangle.
    (Nick Knize)
  • LUCENE-8557 : LeafReader.getFieldInfos is now documented and tested that it ought to return the same cached instance. MemoryIndex's impl now pre-creates the FieldInfos instead of re-calculating a new instance each time.
    (Tim Underwood, David Smiley)
  • LUCENE-8558 : Replace O(N) lookup with O(1) lookup in PerFieldMergeState#FilterFieldInfos.
    (Kranthi via Simon Willnauer)
  • Other (2)
  • LUCENE-8523 : Correct typo in JapaneseNumberFilterFactory javadocs
    (Ankush Jhalani via Alan Woodward)
  • LUCENE-8533 : Fix Javadocs of DataInput#readVInt(): Negative numbers are supported, but should be avoided.
    (Vladimir Dolzhenko via Uwe Schindler)
  • Release 7.5.1

  • Bug Fixes (1)
  • LUCENE-8454 : Fix incorrect vertex indexing and other computation errors in shape tessellation that would sometimes cause an infinite loop.
    (Nick Knize)
  • Release 7.5.0 [2018-09-24]

  • API Changes (18)
  • LUCENE-8467 : RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
    (Dawid Weiss)
  • LUCENE-8356 : StandardFilter is deprecated
    (Alan Woodward)
  • LUCENE-8373 : ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated. Instead use EnglishAnalyzer.ENGLISH_STOP_WORD_SET. The default constructor for StopAnalyzer is also deprecated, and a stop word set should be explicitly passed to the constructor.
    (Alan Woodward)
  • LUCENE-8378 : Add DocIdSetIterator.range static method to return an iterator matching a range of docids
    (Mike McCandless)
  • LUCENE-8379 : Add experimental TermQuery.getTermStates method
    (Mike McCandless)
  • LUCENE-8407 : Add experimental SpanTermQuery.getTermStates method
    (David Smiley)
  • LUCENE-8390 : MatchesIteratorSupplier replaced by IOSupplier
    (Alan Woodward, David Smiley)
  • LUCENE-8397 : Add DirectoryTaxonomyWriter.getCache
    (Mike McCandless)
  • LUCENE-8387 : Add experimental IndexSearcher.getSlices API to see which slices IndexSearcher is searching concurrently when it's created with an ExecutorService
    (Mike McCandless)
  • LUCENE-8263 : TieredMergePolicy's reclaimDeletesWeight has been replaced with a new deletesPctAllowed setting to control how aggressively deletes should be reclaimed.
    (Erick Erickson, Adrien Grand)
  • LUCENE-7314 : Graduate LatLonPoint and query classes to core
    (Nick Knize)
  • LUCENE-8428 : The way that oal.util.PriorityQueue creates sentinel objects has been changed from a protected method to a java.util.function.Supplier as a constructor argument.
    (Adrien Grand)
  • LUCENE-8437 : CheckIndex.Status.cantOpenSegments and missingSegmentVersion have been removed as they were not computed correctly.
    (Adrien Grand)
  • LUCENE-8286 : The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that will tell this highlighter to use the new MatchesIterator API as the underlying approach to navigate matching hits for a query. This mode will highlight more accurately than any other highlighter, and can mark up phrases as one span instead of word-by-word. The UH's public internal APIs changed a bit in the process.
    (David Smiley)
  • LUCENE-8471 : IndexWriter.getFlushingBytes() returns how many bytes are currently being flushed to disk.
    (Alan Woodward)
  • LUCENE-8422 : Static helper functions for Matches and MatchesIterator implementations have been moved from Matches to MatchesUtils
    (Alan Woodward)
  • LUCENE-8343 : Suggesters now require Long (versus long, previously) from weight() method while indexing, and provide double (versus long, previously) scores at lookup time
    (Alessandro Benedetti)
  • LUCENE-8459 : SearcherTaxonomyManager now has a constructor taking already opened IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example.
    (Mike McCandless)
  • Bug Fixes (13)
  • LUCENE-8445 : Tighten condition when two planes are identical to prevent constructing bogus tiles when building GeoPolygons.
    (Ignacio Vera)
  • LUCENE-8444 : Prevent building functionally identical plane bounds when constructing DualCrossingEdgeIterator .
    (Ignacio Vera)
  • LUCENE-8380 : UTF8TaxonomyWriterCache inconsistency.
    (Ruslan Torobaev, Dawid Weiss)
  • LUCENE-8164 : IndexWriter silently accepts broken payload. This has been fixed via LUCENE-8165 since we are now checking for offset+length going out of bounds.
    (Robert Muir, Nhat Nyugen, Simon Willnauer)
  • LUCENE-8370 : Reproducing TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields() failures
    (Erick Erickson)
  • LUCENE-8376 , LUCENE-8371 : ConditionalTokenFilter.end() would not propagate correctly if the last token in the stream was subsequently dropped; FixedShingleFilter did not set position increment in end()
    (Alan Woodward)
  • LUCENE-8395 : WordDelimiterGraphFilter would incorrectly insert a hole into a TokenStream if a token consisting entirely of delimiter characters was encountered, but preserve_original was set.
    (Alan Woodward)
  • LUCENE-8398 : TieredMergePolicy.getMaxMergedSegmentMB has rounding error
    (Erick Erickson)
  • LUCENE-8429 : DaciukMihovAutomatonBuilder is no longer prone to stack overflows by enforcing a maximum term length.
    (Adrien Grand)
  • LUCENE-8441 : IndexWriter now checks doc value type for index sort fields and fails the document if they are not compatible.
    (Jim Ferenczi, Mike McCandless)
  • LUCENE-8458 : Adjust initialization condition of PendingSoftDeletes and ensures it is initialized before accepting deletes
    (Simon Willnauer, Nhat Nguyen)
  • LUCENE-8466 : IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush if the index is sorted.
    (Adrien Grand, Jim Ferenczi, Vish Ramachandran)
  • LUCENE-8502 : Allow access to delegate in FilterCodecReader. FilterCodecReader didn't allow access to it's delegate like other filter readers. This adds a new #getDelegate method to access the wrapped reader.
    (Simon Willnauer)
  • Changes in Runtime Behavior (3)
  • LUCENE-7976 : TieredMergePolicy now respects maxSegmentSizeMB by default when executing findForcedMerges and findForcedDeletesMerges
    (Erick Erickson)
  • LUCENE-8263 : TieredMergePolicy now reclaims deleted documents more aggressively by default ensuring that no more than ~1/3 of the index size is used by deleted documents.
    (Adrien Grand)
  • LUCENE-8503 : Call #getDelegate instead of direct member access during unwrap. Filter*Reader instances access the member or the delegate directly instead of calling getDelegate(). In order to track access of the delegate these methods should call #getDelegate()
    (Simon Willnauer)
  • Improvements (14)
  • LUCENE-8468 : A ByteBuffer based Directory implementation.
    (Dawid Weiss)
  • LUCENE-8447 : Add DISJOINT and WITHIN support to LatLonShape queries.
    (Nick Knize)
  • LUCENE-8440 : Add support for indexing and searching Line and Point shapes using LatLonShape encoding
    (Nick Knize)
  • LUCENE-8435 : Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons
    (Nick Knize)
  • LUCENE-8367 : Make per-dimension drill down optional for each facet dimension
    (Mike McCandless)
  • LUCENE-8396 : Add Points Based Shape Indexing and Search that decomposes shapes into a triangular mesh and indexes individual triangles as a 6 dimension point
    (Nick Knize)
  • LUCENE-8345 , GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives; add wrapper class constructors to forbiddenapis.
    (Michael Braun via Uwe Schindler)
  • LUCENE-8415 : Clean up Directory contracts and JavaDoc comments.
    (Dawid Weiss)
  • LUCENE-8414 : Make segmentInfos private in IndexWriter
    (Simon Willnauer, Nhat Nguyen)
  • LUCENE-8446 : The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in the passage as merged (as if one larger match).
    (David Smiley)
  • LUCENE-8460 : Better argument validation in StoredField.
    (Namgyu Kim)
  • LUCENE-8432 : TopFieldComparator stops comparing documents if the index is sorted, even if hits still need to be visited to compute the hit count.
    (Nikolay Khitrin)
  • LUCENE-8422 : IntervalQuery now returns useful Matches
    (Alan Woodward)
  • LUCENE-7862 : Store the real bounds of the leaf cells in the BKD index when the number of dimensions is bigger than 1. It improves performance when there is correlation between the dimensions, for example ranges.
    (Ignacio Vera, Adrien Grand)
  • Build (1)
  • LUCENE-5143 : Stop publishing KEYS file with each version, use topmost lucene/KEYS file only. The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file. Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build.
    (janhoy)
  • Other (9)
  • LUCENE-8485 : Update randomizedtesting to version 2.6.4.
    (Dawid Weiss)
  • LUCENE-8366 : Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's Extended_Pictographic property.
    (Robert Muir)
  • LUCENE-8408 : original Highlighter: Remove obsolete static AttributeFactory instance in TokenStreamFromTermVector.
    (Michael Braun, David Smiley)
  • LUCENE-8420 : Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x cannot read. 1.9.0 can read the old format.
    (Koji Sekiguchi)
  • LUCENE-8453 : Add documentation to analysis factories of Korean (Nori) analyzer module.
    (Tomoko Uchida via Uwe Schindler)
  • LUCENE-8455 : Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml
    (Erick Erickson)
  • LUCENE-8456 : Upgrade Apache Commons Compress to v1.18
    (Steve Rowe)
  • LUCENE-765 : Improved org.apache.lucene.index javadocs.
    (Mike Sokolov)
  • LUCENE-8476 : Remove redundant nullity check and switch to optimized List.sort in the Korean's user dictionary.
    (Namgyu Kim)
  • Release 7.4.1

  • Bug Fixes (4)
  • LUCENE-8365 : Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes a "off by one" error in the UnifiedHighlighter's code that is only triggered when two nested SpanNearQueries contain the same term.
    (Marc-Andre Morissette via Simon Willnauer)
  • LUCENE-8381 : Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes while wrapping reader for merges.
    (Simon Willnauer, Nhat Nguyen)
  • LUCENE-8384 : Fix missing advance docValues generation while handling docValues update in PendingSoftDeletes.
    (Simon Willnauer, Nhat Nguyen)
  • LUCENE-8472 : Always rewrite the soft-deletes merge retention query.
    (Adrien Grand, Nhat Nguyen)
  • Release 7.4.0 [2018-06-27]

  • Upgrading (1)
  • LUCENE-8344 : If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you explicitly use the preservePositionIncrements=false setting (not the default), then you ought to rebuild your suggester index. If you don't, queries or indexed data with trailing position gaps (e.g. stop words) may not work correctly.
    (David Smiley, Jim Ferenczi)
  • API Changes (3)
  • LUCENE-8242 : IndexSearcher.createNormalizedWeight() has been deprecated. Instead use IndexSearcher.createWeight(), rewriting the query first.
    (Alan Woodward)
  • LUCENE-8248 : MergePolicyWrapper is renamed to FilterMergePolicy and now also overrides getMaxCFSSegmentSizeMB
    (Mike Sokolov via Mike McCandless)
  • LUCENE-8303 : LiveDocsFormat is now only responsible for (de)serialization of live docs.
    (Adrien Grand)
  • Changes in Runtime Behavior (2)
  • LUCENE-8309 : Live docs are no longer backed by a FixedBitSet.
    (Adrien Grand)
  • LUCENE-8330 : Detach IndexWriter from MergePolicy. MergePolicy now instead of requiring IndexWriter as a hard dependency expects a MergeContext which IndexWriter implements.
    (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless)
  • New Features (19)
  • LUCENE-8200 : Allow doc-values to be updated atomically together with a document. Doc-Values updates now can be used as a soft-delete mechanism to all keeping several version of a document or already deleted documents around for later reuse. See "IW.softUpdateDocument(...)" for reference.
    (Simon Willnauer)
  • LUCENE-8197 : A new FeatureField makes it easy and efficient to integrate static relevance signals into the final score.
    (Adrien Grand, Robert Muir)
  • LUCENE-8202 : Add a FixedShingleFilter
    (Alan Woodward, Adrien Grand, Jim Ferenczi)
  • LUCENE-8125 : ICUTokenizer support for emoji/emoji sequence tokens.
    (Robert Muir)
  • LUCENE-8196 , LUCENE-8300 : A new IntervalQuery in the sandbox allows efficient proximity searches based on minimum-interval semantics.
    (Alan Woodward, Adrien Grand, Jim Ferenczi, Simon Willnauer, Matt Weber)
  • LUCENE-8233 : Add support for soft deletes to IndexWriter delete accounting. Soft deletes are accounted for inside the index writer and therefor also by merge policies. A SoftDeletesRetentionMergePolicy is added that allows to selectively carry over soft_deleted document across merges for retention policies
    (Simon Willnauer, Mike McCandless, Robert Muir)
  • LUCENE-8237 : Add a SoftDeletesDirectoryReaderWrapper that allows to respect soft deletes if the reader is opened form a directory.
    (Simon Willnauer, Mike McCandless, Uwe Schindler, Adrien Grand)
  • LUCENE-8229 , LUCENE-8270 : Add a method Weight.matches(LeafReaderContext, doc) that returns an iterator over matching positions for a given query and document. This allows exact hit extraction and will enable implementation of accurate highlighters.
    (Alan Woodward, Adrien Grand, David Smiley)
  • LUCENE-8249 : Implement Matches API for phrase queries
    (Alan Woodward, Adrien Grand)
  • LUCENE-8246 : Allow to customize the number of deletes a merge claims. This helps merge policies in the soft-delete case to correctly implement retention policies without triggering uncessary merges.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-8231 : A new analysis module (nori) similar to Kuromoji but to handle Korean using mecab-ko-dic and morphological analysis.
    (Robert Muir, Jim Ferenczi)
  • LUCENE-8265 : WordDelimter/GraphFilter now have an option to skip tokens marked with KeywordAttribute
    (Mike Sokolov via Mike McCandless)
  • LUCENE-8297 : Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can update doc values for a specific term but this might affect all documents containing the term. With tryUpdateDocValues users can update doc-values fields for individual documents. This allows for instance to soft-delete individual documents.
    (Simon Willnauer)
  • LUCENE-8298 : Allow DocValues updates to reset a value. Passing a DV field with a null value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the provided document. This allows to undelete a soft-deleted document unless it's been claimed by a merge.
    (Simon Willnauer)
  • LUCENE-8273 : ConditionalTokenFilter allows analysis chains to skip particular token filters based on the attributes of the current token. This generalises the keyword token logic currently used for stemmers and WDF. It is integrated into CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new ProtectedTermFilter is added as an example.
    (Alan Woodward, Robert Muir, David Smiley, Steve Rowe, Mike Sokolov)
  • LUCENE-8310 : Ensure IndexFileDeleter accounts for pending deletes. Today we fail creating the IndexWriter when the directory has a pending delete. Yet, this is mainly done to prevent writing still existing files more than once. IndexFileDeleter already accounts for that for existing files which we can now use to also take pending deletes into account which ensures that all file generations per segment always go forward.
    (Simon Willnauer)
  • LUCENE-7960 : Add preserveOriginal option to the NGram and EdgeNGram filters.
    (Ingomar Wesp, Shawn Heisey via Robert Muir)
  • LUCENE-8335 : Enforce soft-deletes field up-front. Soft deletes field must be marked as such once it's introduced and can't be changed after the fact.
    (Nhat Nguyen via Simon Willnauer)
  • LUCENE-8332 : New ConcatenateGraphFilter for concatenating all tokens into one (or more in the event of a graph input). This is useful for fast analyzed exact-match lookup, suggesters, and as a component of a named entity recognition system. This was excised out of CompletionTokenStream in the NRT doc suggester.
    (David Smiley, Jim Ferenczi)
  • Bug Fixes (19)
  • LUCENE-8221 : MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes.
  • LUCENE-8266 : Detect bogus tiles when creating a standard polygon and throw a TileException.
    (Ignacio Vera)
  • LUCENE-8234 : Fixed bug in how spatial relationship is computed for GeoStandardCircle when it covers the whole world.
    (Ignacio Vera)
  • LUCENE-8236 : Filter duplicated points when creating GeoPath shapes to avoid creation of bogus planes.
    (Ignacio Vera)
  • LUCENE-8243 : IndexWriter.addIndexes(Directory[]) did not properly preserve index file names for updated doc values fields
    (Simon Willnauer, Michael McCandless, Nhat Nguyen)
  • LUCENE-8275 : Push up #checkPendingDeletes to Directory to ensure IW fails if the directory has pending deletes files even if the directory is filtered or a FileSwitchDirectory
    (Simon Willnauer, Robert Muir)
  • LUCENE-8244 : Do not leak open file descriptors in SearcherTaxonomyManager's refresh on exception
    (Mike McCandless)
  • LUCENE-8305 : ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery that rewrites to a MatchNoDocsQuery instead of throwing an exception.
    (Bjarke Mortensen, Andy Tran via David Smiley)
  • LUCENE-8287 : Ensure that empty regex completion queries always return no results.
    (Julie Tibshirani via Jim Ferenczi)
  • LUCENE-8317 : Prevent concurrent deletes from being applied during full flush. Future deletes could potentially be exposed to flushes/commits/refreshes if the amount of RAM used by deletes is greater than half of the IW RAM buffer.
    (Simon Willnauer)
  • LUCENE-8320 : Fix WindowsFS to correctly account for rename and hardlinks.
    (Simon Willnauer, Nhat Nguyen)
  • LUCENE-8328 : Ensure ReadersAndUpdates consistently executes under lock.
    (Nhat Nguyen via Simon Willnauer)
  • LUCENE-8325 : Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs.
    (chengpohi via Jim Ferenczi)
  • LUCENE-8186 : LowerCaseTokenizerFactory now lowercases text in multi-term queries.
    (Tim Allison via Adrien Grand)
  • LUCENE-8278 : Some end-of-input no-scheme domain-only URL tokens are typed as <ALPHANUM> rather than <URL>.
    (Junte Zhang, Steve Rowe)
  • LUCENE-8355 : Prevent IW from opening an already dropped segment while DV updates are written.
    (Nhat Nguyen via Simon Willnauer)
  • LUCENE-8344 : TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing position increment when the preservePositionIncrement setting is false.
    (David Smiley, Jim Ferenczi)
  • LUCENE-8357 : FunctionScoreQuery.boostByQuery() and boostByValue() were producing truncated Explanations
    (Markus Jelsma, Alan Woodward)
  • LUCENE-8360 : NGramTokenFilter and EdgeNGramTokenFilter did not correctly set position increments in end()
    (Alan Woodward)
  • Other (9)
  • LUCENE-8301 : Update randomizedtesting to 2.6.0.
    (Dawid Weiss)
  • LUCENE-8299 : Geo3D wrapper uses new polygon method factory that gives better support for polygons with many points (>100).
    (Ignacio vera)
  • LUCENE-8261 : InterpolatedProperties.interpolate and recursive property references.
    (Steve Rowe, Dawid Weiss)
  • LUCENE-8228 : removed obsolete IndexDeletionPolicy clone() requirements from the javadoc.
    (Dawid Weiss)
  • LUCENE-8219 : Use a realistic estimate of the number of nodes and links in LevensteinAutomaton.java, to save reallocation of arrays.
    (Christian Ziech)
  • LUCENE-8214 : Improve selection of testPoint for GeoComplexPolygon.
    (Ignacio Vera)
  • SOLR-10912 : Add automatic patch validation.
    (Mano Kovacs, Steve Rowe)
  • LUCENE-8122 , LUCENE-8175 : Upgrade analysis/icu to ICU 61.1.
    (Robert Muir, Adrien Grand, Uwe Schindler)
  • LUCENE-8291 : Remove QueryTemplateManager utility class from XML queryparser. This class is just a general XML transforming tool (using property files and XSLT) and has nothing to do with query parsing. It can easily be implemented using more sophisticated libraries or using XSL transformers from the JDK. This change also removes the Lucene demo webapp to prevent XSS issues in untested/unmaintained code.
    (Uwe Schindler)
  • Build (2)
  • LUCENE-7935 : Publish .sha512 hash files with the release artifacts and stop publishing .md5 hashes since the algorithm is broken
    (janhoy)
  • LUCENE-8230 : Upgrade forbiddenapis to version 2.5.
    (Uwe Schindler)
  • Documentation (1)
  • LUCENE-8238 : Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
    (Mike Sokolov via Mike McCandless)
  • Release 7.3.1 [2018-05-15]

  • Bug fixes (1)
  • LUCENE-8254 : LRUQueryCache could cause IndexReader to hang on close, when shared with another reader with no CacheHelper
    (Alan Woodward, Simon Willnauer, Adrien Grand)
  • Release 7.3.0 [2018-04-04]

  • API Changes (4)
  • LUCENE-8051 : LevensteinDistance renamed to LevenshteinDistance.
    (Pulak Ghosh via Adrien Grand)
  • LUCENE-8099 : Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery. Users should instead use FunctionScoreQuery, possibly combined with a lucene expression
    (Alan Woodward)
  • LUCENE-8104 : Remove facets module compile-time dependency on queries
    (Alan Woodward)
  • LUCENE-8145 : UnifiedHighlighter now uses a unitary OffsetsEnum rather than a list of enums
    (Alan Woodward, David Smiley, Jim Ferenczi, Timothy Rodriguez)
  • New Features (2)
  • LUCENE-2899 : Add new module analysis/opennlp, with analysis components to perform tokenization, part-of-speech tagging, lemmatization and phrase chunking by invoking the corresponding OpenNLP tools. Named entity recognition is also provided as a Solr update request processor.
    (Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau, Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
  • LUCENE-8126 : Add new spatial prefix tree (SPT) based on google S2 geometry. It can only be used currently with Geo3D spatial context and it provides improvements on indexing time for non-points shapes and on query performance.
    (Ignacio Vera, David Smiley) .
  • Improvements (11)
  • LUCENE-8081 : Allow IndexWriter to opt out of flushing on indexing threads Index/Update Threads try to help out flushing pending document buffers to disk. This change adds an expert setting to opt ouf of this behavior unless flusing is falling behind.
    (Simon Willnauer)
  • LUCENE-8086 : spatial-extras Geo3dFactory: Use GeoExactCircle with configurable precision for non-spherical planet models.
    (Ignacio Vera via David Smiley)
  • LUCENE-8093 : TrimFilterFactory implements MultiTermAwareComponent
    (Alan Woodward)
  • LUCENE-8094 : TermInSetQuery.toString now returns "field:(A B C)"
    (Mike McCandless)
  • LUCENE-8121 : UnifiedHighlighter passage relevancy is improved for terms that are position sensitive (e.g. part of a phrase) by having an accurate freq.
    (David Smiley)
  • LUCENE-8129 : A Unicode set filter can now be specified when using ICUFoldingFilter.
    (Ere Maijala)
  • LUCENE-7966 : Build Multi-Release JARs to enable usage of optimized intrinsic methods from Java 9 for index bounds checking and array comparison/mismatch. This change introduces Java 8 replacements for those Java 9 methods and patches the compiled classes to use the optimized variants through the MR-JAR mechanism.
    (Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless)
  • LUCENE-8127 : Speed up rewriteNoScoring when there are no MUST clauses.
    (Michael Braun via Adrien Grand)
  • LUCENE-8152 : Improve consumption of doc-value iterators.
    (Horatiu Lazu via Adrien Grand)
  • LUCENE-8033 : FieldInfos now always use a dense encoding.
    (Mayya Sharipova via Adrien Grand)
  • LUCENE-8190 : Specialized cell interface to allow any spatial prefix tree to benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy.
    (Ignacio Vera)
  • Bug Fixes (10)
  • LUCENE-8077 : Fixed bug in how CheckIndex verifies doc-value iterators.
    (Xiaoshan Sun via Adrien Grand)
  • SOLR-11758 : Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F
    (Munendra S N via hossman)
  • LUCENE-8121 : The UnifiedHighlighter would highlight some terms within some nested SpanNearQueries at positions where it should not have. It's fixed in the UH by switching to the SpanCollector API. The original Highlighter still has this problem ( LUCENE-2287 , LUCENE-5455 , LUCENE-6796 ). Some public but internal parts of the UH were refactored.
    (David Smiley, Steve Davids)
  • LUCENE-8120 : Fix LatLonBoundingBox's toString() method
    (Martijn van Groningen, Adrien Grand)
  • LUCENE-8130 : Fix NullPointerException from TermStates.toString()
    (Mike McCandless)
  • LUCENE-8124 : Fixed HyphenationCompoundWordTokenFilter to handle correctly hyphenation patterns with indicator >= 7.
    (Holger Bruch via Adrien Grand)
  • LUCENE-8163 : BaseDirectoryTestCase could produce random filenames that fail on Windows
    (Alan Woodward)
  • LUCENE-8174 : Fixed {Float,Double,Int,Long}Range.toString().
    (Oliver Kaleske via Adrien Grand)
  • LUCENE-8182 : Fixed BoostingQuery to apply the context boost instead of the parent query boost
    (Jim Ferenczi)
  • LUCENE-8188 : Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the ResourceLoader to be leaked
    (hossman)
  • Other (8)
  • LUCENE-8111 : IndexOrDocValuesQuery Javadoc references outdated method name.
    (Kai Chan via Adrien Grand)
  • LUCENE-8106 : Add script (reproduceJenkinsFailures.py) to attempt to reproduce failing tests from a Jenkins log.
    (Steve Rowe)
  • LUCENE-8075 : Removed unnecessary null check in IntersectTermsEnum.
    (Pulak Ghosh via Adrien Grand)
  • LUCENE-8156 : Require users to not have ASM on the Ant classpath during build. This is required by LUCENE-7966 .
    (Adrien Grand, Uwe Schindler)
  • LUCENE-8161 : spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7, which is drop-in compatible (Lucene doesn't expressly use any of the few API differences). Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version. JTS 1.15.0 is dual-licensed to include BSD; prior versions were LGPL.
    (David Smiley)
  • LUCENE-8155 : Add back support in smoke tester to run against later Java versions.
    (Uwe Schindler)
  • LUCENE-8169 : Migrated build to use OpenClover 4.2.1 for checking code coverage.
    (Uwe Schindler)
  • LUCENE-8170 : Improve OpenClover reports (separate test from production code); enable coverage reports inside test-frameworks.
    (Uwe Schindler)
  • Build (2)
  • LUCENE-8168 : Moved Groovy scripts in build files to separate files. Update Groovy to 2.4.13.
    (Uwe Schindler)
  • LUCENE-8176 : HttpReplicatorTest awaits more than a minute for stopping Jetty threads
    (Mikhail Khludnev)
  • Release 7.2.1 [2018-01-15]

  • Bug Fixes (1)
  • LUCENE-8117 : Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues.
    (Jim Ferenczi) .
  • API Changes (8)
  • LUCENE-8017 , LUCENE-8042 : Weight, DoubleValuesSource and related objects now implement a SegmentCacheable interface, with a single method isCacheable(LeafReaderContext) determining whether or not the object may be cached against a LeafReader.
    (Alan Woodward, Robert Muir)
  • LUCENE-8038 : Payload factors for scoring in PayloadScoreQuery are now calculated by a PayloadDecoder, instead of delegating to the Similarity.
    (Alan Woodward)
  • LUCENE-8014 : Similarity.computeSlopFactor() and Similarity.computePayloadFactor() have been deprecated.
    (Alan Woodward)
  • LUCENE-6278 : Scorer.freq() has been removed
    (Alan Woodward)
  • LUCENE-7736 : DoubleValuesSource and LongValuesSource now expose a rewrite(IndexSearcher) function.
    (Alan Woodward)
  • LUCENE-7998 : DoubleValuesSource.fromQuery() allows you to use the scores from a Query as a DoubleValuesSource.
    (Alan Woodward)
  • LUCENE-8049 : IndexWriter.getMergingSegments()'s return type was changed from Collection to Set to more accurately reflect it's nature.
    (David Smiley)
  • LUCENE-8059 : TopFieldDocCollector can now early terminate collection when the sort order is compatible with the index order. As a consequence, EarlyTerminatingSortingCollector is now deprecated.
    (Adrien Grand)
  • New Features (3)
  • LUCENE-8061 : Add convenience factory methods to create BBoxes and XYZSolids directly from bounds objects.
  • LUCENE-7736 : IndexReaderFunctions expose various IndexReader statistics as DoubleValuesSources.
    (Alan Woodward)
  • LUCENE-8068 : Allow IndexWriter to write a single DWPT to disk Adds a flushNextBuffer method to IndexWriter that allows the caller to synchronously move the next pending or the biggest non-pending index buffer to disk. This enables flushing selected buffer to disk without highjacking an indexing thread. This is for instance useful if more than one IW (shards) must be maintained in a single JVM / system.
    (Simon Willnauer)
  • Bug Fixes (11)
  • LUCENE-8076 : Normalize Vincenti distance calculation for planet models that aren't normalized.
    (Ignacio Vera)
  • LUCENE-8057 : Exact circle bounds computation was incorrect.
    (Ignacio Vera)
  • LUCENE-8056 : Exact circle segment bounding suffered from precision errors.
    (Karl Wright)
  • LUCENE-8054 : Fix the exact circle case where relationships fail when the planet model has c <= ab, because the planes are constructed incorrectly.
    (Ignacio Vera)
  • LUCENE-7991 : KNearestNeighborDocumentClassifier.knnSearch no longer applies a previous boosted field's factor to subsequent unboosted fields.
    (Christine Poerschke)
  • LUCENE-7999 : Switch from int to long to track the name for the next segment to write, so that very long lived indices with very frequent refreshes or commits, and high indexing thread counts, do not overflow an int
    (Mykhailo Demianenko via Mike McCandless)
  • LUCENE-8025 : Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields that omit term frequency information, as it is equivalent in that case. Previously bogus numbers were used, and many similarities would completely degrade.
    (Robert Muir, Adrien Grand)
  • LUCENE-8045 : ParallelLeafReader did not correctly report FieldInfo.dvGen
    (Alan Woodward)
  • LUCENE-8034 : Use subtraction instead of addition to sidestep int overflow in SpanNotQuery.
    (Hari Menon via Mike McCandless)
  • LUCENE-8078 : The query cache should not cache instances of MatchNoDocsQuery.
    (Jon Harper via Adrien Grand)
  • LUCENE-8048 : Filesystems do not guarantee order of directories updates
    (Nikolay Martynov, Simon Willnauer, Erick Erickson)
  • Optimizations (6)
  • LUCENE-8018 : Smaller FieldInfos memory footprint by not retaining unnecessary references to TreeMap entries.
    (Julian Vassev via Adrien Grand)
  • LUCENE-7994 : Use int/int scatter map to gather facet counts when the number of hits is small relative to the number of unique facet labels
    (Dawid Weiss, Robert Muir, Mike McCandless)
  • LUCENE-8062 : GlobalOrdinalsQuery is no longer eligible for caching.
    (Jim Ferenczi)
  • LUCENE-8058 : Large instances of TermInSetQuery are no longer eligible for caching as they could break memory accounting of the query cache.
    (Adrien Grand)
  • LUCENE-8055 : MemoryIndex.MemoryDocValuesIterator returns 2 documents instead of 1.
    (Simon Willnauer)
  • LUCENE-8043 : Fix document accounting in IndexWriter to prevent writing too many documents. Once this happens, Lucene refuses to open the index and throws a CorruptIndexException.
    (Simon Willnauer, Yonik Seeley, Mike McCandless)
  • Tests (1)
  • LUCENE-8035 : Run tests with JDK-specific options: --illegal-access=deny on Java 9+.
    (Uwe Schindler)
  • Build (1)
  • LUCENE-6144 : Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy jars in ~/.ant/lib/.
    (Shawn Heisey, Steve Rowe)
  • Release 7.1.0 [2017-10-17]

  • Changes in Runtime Behavior (1)
  • Resolving of external entities in queryparser/xml/CoreParser is disallowed by default. See SOLR-11477 for details.
  • New Features (18)
  • LUCENE-7970 : Add a shape to Geo3D that consists of multiple planes that approximate a true circle, rather than an ellipse, for non-spherical planet models.
    (Karl Wright, Ignacio Vera)
  • LUCENE-7955 : Add support for the concept of "nearest distance" to Geo3D's GeoPath abstraction, which is the distance along the path to the point that is closest to the provided point.
    (Karl Wright)
  • LUCENE-7906 : Add spatial relationships between all currently-defined Geo shapes.
    (Ignacio Vera)
  • LUCENE-7955 : Add support for zero-width paths.
    (Karl Wright)
  • LUCENE-7936 : Add serialization and deserialization support to Geo3D.
    (Karl Wright, Ignacio Vera)
  • LUCENE-7942 : Distance computations now have the ability to accurately aggregate distances, rather than just doing sums.
    (Karl Wright)
  • LUCENE-7934 : Add a planet model interface.
    (Karl Wright)
  • LUCENE-7918 : Revamp the API for composites so that it's generic and can be used for many kinds of shapes.
    (Ignacio Vera)
  • LUCENE-7621 : Add CoveringQuery, a query whose required number of matching clauses can be defined per document.
    (Adrien Grand)
  • LUCENE-7927 : Add LongValueFacetCounts, to compute facet counts for individual numeric values
    (Mike McCandless)
  • LUCENE-7940 : Add BengaliAnalyzer.
    (Md. Abdulla-Al-Sun via Robert Muir)
  • LUCENE-7392 : Add point based LatLonBoundingBox as new RangeField Type.
    (Nick Knize)
  • LUCENE-7951 : Spatial-extras has much better Geo3d support by implementing Spatial4j abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator.
    (Ignacio Vera, David Smiley)
  • LUCENE-7973 : Update dictionary version for Ukrainian analyzer to 3.9.0
    (Andriy Rysin via Dawid Weiss)
  • LUCENE-7974 : Add FloatPointNearestNeighbor, an N-dimensional FloatPoint K-nearest-neighbor search implementation.
    (Steve Rowe)
  • LUCENE-7975 : Change the default taxonomy facets cache to a faster byte[] (UTF-8) based cache.
    (Mike McCandless)
  • LUCENE-7972 : DirectoryTaxonomyReader, in Lucene's facet module, now implements Accountable, so you can more easily track how much heap it's using.
    (Mike McCandless)
  • LUCENE-7982 : A new NormsFieldExistsQuery matches documents that have norms in a specified field
    (Colin Goodheart-Smithe via Mike McCandless)
  • Optimizations (6)
  • LUCENE-7905 : Optimize how OrdinalMap (used by SortedSetDocValuesFacetCounts and others) builds its map
    (Robert Muir, Adrien Grand, Mike McCandless)
  • LUCENE-7655 : Speed up geo-distance queries in case of dense single-valued fields when most documents match.
    (Maciej Zasada via Adrien Grand)
  • LUCENE-7897 : IndexOrDocValuesQuery now requires the range cost to be more than 8x greater than the cost of the lead iterator in order to use doc values.
    (Murali Krishna P via Adrien Grand)
  • LUCENE-7925 : Collapse duplicate SHOULD or MUST clauses by summing up their boosts.
    (Adrien Grand)
  • LUCENE-7939 : MinShouldMatchSumScorer now leverages two-phase iteration in order to be faster when used in conjunctions.
    (Adrien Grand)
  • LUCENE-7827 : AnalyzingInfixSuggester doesn't create "textgrams" when minPrefixChar=0
    (Mikhail Khludnev)
  • Bug Fixes (9)
  • LUCENE-8066 : It was still possible to construct a concave GeoExactCircle, so use a sector approach to prevent that.
    (Ignacio Vera)
  • LUCENE-7967 : The GeoDegeneratePoint isWithin() method needed allowance for numerical precision.
    (Karl Wright)
  • LUCENE-7965 : GeoBBoxFactory was constructing the wrong shape at the poles if the longitude span was greater than 180 degrees.
    (Karl Wright)
  • LUCENE-7916 : Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used with a different ICU JAR version than it is compiled against. Note, this is not recommended, lucene-analyzers-icu contains binary data structures specific to ICU/Unicode versions it is built against.
    (Chris Koenig, Robert Muir)
  • LUCENE-7891 : Lucene's taxonomy facets now uses a non-buggy LRU cache by default.
    (Jan-Willem van den Broek via Mike McCandless)
  • LUCENE-7959 : Improve NativeFSLockFactory's exception message if it cannot create write.lock for an empty index due to bad permissions/read-only filesystem/etc.
    (Erick Erickson, Shawn Heisey, Robert Muir)
  • LUCENE-7968 : AnalyzingSuggester would sometimes order suggestions incorrectly, it did not properly break ties on the surface forms when both the weights and the analyzed forms were equal.
    (Robert Muir)
  • LUCENE-7957 : ConjunctionScorer.getChildren was failing to return all child scorers
    (Adrien Grand, Mike McCandless)
  • SOLR-11477 : Disallow resolving of external entities in queryparser/xml/CoreParser by default.
    (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
  • Build (3)
  • SOLR-11181 : Switch order of maven artifact publishing procedure: deploy first instead of locally installing first, to workaround a double repository push of *-sources.jar and *-javadoc.jar files.
    (Lynn Monson via Steve Rowe)
  • LUCENE-6673 : Maven build fails for target javadoc:jar.
    (Ramkumar Aiyengar, Daniel Collins via Steve Rowe)
  • LUCENE-7985 : Upgrade forbiddenapis to 2.4.1.
    (Uwe Schindler)
  • Other (5)
  • LUCENE-7948 , LUCENE-7937 : Upgrade randomizedtesting to 2.5.3 (minor fixes in test filtering for IDEs).
    (Mike Sokolov, Dawid Weiss)
  • LUCENE-7933 : LongBitSet now validates the numBits parameter Jonghoon, Mike McCandless)
  • LUCENE-7978 : Add some more documentation about setting up build environment.
    (Anton R. Yuste via Uwe Schindler)
  • LUCENE-7983 : IndexWriter.IndexReaderWarmer is now a functional interface instead of an abstract class with a single method
    (Dawid Weiss)
  • LUCENE-5753 : Update TLDs recognized by UAX29URLEmailTokenizer.
    (Steve Rowe)
  • Release 7.0.1 [2017-10-06]

  • Bug Fixes (1)
  • LUCENE-7957 : ConjunctionScorer.getChildren was failing to return all child scorers
    (Adrien Grand, Mike McCandless)
  • Release 7.0.0 [2017-09-20]

  • New Features (8)
  • LUCENE-7703 : SegmentInfos now record the major Lucene version at index creation time.
    (Adrien Grand)
  • LUCENE-7756 : LeafReader.getMetaData now exposes the index created version as well as the oldest Lucene version that contributed to the segment.
    (Adrien Grand)
  • LUCENE-7854 : The new TermFrequencyAttribute used during analysis with a custom token stream allows indexing custom term frequencies
    (Mike McCandless)
  • LUCENE-7866 : Add a new DelimitedTermFrequencyTokenFilter that allows to mark tokens with a custom term frequency ( LUCENE-7854 ). It parses a numeric value after a separator char ('|') at the end of each token and changes the term frequency to this value.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-7868 : Multiple threads can now resolve deletes and doc values updates concurrently, giving sizable speedups in update-heavy indexing use cases
    (Simon Willnauer, Mike McCandless)
  • LUCENE-7823 : Pure query based naive bayes classifier using BM25 scores
    (Tommaso Teofili)
  • LUCENE-7838 : Knn classifier based on fuzzified term queries
    (Tommaso Teofili)
  • LUCENE-7855 : Added advanced options of the Wikipedia tokenizer to its factory.
    (Juan Pedro via Adrien Grand)
  • API Changes (23)
  • LUCENE-2605 : Classic QueryParser no longer splits on whitespace by default. Use setSplitOnWhitespace(true) to get the old behavior.
    (Steve Rowe)
  • LUCENE-7369 : Similarity.coord and BooleanQuery.disableCoord are removed.
    (Adrien Grand)
  • LUCENE-7368 : Removed query normalization.
    (Adrien Grand)
  • LUCENE-7355 : AnalyzingQueryParser has been removed as its functionality has been folded into the classic QueryParser.
    (Adrien Grand)
  • LUCENE-7407 : Doc values APIs have been switched from random access to iterators, enabling future codec compression improvements.
    (Mike McCandless)
  • LUCENE-7475 : Norms now support sparsity, allowing to pay for what is actually used.
    (Adrien Grand)
  • LUCENE-7494 : Points now have a per-field API, like doc values.
    (Adrien Grand)
  • LUCENE-7410 : Cache keys and close listeners have been refactored in order to be less trappy. See IndexReader.getReaderCacheHelper and LeafReader.getCoreCacheHelper.
    (Adrien Grand)
  • LUCENE-6819 : Index-time boosts are not supported anymore. As a replacement, index-time scoring factors should be indexed into a doc value field and combined at query time using eg. FunctionScoreQuery.
    (Adrien Grand)
  • LUCENE-7734 : FieldType's copy constructor was widened to accept any IndexableFieldType.
    (David Smiley)
  • LUCENE-7701 : Grouping collectors have been refactored, such that groups are now defined by a GroupSelector implementation.
    (Alan Woodward)
  • LUCENE-7741 : DoubleValuesSource now has an explain() method
    (Alan Woodward, Adrien Grand)
  • LUCENE-7815 : Removed the PostingsHighlighter; you should use the UnifiedHighlighter instead, which derived from the UH. WholeBreakIterator and CustomSeparatorBreakIterator were moved to UH's package.
    (David Smiley)
  • LUCENE-7850 : Removed support for legacy numerics.
    (Adrien Grand)
  • LUCENE-7500 : Removed abstract LeafReader.fields(); instead terms(fieldName) has been made abstract, fomerly was final. Also, MultiFields.getTerms was optimized to work directly instead of being implemented on getFields.
    (David Smiley)
  • LUCENE-7872 : TopDocs.totalHits is now a long.
    (Adrien Grand, hossman)
  • LUCENE-7868 : IndexWriterConfig.setMaxBufferedDeleteTerms is removed.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-7877 : PrefixAwareTokenStream is replaced with ConcatenatingTokenStream
    (Alan Woodward, Uwe Schindler, Adrien Grand)
  • LUCENE-7867 : The deprecated Token class is now only available in the test framework
    (Alan Woodward, Adrien Grand)
  • LUCENE-7723 : DoubleValuesSource enforces implementation of equals() and hashCode()
    (Alan Woodward)
  • LUCENE-7737 : The spatial-extras module no longer has a dependency on the queries module. All uses of ValueSource are either replaced with core DoubleValuesSource extensions, or with the new ShapeValuesSource and ShapeValuesPredicate classes
    (Alan Woodward, David Smiley)
  • LUCENE-7892 : Doc-values query factory methods have been renamed so that their name contains "slow" in order to cleary indicate that they would usually be a bad choice.
    (Adrien Grand)
  • LUCENE-7899 : FieldValueQuery is renamed to DocValuesFieldExistsQuery
    (Adrien Grand, Mike McCandless)
  • Bug Fixes (7)
  • LUCENE-7626 : IndexWriter will no longer accept broken token offsets
    (Mike McCandless)
  • LUCENE-7859 : Spatial-extras PackedQuadPrefixTree bug that only revealed itself with the new pointsOnly optimizations in LUCENE-7845 .
    (David Smiley)
  • LUCENE-7871 : fix false positive match in BlockJoinSelector when children have no value, introducing wrap methods accepting children as DISI. Extracting ToParentDocValues
    (Mikhail Khludnev)
  • LUCENE-7914 : Add a maximum recursion level in automaton recursive functions (Operations.isFinite and Operations.topsortState) to prevent large automaton to overflow the stack
    (Robert Muir, Adrien Grand, Jim Ferenczi)
  • LUCENE-7864 : IndexMergeTool is not using intermediate hard links (even if possible).
    (Dawid Weiss)
  • LUCENE-7956 : Fixed potential stack overflow error in ICUNormalizer2CharFilter.
    (Adrien Grand)
  • LUCENE-7963 : Remove useless getAttribute() in DefaultIndexingChain that causes performance drop, introduced by LUCENE-7626 .
    (Daniel Mitterdorfer via Uwe Schindler)
  • Improvements (4)
  • LUCENE-7489 : Better storage of sparse doc-values fields with the default codec.
    (Adrien Grand)
  • LUCENE-7730 : More accurate encoding of the length normalization factor thanks to the removal of index-time boosts.
    (Adrien Grand)
  • LUCENE-7901 : Original Highlighter now eagerly throws an exception if you provide components that are null.
    (Jason Gerlowski, David Smiley)
  • LUCENE-7841 : Normalize ґ to г in Ukrainian analyzer.
    (Andriy Rysin via Dawid Weiss)
  • Optimizations (7)
  • LUCENE-7416 : BooleanQuery optimizes queries that have queries that occur both in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT clauses.
    (Spyros Kapnissis via Adrien Grand, Uwe Schindler)
  • LUCENE-7506 : FastTaxonomyFacetCounts should use CPU in proportion to the size of the intersected set of hits from the query and documents that have a facet value, so sparse faceting works as expected
    (Adrien Grand via Mike McCandless)
  • LUCENE-7519 : Add optimized APIs to compute browse-only top level facets
    (Mike McCandless)
  • LUCENE-7589 : Numeric doc values now have the ability to encode blocks of values using different numbers of bits per value if this proves to save storage.
    (Adrien Grand)
  • LUCENE-7845 : Enhance spatial-extras RecursivePrefixTreeStrategy queries when the query is a point (for 2D) or a is a simple date interval (e.g. 1 month). When the strategy is marked as pointsOnly, the results is a TermQuery.
    (David Smiley)
  • LUCENE-7874 : DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1.
    (Jim Ferenczi)
  • LUCENE-7828 : Speed up range queries on range fields by improving how we compute the relation between the query and inner nodes of the BKD tree.
    (Adrien Grand)
  • Other (14)
  • LUCENE-7923 : Removed FST.Arc.node field (unused).
    (Dawid Weiss)
  • LUCENE-7328 : Remove LegacyNumericEncoding from GeoPointField.
    (Nick Knize)
  • LUCENE-7360 : Remove Explanation.toHtml()
    (Alan Woodward)
  • LUCENE-7681 : MemoryIndex uses new DocValues API
    (Alan Woodward)
  • LUCENE-7753 : Make fields static when possible.
    (Daniel Jelinski via Adrien Grand)
  • LUCENE-7540 : Upgrade ICU to 59.1
    (Mike McCandless, Jim Ferenczi)
  • LUCENE-7852 : Correct copyright year(s) in lucene/LICENSE.txt file.
    (Christine Poerschke, Steve Rowe)
  • LUCENE-7719 : Generalized the UnifiedHighlighter's support for AutomatonQuery for character & binary automata. Added AutomatonQuery.isBinary.
    (David Smiley)
  • LUCENE-7873 : Due to serious problems with context class loaders in several frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats and all analysis factories was changed to only inspect the current classloader that defined the interface class (lucene-core.jar). See MIGRATE.txt for more information!
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-7883 : Lucene no longer uses the context class loader when resolving resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved against Lucene's class loader by default. Please use another builder method to change to a custom classloader.
    (Uwe Schindler)
  • LUCENE-5822 : Convert README to Markdown
    (Jason Gerlowski via Mike Drob)
  • LUCENE-7773 : Remove unused/deprecated token types from StandardTokenizer.
    (Ahmet Arslan via Steve Rowe)
  • LUCENE-7800 : Remove code that potentially rethrows checked exceptions from methods that don't declare them ("sneaky throw" hack).
    (Robert Muir, Uwe Schindler, Dawid Weiss)
  • LUCENE-7876 : Avoid calls to LeafReader.fields() and MultiFields.getFields() that are trivially replaced by LeafReader.terms() and MultiFields.getTerms()
    (David Smiley)
  • Release 6.6.5 [2018-06-03]

  • (No Changes)

    Release 6.6.4 [2018-05-18]

  • (No Changes)

    Release 6.6.3 [2018-03-07]

  • Build (1)
  • LUCENE-6144 : Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy jars in ~/.ant/lib/.
    (Shawn Heisey, Steve Rowe)
  • Release 6.6.2 [2017-10-18]

  • Changes in Runtime Behavior (1)
  • Resolving of external entities in queryparser/xml/CoreParser is disallowed by default. See SOLR-11477 for details.
  • Bug Fixes (1)
  • SOLR-11477 : Disallow resolving of external entities in queryparser/xml/CoreParser by default.
    (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
  • Release 6.6.1 [2017-09-07]

  • Bug Fixes (2)
  • LUCENE-7869 : Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects that these points are visited in ascending order. The memory index doesn't do this and this can result in document with multiple points that should match to not match.
    (Martijn van Groningen)
  • LUCENE-7878 : Fix query builder to keep the SHOULD clause that wraps multi-word synonyms.
    (Jim Ferenczi)
  • Release 6.6.0 [2017-06-06]

  • New Features (1)
  • LUCENE-7811 : Add a concurrent SortedSet facets implementation.
    (Mike McCandless)
  • Bug Fixes (14)
  • LUCENE-7777 : ByteBlockPool.readBytes sometimes throws ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB were added
    (Mike McCandless)
  • LUCENE-7797 : The static FSDirectory.listAll(Path) method was always returning an empty array.
    (Atkins Chang via Mike McCandless)
  • LUCENE-7481 : Fixed missing rewrite methods for SpanPayloadCheckQuery and PayloadScoreQuery.
    (Erik Hatcher)
  • LUCENE-7808 : Fixed PayloadScoreQuery and SpanPayloadCheckQuery .equals and .hashCode methods.
    (Erik Hatcher)
  • LUCENE-7798 : Add .equals and .hashCode to ToParentBlockJoinSortField
    (Mikhail Khludnev)
  • LUCENE-7814 : DateRangePrefixTree (in spatial-extras) had edge-case bugs for years >= 292,000,000.
    (David Smiley)
  • LUCENE-5365 , LUCENE-7818 : Fix incorrect condition in queryparser's QueryNodeOperation#logicalAnd().
    (Olivier Binda, Amrit Sarkar, AppChecker via Uwe Schindler)
  • LUCENE-7821 : The classic and flexible query parsers, as well as Solr's "lucene"/standard query parser, should require " TO " in range queries, and accept "TO" as endpoints in range queries.
    (hossman, Steve Rowe)
  • LUCENE-7824 : Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city).
    (Jim Ferenczi)
  • LUCENE-7817 : Pass cached query to onQueryCache instead of null.
    (Christoph Kaser via Adrien Grand)
  • LUCENE-7831 : CodecUtil should not seek to negative offsets.
    (Adrien Grand)
  • LUCENE-7833 : ToParentBlockJoinQuery computed the min score instead of the max score with ScoreMode.MAX.
    (Adrien Grand)
  • LUCENE-7847 : Fixed all-docs-match optimization of range queries on range fields.
    (Adrien Grand)
  • LUCENE-7810 : Fix equals() and hashCode() methods of several join queries.
    (Hossman, Adrien Grand, Martijn van Groningen)
  • Improvements (5)
  • LUCENE-7782 : OfflineSorter now passes the total number of items it will write to getWriter
    (Mike McCandless)
  • LUCENE-7785 : Move dictionary for Ukrainian analyzer to external dependency.
    (Andriy Rysin via Steve Rowe, Dawid Weiss)
  • LUCENE-7801 : SortedSetDocValuesReaderState now implements Accountable so you can see how much RAM it's using
    (Robert Muir, Mike McCandless)
  • LUCENE-7792 : OfflineSorter can now run concurrently if you pass it an optional ExecutorService
    (Dawid Weiss, Mike McCandless)
  • LUCENE-7811 : Sorted set facets now use sparse storage when collecting hits, when appropriate.
    (Mike McCandless)
  • Optimizations (1)
  • LUCENE-7787 : spatial-extras HeatmapFacetCounter will now short-circuit it's work when Bits.MatchNoBits is passed.
    (David Smiley)
  • Other (5)
  • LUCENE-7796 : Make IOUtils.reThrow idiom declare Error return type so callers may use it in a way that compiler knows subsequent code is unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways with a slightly different semantics (see javadoc).
    (Hossman, Robert Muir, Dawid Weiss)
  • LUCENE-7754 : Inner classes should be static whenever possible.
    (Daniel Jelinski via Adrien Grand)
  • LUCENE-7751 : Avoid boxing primitives only to call compareTo.
    (Daniel Jelinski via Adrien Grand)
  • LUCENE-7743 : Never call new String(String).
    (Daniel Jelinski via Adrien Grand)
  • LUCENE-7761 : Fixed comment in ReqExclScorer.
    (Pablo Pita Leira via Adrien Grand)
  • Release 6.5.1 [2017-04-27]

  • Bug Fixes (3)
  • LUCENE-7755 : Fixed join queries to not reference IndexReaders, as it could cause leaks if they are cached.
    (Adrien Grand)
  • LUCENE-7749 : Made LRUQueryCache delegate the scoreSupplier method.
    (Martin Amirault via Adrien Grand)
  • LUCENE-7769 : The UnifiedHighligter wasn't highlighting portions of the query wrapped in BoostQuery or SpanBoostQuery.
    (David Smiley, Dmitry Malinin)
  • Other (1)
  • LUCENE-7763 : Remove outdated comment in IndexWriterConfig.setIndexSort javadocs.
    (马可阳 via Christine Poerschke)
  • Release 6.5.0 [2017-03-27]

  • API Changes (12)
  • LUCENE-7740 : Refactor Range Fields to remove Field suffix (e.g., DoubleRange), move InetAddressRange and InetAddressPoint from sandbox to misc module, and refactor all other range fields from sandbox to core.
    (Nick Knize)
  • LUCENE-7624 : TermsQuery has been renamed as TermInSetQuery and moved to core.
    (Alan Woodward)
  • LUCENE-7637 : TermInSetQuery requires that all terms come from the same field.
    (Adrien Grand)
  • LUCENE-7644 : FieldComparatorSource.newComparator() and SortField.getComparator() no longer throw IOException
    (Alan Woodward)
  • LUCENE-7643 : Replaced doc-values queries in lucene/sandbox with factory methods on the *DocValuesField classes.
    (Adrien Grand)
  • LUCENE-7659 : Added a IndexWriter#getFieldNames() method (experimental) to return all field names as visible from the IndexWriter. This would be useful for IndexWriter#updateDocValues() calls, to prevent calling with non-existent docValues fields
    (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless)
  • LUCENE-6959 : Removed ToParentBlockJoinCollector in favour of ParentChildrenBlockJoinQuery, that can return the matching children documents per parent document. This query should be executed for each matching parent document after the main query has been executed.
    (Adrien Grand, Martijn van Groningen, Mike McCandless)
  • LUCENE-7628 : Scorer.getChildren() now only returns Scorers that are positioned on the current document, and can throw an IOException. AssertingScorer checks that getChildren() is not called on an unpositioned Scorer.
    (Alan Woodward, Adrien Grand)
  • LUCENE-7702 : Removed GraphQuery in favour of simple boolean query.
    (Matt Webber via Jim Ferenczi)
  • LUCENE-7707 : TopDocs.merge now takes a boolean option telling it when to use the incoming shard index versus when to assign the shard index itself, allowing users to merge shard responses incrementally instead of once all shard responses are present.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-7700 : A cleanup of merge throughput control logic. Refactored all the code previously scattered throughout the IndexWriter and ConcurrentMergeScheduler into a more accessible set of public methods (see MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and OneMerge.mergeInit).
    (Dawid Weiss, Mike McCandless) .
  • LUCENE-7734 : FieldType's copy constructor was widened to accept any IndexableFieldType.
    (David Smiley)
  • New Features (10)
  • LUCENE-7738 : Add new InetAddressRange for indexing and querying InetAddress ranges.
    (Nick Knize)
  • LUCENE-7449 : Add CROSSES relation support to RangeFieldQuery.
    (Nick Knize)
  • LUCENE-7623 : Add FunctionScoreQuery and FunctionMatchQuery
    (Alan Woodward, Adrien Grand, David Smiley)
  • LUCENE-7619 : Add WordDelimiterGraphFilter, just like WordDelimiterFilter except it produces correct token graphs so that proximity queries at search time will produce correct results
    (Mike McCandless)
  • LUCENE-7656 : Added the LatLonDocValuesField.new(Box/Distance)Query() factory methods that are the equivalent of factory methods on LatLonPoint but operate on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery for best performance.
    (Adrien Grand)
  • LUCENE-7673 : Added MultiValued[Int/Long/Float/Double]FieldSource that given a SortedNumericSelector.Type can give a ValueSource view of a SortedNumericDocValues field.
    (Tomás Fernández Löbbe)
  • LUCENE-7465 : Add SimplePatternTokenizer and SimplePatternSplitTokenizer, using Lucene's regexp/automaton implementation for analysis/tokenization
    (Clinton Gormley, Mike McCandless)
  • LUCENE-7688 : Add OneMergeWrappingMergePolicy class.
    (Keith Laban, Christine Poerschke)
  • LUCENE-7686 : The near-real-time document suggester can now efficiently filter out duplicate suggestions
    (Uwe Schindler, Mike McCandless)
  • LUCENE-7712 : SimpleQueryParser now supports default fuzziness syntax, mapping foo~ to a FuzzyQuery with edit distance 2. Hinman, David Pilato via Mike McCandless)
  • Bug Fixes (6)
  • LUCENE-7630 : Fix (Edge)NGramTokenFilter to no longer drop payloads and preserve all attributes.
    (Nathan Gass via Uwe Schindler)
  • LUCENE-7679 : MemoryIndex was ignoring omitNorms settings on passed-in IndexableFields.
    (Alan Woodward)
  • LUCENE-7692 : PatternReplaceCharFilterFactory now implements MultiTermAware.
    (Adrien Grand)
  • LUCENE-7685 : ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the rewritten child query in their equals and hashCode implementations.
    (Adrien Grand)
  • LUCENE-7698 : CommonGramsQueryFilter was producing a disconnected token graph, messing up phrase queries when it was used during query parsing
    (Ere Maijala via Mike McCandless)
  • LUCENE-7708 : ShingleFilter without unigram was producing a disconnected token graph, messing up queries when it was used during query parsing
    (Jim Ferenczi)
  • Improvements (8)
  • LUCENE-7055 : Added Weight#scorerSupplier, which allows to estimate the cost of a Scorer before actually building it, in order to optimize how the query should be run, eg. using points or doc values depending on costs of other parts of the query.
    (Adrien Grand)
  • LUCENE-7643 : IndexOrDocValuesQuery allows to execute range queries using either points or doc values depending on which one is more efficient.
    (Adrien Grand)
  • LUCENE-7662 : If index files are missing, throw CorruptIndexException instead of the less descriptive FileNotFound or NoSuchFileException
    (Mike Drob via Mike McCandless, Erick Erickson)
  • LUCENE-7680 : UsageTrackingQueryCachingPolicy never caches term filters anymore since they are plenty fast. This also has the side-effect of leaving more space in the history for costly filters.
    (Adrien Grand)
  • LUCENE-7677 : UsageTrackingQueryCachingPolicy now caches compound queries a bit earlier than regular queries in order to improve cache efficiency.
    (Adrien Grand)
  • LUCENE-7710 : BlockPackedReader throws CorruptIndexException and includes IndexInput description instead of plain IOException
    (Mike Drob via Mike McCandless)
  • LUCENE-7695 : ComplexPhraseQueryParser to support query time synonyms
    (Markus Jelsma via Mikhail Khludnev)
  • LUCENE-7747 : QueryBuilder now iterates lazily over the possible paths when building a graph query
    (Jim Ferenczi)
  • Optimizations (10)
  • LUCENE-7641 : Optimized point range queries to compute documents that do not match the range on single-valued fields when more than half the documents in the index would match.
    (Adrien Grand)
  • LUCENE-7656 : Speed up for LatLonPointDistanceQuery by computing distances even less often.
    (Adrien Grand)
  • LUCENE-7661 : Speed up for LatLonPointInPolygonQuery by pre-computing the relation of the polygon with a grid.
    (Adrien Grand)
  • LUCENE-7660 : Speed up LatLonPointDistanceQuery by improving the detection of whether BKD cells are entirely within the distance close to the dateline.
    (Adrien Grand)
  • LUCENE-7654 : ToParentBlockJoinQuery now implements two-phase iteration and computes scores lazily in order to be faster when used in conjunctions.
    (Adrien Grand)
  • LUCENE-7667 : BKDReader now calls `IntersectVisitor.grow()` on larger increments.
    (Adrien Grand)
  • LUCENE-7638 : Query parsers now analyze the token graph for articulation points (or cut vertices) in order to create more efficient queries for multi-token synonyms.
    (Jim Ferenczi)
  • LUCENE-7699 : Query parsers now use span queries to produce more efficient phrase queries for multi-token synonyms.
    (Matt Webber via Jim Ferenczi)
  • LUCENE-7742 : Fix places where we were unboxing and then re-boxing according to FindBugs
    (Daniel Jelinski via Mike McCandless)
  • LUCENE-7739 : Fix places where we unnecessarily boxed while parsing a numeric value according to FindBugs
    (Daniel Jelinski via Mike McCandless)
  • Build (7)
  • LUCENE-7653 : Update randomizedtesting to version 2.5.0.
    (Dawid Weiss)
  • LUCENE-7665 : Remove grouping dependency from the join module.
    (Martijn van Groningen)
  • SOLR-10023 : Add non-recursive 'test-nocompile' target: Only runs unit tests. Jars are not downloaded; compilation is not updated; and Clover is not enabled.
    (Steve Rowe)
  • LUCENE-7694 : Update forbiddenapis to version 2.3.
    (Uwe Schindler)
  • LUCENE-7693 : Replace "org.apache." logic in GetMavenDependenciesTask.
    (Daniel Collins, Christine Poerschke)
  • LUCENE-7726 : Fix HTML entity bugs in Javadocs to be able to build with Java 9.
    (Uwe Schindler, Hossman)
  • LUCENE-7727 : Replace end-of-life Markdown parser "Pegdown" by "Flexmark" for compatibility with Java 9.
    (Uwe Schindler)
  • Other (3)
  • LUCENE-7666 : Fix typos in lucene-join package info javadoc.
    (Tom Saleeba via Christine Poerschke)
  • LUCENE-7658 : queryparser/xml CoreParser now implements SpanQueryBuilder interface.
    (Daniel Collins, Christine Poerschke)
  • LUCENE-7715 : NearSpansUnordered simplifications.
    (Paul Elschot via Adrien Grand)
  • Release 6.4.2 [2017-03-07]

  • Bug Fixes (2)
  • LUCENE-7676 : Fixed FilterCodecReader to override more super-class methods. Also added TestFilterCodecReader class.
    (Christine Poerschke)
  • LUCENE-7717 : The UnifiedHighlighter and PostingsHighlighter were not highlighting prefix queries with multi-byte characters. TermRangeQuery is affected too.
    (Dmitry Malinin, David Smiley)
  • Release 6.4.1 [2017-02-06]

  • Build (1)
  • LUCENE-7651 : Fix Javadocs build for Java 8u121 by injecting "Google Code Prettify" without adding Javascript to Javadocs's -bottom parameter. Also update Prettify to latest version to fix Google Chrome issue.
    (Uwe Schindler)
  • Bug Fixes (3)
  • LUCENE-7657 : Fixed potential memory leak in the case that a (Span)TermQuery with a TermContext is cached.
    (Adrien Grand)
  • LUCENE-7647 : Made stored fields reclaim native memory more aggressively when configured with BEST_COMPRESSION. This could otherwise result in out-of-memory issues.
    (Adrien Grand)
  • LUCENE-7670 : AnalyzingInfixSuggester should not immediately open an IndexWriter over an already-built index.
    (Steve Rowe)
  • Release 6.4.0 [2017-01-23]

  • API Changes (6)
  • LUCENE-7533 : Classic query parser no longer allows autoGeneratePhraseQueries to be set to true when splitOnWhitespace is false (and vice-versa).
  • LUCENE-7607 : LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer are declared as throwing IOException
    (Alan Woodward)
  • LUCENE-7617 : Collector construction for two-pass grouping queries is abstracted into a new Grouper class, which can be passed as a constructor parameter to GroupingSearch. The abstract base classes for the different grouping Collectors are renamed to remove the Abstract* prefix.
    (Alan Woodward, Martijn van Groningen)
  • LUCENE-7609 : The expressions module now uses the DoubleValuesSource API, and no longer depends on the queries module. Expression#getValueSource() is replaced with Expression#getDoubleValuesSource().
    (Alan Woodward, Adrien Grand)
  • LUCENE-7610 : The facets module now uses the DoubleValuesSource API, and methods that take ValueSource parameters are deprecated
    (Alan Woodward)
  • LUCENE-7611 : DocumentValueSourceDictionary now takes a LongValuesSource as a parameter, and the ValueSource equivalent is deprecated
    (Alan Woodward)
  • New features (9)
  • LUCENE-5867 : Added BooleanSimilarity.
    (Robert Muir, Adrien Grand)
  • LUCENE-7466 : Added AxiomaticSimilarity.
    (Peilin Yang via Tommaso Teofili)
  • LUCENE-7590 : Added DocValuesStatsCollector to compute statistics on DocValues fields.
    (Shai Erera)
  • LUCENE-7587 : The new FacetQuery and MultiFacetQuery helper classes make it simpler to execute drill down when drill sideways counts are not needed
    (Emmanuel Keller via Mike McCandless)
  • LUCENE-6664 : A new SynonymGraphFilter outputs a correct graph structure for multi-token synonyms, separating out a FlattenGraphFilter that is hardwired into the current SynonymFilter. This finally makes it possible to implement correct multi-token synonyms at search time. See http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html for details.
    (Mike McCandless)
  • LUCENE-5325 : Added LongValuesSource and DoubleValuesSource, intended as type-safe replacements for ValueSource in the queries module. These expose per-segment LongValues or DoubleValues iterators.
    (Alan Woodward, Adrien Grand)
  • LUCENE-7603 : Graph token streams are now handled accurately by query parsers, by enumerating all paths and creating the corresponding query/ies as sub-clauses
    (Matt Weber via Mike McCandless)
  • LUCENE-7588 : DrillSideways can now run queries concurrently, and supports an IndexSearcher using an executor service to run each query concurrently across all segments in the index
    (Emmanuel Keller via Mike McCandless)
  • LUCENE-7627 : Added .intersect methods to SortedDocValues and SortedSetDocValues to allow filtering their TermsEnums with a CompiledAutomaton
    (Alan Woodward, Mike McCandless)
  • Bug Fixes (11)
  • LUCENE-7547 : JapaneseTokenizerFactory was failing to close the dictionary file it opened
    (Markus via Mike McCandless)
  • LUCENE-7562 : CompletionFieldsConsumer sometimes throws NullPointerException on ghost fields
    (Oliver Eilhard via Mike McCandless)
  • LUCENE-7533 : Classic query parser: disallow autoGeneratePhraseQueries=true when splitOnWhitespace=false (and vice-versa).
    (Steve Rowe)
  • LUCENE-7536 : ASCIIFoldingFilterFactory used to return an illegal multi-term component when preserveOriginal was set to true.
    (Adrien Grand)
  • LUCENE-7576 : Fix Terms.intersect in the default codec to detect when the incoming automaton is a special case and throw a clearer exception than NullPointerException
    (Tom Mortimer via Mike McCandless)
  • LUCENE-6989 : Fix Exception handling in MMapDirectory's unmap hack support code to work with Java 9's new InaccessibleObjectException that does not extend ReflectiveAccessException in Java 9.
    (Uwe Schindler)
  • LUCENE-7581 : Lucene now prevents updating a doc values field that is used in the index sort, since this would lead to corruption. Ferenczi via Mike McCandless)
  • LUCENE-7570 : IndexWriter may deadlock if a commit is running while there are too many merges running and one of the merges hits a tragic exception
    (Joey Echeverria via Mike McCandless)
  • LUCENE-7594 : Fixed point range queries on floating-point types to recommend using helpers for exclusive bounds that are consistent with Double.compare.
    (Adrien Grand, Dawid Weiss)
  • LUCENE-7606 : Normalization with CustomAnalyzer would only apply the last token filter.
    (Adrien Grand)
  • LUCENE-7612 : Removed an unused dependency from the suggester to the misc module.
    (Alan Woodward)
  • Improvements (16)
  • LUCENE-7532 : Add back lost codec file format documentation
    (Shinichiro Abe via Mike McCandless)
  • LUCENE-6824 : TermAutomatonQuery now rewrites to TermQuery, PhraseQuery or MultiPhraseQuery when the word automaton is simple
    (Mike McCandless)
  • LUCENE-7431 : Allow a certain amount of overlap to be specified between the include and exclude arguments of SpanNotQuery via negative pre and/or post arguments.
    (Marc Morissette via David Smiley)
  • LUCENE-7544 : UnifiedHighlighter: add extension points for handling custom queries.
    (Michael Braun, David Smiley)
  • LUCENE-7538 : Asking IndexWriter to store a too-massive text field now throws IllegalArgumentException instead of a cryptic exception that closes your IndexWriter
    (Steve Chen via Mike McCandless)
  • LUCENE-7524 : Added more detailed explanation of how IDF is computed in ClassicSimilarity and BM25Similarity.
    (Adrien Grand)
  • LUCENE-7564 : AnalyzingInfixSuggester should close its IndexWriter by default at the end of build().
    (Steve Rowe)
  • LUCENE-7526 : Enhanced UnifiedHighlighter's passage relevancy for queries with wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed() which can be overridden to return false to eek out more speed in some cases.
    (Timothy M. Rodriguez, David Smiley)
  • LUCENE-7560 : QueryBuilder.createFieldQuery is no longer final, giving custom query parsers subclassing QueryBuilder more freedom to control how text is analyzed and converted into a query
    (Matt Weber via Mike McCandless)
  • LUCENE-7537 : Index time sorting now supports multi-valued sorts using selectors (MIN, MAX, etc.)
    (Jim Ferenczi via Mike McCandless)
  • LUCENE-7575 : UnifiedHighlighter can now highlight fields with queries that don't necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default. See UH get/setFieldMatcher.
    (Jim Ferenczi via David Smiley)
  • LUCENE-7592 : If the segments file is truncated, we now throw CorruptIndexException instead of the more confusing EOFException
    (Mike Drob via Mike McCandless)
  • LUCENE-6989 : Make MMapDirectory's unmap hack work with Java 9 EA (b150+): Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer). Java 9 now needs same permissions like Java 8; RuntimePermission("accessClassInPackage.jdk.internal.ref") is no longer needed. Support for older Java 9 builds was removed.
    (Uwe Schindler)
  • LUCENE-7401 : Changed the way BKD trees pick the split dimension in order to ensure all dimensions are indexed.
    (Adrien Grand)
  • LUCENE-7614 : Complex Phrase Query parser ignores double quotes around single token prefix, wildcard, range queries
    (Mikhail Khludnev)
  • LUCENE-7620 : Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks that would create Passages that are too short. Only for use with the UnifiedHighlighter (and probably PostingsHighlighter).
    (David Smiley)
  • Optimizations (4)
  • LUCENE-7568 : Optimize merging when index sorting is used but the index is already sorted
    (Jim Ferenczi via Mike McCandless)
  • LUCENE-7563 : The BKD in-memory index for dimensional points now uses a compressed format, using substantially less RAM in some cases
    (Adrien Grand, Mike McCandless)
  • LUCENE-7583 : BKD writing now buffers each leaf block in heap before writing to disk, giving a small speedup in points-heavy use cases.
    (Mike McCandless)
  • LUCENE-7572 : Doc values queries now cache their hash code.
    (Adrien Grand)
  • Other (5)
  • LUCENE-7546 : Fixed references to benchmark wikipedia data and the Jenkins line-docs file
    (David Smiley)
  • LUCENE-7534 : fix smokeTestRelease.py to run on Cygwin
    (Mikhail Khludnev)
  • LUCENE-7559 : UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow passage creation to be customized.
    (David Smiley)
  • LUCENE-7599 : Simplify TestRandomChains using Java's built-in Predicate and Function interfaces.
    (Ahmet Arslan via Adrien Grand)
  • LUCENE-7595 : Improve RAMUsageTester in test-framework to estimate memory usage of runtime classes and work with Java 9 EA (b148+). Disable static field heap usage checker in LuceneTestCase.
    (Uwe Schindler, Dawid Weiss)
  • Build (3)
  • LUCENE-7387 : fix defaultCodec in build.xml to account for the line ending
    (hossman)
  • LUCENE-7543 : Make changes-to-html target an offline operation, by moving the Lucene and Solr DOAP RDF files into the Git source repository under dev-tools/doap/ and then pulling release dates from those files, rather than from JIRA.
    (Mano Kovacs, hossman, Steve Rowe)
  • LUCENE-7596 : Update Groovy to version 2.4.8 to allow building with Java 9 build 148+. Also update JGit version for working-copy checks.
    (Uwe Schindler)
  • Release 6.3.0 [2016-11-08]

  • API Changes (none)
  • New Features (2)
  • LUCENE-7438 : New "UnifiedHighlighter" derivative of the PostingsHighlighter that can consume offsets from postings, term vectors, or analysis. It can highlight phrases as accurately as the standard Highlighter. Light term vectors can be used with offsets in postings for fast wildcard (MultiTermQuery) highlighting.
    (David Smiley, Timothy Rodriguez)
  • LUCENE-7490 : SimpleQueryParser now parses '*' to MatchAllDocsQuery
    (Lee Hinman via Mike McCandless)
  • Bug Fixes (13)
  • LUCENE-7507 : Upgrade morfologik-stemming to version 2.1.1 (fixes security manager issue with Polish dictionary lookup).
    (Dawid Weiss)
  • LUCENE-7472 : MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery.
    (Steve Rowe)
  • LUCENE-7456 : PerFieldPostings/DocValues was failing to delegate the merge method
    (Julien MASSENET via Mike McCandless)
  • LUCENE-7468 : ASCIIFoldingFilter should not emit duplicated tokens when preserve original is on.
    (David Causse via Adrien Grand)
  • LUCENE-7484 : FastVectorHighlighter failed to highlight SynonymQuery
    (Jim Ferenczi via Mike McCandless)
  • LUCENE-7476 : JapaneseNumberFilter should not invoke incrementToken on its input after it's exhausted
    (Andy Hind via Mike McCandless)
  • LUCENE-7486 : DisjunctionMaxQuery does not work correctly with queries that return negative scores.
    (Ivan Provalov, Uwe Schindler, Adrien Grand)
  • LUCENE-7491 : Suddenly turning on dimensional points for some fields that already exist in an index but didn't previously index dimensional points could cause unexpected merge exceptions
    (Hans Lund, Mike McCandless)
  • LUCENE-6914 : Fixed DecimalDigitFilter in case of supplementary code points.
    (Hossman)
  • LUCENE-7493 : FacetCollector.search threw an unexpected exception if you asked for zero hits but wanted facets
    (Mahesh via Mike McCandless)
  • LUCENE-7505 : AnalyzingInfixSuggester returned invalid results when allTermsRequired is false and context filters are specified
    (Mike McCandless)
  • LUCENE-7429 : AnalyzerWrapper can now modify the normalization chain too and DelegatingAnalyzerWrapper does the right thing automatically.
    (Adrien Grand)
  • LUCENE-7135 : Lucene's check for 32 or 64 bit JVM now works around security manager blocking access to some properties
    (Aaron Madlon-Kay via Mike McCandless)
  • Improvements (3)
  • LUCENE-7439 : FuzzyQuery now matches all terms within the specified edit distance, even if they are short terms
    (Mike McCandless)
  • LUCENE-7496 : Better toString for SweetSpotSimilarity
    (janhoy)
  • LUCENE-7520 : Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery when its field doesn't match the field the extraction is scoped to.
    (Cao Manh Dat via David Smiley)
  • Optimizations (1)
  • LUCENE-7501 : BKDReader should not store the split dimension explicitly in the 1D case.
    (Adrien Grand)
  • Other (3)
  • LUCENE-7513 : Upgrade randomizedtesting to 2.4.0.
    (Dawid Weiss)
  • LUCENE-7452 : Block join query exception suggests how to find a doc, which violates orthogonality requirement.
    (Mikhail Khludnev)
  • LUCENE-7438 : Renovate the Benchmark module's support for benchmarking highlighting. All highlighters are supported via SearchTravRetHighlight.
    (David Smiley)
  • Build (1)
  • LUCENE-7292 : Fix build to use "--release 8" instead of "-release 8" on Java 9 (this changed with recent EA build b135).
    (Uwe Schindler)
  • Release 6.2.1 [2016-09-20]

  • API Changes (1)
  • LUCENE-7436 : MinHashFilter's constructor, and some of its default settings, should be public.
    (Doug Turnbull via Mike McCandless)
  • Bug Fixes (4)
  • LUCENE-7417 : The standard Highlighter could throw an IllegalArgumentException when trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one term.
    (Thomas Kappler via David Smiley)
  • LUCENE-7440 : Document id skipping (PostingsEnum.advance) could throw an ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs) with large skips.
    (yonik)
  • LUCENE-7442 : MinHashFilter's ctor should validate its args.
    (Cao Manh Dat via Steve Rowe)
  • LUCENE-7318 : Fix backwards compatibility issues around StandardAnalyzer and its components, introduced with Lucene 6.2.0. The moved classes were restored in their original packages: LowercaseFilter and StopFilter, as well as several utility classes.
    (Uwe Schindler, Mike McCandless)
  • Release 6.2.0 [2016-08-25]

  • API Changes (1)
  • ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5.
  • New Features (11)
  • LUCENE-7388 : Add point based IntRangeField, FloatRangeField, LongRangeField along with supporting queries and tests
    (Nick Knize)
  • LUCENE-7381 : Add point based DoubleRangeField and RangeFieldQuery for indexing and querying on Ranges up to 4 dimensions
    (Nick Knize)
  • LUCENE-6968 : LSH Filter
    (Tommaso Teofili, Andy Hind, Cao Manh Dat)
  • LUCENE-7302 : IndexWriter methods that change the index now return a long "sequence number" indicating the effective equivalent single-threaded execution order
    (Mike McCandless)
  • LUCENE-7335 : IndexWriter's commit data is now late binding, recording key/values from a provided iterable based on when the commit actually takes place
    (Mike McCandless)
  • LUCENE-7287 : UkrainianMorfologikAnalyzer is a new dictionary-based analyzer for the Ukrainian language
    (Andriy Rysin via Mike McCandless)
  • LUCENE-7373 : Directory.renameFile, which did both renaming and fsync of the directory metadata, has been deprecated; use the new separate methods Directory.rename and Directory.syncMetaData instead
    (Robert Muir, Uwe Schindler, Mike McCandless)
  • LUCENE-7355 : Added Analyzer#normalize(), which only applies normalization to an input string.
    (Adrien Grand)
  • LUCENE-7380 : Add Polygon.fromGeoJSON for more easily creating Polygon instances from a standard GeoJSON string
    (Robert Muir, Mike McCandless)
  • LUCENE-7395 : PerFieldSimilarityWrapper requires a default similarity for calculating query norm and coordination factor in Lucene 6.x. Lucene 7 will no longer have those factors.
    (Uwe Schindler, Sascha Markus)
  • SOLR-9279 : Queries module: new ComparisonBoolFunction base class
    (Doug Turnbull via David Smiley)
  • Bug Fixes (8)
  • LUCENE-6662 : Fixed potential resource leaks.
    (Rishabh Patel via Adrien Grand)
  • LUCENE-7340 : MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug().
    (Daniel Collins, David Smiley)
  • LUCENE-7382 : Fix bug introduced by LUCENE-7355 that used the wrong default AttributeFactory for new Tokenizers.
    (Terry Smith, Uwe Schindler)
  • LUCENE-7389 : Fix FieldType.setDimensions(...) validation for the dimensionNumBytes parameter.
    (Martijn van Groningen)
  • LUCENE-7391 : Fix performance regression in MemoryIndex's fields() introduced in Lucene 6.
    (Steve Mason via David Smiley)
  • LUCENE-7395 , SOLR-9315 : Fix PerFieldSimilarityWrapper to also delegate query norm and coordination factor using a default similarity added as ctor param.
    (Uwe Schindler, Sascha Markus)
  • SOLR-9413 : Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test.
    (AppChecker, Christine Poerschke)
  • LUCENE-7419 : Fix performance bug with TokenStream.end(), where it would lookup PositionIncrementAttribute every time.
    (Mike McCandless, Robert Muir)
  • Improvements (16)
  • LUCENE-7323 : Compound file writing now verifies the incoming sub-files' checkums and segment IDs, to catch hardware issues or filesytem bugs earlier
    (Robert Muir, Mike McCandless)
  • LUCENE-6766 : Index time sorting has graduated from the misc module to core, is much simpler to use, via IndexWriter.setIndexSort, and now works with dimensional points.
    (Adrien Grand, Mike McCandless)
  • LUCENE-5931 : Detect when an application tries to reopen an IndexReader after (illegally) removing the old index and reindexing
    (Vitaly Funstein, Robert Muir, Mike McCandless)
  • LUCENE-6171 : Lucene now passes the StandardOpenOption.CREATE_NEW option when writing new files so the filesystem enforces our write-once architecture, possibly catching externally caused issues sooner
    (Robert Muir, Mike McCandless)
  • LUCENE-7318 : StandardAnalyzer has been moved from the analysis module into core and is now the default analyzer in IndexWriterConfig
    (Robert Muir, Mike McCandless)
  • LUCENE-7345 : RAMDirectory now enforces write-once files as well
    (Robert Muir, Mike McCandless)
  • LUCENE-7337 : MatchNoDocsQuery now scores with 0 normalization factor and empty boolean queries now rewrite to MatchNoDocsQuery instead of vice/versa
    (Jim Ferenczi via Mike McCandless)
  • LUCENE-7359 : Add equals() and hashCode() to Explanation
    (Alan Woodward)
  • LUCENE-7353 : ScandinavianFoldingFilterFactory and ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent.
    (Adrien Grand)
  • LUCENE-2605 : Add classic QueryParser option setSplitOnWhitespace() to control whether to split on whitespace prior to text analysis. Default behavior remains unchanged: split-on-whitespace=true.
    (Steve Rowe)
  • LUCENE-7276 : MatchNoDocsQuery now includes an optional reason for why it was used
    (Jim Ferenczi via Mike McCandless)
  • LUCENE-7355 : AnalyzingQueryParser now only applies the subset of the analysis chain that is about normalization for range/fuzzy/wildcard queries.
    (Adrien Grand)
  • LUCENE-7376 : Add support for ToParentBlockJoinQuery to fast vector highlighter's FieldQuery.
    (Martijn van Groningen)
  • LUCENE-7385 : Improve/fix assert messages in SpanScorer.
    (David Smiley)
  • LUCENE-7393 : Add ICUTokenizer option to parse Myanmar text as syllables instead of words, because the ICU word-breaking algorithm has some issues. This allows for the previous tokenization used before Lucene 5.
    (AM, Robert Muir)
  • LUCENE-7409 : Changed MMapDirectory's unmapping to work safer, but still with no guarantees. This uses a store-store barrier and yields the current thread before unmapping to allow in-flight requests to finish. The new code no longer uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones.
    (Robert Muir, Uwe Schindler)
  • Optimizations (7)
  • LUCENE-7330 , LUCENE-7339 : Speed up conjunction queries.
    (Adrien Grand)
  • LUCENE-7356 : SearchGroup tweaks.
    (Christine Poerschke)
  • LUCENE-7351 : Doc id compression for points.
    (Adrien Grand)
  • LUCENE-7371 : Point values are now better compressed using run-length encoding.
    (Adrien Grand)
  • LUCENE-7311 : Cached term queries do not seek the terms dictionary anymore.
    (Adrien Grand)
  • LUCENE-7396 , LUCENE-7399 : Faster flush of points.
    (Adrien Grand, Mike McCandless)
  • LUCENE-7406 : Automaton and PrefixQuery tweaks (fewer object (re)allocations).
    (Christine Poerschke)
  • Other (6)
  • LUCENE-4787 : Fixed some highlighting javadocs.
    (Michael Dodsworth via Adrien Grand)
  • LUCENE-7334 : Update ASM dependency to 5.1.
    (Uwe Schindler)
  • LUCENE-7346 : Update forbiddenapis to version 2.2.
    (Uwe Schindler)
  • LUCENE-7360 : Explanation.toHtml() is deprecated.
    (Alan Woodward)
  • LUCENE-7372 : Factor out an org.apache.lucene.search.FilterWeight class.
    (Christine Poerschke, Adrien Grand, David Smiley)
  • LUCENE-7384 : Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to reuse the existing Similarity instead of creating a new one.
    (David Smiley)
  • Release 6.1.0 [2016-06-17]

  • New Features (5)
  • LUCENE-7099 : Add LatLonDocValuesField.newDistanceSort to the sandbox.
    (Robert Muir)
  • LUCENE-7140 : Add PlanetModel.bisection to spatial3d
    (Karl Wright via Mike McCandless)
  • LUCENE-7069 : Add LatLonPoint.nearest, to find nearest N points to a provided query point
    (Mike McCandless)
  • LUCENE-7234 : Added InetAddressPoint.nextDown/nextUp to easily generate range queries with excluded bounds.
    (Adrien Grand)
  • LUCENE-7300 : The misc module now has a directory wrapper that uses hard-links if applicable and supported when copying files from another FSDirectory in Directory#copyFrom.
    (Simon Willnauer)
  • API Changes (6)
  • LUCENE-7184 : Refactor LatLonPoint encoding methods to new GeoEncodingUtils helper class in core geo package. Also refactors LatLonPointTests to TestGeoEncodingUtils
    (Nick Knize)
  • LUCENE-7163 : refactor GeoRect, Polygon, and GeoUtils tests to geo package in core
    (Nick Knize)
  • LUCENE-7152 : Refactor GeoUtils from lucene-spatial package to
    (Nick Knize)
  • LUCENE-7141 : Switch OfflineSorter's ByteSequencesReader to BytesRefIterator
    (Mike McCandless)
  • LUCENE-7150 : Spatial3d gets useful APIs to create common shape queries, matching LatLonPoint.
    (Karl Wright via Mike McCandless)
  • LUCENE-7243 : Removed the LeafReaderContext parameter from QueryCachingPolicy#shouldCache.
    (Adrien Grand)
  • Optimizations (14)
  • LUCENE-7071 : Reduce bytes copying in OfflineSorter, giving ~10% speedup on merging 2D LatLonPoint values
    (Mike McCandless)
  • LUCENE-7105 , LUCENE-7215 : Optimize LatLonPoint's newDistanceQuery.
    (Robert Muir)
  • LUCENE-7097 : IntroSorter now recurses to 2 * log_2(count) quicksort stack depth before switching to heapsort
    (Adrien Grand, Mike McCandless)
  • LUCENE-7115 : Speed up FieldCache.CacheEntry toString by setting initial StringBuilder capacity
    (Gregory Chanan)
  • LUCENE-7147 : Improve disjoint check for geo distance query traversal
    (Ryan Ernst, Robert Muir, Mike McCandless)
  • LUCENE-7153 : GeoPointField and LatLonPoint polygon queries now support multiple polygons and holes, with memory usage independent of polygon complexity.
    (Karl Wright, Mike McCandless, Robert Muir)
  • LUCENE-7159 : Speed up LatLonPoint polygon performance.
    (Robert Muir, Ryan Ernst)
  • LUCENE-7211 : Reduce memory & GC for spatial RPT Intersects when the number of matching docs is small.
    (Jeff Wartes, David Smiley)
  • LUCENE-7235 : LRUQueryCache should not take a lock for segments that it will not cache on anyway.
    (Adrien Grand)
  • LUCENE-7238 : Explicitly disable the query cache in MemoryIndex#createSearcher.
    (Adrien Grand)
  • LUCENE-7237 : LRUQueryCache now prefers returning an uncached Scorer than waiting on a lock.
    (Adrien Grand)
  • LUCENE-7261 , LUCENE-7262 , LUCENE-7264 , LUCENE-7258 : Speed up DocIdSetBuilder (which is used by TermsQuery, multi-term queries and several point queries).
    (Adrien Grand, Jeff Wartes, David Smiley)
  • LUCENE-7299 : Speed up BytesRefHash.sort() using radix sort.
    (Adrien Grand)
  • LUCENE-7306 : Speed up points indexing and merging using radix sort.
    (Adrien Grand)
  • Bug Fixes (9)
  • LUCENE-7127 : Fix corner case bugs in GeoPointDistanceQuery.
    (Robert Muir)
  • LUCENE-7166 : Fix corner case bugs in LatLonPoint/GeoPointField bounding box queries.
    (Robert Muir)
  • LUCENE-7168 : Switch to stable encode for geo3d, remove quantization test leniency, remove dead code
    (Mike McCandless)
  • LUCENE-7301 : Multiple doc values updates to the same document within one update batch could be applied in the wrong order resulting in the wrong updated value
    (Ishan Chattopadhyaya, hossman, Mike McCandless)
  • LUCENE-7312 : Fix geo3d's x/y/z double to int encoding to ensure it always rounds down
    (Karl Wright, Mike McCandless)
  • LUCENE-7132 : BooleanQuery sometimes assigned too-low scores in cases where ranges of documents had only a single clause matching while other ranges had more than one clause matching
    (Ahmet Arslan, hossman, Mike McCandless)
  • LUCENE-7286 : Added support for highlighting SynonymQuery.
    (Adrien Grand)
  • LUCENE-7291 : Spatial heatmap faceting could mis-count when the heatmap crosses the dateline and indexed non-point shapes are much bigger than the heatmap region.
    (David Smiley)
  • LUCENE-7333 : Fix test bug where randomSimpleString() generated a filename that is a reserved device name on Windows.
    (Uwe Schindler, Mike McCandless)
  • Other (9)
  • LUCENE-7295 : TermAutomatonQuery.hashCode calculates Automaton.toDot().hash, equivalence relationship replaced with object identity.
    (Dawid Weiss)
  • LUCENE-7277 : Make Query.hashCode and Query.equals abstract.
    (Paul Elschot, Dawid Weiss)
  • LUCENE-7174 : Upgrade randomizedtesting to 2.3.4.
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-7205 : Remove repeated nl.getLength() calls in (Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder.
    (Christine Poerschke)
  • LUCENE-7210 : Make TestCore*Parser's analyzer choice override-able
    (Christine Poerschke, Daniel Collins)
  • LUCENE-7263 : Make queryparser/xml/CoreParser's SpanQueryBuilderFactory accessible to deriving classes.
    (Daniel Collins via Christine Poerschke)
  • SOLR-9109 / SOLR-9121 : Allow specification of a custom Ivy settings file via system property "ivysettings.xml".
    (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe)
  • LUCENE-7206 : Improve the ToParentBlockJoinQuery's explain by including the explain of the best matching child doc.
    (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen)
  • LUCENE-7307 : Add getters to the PointInSetQuery and PointRangeQuery queries.
    (Martijn van Groningen, Adrien Grand)
  • Build (2)
  • LUCENE-7292 : Use '-release' instead of '-source/-target' during compilation on Java 9+ to ensure real cross-compilation.
    (Uwe Schindler)
  • LUCENE-7296 : Update forbiddenapis to version 2.1.
    (Uwe Schindler)
  • Release 6.0.1 [2016-05-28]

  • New Features (1)
  • LUCENE-7278 : Spatial-extras DateRangePrefixTree's Calendar is now configurable, to e.g. clear the Gregorian Change Date. Also, toString(cal) is now identical to DateTimeFormatter.ISO_INSTANT.
    (David Smiley)
  • Bug Fixes (10)
  • LUCENE-7187 : Block join queries' Weight#extractTerms(...) implementations should delegate to the wrapped weight.
    (Martijn van Groningen)
  • LUCENE-7209 : Fixed explanations of FunctionScoreQuery.
    (Adrien Grand)
  • LUCENE-7232 : Fixed InetAddressPoint.newPrefixQuery, which was generating an incorrect query when the prefix length was not a multiple of 8.
    (Adrien Grand)
  • LUCENE-7279 : JapaneseTokenizer throws ArrayIndexOutOfBoundsException on some valid inputs
    (Mike McCandless)
  • LUCENE-7188 : remove incorrect sanity check in NRTCachingDirectory.listAll() that led to IllegalStateException being thrown when nothing was wrong.
    (David Smiley, yonik)
  • LUCENE-7219 : Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders match the underlying queries' (lower|upper)Term optionality logic.
    (Kaneshanathan Srivisagan, Christine Poerschke)
  • LUCENE-7257 : Fixed PointValues#size(IndexReader, String), docCount, minPackedValue and maxPackedValue to skip leaves that do not have points rather than raising an IllegalStateException.
    (Adrien Grand)
  • LUCENE-7284 : GapSpans needs to implement positionsCost().
    (Daniel Bigham, Alan Woodward)
  • LUCENE-7231 : WeightedSpanTermExtractor didn't deal correctly with single-term phrase queries.
    (Eva Popenda, Alan Woodward)
  • LUCENE-7293 : Don't try to highlight GeoPoint queries
    (Britta Weber, Nick Knize, Mike McCandless, Uwe Schindler)
  • Documentation (1)
  • LUCENE-7223 : Improve XXXPoint javadocs to make it clear that you should separately add StoredField if you want to retrieve these field values at search time
    (Greg Huber, Robert Muir, Mike McCandless)
  • Release 6.0.0 [2016-04-08]

  • System Requirements (2)
  • LUCENE-5950 : Move to Java 8 as minimum Java version.
    (Ryan Ernst, Uwe Schindler)
  • LUCENE-6069 : Lucene Core now gets compiled with Java 8 "compact1" profile, all other modules with "compact2".
    (Robert Muir, Uwe Schindler)
  • New Features (17)
  • LUCENE-6631 : Lucene Document classification
    (Tommaso Teofili, Alessandro Benedetti)
  • LUCENE-6747 : FingerprintFilter is a TokenFilter that outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. Useful for normalizing short text in clustering/linking tasks.
    (Mark Harwood, Adrien Grand)
  • LUCENE-5735 : NumberRangePrefixTreeStrategy now includes interval/range faceting for counting ranges that align with the underlying terms as defined by the NumberRangePrefixTree (e.g. familiar date units like days).
    (David Smiley)
  • LUCENE-6711 : Use CollectionStatistics.docCount() for IDF and average field length computations, to avoid skew from documents that don't have the field.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-6758 : Use docCount+1 for DefaultSimilarity's IDF, so that queries containing nonexistent fields won't screw up querynorm.
    (Terry Smith, Robert Muir)
  • SOLR-7876 : The QueryTimeout interface now has a isTimeoutEnabled method that can return false to exit from ExitableDirectoryReader wrapping at the point fields() is called.
    (yonik)
  • LUCENE-6825 : Add low-level support for block-KD trees
    (Mike McCandless)
  • LUCENE-6852 , LUCENE-6975 : Add support for points (dimensionally indexed values) to index, document and codec APIs, including a simple text implementation.
    (Mike McCandless)
  • LUCENE-6861 : Create Lucene60Codec, supporting points.
    (Mike McCandless)
  • LUCENE-6879 : Allow to define custom CharTokenizer instances without subclassing using Java 8 lambdas or method references.
    (Uwe Schindler)
  • LUCENE-6881 : Cutover all BKD implementations to points
    (Mike McCandless)
  • LUCENE-6837 : Add N-best output support to JapaneseTokenizer.
    (Hiroharu Konno via Christian Moen)
  • LUCENE-6962 : Add per-dimension min/max to points
    (Mike McCandless)
  • LUCENE-6975 : Add ExactPointQuery, to match a single N-dimensional point
    (Robert Muir, Mike McCandless)
  • LUCENE-6989 : Add preliminary support for MMapDirectory unmapping in Java 9.
    (Uwe Schindler, Chris Hegarty, Peter Levart)
  • LUCENE-7040 : Upgrade morfologik-stemming to version 2.1.0.
    (Dawid Weiss)
  • LUCENE-7048 : Add XXXPoint.newSetQuery, to create a query that efficiently matches all documents containing any of the specified point values. This is the analog of TermsQuery, but for points instead.
    (Adrien Grand, Robert Muir, Mike McCandless)
  • API Changes (17)
  • LUCENE-7094 : BBoxStrategy and PointVectorStrategy now support PointValues (in addition to legacy numeric trie). Their APIs were changed a little and also made more consistent. PointValues/Trie is optional, DocValues is optional, stored value is optional.
    (Nick Knize, David Smiley)
  • LUCENE-6067 : Accountable.getChildResources has a default implementation returning the empty list.
    (Robert Muir)
  • LUCENE-6583 : FilteredQuery has been removed. Instead, you can construct a BooleanQuery with one MUST clause for the query, and one FILTER clause for the filter.
    (Adrien Grand)
  • LUCENE-6651 : AttributeImpl#reflectWith(AttributeReflector) was made abstract and has no reflection-based default implementation anymore.
    (Uwe Schindler)
  • LUCENE-6706 : PayloadTermQuery and PayloadNearQuery have been removed. Instead, use PayloadScoreQuery to wrap any SpanQuery.
    (Alan Woodward)
  • LUCENE-6829 : OfflineSorter, and the classes that use it (suggesters, hunspell) now do all temporary file IO via Directory instead of directly through java's temp dir. Directory.createTempOutput creates a uniquely named IndexOutput, and the new IndexOutput.getName returns its name
    (Dawid Weiss, Robert Muir, Mike McCandless)
  • LUCENE-6917 : Deprecate and rename NumericXXX classes to LegacyNumericXXX in favor of points
    (Mike McCandless)
  • LUCENE-6947 : SortField.missingValue is now protected. You can read its value using the new SortField.getMissingValue getter.
    (Adrien Grand)
  • LUCENE-7028 : Remove duplicate method in LegacyNumericUtils.
    (Uwe Schindler)
  • LUCENE-7052 , LUCENE-7053 : Remove custom comparators from BytesRef class and solely use natural byte[] comparator throughout codebase. This also simplifies API of BytesRefHash. It also replaces the natural comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
    (Mike McCandless, Uwe Schindler, Robert Muir)
  • LUCENE-7060 : Update Spatial4j to 0.6. The package com.spatial4j.core is now org.locationtech.spatial4j.
    (David Smiley)
  • LUCENE-7058 : Add getters to various Query implementations
    (Guillaume Smet via Alan Woodward)
  • LUCENE-7064 : MultiPhraseQuery is now immutable and should be constructed with MultiPhraseQuery.Builder.
    (Luc Vanlerberghe via Adrien Grand)
  • LUCENE-7072 : Geo3DPoint always uses WGS84 planet model.
    (Robert Muir, Mike McCandless)
  • LUCENE-7056 : Geo3D classes are in different packages now.
    (David Smiley)
  • LUCENE-6952 : These classes are now abstract: FilterCodecReader, FilterLeafReader, FilterCollector, FilterDirectory. And some Filter* classes in lucene-test-framework too.
    (David Smiley)
  • SOLR-8867 : FunctionValues.getRangeScorer now takes a LeafReaderContext instead of an IndexReader, and avoids matching documents without a value in the field for numeric fields.
    (yonik)
  • Optimizations (5)
  • LUCENE-6891 : Use prefix coding when writing points in each leaf block in the default codec, to reduce the index
    (Mike McCandless)
  • LUCENE-6901 : Optimize points indexing: use faster IntroSorter instead of InPlaceMergeSorter, and specialize 1D merging to merge sort the already sorted segments instead of re-indexing
    (Mike McCandless)
  • LUCENE-6793 : LegacyNumericRangeQuery.hashCode() is now less subject to hash collisions.
    (J.B. Langston via Adrien Grand)
  • LUCENE-7050 : TermsQuery is now cached more aggressively by the default query caching policy.
    (Adrien Grand)
  • LUCENE-7066 : PointRangeQuery got optimized for the case that all documents have a value and all points from the segment match.
    (Adrien Grand)
  • Changes in Runtime Behavior (3)
  • LUCENE-6789 : IndexSearcher's default Similarity is changed to BM25Similarity. Use ClassicSimilarity to get the old vector space DefaultSimilarity.
    (Robert Muir)
  • LUCENE-6886 : Reserve the .tmp file name extension for temp files, and codec components are no longer allowed to use this extension
    (Robert Muir, Mike McCandless)
  • LUCENE-6835 : Directory.listAll now returns entries in sorted order, to not leak platform-specific behavior, and "retrying file deletion" is now the responsibility of Directory.deleteFile, not the caller.
    (Robert Muir, Mike McCandless)
  • Tests (1)
  • LUCENE-7009 : Add expectThrows utility to LuceneTestCase. This uses a lambda expression to encapsulate a statement that is expected to throw an exception.
    (Ryan Ernst)
  • Bug Fixes (7)
  • LUCENE-7065 : Fix the explain for the global ordinals join query. Before the explain would also indicate that non matching documents would match. On top of that with score mode average, the explain would fail with a NPE.
    (Martijn van Groningen)
  • LUCENE-7101 : OfflineSorter had O(N^2) merge cost, and used too many temporary file descriptors, for large sorts
    (Mike McCandless)
  • LUCENE-7111 : DocValuesRangeQuery.newLongRange behaves incorrectly for Long.MAX_VALUE and Long.MIN_VALUE
    (Ishan Chattopadhyaya via Steve Rowe)
  • LUCENE-7139 : Fix bugs in geo3d's Vincenty surface distance implementation
    (Karl Wright via Mike McCandless)
  • LUCENE-7112 : WeightedSpanTermExtractor.extractUnknownQuery is only called on queries that could not be extracted.
    (Adrien Grand)
  • LUCENE-7126 : Remove GeoPointDistanceRangeQuery. This query was implemented with boolean NOT, and incorrect for multi-valued documents.
    (Robert Muir)
  • LUCENE-7158 : Consistently use earth's WGS84 mean radius wherever our geo search implementations approximate the earth as a sphere
    (Karl Wright via Mike McCandless)
  • Other (5)
  • LUCENE-7035 : Upgrade icu4j to 56.1/unicode 8.
    (Robert Muir)
  • LUCENE-7087 : Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>' as document instead of 'Document'.
    (Martijn van Groningen)
  • LUCENE-7091 : Add doc values support to MemoryIndex
    (Martijn van Groningen, David Smiley)
  • LUCENE-7093 : Add point values support to MemoryIndex
    (Martijn van Groningen, Mike McCandless)
  • LUCENE-7095 : Add point values support to the numeric field query time join.
    (Martijn van Groningen, Mike McCandless)
  • Release 5.5.5 [2017-10-24]

  • Changes in Runtime Behavior (1)
  • Resolving of external entities in queryparser/xml/CoreParser is disallowed by default. See SOLR-11477 for details.
  • Bug Fixes (2)
  • LUCENE-7419 : Fix performance bug with TokenStream.end(), where it would lookup PositionIncrementAttribute every time.
    (Mike McCandless, Robert Muir)
  • SOLR-11477 : Disallow resolving of external entities in queryparser/xml/CoreParser by default.
    (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
  • Release 5.5.4 [2017-02-15]

  • Bug Fixes (8)
  • LUCENE-7417 : The standard Highlighter could throw an IllegalArgumentException when trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one term.
    (Thomas Kappler via David Smiley)
  • LUCENE-7657 : Fixed potential memory leak in the case that a (Span)TermQuery with a TermContext is cached.
    (Adrien Grand)
  • LUCENE-7647 : Made stored fields reclaim native memory more aggressively when configured with BEST_COMPRESSION. This could otherwise result in out-of-memory issues.
    (Adrien Grand)
  • LUCENE-7562 : CompletionFieldsConsumer sometimes throws NullPointerException on ghost fields
    (Oliver Eilhard via Mike McCandless)
  • LUCENE-7547 : JapaneseTokenizerFactory was failing to close the dictionary file it opened
    (Markus via Mike McCandless)
  • LUCENE-6914 : Fixed DecimalDigitFilter in case of supplementary code points.
    (Hossman)
  • LUCENE-7440 : Document id skipping (PostingsEnum.advance) could throw an ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs) with large skips.
    (yonik)
  • LUCENE-7570 : IndexWriter may deadlock if a commit is running while there are too many merges running and one of the merges hits a tragic exception
    (Joey Echeverria via Mike McCandless)
  • Other (1)
  • LUCENE-6989 : Backport MMapDirectory's unmapping code from Lucene 6.4 to use MethodHandles. This allows it to work with Java 9 (EA build 150 and later).
    (Uwe Schindler)
  • Build (3)
  • LUCENE-7543 : Make changes-to-html target an offline operation, by moving the Lucene and Solr DOAP RDF files into the Git source repository under dev-tools/doap/ and then pulling release dates from those files, rather than from JIRA.
    (Mano Kovacs, hossman, Steve Rowe)
  • LUCENE-7596 : Update Groovy to version 2.4.8 to allow building with Java 9 build 148+. Also update JGit version for working-copy checks. This does not fix all issues with Java 9, but allows to build the distribution.
    (Uwe Schindler)
  • LUCENE-7651 : Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation build to inject "Google Code Prettify" without adding Javascript to Javadocs's bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are built with Java 7, as there is no generic way in Java 7 to inject Javascript without breaking Java 8 (and possible paid Java 7 security updates). This fix also updates Prettify to latest version to work around a Google Chrome issue.
    (Uwe Schindler)
  • Release 5.5.3 [2016-09-09]

  • (No Changes)

    Release 5.5.2 [2016-06-25]

  • Bug Fixes (11)
  • LUCENE-7065 : Fix the explain for the global ordinals join query. Before the explain would also indicate that non matching documents would match. On top of that with score mode average, the explain would fail with a NPE.
    (Martijn van Groningen)
  • LUCENE-7111 : DocValuesRangeQuery.newLongRange behaves incorrectly for Long.MAX_VALUE and Long.MIN_VALUE
    (Ishan Chattopadhyaya via Steve Rowe)
  • LUCENE-7139 : Fix bugs in geo3d's Vincenty surface distance implementation
    (Karl Wright via Mike McCandless)
  • LUCENE-7187 : Block join queries' Weight#extractTerms(...) implementations should delegate to the wrapped weight.
    (Martijn van Groningen)
  • LUCENE-7279 : JapaneseTokenizer throws ArrayIndexOutOfBoundsException on some valid inputs
    (Mike McCandless)
  • LUCENE-7219 : Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders match the underlying queries' (lower|upper)Term optionality logic.
    (Kaneshanathan Srivisagan, Christine Poerschke)
  • LUCENE-7284 : GapSpans needs to implement positionsCost().
    (Daniel Bigham, Alan Woodward)
  • LUCENE-7231 : WeightedSpanTermExtractor didn't deal correctly with single-term phrase queries.
    (Eva Popenda, Alan Woodward)
  • LUCENE-7301 : Multiple doc values updates to the same document within one update batch could be applied in the wrong order resulting in the wrong updated value
    (Ishan Chattopadhyaya, hossman, Mike McCandless)
  • LUCENE-7132 : BooleanQuery sometimes assigned too-low scores in cases where ranges of documents had only a single clause matching while other ranges had more than one clause matching
    (Ahmet Arslan, hossman, Mike McCandless)
  • LUCENE-7291 : Spatial heatmap faceting could mis-count when the heatmap crosses the dateline and indexed non-point shapes are much bigger than the heatmap region.
    (David Smiley)
  • Release 5.5.1 [2016-05-05]

  • Bug fixes (3)
  • LUCENE-7112 : WeightedSpanTermExtractor.extractUnknownQuery is only called on queries that could not be extracted.
    (Adrien Grand)
  • LUCENE-7188 : remove incorrect sanity check in NRTCachingDirectory.listAll() that led to IllegalStateException being thrown when nothing was wrong.
    (David Smiley, yonik)
  • LUCENE-7209 : Fixed explanations of FunctionScoreQuery.
    (Adrien Grand)
  • Release 5.5.0 [2016-02-22]

  • New Features (6)
  • LUCENE-5868 : JoinUtil.createJoinQuery(..,NumericType,..) query-time join for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values.
    (Alexey Zelin via Mikhail Khludnev)
  • LUCENE-6939 : Add exponential reciprocal scoring to BlendedInfixSuggester, to even more strongly favor suggestions that match closer to the beginning
    (Arcadius Ahouansou via Mike McCandless)
  • LUCENE-6958 : Improved CustomAnalyzer to take class references to factories as alternative to their SPI name. This enables compile-time safety when defining analyzer's components.
    (Uwe Schindler, Shai Erera)
  • LUCENE-6818 , LUCENE-6986 : Add DFISimilarity implementing the divergence from independence model.
    (Ahmet Arslan via Robert Muir)
  • SOLR-4619 : Added removeAllAttributes() to AttributeSource, which removes all previously added attributes.
  • LUCENE-7010 : Added MergePolicyWrapper to allow easy wrapping of other policies.
    (Shai Erera)
  • API Changes (10)
  • LUCENE-6997 : refactor sandboxed GeoPointField and query classes to lucene-spatial module under new lucene.spatial.geopoint package
    (Nick Knize)
  • LUCENE-6908 : GeoUtils static relational methods have been refactored to new GeoRelationUtils and now correctly handle large irregular rectangles, and pole crossing distance queries.
    (Nick Knize)
  • LUCENE-6900 : Grouping sortWithinGroup variables used to allow null to mean Sort.RELEVANCE. Null is no longer permitted.
    (David Smiley)
  • LUCENE-6919 : The Scorer class has been refactored to expose an iterator instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed to twoPhaseIterator() for consistency.
    (Adrien Grand)
  • LUCENE-6973 : TeeSinkTokenFilter no longer accepts a SinkFilter (the latter has been removed). If you wish to filter the sinks, you can wrap them with any other TokenFilter (e.g. a FilteringTokenFilter). Also, you can no longer add a SinkTokenStream to an existing TeeSinkTokenFilter. If you need to share multiple streams with a single sink, chain them with multiple TeeSinkTokenFilters. DateRecognizerSinkFilter was renamed to DateRecognizerFilter and moved under analysis/common. TokenTypeSinkFilter was removed (use TypeTokenFilter instead). TokenRangeSinkFilter was removed.
    (Shai Erera, Uwe Schindler)
  • LUCENE-6980 : Default applyAllDeletes to true when opening near-real-time readers
    (Mike McCandless)
  • LUCENE-6981 : SpanQuery.getTermContexts() helper methods are now public, and SpanScorer has a public getSpans() method.
    (Alan Woodward)
  • LUCENE-6932 : IndexInput.seek implementations now throw EOFException if you seek beyond the end of the file
    (Adrien Grand, Mike McCandless)
  • LUCENE-6988 : IndexableField.tokenStream() no longer throws IOException
    (Alan Woodward)
  • LUCENE-7028 : Deprecate a duplicate method in NumericUtils.
    (Uwe Schindler)
  • Optimizations (9)
  • LUCENE-6930 : Decouple GeoPointField from NumericType by using a custom and efficient GeoPointTokenStream and TermEnum designed for GeoPoint prefix terms.
    (Nick Knize)
  • LUCENE-6951 : Improve GeoPointInPolygonQuery using point orientation based line crossing algorithm, and adding result for multi-value docs when least 1 point satisfies polygon criteria.
    (Nick Knize)
  • LUCENE-6889 : BooleanQuery.rewrite now performs some query optimization, in particular to rewrite queries that look like: "+*:* #filter" to a "ConstantScore(filter)".
    (Adrien Grand)
  • LUCENE-6912 : Grouping's Collectors now calculate a response to needsScores() instead of always 'true'.
    (David Smiley)
  • LUCENE-6815 : DisjunctionScorer now advances two-phased iterators lazily, stopping to evaluate them as soon as a single one matches. The other iterators will be confirmed lazily when computing score() or freq().
    (Adrien Grand)
  • LUCENE-6926 : MUST_NOT clauses now use the match cost API to run the slow bits last whenever possible.
    (Adrien Grand)
  • LUCENE-6944 : BooleanWeight no longer creates sub-scorers if BS1 is not applicable.
    (Adrien Grand)
  • LUCENE-6940 : MUST_NOT clauses execute faster, especially when they are sparse.
    (Adrien Grand)
  • LUCENE-6470 : Improve efficiency of TermsQuery constructors.
    (Robert Muir)
  • Bug Fixes (10)
  • LUCENE-6976 : BytesRefTermAttributeImpl.copyTo NPE'ed if BytesRef was null. Added equals & hashCode, and a new test for these things.
    (David Smiley)
  • LUCENE-6932 : RAMDirectory's IndexInput was failing to throw EOFException in some cases
    (Stéphane Campinas, Adrien Grand via Mike McCandless)
  • LUCENE-6896 : Don't treat the smallest possible norm value as an infinitely long document in SimilarityBase or BM25Similarity. Add more warnings to sims that will not work well with extreme tf values.
    (Ahmet Arslan, Robert Muir)
  • LUCENE-6984 : SpanMultiTermQueryWrapper no longer modifies its wrapped query.
    (Alan Woodward, Adrien Grand)
  • LUCENE-6998 : Fix a couple places to better detect truncated index files as corruption.
    (Robert Muir, Mike McCandless)
  • LUCENE-7002 : Fixed MultiCollector to not throw a NPE if setScorer is called after one of the sub collectors is done collecting.
    (John Wang, Adrien Grand)
  • LUCENE-7027 : Fixed NumericTermAttribute to not throw IllegalArgumentException after NumericTokenStream was exhausted.
    (Uwe Schindler, Lee Hinman, Mike McCandless)
  • LUCENE-7018 : Fix GeoPointTermQueryConstantScoreWrapper to add document on first GeoPointField match.
    (Nick Knize)
  • LUCENE-7019 : Add two-phase iteration to GeoPointTermQueryConstantScoreWrapper.
    (Robert Muir via Nick Knize)
  • LUCENE-6989 : Improve MMapDirectory's unmapping checks to catch more non-working cases. The unmap-hack does not yet work with recent Java 9. Official support will come with Lucene 6.
    (Uwe Schindler)
  • Other (15)
  • LUCENE-6924 : Upgrade randomizedtesting to 2.3.2.
    (Dawid Weiss)
  • LUCENE-6920 : Improve custom function checks in expressions module to use MethodHandles and work without extra security privileges.
    (Uwe Schindler, Robert Muir)
  • LUCENE-6921 : Fix SPIClassIterator#isParentClassLoader to don't require extra permissions.
    (Uwe Schindler)
  • LUCENE-6923 : Fix RamUsageEstimator to access private fields inside AccessController block for computing size.
    (Robert Muir)
  • LUCENE-6907 : make TestParser extendable, rename test/.../xml/ NumericRangeQueryQuery.xml to NumericRangeQuery.xml
    (Christine Poerschke)
  • LUCENE-6925 : add ForceMergePolicy class in test-framework
    (Christine Poerschke)
  • LUCENE-6945 : factor out TestCorePlus(Queries|Extensions)Parser from TestParser, rename TestParser to TestCoreParser
    (Christine Poerschke)
  • LUCENE-6949 : fix (potential) resource leak in SynonymFilterFactory ( https://scan.coverity.com/projects/5620 CID 120656) (Christine Poerschke, Coverity Scan (via Rishabh Patel))
  • LUCENE-6961 : Improve Exception handling in AnalysisFactories / AnalysisSPILoader: Don't wrap exceptions occuring in factory's ctor inside InvocationTargetException.
    (Uwe Schindler)
  • LUCENE-6965 : Expression's JavascriptCompiler now throw ParseException with bad function names or bad arity instead of IllegalArgumentException.
    (Tomás Fernández Löbbe, Uwe Schindler, Ryan Ernst)
  • LUCENE-6964 : String-based signatures in JavascriptCompiler replaced with better compile-time-checked MethodType; generated class files are no longer marked as synthetic.
    (Uwe Schindler)
  • LUCENE-6978 : Refactor several code places that lookup locales by string name to use BCP47 locale tag instead. LuceneTestCase now also prints locales on failing tests this way. Locale#forLanguageTag() and Locale#toString() were placed on list of forbidden signatures.
    (Uwe Schindler, Robert Muir)
  • LUCENE-6988 : You can now add IndexableFields directly to a MemoryIndex, and create a MemoryIndex from a lucene Document.
    (Alan Woodward)
  • LUCENE-7005 : TieredMergePolicy tweaks (>= vs. >, @see get vs. set)
    (Christine Poerschke)
  • LUCENE-7006 : increase BaseMergePolicyTestCase use (TestNoMergePolicy and TestSortingMergePolicy now extend it, TestUpgradeIndexMergePolicy added)
    (Christine Poerschke)
  • Release 5.4.1 [2016-01-23]

  • Bug Fixes (9)
  • LUCENE-6910 : fix 'if ... > Integer.MAX_VALUE' check in (Binary|Numeric)DocValuesFieldUpdates.merge ( https://scan.coverity.com/projects/5620 CID 119973 and CID 120081) (Christine Poerschke, Coverity Scan (via Rishabh Patel))
  • LUCENE-6946 : SortField.equals now takes the missingValue parameter into account.
    (Adrien Grand)
  • LUCENE-6918 : LRUQueryCache.onDocIdSetEviction is only called when at least one DocIdSet is being evicted.
    (Adrien Grand)
  • LUCENE-6929 : Fix SpanNotQuery rewriting to not drop the pre/post parameters.
    (Tim Allison via Adrien Grand)
  • LUCENE-6950 : Fix FieldInfos handling of UninvertingReader, e.g. do not hide the true docvalues update generation or other properties.
    (Ishan Chattopadhyaya via Robert Muir)
  • LUCENE-6948 : Fix ArrayIndexOutOfBoundsException in PagedBytes$Reader.fill by removing an unnecessary long-to-int cast.
    (Michael Lawley via Christine Poerschke)
  • SOLR-7865 : BlendedInfixSuggester was returning too many results
    (Arcadius Ahouansou via Mike McCandless)
  • LUCENE-6970 : Fixed off-by-one error in Lucene54DocValuesProducer that could potentially corrupt doc values.
    (Adrien Grand)
  • LUCENE-2229 : Fix Highlighter's SimpleSpanFragmenter when multiple adjacent stop words following a span can unduly make the fragment way too long.
    (Elmer Garduno, Lukhnos Liu via David Smiley)
  • Release 5.4.0 [2015-12-14]

  • New Features (9)
  • LUCENE-6875 : New Serbian Filter.
    (Nikola Smolenski via Robert Muir, Dawid Weiss)
  • LUCENE-6720 : New FunctionRangeQuery wrapper around ValueSourceScorer (returned from ValueSource/FunctionValues.getRangeScorer()).
    (David Smiley)
  • LUCENE-6724 : Add utility APIs to GeoHashUtils to compute neighbor geohash cells
    (Nick Knize via Mike McCandless) .
  • LUCENE-6737 : Add DecimalDigitFilter which folds unicode digits to basic latin.
    (Robert Muir)
  • LUCENE-6699 : Add integration of BKD tree and geo3d APIs to give fast, very accurate query to find all indexed points within an earth-surface shape
    (Karl Wright, Mike McCandless)
  • LUCENE-6838 : Added IndexSearcher#getQueryCache and #getQueryCachingPolicy.
    (Adrien Grand)
  • LUCENE-6844 : PayloadScoreQuery can include or exclude underlying span scores from its score calculations
    (Bill Bell, Alan Woodward)
  • LUCENE-6778 : Add GeoPointDistanceRangeQuery, to search for points within a "ring" (beyond a minimum distance and below a maximum distance)
    (Nick Knize via Mike McCandless)
  • LUCENE-6874 : Add a new UnicodeWhitespaceTokenizer to analysis/common that uses Unicode character properties extracted from ICU4J to tokenize text on whitespace. This tokenizer will split on non-breaking space (NBSP), too.
    (David Smiley, Uwe Schindler, Steve Rowe)
  • API Changes (12)
  • LUCENE-6590 : Query.setBoost(), Query.getBoost() and Query.clone() are gone. In order to apply boosts, you now need to wrap queries in a BoostQuery.
    (Adrien Grand)
  • LUCENE-6716 : SpanPayloadCheckQuery now takes a List<BytesRef> rather than a Collection<byte[]>.
    (Alan Woodward)
  • LUCENE-6489 : The various span payload queries have been moved to the queries submodule, and PayloadSpanUtil is now in sandbox.
    (Alan Woodward)
  • LUCENE-6650 : The spatial module no longer uses Filter in any way. All spatial Filters are now subclass Query. The spatial heatmap/facet API now accepts a Bits parameter to filter counts.
    (David Smiley, Adrien Grand)
  • LUCENE-6803 : Deprecate sandbox Regexp Query.
    (Uwe Schindler)
  • LUCENE-6301 : org.apache.lucene.search.Filter is now deprecated. You should use Query objects instead of Filters, and the BooleanClause.Occur.FILTER clause in order to let Lucene know that a Query should be used for filtering but not scoring.
  • LUCENE-6939 : SpanOrQuery.addClause is now deprecated, clauses should all be provided at construction time.
    (Paul Elschot via Adrien Grand)
  • LUCENE-6855 : CachingWrapperQuery is deprecated and will be removed in 6.0.
    (Adrien Grand)
  • LUCENE-6870 : DisjunctionMaxQuery#add is now deprecated, clauses should all be provided at construction time.
    (Adrien Grand)
  • LUCENE-6884 : Analyzer.tokenStream() and Tokenizer.setReader() are no longer declared as throwing IOException.
    (Alan Woodward)
  • LUCENE-6849 : Expose IndexWriter.flush() method, to move all in-memory segments to disk without opening a near-real-time reader nor calling fsync
    (Robert Muir, Simon Willnauer, Mike McCandless)
  • LUCENE-6911 : Add correct StandardQueryParser.getMultiFields() method, deprecate no-op StandardQueryParser.getMultiFields(CharSequence[]) method. (Christine Poerschke, Mikhail Khludnev, Coverity Scan (via Rishabh Patel))
  • Optimizations (18)
  • LUCENE-6708 : TopFieldCollector does not compute the score several times on the same document anymore.
    (Adrien Grand)
  • LUCENE-6720 : ValueSourceScorer, returned from FunctionValues.getRangeScorer(), now uses TwoPhaseIterator.
    (David Smiley)
  • LUCENE-6756 : MatchAllDocsQuery now has a dedicated BulkScorer for better performance when used as a top-level query.
    (Adrien Grand)
  • LUCENE-6746 : DisjunctionMaxQuery, BoostingQuery and BoostedQuery now create sub weights through IndexSearcher so that they can be cached.
    (Adrien Grand)
  • LUCENE-6754 : Optimized IndexSearcher.count for the cases when it can use index statistics instead of collecting all matches.
    (Adrien Grand)
  • LUCENE-6773 : Nested conjunctions now iterate over documents as if clauses were all at the same level.
    (Adrien Grand)
  • LUCENE-6777 : Reuse BytesRef when visiting term ranges in GeoPointTermsEnum to reduce GC pressure
    (Nick Knize via Mike McCandless)
  • LUCENE-6779 : Reduce memory allocated by CompressingStoredFieldsWriter to write strings larger than 64kb by an amount equal to string's utf8 size.
    (Dawid Weiss, Robert Muir, shalin)
  • LUCENE-6850 : Optimize BooleanScorer for sparse clauses.
    (Adrien Grand)
  • LUCENE-6840 : Ordinal indexes for SORTED_SET/SORTED_NUMERIC fields and addresses for BINARY fields are now stored on disk instead of in memory.
    (Adrien Grand)
  • LUCENE-6878 : Speed up TopDocs.merge.
    (Daniel Jelinski via Adrien Grand)
  • LUCENE-6885 : StandardDirectoryReader (initialCapacity) tweaks
    (Christine Poerschke)
  • LUCENE-6863 : Optimized storage requirements of doc values fields when less than 1% of documents have a value.
    (Adrien Grand)
  • LUCENE-6892 : various lucene.index initialCapacity tweaks
    (Christine Poerschke)
  • LUCENE-6276 : Added TwoPhaseIterator.matchCost() which allows to confirm the least costly TwoPhaseIterators first.
    (Paul Elschot via Adrien Grand)
  • LUCENE-6898 : In the default codec, the last stored field value will not be fully read from disk if the supplied StoredFieldVisitor doesn't want it. So put your largest text field value last to benefit.
    (David Smiley)
  • LUCENE-6909 : Remove unnecessary synchronized from FacetsConfig.getDimConfig for better concurrency
    (Sanne Grinovero via Mike McCandless)
  • SOLR-7730 : Speed up SlowCompositeReaderWrapper.getSortedSetDocValues() by avoiding merging FieldInfos just to check doc value type.
    (Paul Vasilyev, Yuriy Pakhomov, Mikhail Khludnev, yonik)
  • Bug Fixes (19)
  • LUCENE-6905 : Unwrap center longitude for dateline crossing GeoPointDistanceQuery.
    (Nick Knize)
  • LUCENE-6817 : ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in toString().
    (Ahmet Arslan via Dawid Weiss)
  • LUCENE-6730 : Hyper-parameter c is ignored in term frequency NormalizationH1.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-6742 : Lovins & Finnish implementation of SnowballFilter was fixed to behave exactly as specified. A bug in the snowball compiler caused differences in output of the filter in comparison to the original test data. In addition, the performance of those filters was improved significantly.
    (Uwe Schindler, Robert Muir)
  • LUCENE-6783 : Removed side effects from FuzzyLikeThisQuery.rewrite.
    (Adrien Grand)
  • LUCENE-6776 : Fix geo3d math to handle randomly squashed planet models
    (Karl Wright via Mike McCandless)
  • LUCENE-6792 : Fix TermsQuery.toString() to work with binary terms.
    (Ruslan Muzhikov, Robert Muir)
  • LUCENE-5503 : When Highlighter's WeightedSpanTermExtractor converts a PhraseQuery to an equivalent SpanQuery, it would sometimes use a slop that is too low (no highlight) or determine inOrder wrong.
    (Tim Allison via David Smiley)
  • LUCENE-6790 : Fix IndexWriter thread safety when one thread is handling a tragic exception but another is still committing
    (Mike McCandless)
  • LUCENE-6810 : Upgrade to Spatial4j 0.5 -- fixes some edge-case bugs in the spatial module. See https://github.com/locationtech/spatial4j/blob/master/CHANGES.md
    (David Smiley)
  • LUCENE-6813 : OfflineSorter no longer removes its output Path up front, and instead opens it for write with the StandardCopyOption.REPLACE_EXISTING to overwrite any prior file, so that callers can safely use Files.createTempFile for the output. This change also fixes OfflineSorter's default temp directory when running tests to use mock filesystems so e.g. we detect file handle leaks
    (Dawid Weiss, Robert Muir, Mike McCandless)
  • LUCENE-6813 : RangeTreeWriter was failing to close all file handles it opened, leading to intermittent failures on Windows
    (Dawid Weiss, Robert Muir, Mike McCandless)
  • LUCENE-6826 : Fix ClassCastException when merging a field that has no terms because they were filtered out by e.g. a FilterCodecReader
    (Trejkaz via Mike McCandless)
  • LUCENE-6823 : LocalReplicator should use System.nanoTime as its clock source for checking for expiration
    (Ishan Chattopadhyaya via Mike McCandless)
  • LUCENE-6856 : The Weight wrapper used by LRUQueryCache now delegates to the original Weight's BulkScorer when applicable.
    (Adrien Grand)
  • LUCENE-6858 : Fix ContextSuggestField to correctly wrap token stream when using CompletionAnalyzer.
    (Areek Zillur)
  • LUCENE-6872 : IndexWriter handles any VirtualMachineError, not just OOM, as tragic.
    (Robert Muir)
  • LUCENE-6814 : PatternTokenizer no longer hangs onto heap sized to the maximum input string it's ever seen, which can be a large memory "leak" if you tokenize large strings with many threads across many indices
    (Alex Chow via Mike McCandless)
  • LUCENE-6888 : Explain output of map() function now also prints default value
    (janhoy)
  • Other (26)
  • LUCENE-6899 : Upgrade randomizedtesting to 2.3.1.
    (Dawid Weiss)
  • LUCENE-6478 : Test execution can hang with java.security.debug.
    (Dawid Weiss)
  • LUCENE-6862 : Upgrade of RandomizedRunner to version 2.2.0.
    (Dawid Weiss)
  • LUCENE-6857 : Validate StandardQueryParser with NOT operator with-in parantheses.
    (Jigar Shah via Dawid Weiss)
  • LUCENE-6827 : Use explicit capacity ArrayList instead of a LinkedList in MultiFieldQueryNodeProcessor.
    (Dawid Weiss) .
  • LUCENE-6812 : Upgrade RandomizedTesting to 2.1.17.
    (Dawid Weiss)
  • LUCENE-6174 : Improve "ant eclipse" to select right JRE for building.
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-6417 , LUCENE-6830 : Upgrade ANTLR used in expressions module to version 4.5.1-1.
    (Jack Conradson, Uwe Schindler)
  • LUCENE-6729 : Upgrade ASM used in expressions module to version 5.0.4.
    (Uwe Schindler)
  • LUCENE-6738 : remove IndexWriterConfig.[gs]etIndexingChain
    (Christine Poerschke)
  • LUCENE-6755 : more tests of ToChildBlockJoinScorer.advance
    (hossman)
  • LUCENE-6571 : fix some private access level javadoc errors and warnings
    (Cao Manh Dat, Christine Poerschke)
  • LUCENE-6768 : AbstractFirstPassGroupingCollector.groupSort private member is not needed.
    (Christine Poerschke)
  • LUCENE-6761 : MatchAllDocsQuery's Scorers do not expose approximations anymore.
    (Adrien Grand)
  • LUCENE-6775 , LUCENE-6833 : Improved MorfologikFilterFactory to allow loading of custom dictionaries from ResourceLoader. Upgraded Morfologik to version 2.0.1. The 'dictionary' attribute has been reverted back and now points at the dictionary resource to be loaded instead of the default Polish dictionary.
    (Uwe Schindler, Dawid Weiss)
  • LUCENE-6797 : Make GeoCircle an interface and use a factory to create it, to eventually handle degenerate cases
    (Karl Wright via Mike McCandless)
  • LUCENE-6800 : Use XYZSolidFactory to create XYZSolids
    (Karl Wright via Mike McCandless)
  • LUCENE-6798 : Geo3d now models degenerate (too tiny) circles as a single point
    (Karl Wright via Mike McCandless)
  • LUCENE-6770 : Add javadocs that FSDirectory canonicalizes the path.
    (Uwe Schindler, Vladimir Kuzmin)
  • LUCENE-6795 : Fix various places where code used AccessibleObject#setAccessible() without a privileged block. Code without a hard requirement to do reflection were rewritten. This makes Lucene and Solr ready for Java 9 Jigsaw's module system, where reflection on Java's runtime classes is very restricted.
    (Robert Muir, Uwe Schindler)
  • LUCENE-6467 : Simplify Query.equals.
    (Paul Elschot via Adrien Grand)
  • LUCENE-6845 : SpanScorer is now merged into Spans
    (Alan Woodward, David Smiley)
  • LUCENE-6887 : DefaultSimilarity is deprecated, use ClassicSimilarity for equivalent behavior, or consider switching to BM25Similarity which will become the new default in Lucene 6.0
    (hossman)
  • LUCENE-6893 : factor out CorePlusQueriesParser from CorePlusExtensionsParser
    (Christine Poerschke)
  • LUCENE-6902 : Don't retry to fsync files / directories; fail immediately.
    (Daniel Mitterdorfer, Uwe Schindler)
  • LUCENE-6801 : Clarify JavaDocs of PhraseQuery that it in fact supports terms at the same position (as does MultiPhraseQuery), treated like a conjunction. Added test.
    (David Smiley, Adrien Grand)
  • Build (2)
  • LUCENE-6732 : Improve checker for invalid source patterns to also detect javadoc-style license headers. Use Groovy to implement the checks instead of plain Ant.
    (Uwe Schindler)
  • LUCENE-6594 : Update forbiddenapis to 2.0.
    (Uwe Schindler)
  • Tests (1)
  • LUCENE-6752 : Add Math#random() to forbiddenapis.
    (Uwe Schindler, Mikhail Khludnev, Andrei Beliakov)
  • Changes in Backwards Compatibility Policy (1)
  • LUCENE-6742 : The Lovins & Finnish implementation of SnowballFilter were fixed to now behave exactly like the original Snowball stemmer. If you have indexed text using those stemmers you may need to reindex.
    (Uwe Schindler, Robert Muir)
  • Changes in Runtime Behavior (3)
  • LUCENE-6772 : MultiCollector now catches CollectionTerminatedException and removes the collector that threw this exception from the list of sub collectors to collect.
    (Adrien Grand)
  • LUCENE-6784 : IndexSearcher's query caching is enabled by default. Run indexSearcher.setQueryCache(null) to disable.
    (Adrien Grand)
  • LUCENE-6305 : BooleanQuery.equals and hashcode do not depend on the order of clauses anymore.
    (Adrien Grand)
  • Release 5.3.2 [2016-01-23]

  • Bug Fixes (1)
  • SOLR-7865 : BlendedInfixSuggester was returning too many results
    (Arcadius Ahouansou via Mike McCandless)
  • Release 5.3.1 [2015-09-24]

  • Bug Fixes (3)
  • LUCENE-6774 : Remove classloader hack in MorfologikFilter.
    (Robert Muir, Uwe Schindler)
  • LUCENE-6748 : UsageTrackingQueryCachingPolicy no longer caches trivial queries like MatchAllDocsQuery.
    (Adrien Grand)
  • LUCENE-6781 : Fixed BoostingQuery to rewrite wrapped queries.
    (Adrien Grand)
  • Tests (1)
  • LUCENE-6760 , SOLR-7958 : Move TestUtil#randomWhitespace to the only Solr test that is using it. The method is not useful for Lucene tests (and easily breaks, e.g., in Java 9 caused by Unicode version updates).
    (Uwe Schindler)
  • Release 5.3.0 [2015-08-21]

  • New Features (31)
  • LUCENE-6485 : Add CustomSeparatorBreakIterator to postings highlighter which splits on any character. For example, it can be used with getMultiValueSeparator render whole field values.
    (Luca Cavanna via Robert Muir)
  • LUCENE-6459 : Add common suggest API that mirrors Lucene's Query/IndexSearcher APIs for Document based suggester. Adds PrefixCompletionQuery, RegexCompletionQuery, FuzzyCompletionQuery and ContextQuery.
    (Areek Zillur via Mike McCandless)
  • LUCENE-6487 : Spatial Geo3D API now has a WGS84 ellipsoid world model option.
    (Karl Wright via David Smiley)
  • LUCENE-6477 : Add experimental BKD geospatial tree doc values format and queries, for fast "bbox/polygon contains lat/lon points"
    (Mike McCandless)
  • LUCENE-6526 : Asserting(Query|Weight|Scorer) now ensure scores are not computed if they are not needed.
    (Adrien Grand)
  • LUCENE-6481 : Add GeoPointField, GeoPointInBBoxQuery, GeoPointInPolygonQuery for simple "indexed lat/lon point in bbox/shape" searching.
    (Nick Knize via Mike McCandless)
  • LUCENE-5954 : The segments_N commit point now stores the Lucene version that wrote the commit as well as the lucene version that wrote the oldest segment in the index, for faster checking of "too old" indices
    (Ryan Ernst, Robert Muir, Mike McCandless)
  • LUCENE-6519 : BKDPointInPolygonQuery is much faster by avoiding the per-hit polygon check when a leaf cell is fully contained by the polygon.
    (Nick Knize, Mike McCandless)
  • LUCENE-6549 : Add preload option to MMapDirectory.
    (Robert Muir)
  • LUCENE-6504 : Add Lucene53Codec, with norms implemented directly via the Directory's RandomAccessInput api.
    (Robert Muir)
  • LUCENE-6539 : Add new DocValuesNumbersQuery, to match any document containing one of the specified long values. This change also moves the existing DocValuesTermsQuery and DocValuesRangeQuery to Lucene's sandbox module, since in general these queries are quite slow and are only fast in specific cases.
    (Adrien Grand, Robert Muir, Mike McCandless)
  • LUCENE-6577 : Give earlier and better error message for invalid CRC.
    (Robert Muir)
  • LUCENE-6544 : Geo3D: (1) Regularize path & polygon construction, (2) add PlanetModel.surfaceDistance() (ellipsoidal calculation), (3) cache lat & lon in GeoPoint, (4) add thread-safety where missing -- Geo3dShape.
    (Karl Wright, David Smiley)
  • LUCENE-6606 : SegmentInfo.toString now confesses how the documents were sorted, when SortingMergePolicy was used
    (Christine Poerschke via Mike McCandless)
  • LUCENE-6524 : IndexWriter can now be initialized from an already open near-real-time or non-NRT reader.
    (Boaz Leskes, Robert Muir, Mike McCandless)
  • LUCENE-6578 : Geo3D can now compute the distance from a point to a shape, both inner distance and to an outside edge. Multiple distance algorithms are available.
    (Karl Wright, David Smiley)
  • LUCENE-6632 : Geo3D: Compute circle planes more accurately.
    (Karl Wright via David Smiley)
  • LUCENE-6653 : Added general purpose BytesTermAttribute to basic token attributes package that can be used for TokenStreams that solely produce binary terms.
    (Uwe Schindler)
  • LUCENE-6365 : Add Operations.topoSort, to run topological sort of the states in an Automaton
    (Markus Heiden via Mike McCandless)
  • LUCENE-6365 : Replace Operations.getFiniteStrings with a more scalable iterator API (FiniteStringsIterator)
    (Markus Heiden via Mike McCandless)
  • LUCENE-6589 : Add a new org.apache.lucene.search.join.CheckJoinIndex class that can be used to validate that an index has an appropriate structure to run join queries.
    (Adrien Grand)
  • LUCENE-6659 : Remove IndexWriter's unnecessary hard limit on max concurrency
    (Robert Muir, Mike McCandless)
  • LUCENE-6547 : Add GeoPointDistanceQuery, matching all points within the specified distance from the center point. Fix GeoPointInBBoxQuery to handle dateline crossing.
  • LUCENE-6694 : Add LithuanianAnalyzer and LithuanianStemmer.
    (Dainius Jocas via Robert Muir)
  • LUCENE-6695 : Added a new BlendedTermQuery to blend statistics across several terms.
    (Simon Willnauer, Adrien Grand)
  • LUCENE-6706 : Added a new PayloadScoreQuery that generalises the behaviour of PayloadTermQuery and PayloadNearQuery to all Span queries.
    (Alan Woodward)
  • LUCENE-6697 : Add experimental range tree doc values format and queries, based on a 1D version of the spatial BKD tree, for a faster and smaller alternative to postings-based numeric and binary term filtering. Range trees can also handle values larger than 64 bits.
    (Adrien Grand, Mike McCandless)
  • LUCENE-6647 : Add GeoHash string utility APIs
    (Nick Knize via Mike McCandless) .
  • LUCENE-6710 : GeoPointField now uses full 64 bits (up from 62) to encode lat/lon
    (Nick Knize via Mike McCandless) .
  • LUCENE-6580 : SpanNearQuery now allows defined-width gaps in its subqueries
    (Alan Woodward, Adrien Grand) .
  • LUCENE-6712 : Use doc values to post-filter GeoPointField hits that fall in boundary cells, resulting in smaller index, faster searches and less heap used for each query
    (Nick Knize via Mike McCandless) .
  • LUCENE-6508 : Simplify Lock api, there is now just Directory.obtainLock() which returns a Lock that can be released (or fails with exception). Add lock verification to IndexWriter. Improve exception messages when locking fails.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-6371 , LUCENE-6490 : Payload collection from Spans is moved to a more generic SpanCollector framework. Spans no longer implements .hasPayload() and .getPayload() methods, and instead exposes a collect() method that allows the collection of arbitrary postings information. SpanPayloadCheckQuery and SpanPayloadNearCheckQuery have moved from the .spans package to the .payloads package.
    (Alan Woodward, David Smiley, Paul Elschot, Robert Muir)
  • LUCENE-6529 : Removed an optimization in UninvertingReader that was causing incorrect results for Numeric fields using precisionStep
    (hossman, Robert Muir)
  • LUCENE-6551 : Add missing ConcurrentMergeScheduler.getAutoIOThrottle getter
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6552 : Add MergePolicy.OneMerge.getMergeInfo and rename setInfo to setMergeInfo
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6525 : Deprecate IndexWriterConfig's writeLockTimeout.
    (Robert Muir)
  • LUCENE-6583 : FilteredQuery is deprecated and will be removed in 6.0. It should be replaced with a BooleanQuery which handle the query as a MUST clause and the filter as a FILTER clause.
    (Adrien Grand)
  • LUCENE-6553 : The postings, spans and scorer APIs no longer take an acceptDocs parameter. Live docs are now always checked on top of these APIs.
    (Adrien Grand)
  • LUCENE-6634 : PKIndexSplitter now takes a Query instead of a Filter to decide how to split an index.
    (Adrien Grand)
  • LUCENE-6643 : GroupingSearch from lucene/grouping was changed to take a Query object to define groups instead of a Filter.
    (Adrien Grand)
  • LUCENE-6554 : ToParentBlockJoinFieldComparator was removed because of a bug with missing values that could not be fixed. ToParentBlockJoinSortField now works with string or numeric doc values selectors. Sorting on anything else than a string or numeric field would require to implement a custom selector.
    (Adrien Grand)
  • LUCENE-6648 : All lucene/facet APIs now take Query objects where they used to take Filter objects.
    (Adrien Grand)
  • LUCENE-6640 : Suggesters now take a BitsProducer object instead of a Filter object to reduce the scope of doc IDs that may be returned, emphasizing the fact that these objects need to support random-access.
    (Adrien Grand)
  • LUCENE-6646 : Make EarlyTerminatingCollector take a Sort object directly instead of a SortingMergePolicy.
    (Christine Poerschke via Adrien Grand)
  • LUCENE-6649 : BitDocIdSetFilter and BitDocIdSetCachingWrapperFilter are now deprecated in favour of BitSetProducer and QueryBitSetProducer, which do not extend oal.search.Filter.
    (Adrien Grand)
  • LUCENE-6607 : Factor out geo3d into its own spatial3d module.
    (Karl Wright, Nick Knize, David Smiley, Mike McCandless)
  • LUCENE-6531 : PhraseQuery is now immutable and can be built using the PhraseQuery.Builder class.
    (Adrien Grand)
  • LUCENE-6570 : BooleanQuery is now immutable and can be built using the BooleanQuery.Builder class.
    (Adrien Grand)
  • LUCENE-6702 : NRTSuggester: Add a method to inject context values at index time in ContextSuggestField. Simplify ContextQuery logic for extracting contexts and add dedicated method to consider all context values at query time.
    (Areek Zillur, Mike McCandless)
  • LUCENE-6719 : NumericUtils getMinInt, getMaxInt, getMinLong, getMaxLong now return null if there are no terms for the specified field, previously these methods returned primitive values and raised an undocumented NullPointerException if there were no terms for the field.
    (hossman, Timothy Potter)
  • Bug fixes (27)
  • LUCENE-6500 : ParallelCompositeReader did not always call closed listeners. This was fixed by LUCENE-6501 .
    (Adrien Grand, Uwe Schindler)
  • LUCENE-6520 : Geo3D GeoPath.done() would throw an NPE if adjacent path segments were co-linear.
    (Karl Wright via David Smiley)
  • LUCENE-5805 : QueryNodeImpl.removeFromParent was doing nothing in a costly manner
    (Christoph Kaser, Cao Manh Dat via Mike McCAndless)
  • LUCENE-6533 : SlowCompositeReaderWrapper no longer caches its live docs instance since this can prevent future improvements like a disk-backed live docs
    (Adrien Grand, Mike McCandless)
  • LUCENE-6558 : Highlighters now work with CustomScoreQuery
    (Cao Manh Dat via Mike McCandless)
  • LUCENE-6560 : BKDPointInBBoxQuery now handles "dateline crossing" correctly
    (Nick Knize, Mike McCandless)
  • LUCENE-6564 : Change PrintStreamInfoStream to use thread safe Java 8 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString as workaround); fix output of tests to use same format.
    (Uwe Schindler, Ramkumar Aiyengar)
  • LUCENE-6593 : Fixed ToChildBlockJoinQuery's scorer to not refuse to advance to a document that belongs to the parent space.
    (Adrien Grand)
  • LUCENE-6591 : Never write a negative vLong
    (Robert Muir, Ryan Ernst, Adrien Grand, Mike McCandless)
  • LUCENE-6588 : Fix how ToChildBlockJoinQuery deals with acceptDocs.
    (Christoph Kaser via Adrien Grand)
  • LUCENE-6597 : Geo3D's GeoCircle now supports a world-globe diameter.
    (Karl Wright via David Smiley)
  • LUCENE-6608 : Fix potential resource leak in BigramDictionary.
    (Rishabh Patel via Uwe Schindler)
  • LUCENE-6614 : Improve partition detection in IOUtils#spins() so it works with NVMe drives.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-6586 : Fix typo in GermanStemmer, causing possible wrong value for substCount.
    (Christoph Kaser via Mike McCandless)
  • LUCENE-6658 : Fix IndexUpgrader to also upgrade indexes without any segments.
    (Trejkaz, Uwe Schindler)
  • LUCENE-6677 : QueryParserBase fails to enforce maxDeterminizedStates when creating a WildcardQuery
    (David Causse via Mike McCandless)
  • LUCENE-6680 : Preserve two suggestions that have same key and weight but different payloads
    (Arcadius Ahouansou via Mike McCandless)
  • LUCENE-6681 : SortingMergePolicy must override MergePolicy.size(...).
    (Christine Poerschke via Adrien Grand)
  • LUCENE-6682 : StandardTokenizer performance bug: scanner buffer is unnecessarily copied when maxTokenLength doesn't change. Also stop silently maxing out buffer size (and effectively also max token length) at 1M chars, but instead throw an exception from setMaxTokenLength() when the given length is greater than 1M chars.
    (Piotr Idzikowski, Steve Rowe)
  • LUCENE-6696 : Fix FilterDirectoryReader.close() to never close the underlying reader several times.
    (Adrien Grand)
  • LUCENE-6334 : FastVectorHighlighter failed to highlight phrases across more than one value in a multi-valued field.
    (Chris Earle, Nik Everett via Mike McCandless)
  • LUCENE-6704 : GeoPointDistanceQuery was visiting too many term ranges, consuming too much heap for a large radius
    (Nick Knize via Mike McCandless)
  • SOLR-5882 : fix ScoreMode.Min at ToParentBlockJoinQuery
    (Mikhail Khludnev)
  • LUCENE-6718 : JoinUtil.createJoinQuery failed to rewrite queries before creating a Weight.
    (Adrien Grand)
  • LUCENE-6713 : TooComplexToDeterminizeException claims to be serializable but wasn't
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6723 : Fix date parsing problems in Java 9 with date formats using English weekday/month names.
    (Uwe Schindler)
  • LUCENE-6618 : Properly set MMapDirectory.UNMAP_SUPPORTED when it is now allowed by security policy.
    (Robert Muir)
  • Changes in Runtime Behavior (12)
  • LUCENE-6501 : The subreader structure in ParallelCompositeReader was flattened, because the current implementation had too many hidden bugs regarding refounting and close listeners. If you create a new ParallelCompositeReader, it will just take all leaves of the passed readers and form a flat structure of ParallelLeafReaders instead of trying to assemble the original structure of composite and leaf readers.
    (Adrien Grand, Uwe Schindler)
  • LUCENE-6537 : NearSpansOrdered no longer tries to minimize its Span matches. This means that the matching algorithm is entirely lazy. All spans returned by the previous implementation are still reported, but matching documents may now also return additional spans that were previously discarded in preference to shorter overlapping ones.
    (Alan Woodward, Adrien Grand, Paul Elschot)
  • LUCENE-6538 : Also include java.vm.version and java.runtime.version in per-segment diagnostics
    (Robert Muir, Mike McCandless)
  • LUCENE-6569 : Optimize MultiFunction.anyExists and allExists to eliminate excessive array creation in common 2 argument usage
    (Jacob Graves, hossman)
  • LUCENE-2880 : Span queries now score more consistently with regular queries.
    (Robert Muir, Adrien Grand)
  • LUCENE-6601 : FilteredQuery now always rewrites to a BooleanQuery which handles the query as a MUST clause and the filter as a FILTER clause. LEAP_FROG_QUERY_FIRST_STRATEGY and LEAP_FROG_FILTER_FIRST_STRATEGY do not guarantee anymore which iterator will be advanced first, it will depend on the respective costs of the iterators. QUERY_FIRST_FILTER_STRATEGY and RANDOM_ACCESS_FILTER_STRATEGY still consume the filter using its random-access API, however the returned bits may be called on different documents compared to before.
    (Adrien Grand)
  • LUCENE-6542 : FSDirectory's ctor now works with security policies or file systems that restrict write access.
    (Trejkaz, hossman, Uwe Schindler)
  • LUCENE-6651 : The default implementation of AttributeImpl#reflectWith(AttributeReflector) now uses AccessControler#doPrivileged() to do the reflection. Please consider implementing this method in all your custom attributes, because the method will be made abstract in Lucene 6.
    (Uwe Schindler)
  • LUCENE-6639 : LRUQueryCache and CachingWrapperQuery now consider a query as "used" when the first Scorer is pulled instead of when a Scorer is pulled on the first segment on an index.
    (Terry Smith, Adrien Grand)
  • LUCENE-6579 : IndexWriter now sacrifices (closes) itself to protect the index when an unexpected, tragic exception strikes while merging.
    (Robert Muir, Mike McCandless)
  • LUCENE-6691 : SortingMergePolicy.isSorted now considers FilterLeafReader instances. EarlyTerminatingSortingCollector.terminatedEarly accessor added. TestEarlyTerminatingSortingCollector.testTerminatedEarly test added.
    (Christine Poerschke)
  • LUCENE-6609 : Add getSortField impls to many subclasses of FieldCacheSource which return the most direct SortField implementation. In many trivial sort by ValueSource usages, this will result in less RAM, and more precise sorting of extreme values due to no longer converting to double.
    (hossman)
  • Optimizations (9)
  • LUCENE-6548 : Some optimizations for BlockTree's intersect with very finite automata
    (Mike McCandless)
  • LUCENE-6585 : Flatten conjunctions and conjunction approximations into parent conjunctions. For example a sloppy phrase query of "foo bar"~5 with a filter of "baz" will internally leapfrog foo,bar,baz as one conjunction.
    (Ryan Ernst, Robert Muir, Adrien Grand)
  • LUCENE-6325 : Reduce RAM usage of FieldInfos, and speed up lookup by number, by using an array instead of TreeMap except in very sparse cases
    (Robert Muir, Mike McCandless)
  • LUCENE-6617 : Reduce heap usage for small FSTs
    (Mike McCandless)
  • LUCENE-6616 : IndexWriter now lists the files in the index directory only once on init, and IndexFileDeleter no longer suppresses FileNotFoundException and NoSuchFileException. This also improves IndexFileDeleter to delete segments_N files last, so that in the presence of a virus checker, the index is never left in a state where an expired segments_N references non-existing files
    (Robert Muir, Mike McCandless)
  • LUCENE-6645 : Optimized the way we merge postings lists in multi-term queries and TermsQuery. This should especially help when there are lots of small postings lists.
    (Adrien Grand, Mike McCandless)
  • LUCENE-6668 : Optimized storage for sorted set and sorted numeric doc values in the case that there are few unique sets of values.
    (Adrien Grand, Robert Muir)
  • LUCENE-6690 : Sped up MultiTermsEnum.next() on high-cardinality fields.
    (Adrien Grand)
  • LUCENE-6621 : Removed two unused variables in analysis/stempel/src/java/org/ egothor/stemmer/Compile.java
    (Rishabh Patel via Christine Poerschke)
  • Build (6)
  • LUCENE-6518 : Don't report false thread leaks from IBM J9 ClassCache Reaper in test framework.
    (Dawid Weiss)
  • LUCENE-6567 : Simplify payload checking in SpanPayloadCheckQuery
    (Alan Woodward)
  • LUCENE-6568 : Make rat invocation depend on ivy configuration being set up
    (Ramkumar Aiyengar)
  • LUCENE-6683 : ivy-fail goal directs people to non-existent page
    (Mike Drob via Steve Rowe)
  • LUCENE-6693 : Updated Groovy to 2.4.4, Pegdown to 1.5, Svnkit to 1.8.10. Also fixed some PermGen errors while running full build caused by these updates: Tasks are now installed from root's build.xml.
    (Uwe Schindler)
  • LUCENE-6741 : Fix jflex files to regenerate the java files correctly.
    (Uwe Schindler)
  • Test Framework (4)
  • LUCENE-6637 : Fix FSTTester to not violate file permissions on -Dtests.verbose=true.
    (Mesbah M. Alam, Uwe Schindler)
  • LUCENE-6542 : LuceneTestCase now has runWithRestrictedPermissions() to run an action with reduced permissions. This can be used to simulate special environments (e.g., read-only dirs). If tests are running without a security manager, an assume cancels test execution automatically.
    (Uwe Schindler)
  • LUCENE-6652 : Removed lots of useless Byte(s)TermAttributes all over test infrastructure.
    (Uwe Schindler)
  • LUCENE-6563 : Improve MockFileSystemTestCase.testURI to check if a path can be encoded according to local filesystem requirements. Otherwise stop test execution.
    (Christine Poerschke via Uwe Schindler)
  • Changes in Backwards Compatibility Policy (4)
  • LUCENE-6553 : The iterator returned by the LeafReader.postings method now always includes deleted docs, so you have to check for deleted documents on top of the iterator.
    (Adrien Grand)
  • LUCENE-6633 : DuplicateFilter has been deprecated and will be removed in 6.0. DiversifiedTopDocsCollector can be used instead with a maximum number of hits per key equal to 1.
    (Adrien Grand)
  • LUCENE-6653 : The workflow for consuming the TermToBytesRefAttribute was changed: getBytesRef() now does all work and is called on each token, fillBytesRef() was removed. The implementation is free to reuse the internal BytesRef or return a new one on each call.
    (Uwe Schindler)
  • LUCENE-6682 : StandardTokenizer.setMaxTokenLength() now throws an exception if a length greater than 1M chars is given. Previously the effective max token length (the scanner's buffer) was capped at 1M chars, but getMaxTokenLength() incorrectly returned the previously requested length, even when it exceeded 1M.
    (Piotr Idzikowski, Steve Rowe)
  • Release 5.2.1 [2015-06-15]

  • Bug Fixes (4)
  • LUCENE-6482 : Fix class loading deadlock relating to Codec initialization, default codec and SPI discovery.
    (Shikhar Bhushan, Uwe Schindler)
  • LUCENE-6523 : NRT readers now reflect a new commit even if there is no change to the commit user data
    (Mike McCandless)
  • LUCENE-6527 : Queries now get a dummy Similarity when scores are not needed in order to not load unnecessary information like norms.
    (Adrien Grand)
  • LUCENE-6559 : TimeLimitingCollector now also checks for timeout when a new leaf reader is pulled ie. if we move from one segment to another even without collecting a hit.
    (Simon Willnauer)
  • Release 5.2.0 [2015-06-07]

  • New Features (16)
  • LUCENE-6308 , LUCENE-6385 , LUCENE-6391 : Span queries now share document conjunction/intersection code with boolean queries, and use two-phased iterators for faster intersection by avoiding loading positions in certain cases.
    (Paul Elschot, Terry Smith, Robert Muir via Mike McCandless)
  • LUCENE-6393 : Add two-phase support to SpanPositionCheckQuery and its subclasses: SpanPositionRangeQuery, SpanPayloadCheckQuery, SpanNearPayloadCheckQuery, SpanFirstQuery.
    (Paul Elschot, Robert Muir)
  • LUCENE-6394 : Add two-phase support to SpanNotQuery and refactor FilterSpans to just have an accept(Spans candidate) method for subclasses.
    (Robert Muir)
  • LUCENE-6373 : SpanOrQuery shares disjunction logic with boolean queries, and supports two-phased iterators to avoid loading positions when possible.
    (Paul Elschot via Robert Muir)
  • LUCENE-6352 , LUCENE-6472 : Added a new query time join to the join module that uses global ordinals, which is faster for subsequent joins between reopens.
    (Martijn van Groningen, Adrien Grand)
  • LUCENE-5879 : Added experimental auto-prefix terms to BlockTree terms dictionary, exposed as AutoPrefixPostingsFormat
    (Adrien Grand, Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-5579 : New CompositeSpatialStrategy combines speed of RPT with accuracy of SDV. Includes optimized Intersect predicate to avoid many geometry checks. Uses TwoPhaseIterator.
    (David Smiley)
  • LUCENE-5989 : Allow passing BytesRef to StringField to make it easier to index arbitrary binary tokens, and change the experimental StoredFieldVisitor.stringField API to take UTF-8 byte[] instead of String
    (Mike McCandless)
  • LUCENE-6389 : Added ScoreMode.Min that aggregates the lowest child score to the parent hit.
    (Martijn van Groningen, Adrien Grand)
  • LUCENE-6423 : New LimitTokenOffsetFilter that limits tokens to those before a configured maximum start offset.
    (David Smiley)
  • LUCENE-6422 : New spatial PackedQuadPrefixTree, a generally more efficient choice than QuadPrefixTree, especially for high precision shapes. When used, you should typically disable RPT's pruneLeafyBranches option.
    (Nick Knize, David Smiley)
  • LUCENE-6451 : Expressions now support bindings keys that look like zero arg functions
    (Jack Conradson via Ryan Ernst)
  • LUCENE-6083 : Add SpanWithinQuery and SpanContainingQuery that return spans inside of / containing another spans.
    (Paul Elschot via Robert Muir)
  • LUCENE-6454 : Added distinction between member variable and method in expression helper VariableContext
    (Jack Conradson via Ryan Ernst)
  • LUCENE-6196 : New Spatial "Geo3d" API with partial Spatial4j integration. It is a set of shapes implemented using 3D planar geometry for calculating spatial relations on the surface of a sphere. Shapes include Point, BBox, Circle, Path (buffered line string), and Polygon.
    (Karl Wright via David Smiley)
  • LUCENE-6464 : Add a new expert lookup method to AnalyzingInfixSuggester to accept an arbitrary BooleanQuery to express how contexts should be filtered.
    (Arcadius Ahouansou via Mike McCandless)
  • Optimizations (10)
  • LUCENE-6379 : IndexWriter.deleteDocuments(Query...) now detects if one of the queries is MatchAllDocsQuery and just invokes the much faster IndexWriter.deleteAll in that case
    (Robert Muir, Adrien Grand, Mike McCandless)
  • LUCENE-6388 : Optimize SpanNearQuery when payloads are not present.
    (Robert Muir)
  • LUCENE-6421 : Defer reading of positions in MultiPhraseQuery until they are needed.
    (Robert Muir)
  • LUCENE-6392 : Highligher- reduce memory of tokens in TokenStreamFromTermVector, and add maxStartOffset limit.
    (David Smiley)
  • LUCENE-6456 : Queries that generate doc id sets that are too large for the query cache are not cached instead of evicting everything.
    (Adrien Grand)
  • LUCENE-6455 : Require a minimum index size to enable query caching in order not to cache eg. on MemoryIndex.
    (Adrien Grand)
  • LUCENE-6330 : BooleanScorer (used for top-level disjunctions) does not decode norms when not necessary anymore.
    (Adrien Grand)
  • LUCENE-6350 : TermsQuery is now compressed with PrefixCodedTerms.
    (Robert Muir, Mike McCandless, Adrien Grand)
  • LUCENE-6458 : Multi-term queries matching few terms per segment now execute like a disjunction.
    (Adrien Grand)
  • LUCENE-6360 : TermsQuery rewrites to a disjunction when there are 16 matching terms or less.
    (Adrien Grand)
  • Bug Fixes (16)
  • LUCENE-329 : Fix FuzzyQuery defaults to rank exact matches highest.
    (Mark Harwood, Adrien Grand)
  • LUCENE-6378 : Fix all RuntimeExceptions to throw the underlying root cause.
    (Varun Thacker, Adrien Grand, Mike McCandless)
  • LUCENE-6415 : TermsQuery.extractTerms is a no-op (used to throw an UnsupportedOperationException).
    (Adrien Grand)
  • LUCENE-6416 : BooleanQuery.extractTerms now only extracts terms from scoring clauses.
    (Adrien Grand)
  • LUCENE-6409 : Fixed integer overflow in LongBitSet.ensureCapacity.
    (Luc Vanlerberghe via Adrien Grand)
  • LUCENE-6424 , LUCENE-6430 : Fix many bugs with mockfs filesystems in the test-framework: always consistently wrap Path, fix buggy behavior for globs, implement equals/hashcode for filtered Paths, etc.
    (Ryan Ernst, Simon Willnauer, Robert Muir)
  • LUCENE-6426 : Fix FieldType's copy constructor to also copy over the numeric precision step.
    (Adrien Grand)
  • LUCENE-6345 : Null check terms/fields in Lucene queries Hinman via Mike McCandless)
  • LUCENE-6400 : SolrSynonymParser should preserve original token instead of replacing it with a synonym, when expand=true and there is no explicit mapping
    (Ian Ribas, Robert Muir, Mike McCandless)
  • LUCENE-6449 : Don't throw NullPointerException if some segments are missing the field being highlighted, in PostingsHighlighter
    (Roman Khmelichek via Mike McCandless)
  • LUCENE-6427 : Added assertion about the presence of ghost bits in (Fixed|Long)BitSet.
    (Luc Vanlerberghe via Adrien Grand)
  • LUCENE-6468 : Fixed NPE with empty Kuromoji user dictionary.
    (Jun Ohtani via Christian Moen)
  • LUCENE-6483 : Ensure core closed listeners are called on the same cache key as the reader which has been used to register the listener.
    (Adrien Grand)
  • LUCENE-6486 DocumentDictionary iterator no longer skips documents with no payloads and now returns an empty BytesRef instead
    (Marius Grama via Michael McCandless)
  • LUCENE-6505 : NRT readers now reflect segments_N filename and commit user data from previous commits
    (Mike McCandless)
  • LUCENE-6507 : Don't let NativeFSLock.close() release other locks
    (Simon Willnauer, Robert Muir, Uwe Schindler, Mike McCandless)
  • API Changes (8)
  • LUCENE-6377 : SearcherFactory#newSearcher now accepts the previous reader to simplify warming logic during opening new searchers.
    (Simon Willnauer)
  • LUCENE-6410 : Removed unused "reuse" parameter to Terms.iterator.
    (Robert Muir, Mike McCandless)
  • LUCENE-6425 : Replaced Query.extractTerms with Weight.extractTerms.
    (Adrien Grand)
  • LUCENE-6446 : Simplified Explanation API.
    (Adrien Grand)
  • LUCENE-6445 : Two new methods in Highlighter's TokenSources; the existing methods are now marked deprecated.
    (David Smiley)
  • LUCENE-6484 : Removed EliasFanoDocIdSet, which was unused.
    (Paul Elschot via Adrien Grand)
  • LUCENE-6466 : Moved SpanQuery.getSpans() and .extractTerms() to SpanWeight
    (Alan Woodward, Robert Muir)
  • LUCENE-6497 : Allow subclasses of FieldType to check frozen state
    (Ryan Ernst)
  • Other (6)
  • LUCENE-6413 : Test runner should report the number of suites completed/ remaining.
    (Dawid Weiss)
  • LUCENE-5439 : Add 'ant jacoco' build target.
    (Robert Muir)
  • LUCENE-6315 : Simplify the private iterator Lucene uses internally when resolving deleted terms to matched docids.
    (Robert Muir, Adrien Grand, Mike McCandless)
  • LUCENE-6399 : Benchmark module's QueryMaker.resetInputs should call setConfig so queries can react to property changes in new rounds.
    (David Smiley)
  • LUCENE-6382 : Lucene now enforces that positions never exceed the maximum value IndexWriter.MAX_POSITION.
    (Robert Muir, Mike McCandless)
  • LUCENE-6372 : Simplified and improved equals/hashcode of span queries.
    (Paul Elschot via Adrien Grand)
  • Build (1)
  • LUCENE-6420 : Update forbiddenapis to v1.8
    (Uwe Schindler)
  • Test Framework (2)
  • LUCENE-6419 : Added two-phase iteration assertions to AssertingQuery.
    (Adrien Grand)
  • LUCENE-6437 : Randomly set CPU core count and spins, derived from test's master seed, used by ConcurrentMergeScheduler to set dynamic defaults, for better test randomization and to help tests reproduce
    (Robert Muir, Mike McCandless)
  • Release 5.1.0 [2015-04-14]

  • New Features (9)
  • LUCENE-6066 : Added DiversifiedTopDocsCollector to misc for collecting no more than a given number of results under a choice of key. Introduces new remove method to core's PriorityQueue.
    (Mark Harwood)
  • LUCENE-6191 : New spatial 2D heatmap faceting for PrefixTreeStrategy.
    (David Smiley)
  • LUCENE-6227 : Added BooleanClause.Occur.FILTER to filter documents without participating in scoring (on the contrary to MUST).
    (Adrien Grand)
  • LUCENE-6294 : Added oal.search.CollectorManager to allow for parallelization of the document collection process on IndexSearcher.
    (Adrien Grand)
  • LUCENE-6303 : Added filter caching baked into IndexSearcher, disabled by default.
    (Adrien Grand)
  • LUCENE-6304 : Added a new MatchNoDocsQuery that matches no documents.
    (Lee Hinman via Adrien Grand)
  • LUCENE-6341 : Add a -fast option to CheckIndex.
    (Robert Muir)
  • LUCENE-6355 : IndexWriter's infoStream now also logs time to write FieldInfos during merge
    (Lee Hinman via Mike McCandless)
  • LUCENE-6339 : Added Near-real time Document Suggester via custom postings format
    (Areek Zillur, Mike McCandless, Simon Willnauer)
  • Bug Fixes (11)
  • LUCENE-6368 : FST.save can truncate output (BufferedOutputStream may be closed after the underlying stream).
    (Ippei Matsushima via Dawid Weiss)
  • LUCENE-6249 : StandardQueryParser doesn't support pure negative clauses.
    (Dawid Weiss)
  • LUCENE-6190 : Spatial pointsOnly flag on PrefixTreeStrategy shouldn't switch all predicates to Intersects.
    (David Smiley)
  • LUCENE-6242 : Ram usage estimation was incorrect for SparseFixedBitSet when object alignment was different from 8.
    (Uwe Schindler, Adrien Grand)
  • LUCENE-6293 : Fixed TimSorter bug.
    (Adrien Grand)
  • LUCENE-6001 : DrillSideways hits NullPointerException for certain BooleanQuery searches.
    (Dragan Jotannovic, jane chang via Mike McCandless)
  • LUCENE-6311 : Fix NIOFSDirectory and SimpleFSDirectory so that the toString method of IndexInputs confess when they are from a compound file.
    (Robert Muir, Mike McCandless)
  • LUCENE-6381 : Add defensive wait time limit in DocumentsWriterStallControl to prevent hangs during indexing if we miss a .notify/All somewhere
    (Mike McCandless)
  • LUCENE-6386 : Correct IndexWriter.forceMerge documentation to state that up to 3X (X = current index size) spare disk space may be needed to complete forceMerge(1).
    (Robert Muir, Shai Erera, Mike McCandless)
  • LUCENE-6395 : Seeking by term ordinal was failing to set the term's bytes in MemoryIndex
    (Mike McCandless)
  • LUCENE-6429 : Removed the TermQuery(Term,int) constructor which could lead to inconsistent term statistics.
    (Adrien Grand, Robert Muir)
  • Optimizations (16)
  • LUCENE-6183 , LUCENE-5647 : Avoid recompressing stored fields and term vectors when merging segments without deletions. Lucene50Codec's BEST_COMPRESSION mode uses a higher deflate level for more compact storage.
    (Robert Muir)
  • LUCENE-6184 : Make BooleanScorer only score windows that contain matches.
    (Adrien Grand)
  • LUCENE-6161 : Speed up resolving of deleted terms to docIDs by doing a combined merge sort between deleted terms and segment terms instead of a separate merge sort for each segment. In delete-heavy use cases this can be a sizable speedup.
    (Mike McCandless)
  • LUCENE-6201 : BooleanScorer can now deal with values of minShouldMatch that are greater than one and is used when queries produce dense result sets.
    (Adrien Grand)
  • LUCENE-6218 : Don't decode frequencies or match all positions when scoring is not needed.
    (Robert Muir)
  • LUCENE-6233 Speed up CheckIndex when the index has term vectors
    (Robert Muir, Mike McCandless)
  • LUCENE-6198 : Added the TwoPhaseIterator API, exposed on scorers which is for now only used on phrase queries and conjunctions in order to check positions lazily if the phrase query is in a conjunction with other queries.
    (Robert Muir, Adrien Grand, David Smiley)
  • LUCENE-6244 , LUCENE-6251 : All boolean queries but those that have a minShouldMatch > 1 now either propagate or take advantage of the two-phase iteration capabilities added in LUCENE-6198 .
    (Adrien Grand, Robert Muir)
  • LUCENE-6241 : FSDirectory.listAll() doesnt filter out subdirectories anymore, for faster performance. Subdirectories don't matter to Lucene. If you need to filter out non-index files with some custom usage, you may want to look at the IndexFileNames class.
    (Robert Muir)
  • LUCENE-6262 : ConstantScoreQuery does not wrap the inner weight anymore when scores are not required.
    (Adrien Grand)
  • LUCENE-6263 : MultiCollector automatically caches scores when several collectors need them.
    (Adrien Grand)
  • LUCENE-6275 : SloppyPhraseScorer now uses the same logic as ConjunctionScorer in order to advance doc IDs, which takes advantage of the cost() API.
    (Adrien Grand)
  • LUCENE-6290 : QueryWrapperFilter propagates approximations and FilteredQuery rewrites to a BooleanQuery when the filter is a QueryWrapperFilter in order to leverage approximations.
    (Adrien Grand)
  • LUCENE-6318 : Reduce RAM usage of FieldInfos when there are many fields.
    (Mike McCandless, Robert Muir)
  • LUCENE-6320 : Speed up CheckIndex.
    (Robert Muir)
  • LUCENE-4942 : Optimized the encoding of PrefixTreeStrategy indexes for non-point data: 33% smaller index, 68% faster indexing, and 44% faster searching. YMMV
    (David Smiley)
  • API Changes (21)
  • LUCENE-6204 , LUCENE-6208 : Simplify CompoundFormat: remove files() and remove files parameter to write().
    (Robert Muir)
  • LUCENE-6217 : Add IndexWriter.isOpen and getTragicException.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6218 , LUCENE-6220 : Add Collector.needsScores() and needsScores parameter to Query.createWeight().
    (Robert Muir, Adrien Grand)
  • LUCENE-4524 , LUCENE-6246 , LUCENE-6256 , LUCENE-6271 : Merge DocsEnum and DocsAndPositionsEnum into a single PostingsEnum iterator. TermsEnum.docs() and TermsEnum.docsAndPositions() are replaced by TermsEnum.postings().
    (Alan Woodward, Simon Willnauer, Robert Muir, Ryan Ernst)
  • LUCENE-6222 : Removed TermFilter, use a QueryWrapperFilter(TermQuery) instead. This will be as efficient now that queries can opt out from scoring.
    (Adrien Grand)
  • LUCENE-6269 : Removed BooleanFilter, use a QueryWrapperFilter(BooleanQuery) instead.
    (Adrien Grand)
  • LUCENE-6270 : Replaced TermsFilter with TermsQuery, use a QueryWrapperFilter(TermsQuery) instead.
    (Adrien Grand)
  • LUCENE-6223 : Move BooleanQuery.BooleanWeight to BooleanWeight.
    (Robert Muir)
  • LUCENE-1518 : Make Filter extend Query and return 0 as score.
    (Uwe Schindler, Adrien Grand)
  • LUCENE-6245 : Force Filter subclasses to implement toString API from Query.
    (Ryan Ernst)
  • LUCENE-6268 : Replace FieldValueFilter and DocValuesRangeFilter with equivalent queries that support approximations.
    (Adrien Grand)
  • LUCENE-6289 : Replace DocValuesRangeFilter with DocValuesRangeQuery which supports approximations.
    (Adrien Grand)
  • LUCENE-6266 : Remove unnecessary Directory params from SegmentInfo.toString, SegmentInfos.files/toString, and SegmentCommitInfo.toString.
    (Robert Muir)
  • LUCENE-6272 : Scorer extends DocSetIdIterator rather than DocsEnum
    (Alan Woodward)
  • LUCENE-6281 : Removed support for slow collations from lucene/sandbox. Better performance would be achieved through CollationKeyAnalyzer or ICUCollationKeyAnalyzer.
    (Adrien Grand)
  • LUCENE-6286 : Removed IndexSearcher methods that take a Filter object. A BooleanQuery with a filter clause must be used instead.
    (Adrien Grand)
  • LUCENE-6300 : PrefixFilter, TermRangeFilter and NumericRangeFilter have been removed. Use PrefixQuery, TermRangeQuery and NumericRangeQuery instead.
    (Adrien Grand)
  • LUCENE-6303 : Replaced FilterCache with QueryCache and CachingWrapperFilter with CachingWrapperQuery.
    (Adrien Grand)
  • LUCENE-6317 : Deprecate DataOutput.writeStringSet and writeStringStringMap. Use writeSetOfStrings/Maps instead.
    (Mike McCandless, Robert Muir)
  • LUCENE-6307 : Rename SegmentInfo.getDocCount -> .maxDoc, SegmentInfos.totalDocCount -> .totalMaxDoc, MergeInfo.totalDocCount > .totalMaxDoc and MergePolicy.OneMerge.totalDocCount -> .totalMaxDoc
    (Adrien Grand, Robert Muir, Mike McCandless)
  • LUCENE-6367 : PrefixQuery now subclasses AutomatonQuery, removing the specialized PrefixTermsEnum.
    (Robert Muir, Mike McCandless)
  • Other (6)
  • LUCENE-6248 : Remove unused odd constants from StandardSyntaxParser.jj
    (Dawid Weiss)
  • LUCENE-6193 : Collapse identical catch branches in try-catch statements.
    (shalin)
  • LUCENE-6239 : Removed RAMUsageEstimator's sun.misc.Unsafe calls.
    (Robert Muir, Dawid Weiss, Uwe Schindler)
  • LUCENE-6292 : Seed StringHelper better.
    (Robert Muir)
  • LUCENE-6333 : Refactored queries to delegate their equals and hashcode impls to the super class.
    (Lee Hinman via Adrien Grand)
  • LUCENE-6343 : DefaultSimilarity javadocs had the wrong float value to demonstrate precision of encoded norms
    (András Péteri via Mike McCandless)
  • Changes in Runtime Behavior (2)
  • LUCENE-6255 : PhraseQuery now ignores leading holes and requires that positions are positive and added in order.
    (Adrien Grand)
  • LUCENE-6298 : SimpleQueryParser returns an empty query rather than null, if e.g. the terms were all stopwords.
    (Lee Hinman via Robert Muir)
  • Release 5.0.0 [2015-02-20]

  • New Features (32)
  • LUCENE-5945 : All file handling converted to NIO.2 apis.
    (Robert Muir)
  • LUCENE-5946 : SimpleFSDirectory now uses Files.newByteChannel, for portability with custom FileSystemProviders. If you want the old non-interruptible behavior of RandomAccessFile, use RAFDirectory in the misc/ module.
    (Uwe Schindler, Robert Muir)
  • SOLR-3359 : Added analyzer attribute/property to SynonymFilterFactory.
    (Ryo Onodera via Koji Sekiguchi)
  • LUCENE-5648 : Index and search date ranges, particularly multi-valued ones. It's implemented in the spatial module as DateRangePrefixTree used with NumberRangePrefixTreeStrategy.
    (David Smiley)
  • LUCENE-5895 : Lucene now stores a unique id per-segment and per-commit to aid in accurate replication of index files
    (Robert Muir, Mike McCandless)
  • LUCENE-5889 : Add commit method to AnalyzingInfixSuggester, and allow just using .add to build up the suggester.
    (Varun Thacker via Mike McCandless)
  • LUCENE-5123 : Add a "pull" option to the postings writing API, so that a PostingsFormat now receives a Fields instance and it is responsible for iterating through all fields, terms, documents and positions.
    (Robert Muir, Mike McCandless)
  • LUCENE-5268 : Full cutover of all postings formats to the "pull" FieldsConsumer API, removing PushFieldsConsumer. Added new PushPostingsWriterBase for single-pass push of docs/positions to the postings format.
    (Mike McCandless)
  • LUCENE-5906 : Use Files.delete everywhere instead of File.delete, so that when things go wrong, you get a real exception message why.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5933 : Added FilterSpans for easier wrapping of Spans instance.
    (Shai Erera)
  • LUCENE-5925 : Remove fallback logic from opening commits, instead use Directory.renameFile so that in-progress commits are never visible.
    (Robert Muir)
  • LUCENE-5820 : SuggestStopFilter should have a factory.
    (Varun Thacker via Steve Rowe)
  • LUCENE-5949 : Add Accountable.getChildResources().
    (Robert Muir)
  • SOLR-5986 : Added ExitableDirectoryReader that extends FilterDirectoryReader and enables exiting requests that take too long to enumerate over terms.
    (Anshum Gupta, Steve Rowe, Robert Muir)
  • LUCENE-5911 : Add MemoryIndex.freeze() to allow thread-safe searching over a MemoryIndex.
    (Alan Woodward, David Smiley, Robert Muir)
  • LUCENE-5969 : Lucene 5.0 has a new index format with mismatched file detection, improved exception handling, and indirect norms encoding for sparse fields.
    (Mike McCandless, Ryan Ernst, Robert Muir)
  • LUCENE-6053 : Add Serbian analyzer.
    (Nikola Smolenski via Robert Muir, Mike McCandless)
  • LUCENE-4400 : Add support for new NYSIIS Apache commons phonetic codec
    (Thomas Neidhart via Mike McCandless)
  • LUCENE-6059 : Add Daitch-Mokotoff Soundex phonetic Apache commons phonetic codec, and upgrade to Apache commons codec 1.10.
    (Thomas Neidhart via Mike McCandless)
  • LUCENE-6058 : With the upgrade to Apache commons codec 1.10, the experimental BeiderMorseFilter has changed its behavior, so any index using it will need to be rebuilt.
    (Thomas Neidhart via Mike McCandless)
  • LUCENE-6050 : Accept MUST and MUST_NOT (in addition to SHOULD) for each context passed to Analyzing/BlendedInfixSuggester
    (Arcadius Ahouansou, jane chang via Mike McCandless)
  • LUCENE-5929 : Also extract terms to highlight from block join queries.
    (Julie Tibshirani via Mike McCandless)
  • LUCENE-6063 : Allow overriding whether/how ConcurrentMergeScheduler stalls incoming threads when merges are falling behind
    (Mike McCandless)
  • LUCENE-5833 : DocumentDictionary now enumerates each value separately in a multi-valued field (not just the first value), so you can build suggesters from multi-valued fields.
    (Varun Thacker via Mike McCandless)
  • LUCENE-6077 : Added a filter cache.
    (Adrien Grand, Robert Muir)
  • LUCENE-6088 : TermsFilter implements Accountable.
    (Adrien Grand)
  • LUCENE-6034 : The default highlighter when used with QueryScorer will highlight payload-sensitive queries provided that term vectors with positions, offsets, and payloads are present. This is the only highlighter that can highlight such queries accurately.
    (David Smiley)
  • LUCENE-5914 : Add an option to Lucene50Codec to support either BEST_SPEED or BEST_COMPRESSION for stored fields.
    (Adrien Grand, Robert Muir)
  • LUCENE-6119 : Add auto-IO-throttling to ConcurrentMergeScheduler, to rate limit IO writes for each merge depending on incoming merge rate.
    (Mike McCandless)
  • LUCENE-6155 : Add payload support to MemoryIndex. The default highlighter's QueryScorer and WeighedSpanTermExtractor now have setUsePayloads(bool).
    (David Smiley)
  • LUCENE-6166 : Deletions (alone) can now trigger new merges.
    (Mike McCandless)
  • LUCENE-6177 : Add CustomAnalyzer that allows to configure analyzers like you do in Solr's index schema. This class has a builder API to configure Tokenizers, TokenFilters, and CharFilters based on their SPI names and parameters as documented by the corresponding factories.
    (Uwe Schindler)
  • Optimizations (18)
  • LUCENE-5960 : Use a more efficient bitset, not a Set<Integer>, to track visited states.
    (Markus Heiden via Mike McCandless)
  • LUCENE-5959 : Don't allocate excess memory when building automaton in finish.
    (Markus Heiden via Mike McCandless)
  • LUCENE-5963 : Reduce memory allocations in AnalyzingSuggester.
    (Markus Heiden via Mike McCandless)
  • LUCENE-5938 : MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is now faster on queries that match few documents by using a sparse bit set implementation.
    (Adrien Grand)
  • LUCENE-5969 : Refactor merging to be more efficient, checksum calculation is per-segment/per-producer, and norms and doc values merging no longer cause RAM spikes for latent fields.
    (Mike McCandless, Robert Muir)
  • LUCENE-5983 : CachingWrapperFilter now uses a new DocIdSet implementation called RoaringDocIdSet instead of WAH8DocIdSet.
    (Adrien Grand)
  • LUCENE-6022 : DocValuesDocIdSet checks live docs before doc values.
    (Adrien Grand)
  • LUCENE-6030 : Add norms patched compression for a small number of common values
    (Ryan Ernst)
  • LUCENE-6040 : Speed up EliasFanoDocIdSet through broadword bit selection.
    (Paul Elschot)
  • LUCENE-6033 : CachingTokenFilter now uses ArrayList not LinkedList, and has new isCached() method.
    (David Smiley)
  • LUCENE-6031 : TokenSources (in the default highlighter) converts term vectors into a TokenStream much faster in linear time (not N*log(N) using less memory, and with reset() implemented. Only one of offsets or positions are required of the term vector.
    (David Smiley)
  • LUCENE-6089 , LUCENE-6090 : Tune CompressionMode.HIGH_COMPRESSION for better compression and less cpu usage.
    (Adrien Grand, Robert Muir)
  • LUCENE-6034 : QueryScorer, used by the default highlighter, needn't re-index the provided TokenStream with MemoryIndex when it comes from TokenSources (term vectors) with offsets and positions.
    (David Smiley)
  • LUCENE-5951 : ConcurrentMergeScheduler detects whether the index is on SSD or not and does a better job defaulting its settings. This only works on Linux for now; other OS's will continue to use the previous defaults (tuned for spinning disks).
    (Robert Muir, Uwe Schindler, hossman, Mike McCandless)
  • LUCENE-6131 : Optimize SortingMergePolicy.
    (Robert Muir)
  • LUCENE-6133 : Improve default StoredFieldsWriter.merge() to be more efficient.
    (Robert Muir)
  • LUCENE-6145 : Make EarlyTerminatingSortingCollector able to early-terminate when the sort order is a prefix of the index-time order.
    (Adrien Grand)
  • LUCENE-6178 : Score boolean queries containing MUST_NOT clauses with BooleanScorer2, to use skip list data and avoid unnecessary scoring.
    (Adrien Grand, Robert Muir)
  • API Changes (40)
  • LUCENE-5900 : Deprecated more constructors taking Version in *InfixSuggester and ICUCollationKeyAnalyzer, and removed TEST_VERSION_CURRENT from the test framework.
    (Ryan Ernst)
  • LUCENE-4535 : oal.util.FilterIterator is now an internal API.
    (Adrien Grand)
  • LUCENE-4924 : DocIdSetIterator.docID() must now return -1 when the iterator is not positioned. This change affects all classes that inherit from DocIdSetIterator, including DocsEnum and DocsAndPositionsEnum.
    (Adrien Grand)
  • LUCENE-5127 : Reduce RAM usage of FixedGapTermsIndex. Remove IndexWriterConfig.setTermIndexInterval, IndexWriterConfig.setReaderTermsIndexDivisor, and termsIndexDivisor from StandardDirectoryReader. These options have been no-ops with the default codec since Lucene 4.0. If you want to configure the interval for this term index, pass it directly in your codec, where it can also be configured per-field.
    (Robert Muir)
  • LUCENE-5388 : Remove Reader from Tokenizer's constructor and from Analyzer's createComponents. TokenStreams now always get their input via setReader.
    (Benson Margulies via Robert Muir - pull request #16 )
  • LUCENE-5527 : The Collector API has been refactored to use a dedicated Collector per leaf.
    (Shikhar Bhushan, Adrien Grand)
  • LUCENE-5702 : The FieldComparator API has been refactor to a per-leaf API, just like Collectors.
    (Adrien Grand)
  • LUCENE-4246 : IndexWriter.close now always closes, even if it throws an exception. The new IndexWriterConfig.setCommitOnClose (default true) determines whether close() should commit before closing.
  • LUCENE-5608 , LUCENE-5565 : Refactor SpatialPrefixTree/Cell API. Doesn't use Strings as tokens anymore, and now iterates cells on-demand during indexing instead of building a collection. RPT now has more setters.
    (David Smiley)
  • LUCENE-5666 : Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). "Insanity" is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc).
    (Mike McCandless, Robert Muir)
  • LUCENE-5871 : Remove Version from IndexWriterConfig. Use IndexWriterConfig.setCommitOnClose to change the behavior of IndexWriter.close(). The default has been changed to match that of 4.x.
    (Ryan Ernst, Mike McCandless)
  • LUCENE-5965 : CorruptIndexException requires a String or DataInput resource.
    (Robert Muir)
  • LUCENE-5972 : IndexFormatTooOldException and IndexFormatTooNewException now extend from IOException.
    (Ryan Ernst, Robert Muir)
  • LUCENE-5569 : *AtomicReader/AtomicReaderContext have been renamed to *LeafReader/LeafReaderContext.
    (Ryan Ernst)
  • LUCENE-5938 : Removed MultiTermQuery.ConstantScoreAutoRewrite as MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE is usually better.
    (Adrien Grand)
  • LUCENE-5924 : Rename CheckIndex -fix option to -exorcise. This option does not actually fix the index, it just drops data.
    (Robert Muir)
  • LUCENE-5969 : Add Codec.compoundFormat, which handles the encoding of compound files. Add getMergeInstance() to codec producer APIs, which can be overridden to return an instance optimized for merging instead of searching. Add Terms.getStats() which can return additional codec-specific statistics about a field. Change instance method SegmentInfos.read() to two static methods: SegmentInfos.readCommit() and SegmentInfos.readLatestCommit().
    (Mike McCandless, Robert Muir)
  • LUCENE-5992 : Remove FieldInfos from SegmentInfosWriter.write API.
    (Robert Muir, Mike McCandless)
  • LUCENE-5998 : Simplify Field/SegmentInfoFormat to read+write methods.
    (Robert Muir)
  • LUCENE-6000 : Removed StandardTokenizerInterface. Tokenizers now use their jflex impl directly.
    (Ryan Ernst)
  • LUCENE-6006 : Removed FieldInfo.normType since it's redundant: it will be DocValuesType.NUMERIC if the field indexed and does not omit norms, else null.
    (Robert Muir, Mike McCandless)
  • LUCENE-6013 : Removed indexed boolean from IndexableFieldType and FieldInfo, since it's redundant with IndexOptions != null.
    (Robert Muir, Mike McCandless)
  • LUCENE-6021 : FixedBitSet.nextSetBit now returns DocIdSetIterator.NO_MORE_DOCS instead of -1 when there are no more bits which are set.
    (Adrien Grand)
  • LUCENE-5953 : Directory and LockFactory APIs were restructured: Locking is now under the responsibility of the Directory implementation. LockFactory is only used by subclasses of BaseDirectory to delegate locking to an impl class. LockFactories are now singletons and are responsible to create a Lock instance based on a Directory implementation passed to the factory method. See MIGRATE.txt for more details.
    (Uwe Schindler, Robert Muir)
  • LUCENE-6062 : Throw exception instead of silently doing nothing if you try to sort/group/etc on a misconfigured field (e.g. no docvalues, no UninvertingReader, etc).
    (Robert Muir)
  • LUCENE-6068 : LeafReader.fields() never returns null.
    (Robert Muir)
  • LUCENE-6082 : Remove abort() from codec apis.
    (Robert Muir)
  • LUCENE-6084 : IndexOutput's constructor now requires a String resourceDescription so its toString is sane
    (Robert Muir, Mike McCandless)
  • LUCENE-6087 : Allow passing custom DirectoryReader to SearcherManager
    (Mike McCandless)
  • LUCENE-6085 : Undeprecate SegmentInfo attributes, but add safety so they won't be trappy if codec tries to use them during docvalues updates.
    (Robert Muir)
  • LUCENE-6097 : Remove dangerous / overly expert IndexWriter.abortMerges and waitForMerges methods.
    (Robert Muir, Mike McCandless)
  • LUCENE-6099 : Add FilterDirectory.unwrap and FilterDirectoryReader.unwrap
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6121 : CachingTokenFilter.reset() now propagates to its input if called before incrementToken(). You must call reset() now on this filter instead of doing it a-priori on the input(), which previously didn't work.
    (David Smiley, Robert Muir)
  • LUCENE-6147 : Make the core Accountables.namedAccountable function public
    (Ryan Ernst)
  • LUCENE-6150 : Remove staleFiles set and onIndexOutputClosed() from FSDirectory.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-6146 : Replaced Directory.copy() with Directory.copyFrom().
    (Robert Muir)
  • LUCENE-6149 : Infix suggesters' highlighting and allTermsRequired can be set at the constructor for non-contextual lookup.
    (Boon Low, Tomás Fernández Löbbe)
  • LUCENE-6158 , LUCENE-6165 : IndexWriter.addIndexes(IndexReader...) changed to addIndexes(CodecReader...)
    (Robert Muir)
  • LUCENE-6179 : Out-of-order scoring is not allowed anymore, so Weight.scoresDocsOutOfOrder and LeafCollector.acceptsDocsOutOfOrder have been removed and boolean queries now always score in order.
  • LUCENE-6212 : IndexWriter no longer accepts per-document Analyzer to add/updateDocument. These methods were trappy as they made it easy to accidentally index tokens that were not easily searchable.
    (Mike McCandless)
  • Bug Fixes (28)
  • LUCENE-5650 : Enforce read-only access to any path outside the temporary folder via security manager, and make test temp dirs absolute.
    (Ryan Ernst, Dawid Weiss)
  • LUCENE-5948 : RateLimiter now fully inits itself on init.
    (Varun Thacker via Mike McCandless)
  • LUCENE-5981 : CheckIndex obtains write.lock, since with some parameters it may modify the index, and to prevent false corruption reports, as it does not have the regular "spinlock" of DirectoryReader.open. It now implements Closeable and you must close it to release the lock.
    (Mike McCandless, Robert Muir)
  • LUCENE-6004 : Don't highlight the LookupResult.key returned from AnalyzingInfixSuggester
    (Christian Reuschling, jane chang via Mike McCandless)
  • LUCENE-5980 : Don't let document length overflow.
    (Robert Muir)
  • LUCENE-5961 : Fix the exists() method for FunctionValues returned by many ValueSources to behave properly when wrapping other ValueSources which do not exist for the specified document
    (hossman)
  • LUCENE-6039 : Add IndexOptions.NONE and DocValuesType.NONE instead of using null to mean not index and no doc values, renamed IndexOptions.DOCS_ONLY to DOCS, and pulled IndexOptions and DocValues out of FieldInfo into their own classes in org.apache.lucene.index
    (Simon Willnauer, Robert Muir, Mike McCandless)
  • LUCENE-6041 : Remove sugar methods FieldInfo.isIndexed and FieldInfo.hasDocValues.
    (Robert Muir, Mike McCandless)
  • LUCENE-6044 : Fix backcompat support for token filters with enablePositionIncrements=false. Also fixed backcompat for TrimFilter with updateOffsets=true. These options are supported with a match version before 4.4, and no longer valid at all with 5.0.
    (Ryan Ernst)
  • LUCENE-6042 : CustomScoreQuery explain was incorrect in some cases, such as when nested inside a boolean query.
    (Denis Lantsman via Robert Muir)
  • LUCENE-6046 : Add maxDeterminizedStates safety to determinize (which has an exponential worst case) so that if it would create too many states, it now throws an exception instead of exhausting CPU/RAM. Everett via Mike McCandless)
  • LUCENE-6054 : Allow repeating the empty automaton
    (Nik Everett via Mike McCandless)
  • LUCENE-6049 : Don't throw cryptic exception writing a segment when the only docs in it had fields that hit non-aborting exceptions during indexing but also had doc values.
    (Mike McCandless)
  • LUCENE-6055 : PayloadAttribute.clone() now does a deep clone of the underlying bytes.
    (Shai Erera)
  • LUCENE-6060 : Remove dangerous IndexWriter.unlock method
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6062 : Pass correct fieldinfos to docvalues producer when the segment has updates.
    (Mike McCandless, Shai Erera, Robert Muir)
  • LUCENE-6075 : Don't overflow int in SimpleRateLimiter
    (Boaz Leskes via Mike McCandless)
  • LUCENE-5987 : IndexWriter will now forcefully close itself on aborting exception (an exception that would otherwise cause silent data loss).
    (Robert Muir, Mike McCandless)
  • LUCENE-6094 : Allow IW.rollback to stop ConcurrentMergeScheduler even when it's stalling because there are too many merges.
    (Mike McCandless)
  • LUCENE-6105 : Don't cache FST root arcs if the number of root arcs is small, or if the cache would be > 20% of the size of the FST.
    (Robert Muir, Mike McCandless)
  • LUCENE-6124 : Fix double-close() problems in codec and store APIs.
    (Robert Muir)
  • LUCENE-6152 : Fix double close problems in OutputStreamIndexOutput.
    (Uwe Schindler)
  • LUCENE-6139 : Highlighter: TokenGroup start & end offset getters should have been returning the offsets of just the matching tokens in the group when there's a distinction.
    (David Smiley)
  • LUCENE-6173 : NumericTermAttribute and spatial/CellTokenStream do not clone their BytesRef(Builder)s. Also equals/hashCode was missing.
    (Uwe Schindler)
  • LUCENE-6205 : Fixed intermittent concurrency issue that could cause FileNotFoundException when writing doc values updates at the same time that a merge kicks off.
    (Mike McCandless)
  • LUCENE-6192 : Fix int overflow corruption case in skip data for high frequency terms in extremely large indices
    (Robert Muir, Mike McCandless)
  • LUCENE-6093 : Don't throw NullPointerException from BlendedInfixSuggester for lookups that do not end in a prefix token.
    (jane chang via Mike McCandless)
  • LUCENE-6214 : Fixed IndexWriter deadlock when one thread is committing while another opens a near-real-time reader and an unrecoverable (tragic) exception is hit.
    (Simon Willnauer, Mike McCandless)
  • Documentation (3)
  • LUCENE-5392 : Add/improve analysis package documentation to reflect analysis API changes.
    (Benson Margulies via Robert Muir - pull request #17 )
  • LUCENE-6057 : Improve Sort(SortField) docs
    (Martin Braun via Mike McCandless)
  • LUCENE-6112 : Fix compile error in FST package example code
    (Tomoko Uchida via Koji Sekiguchi)
  • Tests (6)
  • LUCENE-5957 : Add option for tests to not randomize codec
    (Ryan Ernst)
  • LUCENE-5974 : Add check that backcompat indexes use default codecs
    (Ryan Ernst)
  • LUCENE-5971 : Create addBackcompatIndexes.py script to build and add backcompat test indexes for a given lucene version. Also renamed backcompat index files to use Version.toString() in filename.
    (Ryan Ernst)
  • LUCENE-6002 : Monster tests no longer fail. Most of them now have an 80 hour timeout, effectively removing the timeout. The tests that operate near the 2 billion limit now use IndexWriter.MAX_DOCS instead of Integer.MAX_VALUE. Some of the slow Monster tests now explicitly choose the default codec.
    (Mike McCandless, Shawn Heisey)
  • LUCENE-5968 : Improve error message when 'ant beast' is run on top-level modules.
    (Ramkumar Aiyengar, Uwe Schindler)
  • LUCENE-6120 : Fix MockDirectoryWrapper's close() handling.
    (Mike McCandless, Robert Muir)
  • Build (5)
  • LUCENE-5909 : Smoke tester now has better command line parsing and optionally also runs on Java 8.
    (Ryan Ernst, Uwe Schindler)
  • LUCENE-5902 : Add bumpVersion.py script to manage version increase after release branch is cut.
  • LUCENE-5962 : Rename diffSources.py to createPatch.py and make it work with all text file types.
    (Ryan Ernst)
  • LUCENE-5995 : Upgrade ICU to 54.1
    (Robert Muir)
  • LUCENE-6070 : Upgrade forbidden-apis to 1.7
    (Uwe Schindler)
  • Other (5)
  • LUCENE-5563 : Removed sep layout: which has fallen behind on features and doesn't perform as well as other options.
    (Robert Muir)
  • LUCENE-4086 : Removed support for Lucene 3.x indexes. See migration guide for more information.
    (Robert Muir)
  • LUCENE-5858 : Moved Lucene 4 compatibility codecs to 'lucene-backward-codecs.jar'.
    (Adrien Grand, Robert Muir)
  • LUCENE-5915 : Remove Pulsing postings format.
    (Robert Muir)
  • LUCENE-6213 : Add useful exception message when commit contains segments from legacy codecs.
    (Ryan Ernst)
  • Release 4.10.4 [2015-03-03]

  • Bug fixes (12)
  • LUCENE-6019 , LUCENE-6117 : Remove -Dtests.assert to make IndexWriter infoStream sane.
    (Robert Muir, Mike McCandless)
  • LUCENE-6161 : Resolving deletes was failing to reuse DocsEnum likely causing substantial performance cost for use cases that frequently delete old documents
    (Mike McCandless)
  • LUCENE-6192 : Fix int overflow corruption case in skip data for high frequency terms in extremely large indices
    (Robert Muir, Mike McCandless)
  • LUCENE-6207 : Fixed consumption of several terms enums on the same sorted (set) doc values instance at the same time.
    (Tom Shally, Robert Muir, Adrien Grand)
  • LUCENE-6093 : Don't throw NullPointerException from BlendedInfixSuggester for lookups that do not end in a prefix token.
    (jane chang via Mike McCandless)
  • LUCENE-6279 : Don't let an abusive leftover _N_upgraded.si in the index directory cause index corruption on upgrade
    (Robert Muir, Mike McCandless)
  • LUCENE-6287 : Fix concurrency bug in IndexWriter that could cause index corruption (missing _N.si files) the first time 4.x kisses a 3.x index if merges are also running.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6205 : Fixed intermittent concurrency issue that could cause FileNotFoundException when writing doc values updates at the same time that a merge kicks off.
    (Mike McCandless)
  • LUCENE-6214 : Fixed IndexWriter deadlock when one thread is committing while another opens a near-real-time reader and an unrecoverable (tragic) exception is hit.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-6105 : Don't cache FST root arcs if the number of root arcs is small, or if the cache would be > 20% of the size of the FST.
    (Robert Muir, Mike McCandless)
  • LUCENE-6001 : DrillSideways hits NullPointerException for certain BooleanQuery searches.
    (Dragan Jotannovic, jane chang via Mike McCandless)
  • LUCENE-6306 : Merging of doc values and norms now checks whether the merge was aborted so IndexWriter.rollback can more promptly abort a running merge.
    (Robert Muir, Mike McCandless)
  • API Changes (1)
  • LUCENE-6212 : Deprecate IndexWriter APIs that accept per-document Analyzer. These methods were trappy as they made it easy to accidentally index tokens that were not easily searchable and will be removed in 5.0.0.
    (Mike McCandless)
  • Release 4.10.3 [2014-12-29]

  • Bug fixes (12)
  • LUCENE-6046 : Add maxDeterminizedStates safety to determinize (which has an exponential worst case) so that if it would create too many states, it now throws an exception instead of exhausting CPU/RAM. Everett via Mike McCandless)
  • LUCENE-6054 : Allow repeating the empty automaton
    (Nik Everett via Mike McCandless)
  • LUCENE-6049 : Don't throw cryptic exception writing a segment when the only docs in it had fields that hit non-aborting exceptions during indexing but also had doc values.
    (Mike McCandless)
  • LUCENE-6060 : Deprecate IndexWriter.unlock
    (Simon Willnauer, Mike McCandless)
  • LUCENE-3229 : Overlapping ordered SpanNearQuery spans should not match.
    (Ludovic Boutros, Paul Elschot, Greg Dearing, ehatcher)
  • LUCENE-6004 : Don't highlight the LookupResult.key returned from AnalyzingInfixSuggester
    (Christian Reuschling, jane chang via Mike McCandless)
  • LUCENE-6075 : Don't overflow int in SimpleRateLimiter
    (Boaz Leskes via Mike McCandless)
  • LUCENE-5980 : Don't let document length overflow.
    (Robert Muir)
  • LUCENE-6042 : CustomScoreQuery explain was incorrect in some cases, such as when nested inside a boolean query.
    (Denis Lantsman via Robert Muir)
  • LUCENE-5948 : RateLimiter now fully inits itself on init.
    (Varun Thacker via Mike McCandless)
  • LUCENE-6055 : PayloadAttribute.clone() now does a deep clone of the underlying bytes.
    (Shai Erera)
  • LUCENE-6094 : Allow IW.rollback to stop ConcurrentMergeScheduler even when it's stalling because there are too many merges.
    (Mike McCandless)
  • Documentation (1)
  • LUCENE-6057 : Improve Sort(SortField) docs
    (Martin Braun via Mike McCandless)
  • Release 4.10.2 [2014-10-31]

  • Bug fixes (2)
  • LUCENE-5977 : Fix tokenstream safety checks in IndexWriter to properly work across multi-valued fields. Previously some cases across multi-valued fields would happily create a corrupt index.
    (Dawid Weiss, Robert Muir)
  • LUCENE-6019 : Detect when DocValuesType illegally changes for the same field name. Also added -Dtests.asserts=true|false so we can run tests with and without assertions.
    (Simon Willnauer, Robert Muir, Mike McCandless) .
  • LUCENE-5934 : Fix backwards compatibility for 4.0 indexes.
    (Ian Lea, Uwe Schindler, Robert Muir, Ryan Ernst)
  • LUCENE-5939 : Regenerate old backcompat indexes to ensure they were built with the exact release
    (Ryan Ernst, Uwe Schindler)
  • LUCENE-5952 : Improve error messages when version cannot be parsed; don't check for too old or too new major version (it's too low level to enforce here); use simple string tokenizer.
    (Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-5958 : Don't let exceptions during checkpoint corrupt the index. Refactor existing OOM handling too, so you don't need to handle OOM special for every IndexWriter method: instead such disasters will cause IW to close itself defensively.
    (Robert Muir, Mike McCandless)
  • LUCENE-5904 : Fixed a corruption case that can happen when 1) IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2) on startup, when a new IndexWriter is created, a virus checker is holding some of the previously written but unused files open and preventing deletion, 3) IndexWriter writes these files again during the course of indexing, then the files can later be deleted, causing corruption. This case was detected by adding evilness to MockDirectoryWrapper to have it simulate a virus checker holding a file open and preventing deletion
    (Robert Muir, Mike McCandless)
  • LUCENE-5916 : Static scope test components should be consistent between tests (and test iterations). Fix for FaultyIndexInput in particular.
    (Dawid Weiss)
  • LUCENE-5975 : Fix reading of 3.0-3.3 indexes, where bugs in these old index formats would result in CorruptIndexException "did not read all bytes from file" when reading the deleted docs file.
    (Patrick Mi, Robert MUir)
  • Tests (1)
  • LUCENE-5936 : Add backcompat checks to verify what is tested matches known versions
    (Ryan Ernst)
  • Release 4.10.0 [2014-09-03]

  • New Features (11)
  • LUCENE-5778 : Support hunspell morphological description fields/aliases.
    (Robert Muir)
  • LUCENE-5801 : Added (back) OrdinalMappingAtomicReader for merging search indexes that contain category ordinals from separate taxonomy indexes.
    (Nicola Buso via Shai Erera)
  • LUCENE-4175 , LUCENE-5714 , LUCENE-5779 : Index and search rectangles with spatial BBoxSpatialStrategy using most predicates. Sort documents by relative overlap of query areas or just by indexed shape area.
    (Ryan McKinley, David Smiley)
  • LUCENE-5806 : Extend expressions grammar to support array access in variables. Added helper class VariableContext to parse complex variable into pieces.
    (Ryan Ernst)
  • LUCENE-5826 : Support proper hunspell case handling, LANG, KEEPCASE, NEEDAFFIX, and ONLYINCOMPOUND flags.
    (Robert Muir)
  • LUCENE-5815 : Add TermAutomatonQuery, a proximity query allowing you to create an arbitrary automaton, using terms on the transitions, expressing which sequence of sequential terms (including a special "any" term) are allowed. This is a generalization of MultiPhraseQuery and span queries, and enables "correct" (including position) length search-time graph synonyms.
    (Mike McCandless)
  • LUCENE-5819 : Add OrdsLucene41 block tree terms dict and postings format, to include term ordinals in the index so the optional TermsEnum.ord() and TermsEnum.seekExact(long ord) APIs work.
    (Mike McCandless)
  • LUCENE-5835 : TermValComparator can sort missing values last.
    (Adrien Grand)
  • LUCENE-5825 : Benchmark module can use custom postings format, e.g.: codec.postingsFormat=Memory
    (Varun Shenoy, David Smiley)
  • LUCENE-5842 : When opening large files (where it's too expensive to compare checksum against all the bytes), retrieve checksum to validate structure of footer, this can detect some forms of corruption such as truncation.
    (Robert Muir)
  • LUCENE-5739 : Added DataInput.readZ(Int|Long) and DataOutput.writeZ(Int|Long) to read and write small signed integers.
    (Adrien Grand)
  • API Changes (8)
  • LUCENE-5752 : Simplified Automaton API to be immutable.
    (Mike McCandless)
  • LUCENE-5793 : Add equals/hashCode to FieldType.
    (Shay Banon, Robert Muir)
  • LUCENE-5692 : DisjointSpatialFilter is deprecated (used by RecursivePrefixTreeStrategy)
    (David Smiley)
  • LUCENE-5771 : SpatialOperation's predicate names are now aliased to OGC standard names. Thus you can use: Disjoint, Equals, Intersects, Overlaps, Within, Contains, Covers, CoveredBy. The area requirement on the predicates was removed, and Overlaps' definition was fixed.
    (David Smiley)
  • LUCENE-5850 : Made Version handling more robust and extensible. Deprecated Constants.LUCENE_MAIN_VERSION, Constants.LUCENE_VERSION and current Version constants of the form LUCENE_X_Y. Added version constants that include bugfix number of form LUCENE_X_Y_Z. Changed Version.LUCENE_CURRENT to Version.LATEST. CheckIndex now prints the Lucene version used to write each segment.
    (Ryan Ernst, Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-5836 : BytesRef has been splitted into BytesRef, whose intended usage is to be just a reference to a section of a larger byte[] and BytesRefBuilder which is a StringBuilder-like class for BytesRef instances.
    (Adrien Grand)
  • LUCENE-5883 : You can now change the MergePolicy instance on a live IndexWriter, without first closing and reopening the writer. This allows to e.g. run a special merge with UpgradeIndexMergePolicy without reopening the writer. Also, MergePolicy no longer implements Closeable; if you need to release your custom MergePolicy's resources, you need to implement close() and call it explicitly.
    (Shai Erera)
  • LUCENE-5859 : Deprecate Analyzer constructors taking Version. Use Analyzer.setVersion() to set the version an analyzer to replicate behavior from a specific release.
    (Ryan Ernst, Robert Muir)
  • Optimizations (14)
  • LUCENE-5780 : Make OrdinalMap more memory-efficient, especially in case the first segment has all values.
    (Adrien Grand, Robert Muir)
  • LUCENE-5782 : OrdinalMap now sorts enums before being built in order to improve compression.
    (Adrien Grand)
  • LUCENE-5798 : Optimize MultiDocsEnum reuse.
    (Robert Muir)
  • LUCENE-5799 : Optimize numeric docvalues merging.
    (Robert Muir)
  • LUCENE-5797 : Optimize norms merging
    (Adrien Grand, Robert Muir)
  • LUCENE-5803 : Add DelegatingAnalyzerWrapper, an optimized variant of AnalyzerWrapper that doesn't allow to wrap components or readers. This wrapper class is the base class of all analyzers that just delegate to another analyzer, e.g. per field name: PerFieldAnalyzerWrapper and Solr's schema support.
    (Shay Banon, Uwe Schindler, Robert Muir)
  • LUCENE-5795 : MoreLikeThisQuery now only collects the top N terms instead of collecting all terms from the like text when building the query.
    (Alex Ksikes, Simon Willnauer)
  • LUCENE-5681 : Fix RAMDirectory's IndexInput to not do double buffering on slices (causes useless data copying, especially on random access slices). This also improves slices of NRTCachingDirectory, because the cache is based on RAMDirectory. BufferedIndexInput.wrap() was marked with a warning in javadocs. It is almost always a better idea to implement slicing on your own!
    (Uwe Schindler, Robert Muir)
  • LUCENE-5834 : Empty sorted set and numeric doc values are now singletons.
    (Adrien Grand)
  • LUCENE-5841 : Improve performance of block tree terms dictionary when assigning terms to blocks.
    (Mike McCandless)
  • LUCENE-5856 : Optimize Fixed/Open/LongBitSet to remove unnecessary AND.
    (Robert Muir)
  • LUCENE-5884 : Optimize FST.ramBytesUsed.
    (Adrien Grand, Robert Muir, Mike McCandless)
  • LUCENE-5882 : Add Lucene410DocValuesFormat, with faster term lookups for SORTED/SORTED_SET fields.
    (Robert Muir)
  • LUCENE-5887 : Remove WeakIdentityMap caching in AttributeFactory, AttributeSource, and VirtualMethod in favour of Java 7's ClassValue. Always use MethodHandles to create AttributeImpl classes.
    (Uwe Schindler)
  • Bug Fixes (9)
  • LUCENE-5796 : Fixes the Scorer.getChildren() method for two combinations of BooleanQuery.
    (Terry Smith via Robert Muir)
  • LUCENE-5790 : Fix compareTo in MutableValueDouble and MutableValueBool, this caused incorrect results when grouping on fields with missing values.
    (海老澤 志信, hossman)
  • LUCENE-5817 : Fix hunspell zero-affix handling: previously only zero-strips worked correctly.
    (Robert Muir)
  • LUCENE-5818 , LUCENE-5823 : Fix hunspell overgeneration for short strings that also match affixes, words are only stripped to a zero-length string if FULLSTRIP option is specified in the dictionary.
    (Robert Muir)
  • LUCENE-5824 : Fix hunspell 'long' flag handling.
    (Robert Muir)
  • LUCENE-5838 : Fix hunspell when the .aff file has over 64k affixes.
    (Robert Muir)
  • LUCENE-5869 : Added restriction to positive values for maxExpansions in FuzzyQuery.
    (Ryan Ernst)
  • LUCENE-5672 : IndexWriter.addIndexes() calls maybeMerge(), to ensure the index stays healthy. If you don't want merging use NoMergePolicy instead.
    (Robert Muir)
  • LUCENE-5908 : Fix Lucene43NGramTokenizer to be final
  • Test Framework (2)
  • LUCENE-5786 : Unflushed/ truncated events file (hung testing subprocess).
    (Dawid Weiss)
  • LUCENE-5881 : Add "beasting" of tests: repeats the whole "test" Ant target N times with "ant beast -Dbeast.iters=N".
    (Uwe Schindler, Robert Muir, Ryan Ernst, Dawid Weiss)
  • Build (2)
  • LUCENE-5770 : Upgrade to JFlex 1.6, which has direct support for supplementary code points - as a result, ICU4J is no longer used to generate surrogate pairs to augment JFlex scanner specifications.
    (Steve Rowe)
  • SOLR-6358 : Remove VcsDirectoryMappings from idea configuration vcs.xml
    (Ramkumar Aiyengar via Steve Rowe)
  • Release 4.9.1 [2014-09-22]

  • Bug fixes (7)
  • LUCENE-5907 : Fix corruption case when opening a pre-4.x index with IndexWriter, then opening an NRT reader from that writer, then calling commit from the writer, then closing the NRT reader. This case would remove the wrong files from the index leading to a corrupt index.
    (Mike McCandless)
  • LUCENE-5919 : Fix exception handling inside IndexWriter when deleteFile throws an exception, to not over-decRef index files, possibly deleting a file that's still in use in the index, leading to corruption.
    (Mike McCandless)
  • LUCENE-5922 : DocValuesDocIdSet on 5.x and FieldCacheDocIdSet on 4.x are not cacheable.
    (Adrien Grand)
  • LUCENE-5843 : Added IndexWriter.MAX_DOCS which is the maximum number of documents allowed in a single index, and any operations that add documents will now throw IllegalStateException if the max count would be exceeded, instead of silently creating an unusable index.
    (Mike McCandless)
  • LUCENE-5844 : ArrayUtil.grow/oversize now returns a maximum of Integer.MAX_VALUE - 8 for the maximum array size.
    (Robert Muir, Mike McCandless)
  • LUCENE-5827 : Make all Directory implementations correctly fail with IllegalArgumentException if slices are out of bounds.
    (Uwe Schindler)
  • LUCENE-5897 , LUCENE-5400 : JFlex-based tokenizers StandardTokenizer and UAX29URLEmailTokenizer tokenize extremely slowly over long sequences of text partially matching certain grammar rules. The scanner default buffer size was reduced, and scanner buffer growth was disabled, resulting in much, much faster tokenization for these text sequences.
    (Chris Geeringh, Robert Muir, Steve Rowe)
  • Release 4.9.0 [2014-06-25]

  • Changes in Runtime Behavior (2)
  • LUCENE-5611 : Changing the term vector options for multiple field instances by the same name in one document is not longer accepted; IndexWriter will now throw IllegalArgumentException.
    (Robert Muir, Mike McCandless)
  • LUCENE-5646 : Remove rare/undertested bulk merge algorithm in CompressingStoredFieldsWriter.
    (Robert Muir, Adrien Grand)
  • New Features (8)
  • LUCENE-5610 : Add Terms.getMin and Terms.getMax to get the lowest and highest terms, and NumericUtils.get{Min/Max}{Int/Long} to get the minimum numeric values from the provided Terms.
    (Robert Muir, Mike McCandless)
  • LUCENE-5675 : Add IDVersionPostingsFormat, a postings format optimized for primary-key (ID) fields that also record a version (long) for each ID.
    (Robert Muir, Mike McCandless)
  • LUCENE-5680 : Add ability to atomically update a set of DocValues fields.
    (Shai Erera)
  • LUCENE-5717 : Add support for multiterm queries nested inside filtered and constant-score queries to postings highlighter.
    (Luca Cavanna via Robert Muir)
  • LUCENE-5731 , LUCENE-5760 : Add RandomAccessInput, a random access API for directory. Add DirectReader/Writer, optimized for reading packed integers directly from Directory. Add Lucene49Codec and Lucene49DocValuesFormat that make use of these.
    (Robert Muir)
  • LUCENE-5743 : Add Lucene49NormsFormat, which can compress in some cases such as very short fields.
    (Ryan Ernst, Adrien Grand, Robert Muir)
  • LUCENE-5748 : Add SORTED_NUMERIC docvalues type, which is efficient for processing numeric fields with multiple values.
    (Robert Muir)
  • LUCENE-5754 : Allow "$" as part of variable and function names in expressions module.
    (Uwe Schindler)
  • Changes in Backwards Compatibility Policy (4)
  • LUCENE-5634 : Add reuse argument to IndexableField.tokenStream. This can be used by custom fieldtypes, which don't use the Analyzer, but implement their own TokenStream.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5640 : AttributeSource.AttributeFactory was moved to a top-level class: org.apache.lucene.util.AttributeFactory
    (Uwe Schindler, Robert Muir)
  • LUCENE-4371 : Removed IndexInputSlicer and Directory.createSlicer() and replaced with IndexInput.slice().
    (Robert Muir)
  • LUCENE-5727 , LUCENE-5678 : Remove IndexOutput.seek, IndexOutput.setLength().
    (Robert Muir, Uwe Schindler)
  • API Changes (20)
  • LUCENE-5756 : IndexWriter now implements Accountable and IW#ramSizeInBytes() has been deprecated in favor of IW#ramBytesUsed()
    (Simon Willnauer)
  • LUCENE-5725 : MoreLikeThis#like now accepts multiple values per field. The pre-existing method has been deprecated in favor of a variable arguments for the like text.
    (Alex Ksikes via Simon Willnauer)
  • LUCENE-5711 : MergePolicy accepts an IndexWriter instance on each method rather than holding state against a single IndexWriter instance.
    (Simon Willnauer)
  • LUCENE-5582 : Deprecate IndexOutput.length (just use IndexOutput.getFilePointer instead) and IndexOutput.setLength.
    (Mike McCandless)
  • LUCENE-5621 : Deprecate IndexOutput.flush: this is not used by Lucene.
    (Robert Muir)
  • LUCENE-5611 : Simplified Lucene's default indexing chain / APIs. AttributeSource/TokenStream.getAttribute now returns null if the attribute is not present (previously it threw IllegalArgumentException). StoredFieldsWriter.startDocument no longer receives the number of fields that will be added
    (Robert Muir, Mike McCandless)
  • LUCENE-5632 : In preparation for coming Lucene versions, the Version enum constants were renamed to make them better readable. The constant for Lucene 4.9 is now "LUCENE_4_9". Version.parseLeniently() is still able to parse the old strings ("LUCENE_49"). The old identifiers got deprecated and will be removed in Lucene 5.0.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5633 : Change NoMergePolicy to a singleton with no distinction between compound and non-compound types.
    (Shai Erera)
  • LUCENE-5640 : The Token class was deprecated. Since Lucene 2.9, TokenStreams are using Attributes, Token is no longer used.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5679 : Consolidated IndexWriter.deleteDocuments(Term) and IndexWriter.deleteDocuments(Query) with their varargs counterparts.
    (Shai Erera)
  • LUCENE-5701 : Core closed listeners are now available in the AtomicReader API, they used to sit only in SegmentReader.
    (Adrien Grand, Robert Muir)
  • LUCENE-5706 : Removed the option to unset a DocValues field through DocValues updates.
    (Shai Erera)
  • LUCENE-5700 : Added oal.util.Accountable that is now implemented by all classes whose memory usage can be estimated.
    (Robert Muir, Adrien Grand)
  • LUCENE-5708 : Remove IndexWriterConfig.clone, so now IndexWriter simply uses the IndexWriterConfig you pass it, and you must create a new IndexWriterConfig for each IndexWriter.
    (Mike McCandless)
  • LUCENE-5678 : IndexOutput no longer allows seeking, so it is no longer required to use RandomAccessFile to write Indexes. Lucene now uses standard FileOutputStream wrapped with OutputStreamIndexOutput to write index data. BufferedIndexOutput was removed, because buffering and checksumming is provided by FilterOutputStreams, provided by the JDK.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-5703 : BinaryDocValues API changed to work like TermsEnum and not allocate/ copy bytes on each access, you are responsible for cloning if you want to keep data around.
    (Adrien Grand)
  • LUCENE-5695 : DocIdSet implements Accountable.
    (Adrien Grand)
  • LUCENE-5757 : Moved RamUsageEstimator's reflection-based processing to RamUsageTester in the test-framework module.
    (Robert Muir)
  • LUCENE-5761 : Removed DiskDocValuesFormat, it was very inefficient and saved very little RAM over the default codec.
    (Robert Muir)
  • LUCENE-5775 : Deprecate JaspellLookup.
    (Mike McCandless)
  • Optimizations (18)
  • LUCENE-5603 : hunspell stemmer more efficiently strips prefixes and suffixes.
    (Robert Muir)
  • LUCENE-5599 : HttpReplicator did not properly delegate bulk read() to wrapped InputStream.
    (Christoph Kaser via Shai Erera)
  • LUCENE-5591 : pass an IOContext with estimated flush size when applying DV updates.
    (Shai Erera)
  • LUCENE-5634 : IndexWriter reuses TokenStream instances for String and Numeric fields by default.
    (Uwe Schindler, Shay Banon, Mike McCandless, Robert Muir)
  • LUCENE-5638 , LUCENE-5640 : TokenStream uses a more performant AttributeFactory by default, that packs the core attributes into one implementation (PackedTokenAttributeImpl), for faster clearAttributes(), saveState(), and restoreState(). In addition, AttributeFactory uses Java 7 MethodHandles for instantiating Attribute implementations.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5609 : Changed the default NumericField precisionStep from 4 to 8 (for int/float) and 16 (for long/double), for faster indexing time and smaller indices.
    (Robert Muir, Uwe Schindler, Mike McCandless)
  • LUCENE-5670 : Add skip/FinalOutput to FST Outputs.
    (Christian Ziech via Mike McCandless) .
  • LUCENE-4236 : Optimize BooleanQuery's in-order scoring. This speeds up some types of boolean queries.
    (Robert Muir)
  • LUCENE-5694 : Don't score() subscorers in DisjunctionSumScorer or DisjunctionMaxScorer unless score() is called.
    (Robert Muir)
  • LUCENE-5720 : Optimize DirectPackedReader's decompression.
    (Robert Muir)
  • LUCENE-5722 : Optimize ByteBufferIndexInput#seek() by specializing implementations. This improves random access as used by docvalues codecs if used with MMapDirectory.
    (Robert Muir, Uwe Schindler)
  • LUCENE-5730 : FSDirectory.open returns MMapDirectory for 64-bit operating systems, not just Linux and Windows.
    (Robert Muir)
  • LUCENE-5703 : BinaryDocValues producers don't allocate or copy bytes on each access anymore.
    (Adrien Grand)
  • LUCENE-5721 : Monotonic compression doesn't use zig-zag encoding anymore.
    (Robert Muir, Adrien Grand)
  • LUCENE-5750 : Speed up monotonic addressing for BINARY and SORTED_SET docvalues.
    (Robert Muir)
  • LUCENE-5751 : Speed up MemoryDocValues.
    (Adrien Grand, Robert Muir)
  • LUCENE-5767 : OrdinalMap optimizations, that mostly help on low cardinalities.
    (Martijn van Groningen, Adrien Grand)
  • LUCENE-5769 : SingletonSortedSetDocValues now supports random access ordinals.
    (Robert Muir)
  • Bug fixes (11)
  • LUCENE-5738 : Ensure NativeFSLock prevents opening the file channel for the lock if the lock is already obtained by the JVM. Trying to obtain an already obtained lock in the same JVM can unlock the file might allow other processes to lock the file even without explicitly unlocking the FileLock. This behavior is operating system dependent.
    (Simon Willnauer)
  • LUCENE-5673 : MMapDirectory: Work around a "bug" in the JDK that throws a confusing OutOfMemoryError wrapped inside IOException if the FileChannel mapping failed because of lack of virtual address space. The IOException is rethrown with more useful information about the problem, omitting the incorrect OutOfMemoryError.
    (Robert Muir, Uwe Schindler)
  • LUCENE-5682 : NPE in QueryRescorer when Scorer is null
    (Joel Bernstein, Mike McCandless)
  • LUCENE-5691 : DocTermOrds lookupTerm(BytesRef) would return incorrect results if the underlying TermsEnum supports ord() and the insertion point would be at the end.
    (Robert Muir)
  • LUCENE-5618 , LUCENE-5636 : SegmentReader referenced unneeded files following doc-values updates. Now doc-values field updates are written in separate file per field.
    (Shai Erera, Robert Muir)
  • LUCENE-5684 : Make best effort to detect invalid usage of Lucene, when IndexReader is reopened after all files in its index were removed and recreated by the application (the proper way to do this is IndexWriter.deleteAll, or opening an IndexWriter with OpenMode.CREATE)
    (Mike McCandless)
  • LUCENE-5704 : Fix compilation error with Java 8u20.
    (Uwe Schindler)
  • LUCENE-5710 : Include the inner exception as the cause and in the exception message when an immense term is hit during indexing Hinman via Mike McCandless)
  • LUCENE-5724 : CompoundFileWriter was failing to pass through the IOContext in some cases, causing NRTCachingDirectory to cache compound files when it shouldn't, then causing OOMEs.
    (Mike McCandless)
  • LUCENE-5747 : Project-specific settings for the eclipse development environment will prevent automatic code reformatting.
    (Shawn Heisey)
  • LUCENE-5768 , LUCENE-5777 : Hunspell condition checks containing character classes were buggy.
    (Clinton Gormley, Robert Muir)
  • Test Framework (2)
  • LUCENE-5622 : Fail tests if they print over the given limit of bytes to System.out or System.err.
    (Robert Muir, Dawid Weiss)
  • LUCENE-5619 : Added backwards compatibility tests to ensure we can update existing indexes with doc-values updates.
    (Shai Erera, Robert Muir)
  • Build (2)
  • LUCENE-5442 : The Ant check-lib-versions target now runs Ivy resolution transitively, then fails the build when it finds a version conflict: when a transitive dependency's version is more recent than the direct dependency's version specified in lucene/ivy-versions.properties. Exceptions are specifiable in lucene/ivy-ignore-conflicts.properties.
    (Steve Rowe)
  • LUCENE-5715 : Upgrade direct dependencies known to be older than transitive dependencies: com.sun.jersey.version:1.8->1.9; com.sun.xml.bind:jaxb-impl:2.2.2->2.2.3-1; commons-beanutils:commons-beanutils:1.7.0->1.8.3; commons-digester:commons-digester:2.0->2.1; commons-io:commons-io:2.1->2.3; commons-logging:commons-logging:1.1.1->1.1.3; io.netty:netty:3.6.2.Final->3.7.0.Final; javax.activation:activation:1.1->1.1.1; javax.mail:mail:1.4.1->1.4.3; log4j:log4j:1.2.16->1.2.17; org.apache.avro:avro:1.7.4->1.7.5; org.tukaani:xz:1.2->1.4; org.xerial.snappy:snappy-java:1.0.4.1->1.0.5
    (Steve Rowe)
  • Release 4.8.1 [2014-05-20]

  • Bug fixes (15)
  • LUCENE-5639 : Fix PositionLengthAttribute implementation in Token class.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5635 : IndexWriter didn't properly handle IOException on TokenStream.reset(), which could leave the analyzer in an inconsistent state.
    (Robert Muir)
  • LUCENE-5599 : HttpReplicator did not properly delegate bulk read() to wrapped InputStream.
    (Christoph Kaser via Shai Erera)
  • LUCENE-5600 : HttpClientBase did not properly consume a connection if a server error occurred.
    (Christoph Kaser via Shai Erera)
  • LUCENE-5628 : Change getFiniteStrings to iterative not recursive implementation, so that building suggesters on a long suggestion doesn't risk overflowing the stack; previously it consumed one Java stack frame per character in the expanded suggestion. If you are building a suggester this is a nasty trap.
    (Robert Muir, Simon Willnauer, Mike McCandless) .
  • LUCENE-5559 : Add additional argument validation for CapitalizationFilter and CodepointCountFilter.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-5641 : SimpleRateLimiter would silently rate limit at 8 MB/sec even if you asked for higher rates.
    (Mike McCandless)
  • LUCENE-5644 : IndexWriter clears which threads use which internal thread states on flush, so that if an application reduces how many threads it uses for indexing, that results in a reduction of how many segments are flushed on a full-flush (e.g. to obtain a near-real-time reader).
    (Simon Willnauer, Mike McCandless)
  • LUCENE-5653 : JoinUtil with ScoreMode.Avg on a multi-valued field with more than 256 values would throw exception.
    (Mikhail Khludnev via Robert Muir)
  • LUCENE-5654 : Fix various close() methods that could suppress throwables such as OutOfMemoryError, instead returning scary messages that look like index corruption.
    (Mike McCandless, Robert Muir)
  • LUCENE-5656 : Fix rare fd leak in SegmentReader when multiple docvalues fields have been updated with IndexWriter.updateXXXDocValue and one hits exception.
    (Shai Erera, Robert Muir)
  • LUCENE-5660 : AnalyzingSuggester.build will now throw IllegalArgumentException if you give it a longer suggestion than it can handle
    (Robert Muir, Mike McCandless)
  • LUCENE-5662 : Add missing checks to Field to prevent IndexWriter.abort if a stored value is null.
    (Robert Muir)
  • LUCENE-5668 : Fix off-by-one in TieredMergePolicy
    (Mike McCandless)
  • LUCENE-5671 : Upgrade ICU version to fix an ICU concurrency problem that could cause exceptions when indexing.
    (feedly team, Robert Muir)
  • Release 4.8.0 [2014-04-28]

  • System Requirements (1)
  • LUCENE-4747 , LUCENE-5514 : Move to Java 7 as minimum Java version.
    (Robert Muir, Uwe Schindler)
  • Changes in Runtime Behavior (1)
  • LUCENE-5472 : IndexWriter.addDocument will now throw an IllegalArgumentException if a Term to be indexed exceeds IndexWriter.MAX_TERM_LENGTH. To recreate previous behavior of silently ignoring these terms, use LengthFilter in your Analyzer.
    (hossman, Mike McCandless, Varun Thacker)
  • New Features (24)
  • LUCENE-5356 : Morfologik filter can accept custom dictionary resources.
    (Michal Hlavac, Dawid Weiss)
  • LUCENE-5454 : Add SortedSetSortField to lucene/sandbox, to allow sorting on multi-valued field.
    (Robert Muir)
  • LUCENE-5478 : CommonTermsQuery now allows to create custom term queries similar to the query parser by overriding a newTermQuery method.
    (Simon Willnauer)
  • LUCENE-5477 : AnalyzingInfixSuggester now supports near-real-time additions and updates (to change weight or payload of an existing suggestion).
    (Mike McCandless)
  • LUCENE-5482 : Improve default TurkishAnalyzer by adding apostrophe handling suitable for Turkish.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-5479 : FacetsConfig subclass can now customize the default per-dim facets configuration.
    (Rob Audenaerde via Mike McCandless)
  • LUCENE-5485 : Add circumfix support to HunspellStemFilter.
    (Robert Muir)
  • LUCENE-5224 : Add iconv, oconv, and ignore support to HunspellStemFilter.
    (Robert Muir)
  • LUCENE-5493 : SortingMergePolicy, and EarlyTerminatingSortingCollector support arbitrary Sort specifications.
    (Robert Muir, Mike McCandless, Adrien Grand)
  • LUCENE-3758 : Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.
    (Ahmet Arslan via Erick Erickson)
  • LUCENE-5530 : ComplexPhraseQueryParser throws ParseException for fielded queries.
    (Erick Erickson via Tomas Fernandez Lobbe and Ahmet Arslan)
  • LUCENE-5513 : Add IndexWriter.updateBinaryDocValue which lets you update the value of a BinaryDocValuesField without reindexing the document(s).
    (Shai Erera)
  • LUCENE-4072 : Add ICUNormalizer2CharFilter, which lets you do unicode normalization with offset correction before the tokenizer.
    (David Goldfarb, Ippei UKAI via Robert Muir)
  • LUCENE-5476 : Add RandomSamplingFacetsCollector for computing facets on a sampled set of matching hits, in cases where there are millions of hits.
    (Rob Audenaerde, Gilad Barkai, Shai Erera)
  • LUCENE-4984 : Add SegmentingTokenizerBase, abstract class for tokenizers that want to do two-pass tokenization such as by sentence and then by word.
    (Robert Muir)
  • LUCENE-5489 : Add Rescorer/QueryRescorer, to resort the hits from a first pass search using scores from a more costly second pass search.
    (Simon Willnauer, Robert Muir, Mike McCandless)
  • LUCENE-5528 : Add context to suggesters (InputIterator and Lookup classes), and fix AnalyzingInfixSuggester to handle contexts. Suggester contexts allow you to filter suggestions.
    (Areek Zillur, Mike McCandless)
  • LUCENE-5545 : Add SortRescorer and Expression.getRescorer, to resort the hits from a first pass search using a Sort or an Expression.
    (Simon Willnauer, Robert Muir, Mike McCandless)
  • LUCENE-5558 : Add TruncateTokenFilter which truncates terms to the specified length.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-2446 : Added checksums to lucene index files. As of 4.8, the last 8 bytes of each file contain a zlib-crc32 checksum. Small metadata files are verified on load. Larger files can be checked on demand via AtomicReader.checkIntegrity. You can configure this to happen automatically before merges by enabling IndexWriterConfig.setCheckIntegrityAtMerge.
    (Robert Muir)
  • LUCENE-5580 : Checksums are automatically verified on the default stored fields format when performing a bulk merge.
    (Adrien Grand)
  • LUCENE-5602 : Checksums are automatically verified on the default term vectors format when performing a bulk merge.
    (Adrien Grand, Robert Muir)
  • LUCENE-5583 : Added DataInput.skipBytes. ChecksumIndexInput can now seek, but only forward.
    (Adrien Grand, Mike McCandless, Simon Willnauer, Uwe Schindler)
  • LUCENE-5588 : Lucene now calls fsync() on the index directory, ensuring that all file metadata is persisted on disk in case of power failure. This does not work on all file systems and operating systems, but Linux and MacOSX are known to work. On Windows, fsyncing a directory is not possible with Java APIs.
    (Mike McCandless, Uwe Schindler)
  • API Changes (10)
  • LUCENE-5454 : Add RandomAccessOrds, an optional extension of SortedSetDocValues that supports random access to the ordinals in a document.
    (Robert Muir)
  • LUCENE-5468 : Move offline Sort (from suggest module) to OfflineSort.
    (Robert Muir)
  • LUCENE-5493 : SortingMergePolicy and EarlyTerminatingSortingCollector take Sort instead of Sorter. BlockJoinSorter is removed, replaced with BlockJoinComparatorSource, which can take a Sort for ordering of parents and a separate Sort for ordering of children within a block.
    (Robert Muir, Mike McCandless, Adrien Grand)
  • LUCENE-5516 : MergeScheduler#merge() now accepts a MergeTrigger as well as a boolean that indicates if a new merge was found in the caller thread before the scheduler was called.
    (Simon Willnauer)
  • LUCENE-5487 : Separated bulk scorer (new Weight.bulkScorer method) from normal scoring (Weight.scorer) for those queries that can do bulk scoring more efficiently, e.g. BooleanQuery in some cases. This also simplified the Weight.scorer API by removing the two confusing booleans.
    (Robert Muir, Uwe Schindler, Mike McCandless)
  • LUCENE-5519 : TopNSearcher now allows to retrieve incomplete results if the max size of the candidate queue is unknown. The queue can still be bound in order to apply pruning while retrieving the top N but will not throw an exception if too many results are rejected to guarantee an absolutely correct top N result. The TopNSearcher now returns a struct like class that indicates if the result is complete in the sense of the top N or not. Consumers of this API should assert on the completeness if the bounded queue size is know ahead of time.
    (Simon Willnauer)
  • LUCENE-4984 : Deprecate ThaiWordFilter and smartcn SentenceTokenizer and WordTokenFilter. These filters would not work correctly with CharFilters and could not be safely placed at an arbitrary position in the analysis chain. Use ThaiTokenizer and HMMChineseTokenizer instead.
    (Robert Muir)
  • LUCENE-5543 : Remove/deprecate Directory.fileExists
    (Mike McCandless)
  • LUCENE-5573 : Move docvalues constants and helper methods to o.a.l.index.DocValues.
    (Dawid Weiss, Robert Muir)
  • LUCENE-5604 : Switched BytesRef.hashCode to MurmurHash3 (32 bit). TermToBytesRefAttribute.fillBytesRef no longer returns the hash code. BytesRefHash now uses MurmurHash3 for its hashing.
    (Robert Muir, Mike McCandless)
  • Optimizations (4)
  • LUCENE-5468 : HunspellStemFilter uses 10 to 100x less RAM. It also loads all known openoffice dictionaries without error, and supports an additional longestOnly option for a less aggressive approach.
    (Robert Muir)
  • LUCENE-4848 : Use Java 7 NIO2-FileChannel instead of RandomAccessFile for NIOFSDirectory and MMapDirectory. This allows to delete open files on Windows if NIOFSDirectory is used, mmapped files are still locked.
    (Michael Poindexter, Robert Muir, Uwe Schindler)
  • LUCENE-5515 : Improved TopDocs#merge to create a merged ScoreDoc array with length of at most equal to the specified size instead of length equal to at most from + size as was before.
    (Martijn van Groningen)
  • LUCENE-5529 : Spatial search of non-point indexed shapes should be a little faster due to skipping intersection tests on redundant cells.
    (David Smiley)
  • Bug fixes (14)
  • LUCENE-5483 : Fix inaccuracies in HunspellStemFilter. Multi-stage affix-stripping, prefix-suffix dependencies, and COMPLEXPREFIXES now work correctly according to the hunspell algorithm. Removed recursionCap parameter, as it's no longer needed, rules for recursive affix application are driven correctly by continuation classes in the affix file.
    (Robert Muir)
  • LUCENE-5497 : HunspellStemFilter properly handles escaped terms and affixes without conditions.
    (Robert Muir)
  • LUCENE-5505 : HunspellStemFilter ignores BOM markers in dictionaries and handles varying types of whitespace in SET/FLAG commands.
    (Robert Muir)
  • LUCENE-5507 : Fix HunspellStemFilter loading of dictionaries with large amounts of aliases etc before the encoding declaration.
    (Robert Muir)
  • LUCENE-5111 : Fix WordDelimiterFilter to return offsets in correct order.
    (Robert Muir)
  • LUCENE-5555 : Fix SortedInputIterator to correctly encode/decode contexts in presence of payload
    (Areek Zillur)
  • LUCENE-5559 : Add missing argument checks to tokenfilters taking numeric arguments.
    (Ahmet Arslan via Robert Muir)
  • LUCENE-5568 : Benchmark module's "default.codec" option didn't work.
    (David Smiley)
  • SOLR-5983 : HTMLStripCharFilter is treating CDATA sections incorrectly.
    (Dan Funk, Steve Rowe)
  • LUCENE-5615 : Validate per-segment delete counts at write time, to help catch bugs that might otherwise cause corruption
    (Mike McCandless)
  • LUCENE-5612 : NativeFSLockFactory no longer deletes its lock file. This cannot be done safely without the risk of deleting someone else's lock file. If you use NativeFSLockFactory, you may see write.lock hanging around from time to time: it's harmless.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-5624 : Ensure NativeFSLockFactory does not leak file handles if it is unable to obtain the lock.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5626 : Fix bug in SimpleFSLockFactory's obtain() that sometimes throwed IOException (ERROR_ACCESS_DENIED) on Windows if the lock file was created concurrently. This error is now handled the same way like in NativeFSLockFactory by returning false.
    (Uwe Schindler, Robert Muir, Dawid Weiss)
  • LUCENE-5630 : Add missing META-INF entry for UpperCaseFilterFactory.
    (Robert Muir)
  • Tests (1)
  • LUCENE-5630 : Fix TestAllAnalyzersHaveFactories to correctly check for existence of class and corresponding Map<String,String> ctor.
    (Uwe Schindler, Robert Muir)
  • Test Framework (5)
  • LUCENE-5592 : Incorrectly reported uncloseable files.
    (Dawid Weiss)
  • LUCENE-5577 : Temporary folder and file management (and cleanup facilities)
    (Mark Miller, Uwe Schindler, Dawid Weiss)
  • LUCENE-5567 : When a suite fails with zombie threads failure marker and count is not propagated properly.
    (Dawid Weiss)
  • LUCENE-5449 : Rename _TestUtil and _TestHelper to remove the leading _.
  • LUCENE-5501 : Added random out-of-order collection testing (when the collector supports it) to AssertingIndexSearcher.
    (Adrien Grand)
  • Build (4)
  • LUCENE-5463 : RamUsageEstimator.(human)sizeOf(Object) is now a forbidden API.
    (Adrien Grand, Robert Muir)
  • LUCENE-5512 : Remove redundant typing (use diamond operator) throughout the codebase.
    (Furkan KAMACI via Robert Muir)
  • LUCENE-5614 : Enable building on Java 8 using Apache Ant 1.8.3 or 1.8.4 by adding a workaround for the Ant bug.
    (Uwe Schindler)
  • LUCENE-5612 : Add a new Ant target in lucene/core to test LockFactory implementations: "ant test-lock-factory".
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • Documentation (1)
  • LUCENE-5534 : Add javadocs to GreekStemmer methods.
    (Stamatis Pitsios via Robert Muir)
  • Release 4.7.2 [2014-04-15]

  • Bug Fixes (2)
  • LUCENE-5574 : Closing a near-real-time reader no longer attempts to delete unreferenced files if the original writer has been closed; this could cause index corruption in certain cases where index files were directly changed (deleted, overwritten, etc.) in the index directory outside of Lucene.
    (Simon Willnauer, Shai Erera, Robert Muir, Mike McCandless)
  • LUCENE-5570 : Don't let FSDirectory.sync() create new zero-byte files, instead throw exception if a file is missing.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • Release 4.7.1 [2014-04-02]

  • Changes in Runtime Behavior (1)
  • LUCENE-5532 : AutomatonQuery.equals is no longer implemented as "accepts same language". This was inconsistent with hashCode, and unnecessary for any subclasses in Lucene. If you desire this in a custom subclass, minimize the automaton.
    (Robert Muir)
  • Bug Fixes (14)
  • LUCENE-5450 : Fix getField() NPE issues with SpanOr/SpanNear when they have an empty list of clauses. This can happen for example, when a wildcard matches no terms.
    (Tim Allison via Robert Muir)
  • LUCENE-5473 : Throw IllegalArgumentException, not NullPointerException, if the synonym map is empty when creating SynonymFilter
    (帅广应 via Mike McCandless)
  • LUCENE-5432 : EliasFanoDocIdSet: Fix number of index entry bits when the maximum entry is a power of 2.
    (Paul Elschot via Adrien Grand)
  • LUCENE-5466 : query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier.
    (Koji Sekiguchi)
  • LUCENE-5502 : Fixed TermsFilter.equals that could return true for different filters.
    (Igor Motov via Adrien Grand)
  • LUCENE-5522 : FacetsConfig didn't add drill-down terms for association facet fields labels.
    (Shai Erera)
  • LUCENE-5520 : ToChildBlockJoinQuery would hit ArrayIndexOutOfBoundsException if a parent document had no children
    (Sally Ang via Mike McCandless)
  • LUCENE-5532 : AutomatonQuery.hashCode was not thread-safe.
    (Robert Muir)
  • LUCENE-5525 : Implement MultiFacets.getAllDims, so you can do sparse facets through DrillSideways, for example.
    (Jose Peleteiro, Mike McCandless)
  • LUCENE-5481 : IndexWriter.forceMerge used to run a merge even if there was a single segment in the index.
    (Adrien Grand, Mike McCandless)
  • LUCENE-5538 : Fix FastVectorHighlighter bug with index-time synonyms when the query is more complex than a single phrase.
    (Robert Muir)
  • LUCENE-5544 : Exceptions during IndexWriter.rollback could leak file handles and the write lock.
    (Robert Muir)
  • LUCENE-4978 : Spatial RecursivePrefixTree queries could result in false-negatives for indexed shapes within 1/2 maxDistErr from the edge of the query shape. This meant searching for a point by the same point as a query rarely worked.
    (David Smiley)
  • LUCENE-5553 : IndexReader#ReaderClosedListener is not always invoked when IndexReader#close() is called or if refCount is 0. If an exception is thrown during internal close or on any of the close listeners some or all listeners might be missed. This can cause memory leaks if the core listeners are used to clear caches.
    (Simon Willnauer)
  • Build (1)
  • LUCENE-5511 : "ant precommit" / "ant check-svn-working-copy" now work again with any working copy format (thanks to svnkit 1.8.4).
    (Uwe Schindler)
  • Release 4.7.0 [2014-02-26]

  • New Features (25)
  • LUCENE-5336 : Add SimpleQueryParser: parser for human-entered queries.
    (Jack Conradson via Robert Muir)
  • LUCENE-5337 : Add Payload support to FileDictionary (Suggest) and make it more configurable
    (Areek Zillur via Erick Erickson)
  • LUCENE-5329 : suggest: DocumentDictionary and DocumentExpressionDictionary are now lenient for dirty documents (missing the term, weight or payload).
    (Areek Zillur via Mike McCandless)
  • LUCENE-5404 : Add .getCount method to all suggesters (Lookup); persist count metadata on .store(); Dictionary returns InputIterator; Dictionary.getWordIterator renamed to .getEntryIterator.
    (Areek Zillur)
  • SOLR-1871 : The RangeMapFloatFunction accepts an arbitrary ValueSource as target and default values.
    (Chris Harris, shalin)
  • LUCENE-5371 : Speed up Lucene range faceting from O(N) per hit to O(log(N)) per hit using segment trees; this only really starts to matter in practice if the number of ranges is over 10 or so.
    (Mike McCandless)
  • LUCENE-5379 : Add Analyzer for Kurdish.
    (Robert Muir)
  • LUCENE-5369 : Added an UpperCaseFilter to make UPPERCASE tokens.
    (ryan)
  • LUCENE-5345 : Add a new BlendedInfixSuggester, which is like AnalyzingInfixSuggester but boosts suggestions that matched tokens with lower positions.
    (Remi Melisson via Mike McCandless)
  • LUCENE-5399 : When sorting by String (SortField.STRING), you can now specify whether missing values should be sorted first (the default), using SortField.setMissingValue(SortField.STRING_FIRST), or last, using SortField.setMissingValue(SortField.STRING_LAST).
    (Rob Muir, Mike McCandless)
  • LUCENE-5099 : QueryNode should have the ability to detach from its node parent. Added QueryNode.removeFromParent() that allows nodes to be detached from its parent node.
    (Adriano Crestani)
  • LUCENE-5395 LUCENE-5451 : Upgrade to Spatial4j 0.4.1: Parses WKT (including ENVELOPE) with extension "BUFFER"; buffering a point results in a Circle. JTS isn't needed for WKT any more but remains required for Polygons. New Shapes: ShapeCollection and BufferedLineString. Various other improvements and bug fixes too. More info: https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md
    (David Smiley)
  • LUCENE-5415 : Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter.
    (Mike McCandless, Robert Muir)
  • LUCENE-3069 : Add two memory resident dictionaries (FST terms dictionary and FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat API is also changed so that term dictionaries get the ability to block encode term metadata, and all dictionary implementations can now plug in any PostingsBaseFormat.
    (Han Jiang, Mike McCandless)
  • LUCENE-5353 : ShingleFilter's filler token should be configurable.
    (Ahmet Arslan, Simon Willnauer, Steve Rowe)
  • LUCENE-5320 : Add SearcherTaxonomyManager over search and taxonomy index directories (i.e. not only NRT).
    (Shai Erera)
  • LUCENE-5410 : Add fuzzy and near support via '~' operator to SimpleQueryParser.
    (Lee Hinman via Robert Muir)
  • LUCENE-5426 : Make SortedSetDocValuesReaderState abstract to allow custom implementations for Lucene doc values faceting
    (John Wang via Mike McCandless)
  • LUCENE-5434 : NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics.
    (Mark Miller, Mike McCandless)
  • LUCENE-5418 : Drilling down or sideways on a Lucene facet range (using Range.getFilter()) is now faster for costly filters (uses random access, not iteration); range facet counts now accept a fast-match filter to avoid computing the value for documents that are out of bounds, e.g. using a bounding box filter with distance range faceting.
    (Mike McCandless)
  • LUCENE-5440 : Add LongBitSet for managing more than 2.1B bits (otherwise use FixedBitSet).
    (Shai Erera)
  • LUCENE-5437 : ASCIIFoldingFilter now has an option to preserve the original token and emit it on the same position as the folded token only if the actual token was folded.
    (Simon Willnauer, Nik Everett)
  • LUCENE-5408 : Add spatial SerializedDVStrategy that serializes a binary representations of a shape into BinaryDocValues. It supports exact geometry relationship calculations.
    (David Smiley)
  • LUCENE-5457 : Add SloppyMath.earthDiameter(double latitude) that returns an approximate value of the diameter of the earth at the given latitude.
    (Adrien Grand)
  • LUCENE-5979 : FilteredQuery uses the cost API to decide on whether to use random-access or leap-frog to intersect the filter with the query.
    (Adrien Grand)
  • Build (11)
  • LUCENE-5217 , LUCENE-5420 : Maven config: get dependencies from Ant+Ivy config; disable transitive dependency resolution for all depended-on artifacts by putting an exclusion for each transitive dependency in the <dependencyManagement> section of the grandparent POM.
    (Steve Rowe)
  • LUCENE-5322 : Clean up / simplify Maven-related Ant targets.
    (Steve Rowe)
  • LUCENE-5347 : Upgrade forbidden-apis checker to version 1.4.
    (Uwe Schindler)
  • LUCENE-4381 : Upgrade analysis/icu to 52.1.
    (Robert Muir)
  • LUCENE-5357 : Upgrade StandardTokenizer and UAX29URLEmailTokenizer to Unicode 6.3; update UAX29URLEmailTokenizer's recognized top level domains in URLs and Emails from the IANA Root Zone Database.
    (Steve Rowe)
  • LUCENE-5360 : Add support for developing in Netbeans IDE.
    (Michal Hlavac, Uwe Schindler, Steve Rowe)
  • SOLR-5590 : Upgrade HttpClient/HttpComponents to 4.3.x.
    (Karl Wright via Shawn Heisey)
  • LUCENE-5385 : "ant precommit" / "ant check-svn-working-copy" now work for SVN 1.8 or GIT checkouts. The ANT target prints a warning instead of failing. It also instructs the user, how to run on SVN 1.8 working copies.
    (Robert Muir, Uwe Schindler)
  • LUCENE-5383 : fix changes2html to link pull requests
    (Steve Rowe)
  • LUCENE-5411 : Upgrade to released JFlex 1.5.0; stop requiring a locally built JFlex snapshot jar.
    (Steve Rowe)
  • LUCENE-5465 : Solr Contrib "map-reduce" breaks Manifest of all other JAR files by adding a broken Main-Class attribute.
    (Uwe Schindler, Steve Rowe)
  • Bug fixes (14)
  • LUCENE-5285 : Improved highlighting of multi-valued fields with FastVectorHighlighter.
    (Nik Everett via Adrien Grand)
  • LUCENE-5391 : UAX29URLEmailTokenizer should not tokenize no-scheme domain-only URLs that are followed by an alphanumeric character.
    (Chris Geeringh, Steve Rowe)
  • LUCENE-5405 : If an analysis component throws an exception, Lucene logs the field name to the info stream to assist in diagnosis.
    (Benson Margulies)
  • SOLR-5661 : PriorityQueue now refuses to allocate itself if the incoming maxSize is too large
    (Raintung Li via Mike McCandless)
  • LUCENE-5228 : IndexWriter.addIndexes(Directory[]) now acquires a write lock in each Directory, to ensure that no open IndexWriter is changing the incoming indices. This also means that you cannot pass the same Directory to multiple concurrent addIndexes calls (which is anyways unusual).
    (Robert Muir, Mike McCandless)
  • LUCENE-5415 : SpanMultiTermQueryWrapper didn't handle its boost in hashcode/equals/tostring/rewrite.
    (Robert Muir)
  • LUCENE-5409 : ToParentBlockJoinCollector.getTopGroups would fail to return any groups when the joined query required more than one rewrite step
    (Peng Cheng via Mike McCandless)
  • LUCENE-5398 : NormValueSource was incorrectly casting the long value to byte, before calling Similarity.decodeNormValue.
    (Peng Cheng via Mike McCandless)
  • LUCENE-5436 : ReferenceManager#accquire can result in infinite loop if managed resource is abused outside of the ReferenceManager. Decrementing the reference without a corresponding incRef() call can cause an infinite loop. ReferenceManager now throws IllegalStateException if currently managed resources ref count is 0.
    (Simon Willnauer)
  • LUCENE-5443 : Lucene45DocValuesProducer.ramBytesUsed() may throw ConcurrentModificationException.
    (Shai Erera, Simon Willnauer)
  • LUCENE-5444 : MemoryIndex didn't respect the analyzers offset gap and offsets were corrupted if multiple fields with the same name were added to the memory index.
    (Britta Weber, Simon Willnauer)
  • LUCENE-5447 : StandardTokenizer should break at consecutive chars matching Word_Break = MidLetter, MidNum and/or MidNumLet
    (Steve Rowe)
  • LUCENE-5462 : RamUsageEstimator.sizeOf(Object) is not used anymore to estimate memory usage of segments. This used to make SegmentReader.ramBytesUsed very CPU-intensive.
    (Adrien Grand)
  • LUCENE-5461 : ControlledRealTimeReopenThread would sometimes wait too long (up to targetMaxStaleSec) when a searcher is waiting for a specific generation, when it should have waited for at most targetMinStaleSec.
    (Hans Lund via Mike McCandless)
  • API Changes (6)
  • LUCENE-5339 : The facet module was simplified/reworked to make the APIs more approachable to new users. Note: when migrating to the new API, you must pass the Document that is returned from FacetConfig.build() to IndexWriter.addDocument().
    (Shai Erera, Gilad Barkai, Rob Muir, Mike McCandless)
  • LUCENE-5405 : Make ShingleAnalyzerWrapper.getWrappedAnalyzer() public final
    (gsingers)
  • LUCENE-5395 : The SpatialArgsParser now only reads WKT, no more "lat, lon" etc. but it's easy to override the parseShape method if you wish.
    (David Smiley)
  • LUCENE-5414 : DocumentExpressionDictionary was renamed to DocumentValueSourceDictionary and all dependencies to the lucene-expression module were removed from lucene-suggest. DocumentValueSourceDictionary now only accepts a ValueSource instead of a convenience ctor for an expression string.
    (Simon Willnauer)
  • LUCENE-3069 : PostingsWriterBase and PostingsReaderBase are no longer responsible for encoding/decoding a block of terms. Instead, they should encode/decode each term to/from a long[] and byte[]. Jiang, Mike McCandless)
  • LUCENE-5425 : FacetsCollector and MatchingDocs use a general DocIdSet, allowing for custom implementations to be used when faceting.
    (John Wang, Lei Wang, Shai Erera)
  • Optimizations (3)
  • LUCENE-5372 : Replace StringBuffer by StringBuilder, where possible.
    (Joshua Hartman via Uwe Schindler, Dawid Weiss, Mike McCandless)
  • LUCENE-5271 : A slightly more accurate SloppyMath distance.
    (Gilad Barkai via Ryan Ernst)
  • LUCENE-5399 : Deep paging using IndexSearcher.searchAfter when sorting by fields is faster
    (Rob Muir, Mike McCandless)
  • Changes in Runtime Behavior (1)
  • LUCENE-5362 : IndexReader and SegmentCoreReaders now throw AlreadyClosedException if the refCount in incremented but is less that 1.
    (Simon Willnauer)
  • Documentation (2)
  • LUCENE-5384 : Add some tips for making tokenfilters and tokenizers to the analysis package overview.
    (Benson Margulies via Robert Muir - pull request #12 )
  • LUCENE-5389 : Add more guidance in the analysis documentation package overview.
    (Benson Margulies via Robert Muir - pull request #14 )
  • Release 4.6.1 [2014-01-28]

  • Bug fixes (8)
  • LUCENE-5373 : Memory usage of [Lucene40/Lucene42/Memory/Direct]DocValuesFormat was over-estimated.
    (Shay Banon, Adrien Grand, Robert Muir)
  • LUCENE-5361 : Fixed handling of query boosts in FastVectorHighlighter.
    (Nik Everett via Adrien Grand)
  • LUCENE-5374 : IndexWriter processes internal events after the it closed itself internally. This rare condition can happen if an IndexWriter has internal changes that were not fully applied yet like when index / flush requests happen concurrently to the close or rollback call.
    (Simon Willnauer)
  • LUCENE-5394 : Fix TokenSources.getTokenStream to return payloads if they were indexed with the term vectors.
    (Mike McCandless)
  • LUCENE-5344 : Flexible StandardQueryParser behaves differently than ClassicQueryParser.
    (Adriano Crestani)
  • LUCENE-5375 : ToChildBlockJoinQuery works harder to detect mis-use, when the parent query incorrectly returns child documents, and throw a clear exception saying so.
    (Dr. Oleg Savrasov via Mike McCandless)
  • LUCENE-5401 : Field.StringTokenStream#end() calls super.end() now, preventing wrong term positions for fields that use StringTokenStream.
    (Michael Busch)
  • LUCENE-5377 : IndexWriter.addIndexes(Directory[]) would cause corruption on Lucene 4.6 if any index segments were Lucene 4.0-4.5.
    (Littlestar, Mike McCandless, Shai Erera, Robert Muir)
  • Release 4.6.0 [2013-11-22]

  • New Features (23)
  • LUCENE-4906 : PostingsHighlighter can now render to custom Object, for advanced use cases where String is too restrictive
    (Luca Cavanna, Robert Muir, Mike McCandless)
  • LUCENE-5133 : Changed AnalyzingInfixSuggester.highlight to return Object instead of String, to allow for advanced use cases where String is too restrictive
    (Robert Muir, Shai Erera, Mike McCandless)
  • LUCENE-5207 , LUCENE-5334 : Added expressions module for customizing ranking with script-like syntax.
    (Jack Conradson, Ryan Ernst, Uwe Schindler via Robert Muir)
  • LUCENE-5180 : ShingleFilter now creates shingles with trailing holes, for example if a StopFilter had removed the last token.
    (Mike McCandless)
  • LUCENE-5219 : Add support to SynonymFilterFactory for custom parsers.
    (Ryan Ernst via Robert Muir)
  • LUCENE-5235 : Tokenizers now throw an IllegalStateException if the consumer does not call reset() before consuming the stream. Previous versions throwed NullPointerException or ArrayIndexOutOfBoundsException on best effort which was not user-friendly.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5240 : Tokenizers now throw an IllegalStateException if the consumer neglects to call close() on the previous stream before consuming the next one.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5214 : Add new FreeTextSuggester, to predict the next word using a simple ngram language model. This is useful for the "long tail" suggestions, when a primary suggester fails to find a suggestion.
    (Mike McCandless)
  • LUCENE-5251 : New DocumentDictionary allows building suggesters via contents of existing field, weight and optionally payload stored fields in an index
    (Areek Zillur via Mike McCandless)
  • LUCENE-5261 : Add QueryBuilder, a simple API to build queries from the analysis chain directly, or to make it easier to implement query parsers.
    (Robert Muir, Uwe Schindler)
  • LUCENE-5270 : Add Terms.hasFreqs, to determine whether a given field indexed per-doc term frequencies.
    (Mike McCandless)
  • LUCENE-5269 : Add CodepointCountFilter.
    (Robert Muir)
  • LUCENE-5294 : Suggest module: add DocumentExpressionDictionary to compute each suggestion's weight using a javascript expression.
    (Areek Zillur via Mike McCandless)
  • LUCENE-5274 : FastVectorHighlighter now supports highlighting against several indexed fields.
    (Nik Everett via Adrien Grand)
  • LUCENE-5304 : SingletonSortedSetDocValues can now return the wrapped SortedDocValues
    (Robert Muir, Adrien Grand)
  • LUCENE-2844 : The benchmark module can now test the spatial module. See spatial.alg
    (David Smiley, Liviy Ambrose)
  • LUCENE-5302 : Make StemmerOverrideMap's methods public
    (Alan Woodward)
  • LUCENE-5296 : Add DirectDocValuesFormat, which holds all doc values in heap as uncompressed java native arrays.
    (Mike McCandless)
  • LUCENE-5189 : Add IndexWriter.updateNumericDocValues, to update numeric DocValues fields of documents, without re-indexing them.
    (Shai Erera, Mike McCandless, Robert Muir)
  • LUCENE-5298 : Add SumValueSourceFacetRequest for aggregating facets by a ValueSource, such as a NumericDocValuesField or an expression.
    (Shai Erera)
  • LUCENE-5323 : Add .sizeInBytes method to all suggesters (Lookup).
    (Areek Zillur via Mike McCandless)
  • LUCENE-5312 : Add BlockJoinSorter, a new Sorter implementation that makes sure to never split up blocks of documents indexed with IndexWriter.addDocuments.
    (Adrien Grand)
  • LUCENE-5297 : Allow to range-facet on any ValueSource, not just NumericDocValues fields.
    (Shai Erera)
  • Bug Fixes (5)
  • LUCENE-5272 : OpenBitSet.ensureCapacity did not modify numBits, causing false assertion errors in fastSet.
    (Shai Erera)
  • LUCENE-5303 : OrdinalsCache did not use coreCacheKey, resulting in over caching across multiple threads.
    (Mike McCandless, Shai Erera)
  • LUCENE-5307 : Fix topScorer inconsistency in handling QueryWrapperFilter inside ConstantScoreQuery, which now rewrites to a query removing the obsolete QueryWrapperFilter.
    (Adrien Grand, Uwe Schindler)
  • LUCENE-5330 : IndexWriter didn't process all internal events on #getReader(), #close() and #rollback() which causes files to be deleted at a later point in time. This could cause short-term disk pollution or OOM if in-memory directories are used.
    (Simon Willnauer)
  • LUCENE-5342 : Fixed bulk-merge issue in CompressingStoredFieldsFormat which created corrupted segments when mixing chunk sizes. Lucene41StoredFieldsFormat is not impacted.
    (Adrien Grand, Robert Muir)
  • API Changes (9)
  • LUCENE-5222 : Add SortField.needsScores(). Previously it was not possible for a custom Sort that makes use of the relevance score to work correctly with IndexSearcher when an ExecutorService is specified.
    (Ryan Ernst, Mike McCandless, Robert Muir)
  • LUCENE-5275 : Change AttributeSource.toString() to display the current state of attributes.
    (Robert Muir)
  • LUCENE-5277 : Modify FixedBitSet copy constructor to take an additional numBits parameter to allow growing/shrinking the copied bitset. You can use FixedBitSet.clone() if you only need to clone the bitset.
    (Shai Erera)
  • LUCENE-5260 : Use TermFreqPayloadIterator for all suggesters; those suggesters that can't support payloads will throw an exception if hasPayloads() is true.
    (Areek Zillur via Mike McCandless)
  • LUCENE-5280 : Rename TermFreqPayloadIterator -> InputIterator, along with associated suggest/spell classes.
    (Areek Zillur via Mike McCandless)
  • LUCENE-5157 : Rename OrdinalMap methods to clarify API and internal structure.
    (Boaz Leskes via Adrien Grand)
  • LUCENE-5313 : Move preservePositionIncrements from setter to ctor in Analyzing/FuzzySuggester.
    (Areek Zillur via Mike McCandless)
  • LUCENE-5321 : Remove Facet42DocValuesFormat. Use DirectDocValuesFormat if you want to load the category list into memory.
    (Shai Erera, Mike McCandless)
  • LUCENE-5324 : AnalyzerWrapper.getPositionIncrementGap and getOffsetGap can now be overridden.
    (Adrien Grand)
  • Optimizations (4)
  • LUCENE-5225 : The ToParentBlockJoinQuery only keeps tracks of the the child doc ids and child scores if the ToParentBlockJoinCollector is used.
    (Martijn van Groningen)
  • LUCENE-5236 : EliasFanoDocIdSet now has an index and uses broadword bit selection to speed-up advance().
    (Paul Elschot via Adrien Grand)
  • LUCENE-5266 : Improved number of read calls and branches in DirectPackedReader.
    (Ryan Ernst)
  • LUCENE-5300 : Optimized SORTED_SET storage for fields which are single-valued.
    (Adrien Grand)
  • Documentation (1)
  • LUCENE-5211 : Better javadocs and error checking of 'format' option in StopFilterFactory, as well as comments in all snowball formatted files about specifying format option.
    (hossman)
  • Changes in backwards compatibility policy (2)
  • LUCENE-5235 : Sub classes of Tokenizer have to call super.reset() when implementing reset(). Otherwise the consumer will get an IllegalStateException because the Reader is not correctly assigned. It is important to never change the "input" field on Tokenizer without using setReader(). The "input" field must not be used outside reset(), incrementToken(), or end() - especially not in the constructor.
    (Uwe Schindler, Robert Muir)
  • LUCENE-5204 : Directory doesn't have default implementations for LockFactory-related methods, which have been moved to BaseDirectory. If you had a custom Directory implementation that extended Directory, you need to extend BaseDirectory instead.
    (Adrien Grand)
  • Build (4)
  • LUCENE-5283 : Fail the build if ant test didn't execute any tests (everything filtered out).
    (Dawid Weiss, Uwe Schindler)
  • LUCENE-5249 , LUCENE-5257 : All Lucene/Solr modules should use the same dependency versions.
    (Steve Rowe)
  • LUCENE-5273 : Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions.
    (Steve Rowe, Uwe Schindler, Shalin Shekhar Mangar)
  • LUCENE-4753 : Run forbidden-apis Ant task per module. This allows more improvements and prevents OOMs after the number of class files raised recently.
    (Uwe Schindler)
  • Tests (1)
  • LUCENE-5278 : Fix MockTokenizer to work better with more regular expression patterns. Previously it could only behave like CharTokenizer (where a character is either a "word" character or not), but now it gives a general longest-match behavior.
    (Nik Everett via Robert Muir)
  • Release 4.5.1 [2013-10-24]

  • Bug Fixes (8)
  • LUCENE-4998 : Fixed a few places to pass IOContext.READONCE instead of IOContext.READ
    (Shikhar Bhushan via Mike McCandless)
  • LUCENE-5242 : DirectoryTaxonomyWriter.replaceTaxonomy did not fully reset its state, which could result in exceptions being thrown, as well as incorrect ordinals returned from getParent.
    (Shai Erera)
  • LUCENE-5254 : Fixed bounded memory leak, where objects like live docs bitset were not freed from an starting reader after reopening to a new reader and closing the original one.
    (Shai Erera, Mike McCandless)
  • LUCENE-5262 : Fixed file handle leaks when multiple attempts to open an NRT reader hit exceptions.
    (Shai Erera)
  • LUCENE-5263 : Transient IOExceptions, e.g. due to disk full or file descriptor exhaustion, hit at unlucky times inside IndexWriter could lead to silently losing deletions.
    (Shai Erera, Mike McCandless)
  • LUCENE-5264 : CommonTermsQuery ignored minMustMatch if only high-frequent terms were present in the query and the high-frequent operator was set to SHOULD.
    (Simon Willnauer)
  • LUCENE-5269 : Fix bug in NGramTokenFilter where it would sometimes count unicode characters incorrectly.
    (Mike McCandless, Robert Muir)
  • LUCENE-5289 : IndexWriter.hasUncommittedChanges was returning false when there were buffered delete-by-Term.
    (Shalin Shekhar Mangar, Mike McCandless)
  • Release 4.5.0 [2013-10-05]

  • New features (15)
  • LUCENE-5084 : Added new Elias-Fano encoder, decoder and DocIdSet implementations.
    (Paul Elschot via Adrien Grand)
  • LUCENE-5081 : Added WAH8DocIdSet, an in-memory doc id set implementation based on word-aligned hybrid encoding.
    (Adrien Grand)
  • LUCENE-5098 : New broadword utility methods in oal.util.BroadWord.
    (Paul Elschot via Adrien Grand, Dawid Weiss)
  • LUCENE-5030 : FuzzySuggester now supports optional unicodeAware (default is false). If true then edits are measured in Unicode code points instead of UTF8 bytes.
    (Artem Lukanin via Mike McCandless)
  • LUCENE-5118 : SpatialStrategy.makeDistanceValueSource() now has an optional multiplier for scaling degrees to another unit.
    (David Smiley)
  • LUCENE-5091 : SpanNotQuery can now be configured with pre and post slop to act as a hypothetical SpanNotNearQuery.
    (Tim Allison via David Smiley)
  • LUCENE-4985 : FacetsAccumulator.create() is now able to create a MultiFacetsAccumulator over a mixed set of facet requests. MultiFacetsAccumulator allows wrapping multiple FacetsAccumulators, allowing to easily mix existing and custom ones. TaxonomyFacetsAccumulator supports any FacetRequest which implements createFacetsAggregator and was indexed using the taxonomy index.
    (Shai Erera)
  • LUCENE-5153 : AnalyzerWrapper.wrapReader allows wrapping the Reader given to inputReader.
    (Shai Erera)
  • LUCENE-5155 : FacetRequest.getValueOf and .getFacetArraysSource replaced by FacetsAggregator.createOrdinalValueResolver. This gives better options for resolving an ordinal's value by FacetAggregators.
    (Shai Erera)
  • LUCENE-5165 : Add SuggestStopFilter, to be used with analyzing suggesters, so that a stop word at the very end of the lookup query, and without any trailing token characters, will be preserved. This enables query "a" to suggest apple; see http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html for details.
  • LUCENE-5178 : Added support for missing values to DocValues fields. AtomicReader.getDocsWithField returns a Bits of documents with a value, and FieldCache.getDocsWithField forwards to that for DocValues fields. Things like SortField.setMissingValue, FunctionValues.exists, and FieldValueFilter now work with DocValues fields.
    (Robert Muir)
  • LUCENE-5124 : Lucene 4.5 has a new Lucene45Codec with Lucene45DocValues, supporting missing values and with most datastructures residing off-heap. Added "Memory" docvalues format that works entirely in heap, and "Disk" loads no datastructures into RAM. Both of these also support missing values. Added DiskNormsFormat (in case you want norms entirely on disk).
    (Robert Muir)
  • LUCENE-2750 : Added PForDeltaDocIdSet, an in-memory doc id set implementation based on the PFOR encoding.
    (Adrien Grand)
  • LUCENE-5186 : Added CachingWrapperFilter.getFilter in order to be able to get the wrapped filter.
    (Trejkaz via Adrien Grand)
  • LUCENE-5197 : Added SegmentReader.ramBytesUsed to return approximate heap RAM used by index datastructures.
    (Areek Zillur via Robert Muir)
  • Bug Fixes (16)
  • LUCENE-5116 : IndexWriter.addIndexes(IndexReader...) should drop empty (or all deleted) segments.
    (Robert Muir, Shai Erera)
  • LUCENE-5132 : Spatial RecursivePrefixTree Contains predicate will throw an NPE when there's no indexed data and maybe in other circumstances too.
    (David Smiley)
  • LUCENE-5146 : AnalyzingSuggester sort comparator read part of the input key as the weight that caused the sorter to never sort by weight first since the weight is only considered if the input is equal causing the malformed weight to be identical as well.
    (Simon Willnauer)
  • LUCENE-5151 : Associations FacetsAggregators could enter an infinite loop when some result documents were missing category associations.
    (Shai Erera)
  • LUCENE-5152 : Fix MemoryPostingsFormat to not modify borrowed BytesRef from FSTEnum seek/lookup which can cause side effects if done on a cached FST root arc.
    (Simon Willnauer)
  • LUCENE-5160 : Handle the case where reading from a file or FileChannel returns -1, which could happen in rare cases where something happens to the file between the time we start the read loop (where we check the length) and when we actually do the read.
    (gsingers, yonik, Robert Muir, Uwe Schindler)
  • LUCENE-5166 : PostingsHighlighter would throw IOOBE if a term spanned the maxLength boundary, made it into the top-N and went to the formatter.
    (Manuel Amoabeng, Michael McCandless, Robert Muir)
  • LUCENE-4583 : Indexing core no longer enforces a limit on maximum length binary doc values fields, but individual codecs (including the default one) have their own limits
    (David Smiley, Robert Muir, Mike McCandless)
  • LUCENE-3849 : TokenStreams now set the position increment in end(), so we can handle trailing holes. If you have a custom TokenStream implementing end() then be sure it calls super.end().
    (Robert Muir, Mike McCandless)
  • LUCENE-5192 : IndexWriter could allow adding same field name with different DocValueTypes under some circumstances.
    (Shai Erera)
  • LUCENE-5191 : SimpleHTMLEncoder in Highlighter module broke Unicode outside BMP because it encoded UTF-16 chars instead of codepoints. The escaping of codepoints > 127 was removed (not needed for valid HTML) and missing escaping for ' and / was added.
    (Uwe Schindler)
  • LUCENE-5201 : Fixed compression bug in LZ4.compressHC when the input is highly compressible and the start offset of the array to compress is > 0.
    (Adrien Grand)
  • LUCENE-5221 : SimilarityBase did not write norms the same way as DefaultSimilarity if discountOverlaps == false and index-time boosts are present for the field.
    (Yubin Kim via Robert Muir)
  • LUCENE-5223 : Fixed IndexUpgrader command line parsing: -verbose is not required and -dir-impl option now works correctly.
    (hossman)
  • LUCENE-5245 : Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm.
    (Nik Everett, Uwe Schindler)
  • LUCENE-5218 : In some cases, trying to retrieve or merge a 0-length binary doc value would hit an ArrayIndexOutOfBoundsException.
    (Littlestar via Mike McCandless)
  • API Changes (13)
  • LUCENE-5094 : Add ramBytesUsed() to MultiDocValues.OrdinalMap.
    (Robert Muir)
  • LUCENE-5114 : Remove unused boolean useCache parameter from TermsEnum.seekCeil and .seekExact
    (Mike McCandless)
  • LUCENE-5128 : IndexSearcher.searchAfter throws IllegalArgumentException if searchAfter exceeds the number of documents in the reader.
    (Crocket via Shai Erera)
  • LUCENE-5129 : CategoryAssociationsContainer no longer supports null association values for categories. If you want to index categories without associations, you should add them using FacetFields.
    (Shai Erera)
  • LUCENE-4876 : IndexWriter no longer clones the given IndexWriterConfig. If you need to use the same config more than once, e.g. when sharing between multiple writers, make sure to clone it before passing to each writer.
    (Shai Erera, Mike McCandless)
  • LUCENE-5144 : StandardFacetsAccumulator renamed to OldFacetsAccumulator, and all associated classes were moved under o.a.l.facet.old. The intention to remove it one day, when the features it covers (complements, partitions, sampling) will be migrated to the new FacetsAggregator and FacetsAccumulator API. Also, FacetRequest.createAggregator was replaced by OldFacetsAccumulator.createAggregator.
    (Shai Erera)
  • LUCENE-5149 : CommonTermsQuery now allows to set the minimum number of terms that should match for its high and low frequent sub-queries. Previously this was only supported on the low frequent terms query.
    (Simon Willnauer)
  • LUCENE-5156 : CompressingTermVectors TermsEnum no longer supports ord().
    (Robert Muir)
  • LUCENE-5161 , LUCENE-5164 : Fix default chunk sizes in FSDirectory to not be unnecessarily large (now 8192 bytes); also use chunking when writing to index files. FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 5.0.
    (Uwe Schindler, Robert Muir, gsingers)
  • LUCENE-5170 : Analyzer.ReuseStrategy instances are now stateless and can be reused in other Analyzer instances, which was not possible before. Lucene ships now with stateless singletons for per field and global reuse. Legacy code can still instantiate the deprecated implementation classes, but new code should use the constants. Implementors of custom strategies have to take care of new method signatures. AnalyzerWrapper can now be configured to use a custom strategy, too, ideally the one from the wrapped Analyzer. Analyzer adds a getter to retrieve the strategy for this use-case.
    (Uwe Schindler, Robert Muir, Shay Banon)
  • LUCENE-5173 : Lucene never writes segments with 0 documents anymore.
    (Shai Erera, Uwe Schindler, Robert Muir)
  • LUCENE-5178 : SortedDocValues always returns -1 ord when a document is missing a value for the field. Previously it only did this if the SortedDocValues was produced by uninversion on the FieldCache.
    (Robert Muir)
  • LUCENE-5183 : remove BinaryDocValues.MISSING. In order to determine a document is missing a field, use getDocsWithField instead.
    (Robert Muir)
  • Changes in Runtime Behavior (2)
  • LUCENE-5178 : DocValues codec consumer APIs (iterables) return null values when the document has no value for the field.
    (Robert Muir)
  • LUCENE-5200 : The HighFreqTerms command-line tool returns the true top-N by totalTermFreq when using the -t option, it uses the term statistics (faster) and now always shows totalTermFreq in the output.
    (Robert Muir)
  • Optimizations (12)
  • LUCENE-5088 : Added TermFilter to filter docs by a specific term.
    (Martijn van Groningen)
  • LUCENE-5119 : DiskDV keeps the document-to-ordinal mapping on disk for SortedDocValues.
    (Robert Muir)
  • LUCENE-5145 : New AppendingPackedLongBuffer, a new variant of the former AppendingLongBuffer which assumes values are 0-based.
    (Boaz Leskes via Adrien Grand)
  • LUCENE-5145 : All Appending*Buffer now support bulk get.
    (Boaz Leskes via Adrien Grand)
  • LUCENE-5140 : Fixed a performance regression of span queries caused by LUCENE-4946 .
    (Alan Woodward, Adrien Grand)
  • LUCENE-5150 : Make WAH8DocIdSet able to inverse its encoding in order to compress dense sets efficiently as well.
    (Adrien Grand)
  • LUCENE-5159 : Prefix-code the sorted/sortedset value dictionaries in DiskDV.
    (Robert Muir)
  • LUCENE-5170 : Fixed several wrapper analyzers to inherit the reuse strategy of the wrapped Analyzer.
    (Uwe Schindler, Robert Muir, Shay Banon)
  • LUCENE-5006 : Simplified DocumentsWriter and DocumentsWriterPerThread synchronization and concurrent interaction with IndexWriter. DWPT is now only setup once and has no reset logic. All segment publishing and state transition from DWPT into IndexWriter is now done via an Event-Queue processed from within the IndexWriter in order to prevent situations where DWPT or DW calling int IW causing deadlocks.
    (Simon Willnauer)
  • LUCENE-5182 : Terminate phrase searches early if max phrase window is exceeded in FastVectorHighlighter to prevent very long running phrase extraction if phrase terms are high frequent.
    (Simon Willnauer)
  • LUCENE-5188 : CompressingStoredFieldsFormat now slices chunks containing big documents into fixed-size blocks so that requesting a single field does not necessarily force to decompress the whole chunk.
    (Adrien Grand)
  • LUCENE-5101 : CachingWrapper makes it easier to plug-in a custom cacheable DocIdSet implementation and uses WAH8DocIdSet by default, which should be more memory efficient than FixedBitSet on average as well as faster on small sets.
    (Robert Muir)
  • Documentation (2)
  • LUCENE-4894 : remove facet userguide as it was outdated. Partially absorbed into package's documentation and classes javadocs.
    (Shai Erera)
  • LUCENE-5206 : Clarify FuzzyQuery's unexpected behavior on short terms.
    (Tim Allison via Mike McCandless)
  • Changes in backwards compatibility policy (5)
  • LUCENE-5141 : CheckIndex.fixIndex(Status,Codec) is now CheckIndex.fixIndex(Status). If you used to pass a codec to this method, just remove it from the arguments.
    (Adrien Grand)
  • LUCENE-5089 , SOLR-5126 : Update to Morfologik 1.7.1. MorfologikAnalyzer and MorfologikFilter no longer support multiple "dictionaries" as there is only one dictionary available.
    (Dawid Weiss)
  • LUCENE-5170 : Changed method signatures of Analyzer.ReuseStrategy to take Analyzer. Closeable interface was removed because the class was changed to be stateless.
    (Uwe Schindler, Robert Muir, Shay Banon)
  • LUCENE-5187 : SlowCompositeReaderWrapper constructor is now private, SlowCompositeReaderWrapper.wrap should be used instead.
    (Adrien Grand)
  • LUCENE-5101 : CachingWrapperFilter doesn't always return FixedBitSet instances anymore. Users of the join module can use oal.search.join.FixedBitSetCachingWrapperFilter instead.
    (Adrien Grand)
  • Build (2)
  • SOLR-5159 : Manifest includes non-parsed maven variables.
    (Artem Karpenko via Steve Rowe)
  • LUCENE-5193 : Add jar-src as top-level target to generate all Lucene and Solr *-src.jar.
    (Steve Rowe, Shai Erera)
  • Release 4.4.0 [2013-07-23]

  • Changes in backwards compatibility policy (18)
  • LUCENE-5085 : MorfologikFilter will no longer stem words marked as keywords
    (Dawid Weiss, Grzegorz Sobczyk)
  • LUCENE-4955 : NGramTokenFilter now emits all n-grams for the same token at the same position and preserves the position length and the offsets of the original token.
    (Simon Willnauer, Adrien Grand)
  • LUCENE-4955 : NGramTokenizer now emits n-grams in a different order (a, ab, b, bc, c) instead of (a, b, c, ab, bc) and doesn't trim trailing whitespaces.
    (Adrien Grand)
  • LUCENE-5042 : The n-gram and edge n-gram tokenizers and filters now correctly handle supplementary characters, and the tokenizers have the ability to pre-tokenize the input stream similarly to CharTokenizer.
    (Adrien Grand)
  • LUCENE-4967 : NRTManager is replaced by ControlledRealTimeReopenThread, for controlling which requests must see which indexing changes, so that it can work with any ReferenceManager
    (Mike McCandless)
  • LUCENE-4973 : SnapshotDeletionPolicy no longer requires a unique String id
    (Mike McCandless, Shai Erera)
  • LUCENE-4946 : The internal sorting API (SorterTemplate, now Sorter) has been completely refactored to allow for a better implementation of TimSort.
    (Adrien Grand, Uwe Schindler, Dawid Weiss)
  • LUCENE-4963 : Some TokenFilter options that generate broken TokenStreams have been deprecated: updateOffsets=true on TrimFilter and enablePositionIncrements=false on all classes that inherit from FilteringTokenFilter: JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, StopFilter and TypeTokenFilter.
    (Adrien Grand)
  • LUCENE-4963 : In order not to take position increments into account in suggesters, you now need to call setPreservePositionIncrements(false) instead of configuring the token filters to not increment positions.
    (Adrien Grand)
  • LUCENE-3907 : EdgeNGramTokenizer now supports maxGramSize > 1024, doesn't trim the input, sets position increment = 1 for all tokens and doesn't support backward grams anymore.
    (Adrien Grand)
  • LUCENE-3907 : EdgeNGramTokenFilter does not support backward grams and does not update offsets anymore.
    (Adrien Grand)
  • LUCENE-4981 : PositionFilter is now deprecated as it can corrupt token stream graphs. Since it main use-case was to make query parsers generate boolean queries instead of phrase queries, it is now advised to use QueryParser.setAutoGeneratePhraseQueries(false) (for simple cases) or to override QueryParser.newFieldQuery.
    (Adrien Grand, Steve Rowe)
  • LUCENE-5018 : CompoundWordTokenFilterBase and its children DictionaryCompoundWordTokenFilter and HyphenationCompoundWordTokenFilter don't update offsets anymore.
    (Adrien Grand)
  • LUCENE-5015 : SamplingAccumulator no longer corrects the counts of the sampled categories. You should set TakmiSampleFixer on SamplingParams if required (but notice that this means slower search).
    (Rob Audenaerde, Gilad Barkai, Shai Erera)
  • LUCENE-4933 : Replace ExactSimScorer/SloppySimScorer with just SimScorer. Previously there were 2 implementations as a performance hack to support tableization of sqrt(), but this caching is removed, as sqrt is implemented in hardware with modern jvms and it's faster not to cache.
    (Robert Muir)
  • LUCENE-5038 : MergePolicy now has a default implementation for useCompoundFile based on segment size and noCFSRatio. The default implementation was pulled up from TieredMergePolicy.
    (Simon Willnauer)
  • LUCENE-5063 : FieldCache.get(Bytes|Shorts), SortField.Type.(BYTE|SHORT) and FieldCache.DEFAULT_(BYTE|SHORT|INT|LONG|FLOAT|DOUBLE)_PARSER are now deprecated. These methods/types assume that data is stored as strings although Lucene has much better support for numeric data through (Int|Long)Field, NumericRangeQuery and FieldCache.get(Int|Long)s.
    (Adrien Grand)
  • LUCENE-5078 : TfIDFSimilarity lets you encode the norm value as any arbitrary long. As a result, encode/decodeNormValue were made abstract with their signatures changed. The default implementation was moved to DefaultSimilarity, which encodes the norm as a single-byte value.
    (Shai Erera)
  • Bug Fixes (23)
  • LUCENE-4890 : QueryTreeBuilder.getBuilder() only finds interfaces on the most derived class.
    (Adriano Crestani)
  • LUCENE-4997 : Internal test framework's tests are sensitive to previous test failures and tests.failfast.
    (Dawid Weiss, Shai Erera)
  • LUCENE-4955 : NGramTokenizer now supports inputs larger than 1024 chars.
    (Adrien Grand)
  • LUCENE-4959 : Fix incorrect return value in SimpleNaiveBayesClassifier.assignClass.
    (Alexey Kutin via Adrien Grand)
  • LUCENE-4972 : DirectoryTaxonomyWriter created empty commits even if no changes were made.
    (Shai Erera, Michael McCandless)
  • LUCENE-949 : AnalyzingQueryParser can't work with leading wildcards.
    (Tim Allison, Robert Muir, Steve Rowe)
  • LUCENE-4980 : Fix issues preventing mixing of RangeFacetRequest and non-RangeFacetRequest when using DrillSideways.
    (Mike McCandless, Shai Erera)
  • LUCENE-4996 : Ensure DocInverterPerField always includes field name in exception messages.
    (Markus Jelsma via Robert Muir)
  • LUCENE-4992 : Fix constructor of CustomScoreQuery to take FunctionQuery for scoringQueries. Instead use QueryValueSource to safely wrap arbitrary queries and use them with CustomScoreQuery.
    (John Wang, Robert Muir)
  • LUCENE-5016 : SamplingAccumulator returned inconsistent label if asked to aggregate a non-existing category. Also fixed a bug in RangeAccumulator if some readers did not have the requested numeric DV field.
    (Rob Audenaerde, Shai Erera)
  • LUCENE-5028 : Remove pointless and confusing doShare option in FST's PositiveIntOutputs
    (Han Jiang via Mike McCandless)
  • LUCENE-5032 : Fix IndexOutOfBoundsExc in PostingsHighlighter when multi-valued fields exceed maxLength
    (Tomás Fernández Löbbe via Mike McCandless)
  • LUCENE-4933 : SweetSpotSimilarity didn't apply its tf function to some queries (SloppyPhraseQuery, SpanQueries).
    (Robert Muir)
  • LUCENE-5033 : SlowFuzzyQuery was accepting too many terms (documents) when provided minSimilarity is an int > 1
    (Tim Allison via Mike McCandless)
  • LUCENE-5045 : DrillSideways.search did not work on an empty index.
    (Shai Erera)
  • LUCENE-4995 : CompressingStoredFieldsReader now only reuses an internal buffer when there is no more than 32kb to decompress. This prevents from running into out-of-memory errors when working with large stored fields.
    (Adrien Grand)
  • LUCENE-5062 : If the spatial data for a document was comprised of multiple overlapping or adjacent parts then a CONTAINS predicate query might not match when the sum of those shapes contain the query shape but none do individually. A flag was added to use the original faster algorithm.
    (David Smiley)
  • LUCENE-4971 : Fixed NPE in AnalyzingSuggester when there are too many graph expansions.
    (Alexey Kudinov via Mike McCandless)
  • LUCENE-5080 : Combined setMaxMergeCount and setMaxThreadCount into one setter in ConcurrentMergePolicy: setMaxMergesAndThreads. Previously these setters would not work unless you invoked them very carefully.
    (Robert Muir, Shai Erera)
  • LUCENE-5068 : QueryParserUtil.escape() does not escape forward slash.
    (Matias Holte via Steve Rowe)
  • LUCENE-5103 : A join on A single-valued field with deleted docs scored too few docs.
    (David Smiley)
  • LUCENE-5090 : Detect mismatched readers passed to SortedSetDocValuesReaderState and SortedSetDocValuesAccumulator.
    (Robert Muir, Mike McCandless)
  • LUCENE-5120 : AnalyzingSuggester modified its FST's cached root arc if payloads are used and the entire output resided on the root arc on the first access. This caused subsequent suggest calls to fail.
    (Simon Willnauer)
  • Optimizations (7)
  • LUCENE-4936 : Improve numeric doc values compression in case all values share a common divisor. In particular, this improves the compression ratio of dates without time when they are encoded as milliseconds since Epoch. Also support TABLE compressed numerics in the Disk codec.
    (Robert Muir, Adrien Grand)
  • LUCENE-4951 : DrillSideways uses the new Scorer.cost() method to make better decisions about which scorer to use internally.
    (Mike McCandless)
  • LUCENE-4976 : PersistentSnapshotDeletionPolicy writes its state to a single snapshots_N file, and no longer requires closing
    (Mike McCandless, Shai Erera)
  • LUCENE-5035 : Compress addresses in FieldCacheImpl.SortedDocValuesImpl more efficiently.
    (Adrien Grand, Robert Muir)
  • LUCENE-4941 : Sort "from" terms only once when using JoinUtil.
    (Martijn van Groningen)
  • LUCENE-5050 : Close the stored fields and term vectors index files as soon as the index has been loaded into memory to save file descriptors.
    (Adrien Grand)
  • LUCENE-5086 : RamUsageEstimator now uses official Java 7 API or a proprietary Oracle Java 6 API to get Hotspot MX bean, preventing AWT classes to be loaded on MacOSX.
    (Shay Banon, Dawid Weiss, Uwe Schindler)
  • New Features (19)
  • LUCENE-5085 : MorfologikFilter will no longer stem words marked as keywords
    (Dawid Weiss, Grzegorz Sobczyk)
  • LUCENE-5064 : Added PagedMutable (internal), a paged extension of PackedInts.Mutable which allows for storing more than 2B values.
    (Adrien Grand)
  • LUCENE-4766 : Added a PatternCaptureGroupTokenFilter that uses Java regexes to emit multiple tokens one for each capture group in one or more patterns.
    (Simon Willnauer, Clinton Gormley)
  • LUCENE-4952 : Expose control (protected method) in DrillSideways to force all sub-scorers to be on the same document being collected. This is necessary when using collectors like ToParentBlockJoinCollector with DrillSideways.
    (Mike McCandless)
  • SOLR-4761 : Add SimpleMergedSegmentWarmer, which just initializes terms, norms, docvalues, and so on.
    (Mark Miller, Mike McCandless, Robert Muir)
  • LUCENE-4964 : Allow arbitrary Query for per-dimension drill-down to DrillDownQuery and DrillSideways, to support future dynamic faceting methods
    (Mike McCandless)
  • LUCENE-4966 : Add CachingWrapperFilter.sizeInBytes()
    (Mike McCandless)
  • LUCENE-4965 : Add dynamic (no taxonomy index used) numeric range faceting to Lucene's facet module
    (Mike McCandless, Shai Erera)
  • LUCENE-4979 : LiveFieldFields can work with any ReferenceManager, not just ReferenceManager<IndexSearcher>
    (Mike McCandless) .
  • LUCENE-4975 : Added a new Replicator module which can replicate index revisions between server and client.
    (Shai Erera, Mike McCandless)
  • LUCENE-5022 : Added FacetResult.mergeHierarchies to merge multiple FacetResult of the same dimension into a single one with the reconstructed hierarchy.
    (Shai Erera)
  • LUCENE-5026 : Added PagedGrowableWriter, a new internal packed-ints structure that grows the number of bits per value on demand, can store more than 2B values and supports random write and read access.
    (Adrien Grand)
  • LUCENE-5025 : FST's Builder can now handle more than 2.1 billion "tail nodes" while building a minimal FST.
    (Aaron Binns, Adrien Grand, Mike McCandless)
  • LUCENE-5063 : FieldCache.DEFAULT.get(Ints|Longs) now uses bit-packing to save memory.
    (Adrien Grand)
  • LUCENE-5079 : IndexWriter.hasUncommittedChanges() returns true if there are changes that have not been committed.
    (yonik, Mike McCandless, Uwe Schindler)
  • SOLR-4565 : Extend NorwegianLightStemFilter and NorwegianMinimalStemFilter to handle "nynorsk"
    (Erlend Garåsen, janhoy via Robert Muir)
  • LUCENE-5087 : Add getMultiValuedSeparator to PostingsHighlighter, for cases where you want a different logical separator between field values. This can be set to e.g. U+2029 PARAGRAPH SEPARATOR if you never want passes to span values.
    (Mike McCandless, Robert Muir)
  • LUCENE-5013 : Added ScandinavianFoldingFilterFactory and ScandinavianNormalizationFilterFactory
    (Karl Wettin via janhoy)
  • LUCENE-4845 : AnalyzingInfixSuggester finds suggestions based on matches to any tokens in the suggestion, not just based on pure prefix matching.
    (Mike McCandless, Robert Muir)
  • API Changes (3)
  • LUCENE-5077 : Make it easier to use compressed norms. Lucene42NormsFormat takes an overhead parameter, so you can easily pass a different value other than PackedInts.FASTEST from your own codec.
    (Robert Muir)
  • LUCENE-5097 : Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. This method uses an internal reusable reader, which was previously only used by the Field class.
    (Uwe Schindler, Robert Muir)
  • LUCENE-4542 : HunspellStemFilter's maximum recursion level is now configurable.
    (Piotr, Rafał Kuć via Adrien Grand)
  • Build (4)
  • LUCENE-4987 : Upgrade randomized testing to version 2.0.10: Test framework may fail internally due to overly aggressive J9 optimizations.
    (Dawid Weiss, Shai Erera)
  • LUCENE-5043 : The eclipse target now uses the containing directory for the project name. This also enforces UTF-8 encoding when files are copied with filtering.
  • LUCENE-5055 : "rat-sources" target now checks also build.xml, ivy.xml, forbidden-api signatures, and parts of resources folders.
    (Ryan Ernst, Uwe Schindler)
  • LUCENE-5072 : Automatically patch javadocs generated by JDK versions before 7u25 to work around the frame injection vulnerability (CVE-2013-1571, VU#225657).
    (Uwe Schindler)
  • Tests (1)
  • LUCENE-4901 : TestIndexWriterOnJRECrash should work on any JRE vendor via Runtime.halt().
    (Mike McCandless, Robert Muir, Uwe Schindler, Rodrigo Trujillo, Dawid Weiss)
  • Changes in runtime behavior (2)
  • LUCENE-5038 : New segments written by IndexWriter are now wrapped into CFS by default. DocumentsWriterPerThread doesn't consult MergePolicy anymore to decide if a CFS must be written, instead IndexWriterConfig now has a property to enable / disable CFS for newly created segments.
    (Simon Willnauer)
  • LUCENE-5107 : Properties files by Lucene are now written in UTF-8 encoding, Unicode is no longer escaped. Reading of legacy properties files with \u escapes is still possible.
    (Uwe Schindler, Robert Muir)
  • Release 4.3.1 [2013-06-18]

  • Bug Fixes (12)
  • SOLR-4813 : Fix SynonymFilterFactory to allow init parameters for tokenizer factory used when parsing synonyms file.
    (Shingo Sasaki, hossman)
  • LUCENE-4935 : CustomScoreQuery wrongly applied its query boost twice (boost^2).
    (Robert Muir)
  • LUCENE-4948 : Fixed ArrayIndexOutOfBoundsException in PostingsHighlighter if you had a 64-bit JVM without compressed OOPS: IBM J9, or Oracle with large heap/explicitly disabled.
    (Mike McCandless, Uwe Schindler, Robert Muir)
  • LUCENE-4953 : Fixed ParallelCompositeReader to inform ReaderClosedListeners of its synthetic subreaders. FieldCaches keyed on the atomic children will be purged earlier and FC insanity prevented. In addition, ParallelCompositeReader's toString() was changed to better reflect the reader structure.
    (Mike McCandless, Uwe Schindler)
  • LUCENE-4968 : Fixed ToParentBlockJoinQuery/Collector: correctly handle parent hits that had no child matches, don't throw IllegalArgumentEx when the child query has no hits, more aggressively catch cases where childQuery incorrectly matches parent documents
    (Mike McCandless)
  • LUCENE-4970 : Fix boost value of rewritten NGramPhraseQuery.
    (Shingo Sasaki via Adrien Grand)
  • LUCENE-4974 : CommitIndexTask was broken if no params were set.
    (Shai Erera)
  • LUCENE-4986 : Fixed case where a newly opened near-real-time reader fails to reflect a delete from IndexWriter.tryDeleteDocument
    (Reg, Mike McCandless)
  • LUCENE-4994 : Fix PatternKeywordMarkerFilter to have public constructor.
    (Uwe Schindler)
  • LUCENE-4993 : Fix BeiderMorseFilter to preserve custom attributes when inserting tokens with position increment 0.
    (Uwe Schindler)
  • LUCENE-4991 : Fix handling of synonyms in classic QueryParser.getFieldQuery for terms not separated by whitespace. PositionIncrementAttribute was ignored, so with default AND synonyms wrongly became mandatory clauses, and with OR, the coordination factor was wrong.
    (李威, Robert Muir)
  • LUCENE-5002 : IndexWriter#deleteAll() caused a deadlock in DWPT / DWSC if a DwPT was flushing concurrently while deleteAll() aborted all DWPT. The IW should never wait on DWPT via the flush control while holding on to the IW Lock.
    (Simon Willnauer)
  • Optimizations (1)
  • LUCENE-4938 : Don't use an unnecessarily large priority queue in IndexSearcher methods that take top-N.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • Release 4.3.0 [2013-05-06]

  • Changes in backwards compatibility policy (8)
  • LUCENE-4810 : EdgeNGramTokenFilter no longer increments position for multiple ngrams derived from the same input token.
    (Walter Underwood via Mike McCandless)
  • LUCENE-4822 : KeywordTokenFilter is now an abstract class. Subclasses need to implement #isKeyword() in order to mark terms as keywords. The existing functionality has been factored out into a new SetKeywordTokenFilter class.
    (Simon Willnauer, Uwe Schindler)
  • LUCENE-4642 : Remove Tokenizer's and subclasses' ctors taking AttributeSource.
    (Renaud Delbru, Uwe Schindler, Steve Rowe)
  • LUCENE-4833 : IndexWriterConfig used to use LogByteSizeMergePolicy when calling setMergePolicy(null) although the default merge policy is TieredMergePolicy. IndexWriterConfig setters now throw an exception when passed null if null is not a valid value.
    (Adrien Grand)
  • LUCENE-4849 : Made ParallelTaxonomyArrays abstract with a concrete implementation for DirectoryTaxonomyWriter/Reader. Also moved it under o.a.l.facet.taxonomy.
    (Shai Erera)
  • LUCENE-4876 : IndexDeletionPolicy is now an abstract class instead of an interface. IndexDeletionPolicy, MergeScheduler and InfoStream now implement Cloneable.
    (Adrien Grand)
  • LUCENE-4874 : FilterAtomicReader and related classes (FilterTerms, FilterDocsEnum, ...) don't forward anymore to the filtered instance when the method has a default implementation through other abstract methods.
    (Adrien Grand, Robert Muir)
  • LUCENE-4642 , LUCENE-4877 : Implementors of TokenizerFactory, TokenFilterFactory, and CharFilterFactory now need to provide at least one constructor taking Map<String,String> to be able to be loaded by the SPI framework (e.g., from Solr). In addition, TokenizerFactory needs to implement the abstract create(AttributeFactory,Reader) method.
    (Renaud Delbru, Uwe Schindler, Steve Rowe, Robert Muir)
  • API Changes (3)
  • LUCENE-4896 : Made PassageFormatter abstract in PostingsHighlighter, made members of DefaultPassageFormatter protected.
    (Luca Cavanna via Robert Muir)
  • LUCENE-4844 : removed TaxonomyReader.getParent(), you should use TaxonomyReader.getParallelArrays().parents() instead.
    (Shai Erera)
  • LUCENE-4742 : Renamed spatial 'Node' to 'Cell', along with any method names and variables using this terminology.
    (David Smiley)
  • New Features (34)
  • LUCENE-4815 : DrillSideways now allows more than one FacetRequest per dimension
    (Mike McCandless)
  • LUCENE-3918 : IndexSorter has been ported to 4.3 API and now supports sorting documents by a numeric DocValues field, or reverse the order of the documents in the index. Additionally, apps can implement their own sort criteria.
    (Anat Hashavit, Shai Erera)
  • LUCENE-4817 : Added KeywordRepeatFilter that allows to emit a token twice once as a keyword and once as an ordinary token allow stemmers to emit a stemmed version along with the un-stemmed version.
    (Simon Willnauer)
  • LUCENE-4822 : PatternKeywordTokenFilter can mark tokens as keywords based on regular expressions.
    (Simon Willnauer, Uwe Schindler)
  • LUCENE-4821 : AnalyzingSuggester now uses the ending offset to determine whether the last token was finished or not, so that a query "i " will no longer suggest "Isla de Muerta" for example.
    (Mike McCandless)
  • LUCENE-4642 : Add create(AttributeFactory) to TokenizerFactory and subclasses with ctors taking AttributeFactory.
    (Renaud Delbru, Uwe Schindler, Steve Rowe)
  • LUCENE-4820 : Add payloads to Analyzing/FuzzySuggester, to record an arbitrary byte[] per suggestion
    (Mike McCandless)
  • LUCENE-4816 : Add WholeBreakIterator to PostingsHighlighter for treating the entire content as a single Passage.
    (Robert Muir, Mike McCandless)
  • LUCENE-4827 : Add additional ctor to PostingsHighlighter PassageScorer to provide bm25 k1,b,avgdl parameters.
    (Robert Muir)
  • LUCENE-4607 : Add DocIDSetIterator.cost() and Spans.cost() for optimizing scoring.
    (Simon Willnauer, Robert Muir)
  • LUCENE-4795 : Add SortedSetDocValuesFacetFields and SortedSetDocValuesAccumulator, to compute topK facet counts from a field's SortedSetDocValues. This method only supports flat (dim/label) facets, is a bit (~25%) slower, has added cost per-IndexReader-open to compute its ordinal map, but it requires no taxonomy index and it tie-breaks facet labels in an understandable (by Unicode sort order) way.
    (Robert Muir, Mike McCandless)
  • LUCENE-4843 : Add LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit.
    (Steve Rowe)
  • LUCENE-4832 : Add ToParentBlockJoinCollector.getTopGroupsWithAllChildDocs, to retrieve all children in each group.
    (Aleksey Aleev via Mike McCandless)
  • LUCENE-4846 : PostingsHighlighter subclasses can override where the String values come from (it still defaults to pulling from stored fields).
    (Robert Muir, Mike McCandless)
  • LUCENE-4853 : Add PostingsHighlighter.highlightFields method that takes int[] docIDs instead of TopDocs.
    (Robert Muir, Mike McCandless)
  • LUCENE-4856 : If there are no matches for a given field, return the first maxPassages sentences
    (Robert Muir, Mike McCandless)
  • LUCENE-4859 : IndexReader now exposes Terms statistics: getDocCount, getSumDocFreq, getSumTotalTermFreq.
    (Shai Erera)
  • LUCENE-4862 : It is now possible to terminate collection of a single IndexReader leaf by throwing a CollectionTerminatedException in Collector.collect.
    (Adrien Grand, Shai Erera)
  • LUCENE-4752 : New SortingMergePolicy (in lucene/misc) that sorts documents before merging segments.
    (Adrien Grand, Shai Erera, David Smiley)
  • LUCENE-4860 : Customize scoring and formatting per-field in PostingsHighlighter by subclassing and overriding the getFormatter and/or getScorer methods. This also changes Passage.getMatchTerms() to return BytesRef[] instead of Term[].
    (Robert Muir, Mike McCandless)
  • LUCENE-4839 : Added SorterTemplate.timSort, a O(n log n) stable sort algorithm that performs well on partially sorted data.
    (Adrien Grand)
  • LUCENE-4644 : Added support for the "IsWithin" spatial predicate for RecursivePrefixTreeStrategy. It's for matching non-point indexed shapes; if you only have points (1/doc) then "Intersects" is equivalent and faster. See the javadocs.
    (David Smiley)
  • LUCENE-4861 : Make BreakIterator per-field in PostingsHighlighter. This means you can override getBreakIterator(String field) to use different mechanisms for e.g. title vs. body fields.
    (Mike McCandless, Robert Muir)
  • LUCENE-4645 : Added support for the "Contains" spatial predicate for RecursivePrefixTreeStrategy.
    (David Smiley)
  • LUCENE-4898 : DirectoryReader.openIfChanged now allows opening a reader on an IndexCommit starting from a near-real-time reader (previously this would throw IllegalArgumentException).
    (Mike McCandless)
  • LUCENE-4905 : Made the maxPassages parameter per-field in PostingsHighlighter.
    (Robert Muir)
  • LUCENE-4897 : Added TaxonomyReader.getChildren for traversing a category's children.
    (Shai Erera)
  • LUCENE-4902 : Added FilterDirectoryReader to allow easy filtering of a DirectoryReader's subreaders.
    (Alan Woodward, Adrien Grand, Uwe Schindler)
  • LUCENE-4858 : Added EarlyTerminatingSortingCollector to be used in conjunction with SortingMergePolicy, which allows to early terminate queries on sorted indexes, when the sort order matches the index order.
    (Adrien Grand, Shai Erera)
  • LUCENE-4904 : Added descending sort order to NumericDocValuesSorter.
    (Shai Erera)
  • LUCENE-3786 : Added SearcherTaxonomyManager, to manage access to both IndexSearcher and DirectoryTaxonomyReader for near-real-time faceting.
    (Shai Erera, Mike McCandless)
  • LUCENE-4915 : DrillSideways now allows drilling down on fields that are not faceted.
    (Mike McCandless)
  • LUCENE-4895 : Added support for the "IsDisjointTo" spatial predicate for RecursivePrefixTreeStrategy.
    (David Smiley)
  • LUCENE-4774 : Added FieldComparator that allows sorting parent documents based on fields on the child / nested document level.
    (Martijn van Groningen)
  • Optimizations (7)
  • LUCENE-4839 : SorterTemplate.merge can now be overridden in order to replace the default implementation which merges in-place by a faster implementation that could require fewer swaps at the expense of some extra memory. ArrayUtil and CollectionUtil override it so that their mergeSort and timSort methods are faster but only require up to 1% of extra memory.
    (Adrien Grand)
  • LUCENE-4571 : Speed up BooleanQuerys with minNrShouldMatch to use skipping.
    (Stefan Pohl via Robert Muir)
  • LUCENE-4863 : StemmerOverrideFilter now uses a FST to represent its overrides in memory.
    (Simon Willnauer)
  • LUCENE-4889 : UnicodeUtil.codePointCount implementation replaced with a non-array-lookup version.
    (Dawid Weiss)
  • LUCENE-4923 : Speed up BooleanQuerys processing of in-order disjunctions.
    (Robert Muir)
  • LUCENE-4926 : Speed up DisjunctionMatchQuery.
    (Robert Muir)
  • LUCENE-4930 : Reduce contention in older/buggy JVMs when using AttributeSource#addAttribute() because java.lang.ref.ReferenceQueue#poll() is implemented using synchronization.
    (Christian Ziech, Karl Wright, Uwe Schindler)
  • Bug Fixes (18)
  • LUCENE-4868 : SumScoreFacetsAggregator used an incorrect index into the scores array.
    (Shai Erera)
  • LUCENE-4882 : FacetsAccumulator did not allow to count ROOT category (i.e. count dimensions).
    (Shai Erera)
  • LUCENE-4876 : IndexWriterConfig.clone() now clones its MergeScheduler, IndexDeletionPolicy and InfoStream in order to make an IndexWriterConfig and its clone fully independent.
    (Adrien Grand)
  • LUCENE-4893 : Facet counts were multiplied as many times as FacetsCollector.getFacetResults() is called.
    (Shai Erera)
  • LUCENE-4888 : Fixed SloppyPhraseScorer, MultiDocs(AndPositions)Enum and MultiSpansWrapper which happened to sometimes call DocIdSetIterator.advance with target<=current (in this case the behavior of advance is undefined).
    (Adrien Grand)
  • LUCENE-4899 : FastVectorHighlighter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets.
    (Simon Willnauer)
  • LUCENE-4877 : Throw exception for invalid arguments in analysis factories.
    (Steve Rowe, Uwe Schindler, Robert Muir)
  • LUCENE-4914 : SpatialPrefixTree's Node/Cell.reset() forgot to reset the 'leaf' flag. It affects SpatialRecursivePrefixTreeStrategy on non-point indexed shapes, as of Lucene 4.2.
    (David Smiley)
  • LUCENE-4913 : FacetResultNode.ordinal was always 0 when all children are returned.
    (Mike McCandless)
  • LUCENE-4918 : Highlighter closes the given IndexReader if QueryScorer is used with an external IndexReader.
    (Simon Willnauer, Sirvan Yahyaei)
  • LUCENE-4880 : Fix MemoryIndex to consume empty terms from the tokenstream consistent with IndexWriter. Previously it discarded them.
    (Timothy Allison via Robert Muir)
  • LUCENE-4885 : FacetsAccumulator did not set the correct value for FacetResult.numValidDescendants.
    (Mike McCandless, Shai Erera)
  • LUCENE-4925 : Fixed IndexSearcher.search when the argument list contains a Sort and one of the sort fields is the relevance score. Only IndexSearchers created with an ExecutorService are concerned.
    (Adrien Grand)
  • LUCENE-4738 , LUCENE-2727 , LUCENE-2812 : Simplified DirectoryReader.indexExists so that it's more robust to transient IOExceptions (e.g. due to issues like file descriptor exhaustion), but this will also cause it to err towards returning true for example if the directory contains a corrupted index or an incomplete initial commit. In addition, IndexWriter with OpenMode.CREATE will now succeed even if the directory contains a corrupted index
    (Billow Gao, Robert Muir, Mike McCandless)
  • LUCENE-4928 : Stored fields and term vectors could become super slow in case of tiny documents (a few bytes). This is especially problematic when switching codecs since bulk-merge strategies can't be applied and the same chunk of documents can end up being decompressed thousands of times. A hard limit on the number of documents per chunk has been added to fix this issue.
    (Robert Muir, Adrien Grand)
  • LUCENE-4934 : Fix minor equals/hashcode problems in facet/DrillDownQuery, BoostingQuery, MoreLikeThisQuery, FuzzyLikeThisQuery, and block join queries.
    (Robert Muir, Uwe Schindler)
  • LUCENE-4504 : Fix broken sort comparator in ValueSource.getSortField, used when sorting by a function query.
    (Tom Shally via Robert Muir)
  • LUCENE-4937 : Fix incorrect sorting of float/double values (+/-0, NaN).
    (Robert Muir, Uwe Schindler)
  • Documentation (1)
  • LUCENE-4841 : Added example SimpleSortedSetFacetsExample to show how to use the new SortedSetDocValues backed facet implementation.
    (Shai Erera, Mike McCandless)
  • Build (1)
  • LUCENE-4879 : Upgrade randomized testing to version 2.0.9: Filter stack traces on console output.
    (Dawid Weiss, Robert Muir)
  • Release 4.2.1 [2013-04-03]

  • Bug Fixes (9)
  • LUCENE-4713 : The SPI components used to load custom codecs or analysis components were fixed to also scan the Lucene ClassLoader in addition to the context ClassLoader, so Lucene is always able to find its own codecs. The special case of a null context ClassLoader is now also supported.
    (Christian Kohlschütter, Uwe Schindler)
  • LUCENE-4819 : seekExact(BytesRef, boolean) did not work correctly with Sorted[Set]DocValuesTermsEnum.
    (Robert Muir)
  • LUCENE-4826 : PostingsHighlighter was not returning the top N best scoring passages.
    (Robert Muir, Mike McCandless)
  • LUCENE-4854 : Fix DocTermOrds.getOrdTermsEnum() to not return negative ord on initial next().
    (Robert Muir)
  • LUCENE-4836 : Fix SimpleRateLimiter#pause to return the actual time spent sleeping instead of the wakeup timestamp in nano seconds.
    (Simon Willnauer)
  • LUCENE-4828 : BooleanQuery no longer extracts terms from its MUST_NOT clauses.
    (Mike McCandless)
  • SOLR-4589 : Fixed CPU spikes and poor performance in lazy field loading of multivalued fields.
    (hossman)
  • LUCENE-4870 : Fix bug where an entire index might be deleted by the IndexWriter due to false detection if an index exists in the directory when OpenMode.CREATE_OR_APPEND is used. This might also affect application that set the open mode manually using DirectoryReader#indexExists.
    (Simon Willnauer)
  • LUCENE-4878 : Override getRegexpQuery in MultiFieldQueryParser to prevent NullPointerException when regular expression syntax is used with MultiFieldQueryParser.
    (Simon Willnauer, Adam Rauch)
  • Optimizations (3)
  • LUCENE-4819 : Added Sorted[Set]DocValues.termsEnum(), and optimized the default codec for improved enumeration performance.
    (Robert Muir)
  • LUCENE-4854 : Speed up TermsEnum of FieldCache.getDocTermOrds.
    (Robert Muir)
  • LUCENE-4857 : Don't unnecessarily copy stem override map in StemmerOverrideFilter.
    (Simon Willnauer)
  • Release 4.2.0 [2013-03-11]

  • Changes in backwards compatibility policy (12)
  • LUCENE-4602 : FacetFields now stores facet ordinals in a DocValues field, rather than a payload. This forces rebuilding existing indexes, or do a one time migration using FacetsPayloadMigratingReader. Since DocValues support in-memory caching, CategoryListCache was removed too.
    (Shai Erera, Michael McCandless)
  • LUCENE-4697 : FacetResultNode is now a concrete class with public members (instead of getter methods).
    (Shai Erera)
  • LUCENE-4600 : FacetsCollector is now an abstract class with two implementations: StandardFacetsCollector (the old version of FacetsCollector) and CountingFacetsCollector. FacetsCollector.create() returns the most optimized collector for the given parameters.
    (Shai Erera, Michael McCandless)
  • LUCENE-4700 : OrdinalPolicy is now per CategoryListParams, and is no longer an interface, but rather an enum with values NO_PARENTS and ALL_PARENTS. PathPolicy was removed, you should extend FacetFields and DrillDownStream to control which categories are added as drill-down terms.
    (Shai Erera)
  • LUCENE-4547 : DocValues improvements: Simplified codec API: codecs are now only responsible for encoding and decoding docvalues, they do not need to do buffering or RAM accounting. Per-Field support: added PerFieldDocValuesFormat, which allows you to use a different DocValuesFormat per field (like postings). Unified with FieldCache api: DocValues can be accessed via FieldCache API, so it works automatically with grouping/join/sort/function queries, etc. Simplified types: There are only 3 types (NUMERIC, BINARY, SORTED), so it's not necessary to specify for example that all of your binary values have the same length. Instead it's easy for the Codec API to optimize encoding based on any properties of the content. (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
  • LUCENE-4757 : Cleanup and refactoring of FacetsAccumulator, FacetRequest, FacetsAggregator and FacetResultsHandler API. If your application did FacetsCollector.create(), you should not be affected, but if you wrote an Aggregator, then you should migrate it to the per-segment FacetsAggregator. You can still use StandardFacetsAccumulator, which works with the old API (for now).
    (Shai Erera)
  • LUCENE-4761 : Facet packages reorganized. Should be easy to fix your import statements, if you use an IDE such as Eclipse.
    (Shai Erera)
  • LUCENE-4750 : Convert DrillDown to DrillDownQuery, so you can initialize it and add drill-down categories to it.
    (Michael McCandless, Shai Erera)
  • LUCENE-4759 : remove FacetRequest.SortBy; result categories are always sorted by value, while ties are broken by category ordinal.
    (Shai Erera)
  • LUCENE-4772 : Facet associations moved to new FacetsAggregator API. You should override FacetsAccumulator and return the relevant aggregator, for aggregating the association values.
    (Shai Erera)
  • LUCENE-4748 : A FacetRequest on a non-existent field now returns an empty FacetResult instead of skipping it.
    (Shai Erera, Mike McCandless)
  • LUCENE-4806 : The default category delimiter character was changed from U+F749 to U+001F, since the latter uses 1 byte vs 3 bytes for the former. Existing facet indices must be reindexed.
    (Robert Muir, Shai Erera, Mike McCandless)
  • Optimizations (11)
  • LUCENE-4687 : BloomFilterPostingsFormat now lazily initializes delegate TermsEnum only if needed to do a seek or get a DocsEnum.
    (Simon Willnauer)
  • LUCENE-4677 , LUCENE-4682 : unpacked FSTs now use vInt to encode the node target, to reduce their size
    (Mike McCandless)
  • LUCENE-4678 : FST now uses a paged byte[] structure instead of a single byte[] internally, to avoid large memory spikes during building
    (James Dyer, Mike McCandless)
  • LUCENE-3298 : FST can now be larger than 2.1 GB / 2.1 B nodes.
    (James Dyer, Mike McCandless)
  • LUCENE-4690 : Performance improvements and non-hashing versions of NumericUtils.*ToPrefixCoded()
    (yonik)
  • LUCENE-4715 : CategoryListParams.getOrdinalPolicy now allows to return a different OrdinalPolicy per dimension, to better tune how you index facets. Also added OrdinalPolicy.ALL_BUT_DIMENSION.
    (Shai Erera, Michael McCandless)
  • LUCENE-4740 : Don't track clones of MMapIndexInput if unmapping is disabled. This reduces GC overhead.
    (Kristofer Karlsson, Uwe Schindler)
  • LUCENE-4733 : The default Lucene 4.2 codec now uses a more compact TermVectorsFormat (Lucene42TermVectorsFormat) based on CompressingTermVectorsFormat.
    (Adrien Grand)
  • LUCENE-3729 : The default Lucene 4.2 codec now uses a more compact DocValuesFormat (Lucene42DocValuesFormat). Sorted values are stored in an FST, Numerics and Ordinals use a number of strategies (delta-compression, table-compression, etc), and memory addresses use MonotonicBlockPackedWriter.
    (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
  • LUCENE-4792 : Reduction of the memory required to build the doc ID maps used when merging segments.
    (Adrien Grand)
  • LUCENE-4794 : Spatial RecursivePrefixTreeStrategy's search filter: Skip calls to termsEnum.seek() when the next term is known to follow the current cell.
    (David Smiley)
  • New Features (16)
  • LUCENE-4686 : New specialized DGapVInt8IntEncoder for facets (now the default).
    (Shai Erera)
  • LUCENE-4703 : Add simple PrintTaxonomyStats tool to see summary information about the facets taxonomy index.
    (Mike McCandless)
  • LUCENE-4599 : New oal.codecs.compressing.CompressingTermVectorsFormat which compresses term vectors into chunks of documents similarly to CompressingStoredFieldsFormat.
    (Adrien Grand)
  • LUCENE-4695 : Added LiveFieldValues utility class, for getting the current (live, real-time) value for any indexed doc/field. The class buffers recently indexed doc/field values until a new near-real-time reader is opened that contains those changes.
    (Robert Muir, Mike McCandless)
  • LUCENE-4723 : Add AnalyzerFactoryTask to benchmark, and enable analyzer creation via the resulting factories using NewAnalyzerTask.
    (Steve Rowe)
  • LUCENE-4728 : Unknown and not explicitly mapped queries are now rewritten against the highlighting IndexReader to obtain primitive queries before discarding the query entirely. WeightedSpanTermExtractor now builds a MemoryIndex only once even if multiple fields are highlighted.
    (Simon Willnauer)
  • LUCENE-4035 : Added ICUCollationDocValuesField, more efficient support for Locale-sensitive sort and range queries for single-valued fields.
    (Robert Muir)
  • LUCENE-4547 : Added MonotonicBlockPacked(Reader/Writer), which provide efficient random access to large amounts of monotonically increasing positive values (e.g. file offsets). Each block stores the minimum value and the average gap, and values are encoded as signed deviations from the expected value.
    (Adrien Grand)
  • LUCENE-4547 : Added AppendingLongBuffer, an append-only buffer that packs signed long values in memory and provides an efficient iterator API.
    (Adrien Grand)
  • LUCENE-4540 : It is now possible for a codec to represent norms with less than 8 bits per value. For performance reasons this is not done by default, but you can customize your codec (e.g. pass PackedInts.DEFAULT to Lucene42DocValuesConsumer) if you want to make this tradeoff.
    (Adrien Grand, Robert Muir)
  • LUCENE-4764 : A new Facet42Codec and Facet42DocValuesFormat provide faster but more RAM-consuming facet performance.
    (Shai Erera, Mike McCandless)
  • LUCENE-4769 : Added OrdinalsCache and CachedOrdsCountingFacetsAggregator which uses the cache to obtain a document's ordinals. This aggregator is faster than others, however consumes much more RAM.
    (Michael McCandless, Shai Erera)
  • LUCENE-4778 : Add a getter for the delegate in RateLimitedDirectoryWrapper.
    (Mark Miller)
  • LUCENE-4765 : Add a multi-valued docvalues type (SORTED_SET). This is equivalent to building a FieldCache.getDocTermOrds at index-time.
    (Robert Muir)
  • LUCENE-4780 : Add MonotonicAppendingLongBuffer: an append-only buffer for monotonically increasing values.
    (Adrien Grand)
  • LUCENE-4748 : Added DrillSideways utility class for computing both drill-down and drill-sideways counts for a DrillDownQuery.
    (Mike McCandless)
  • API Changes (4)
  • LUCENE-4709 : FacetResultNode no longer has a residue field.
    (Shai Erera)
  • LUCENE-4716 : DrillDown.query now takes Occur, allowing to specify if categories should be OR'ed or AND'ed.
    (Shai Erera)
  • LUCENE-4695 : ReferenceManager.RefreshListener.afterRefresh now takes a boolean indicating whether a new reference was in fact opened, and a new beforeRefresh method notifies you when a refresh attempt is starting.
    (Robert Muir, Mike McCandless)
  • LUCENE-4794 : Spatial RecursivePrefixTreeFilter replaced by IntersectsPrefixTreeFilter and some extensible base classes.
    (David Smiley)
  • Bug Fixes (17)
  • LUCENE-4705 : Pass on FilterStrategy in FilteredQuery if the filtered query is rewritten.
    (Simon Willnauer)
  • LUCENE-4712 : MemoryIndex#normValues() throws NPE if field doesn't exist.
    (Simon Willnauer, Ricky Pritchett)
  • LUCENE-4550 : Shapes wider than 180 degrees would use too much accuracy for the PrefixTree based SpatialStrategy. For a pathological case of nearly 360 degrees and barely any height, it would generate so many indexed terms (> 500k) that it could even cause an OutOfMemoryError. Fixed.
    (David Smiley)
  • LUCENE-4704 : Make join queries override hashcode and equals methods.
    (Martijn van Groningen)
  • LUCENE-4724 : Fix bug in CategoryPath which allowed passing null or empty string components. This is forbidden now (throws an exception). Note that if you have a taxonomy index created with such strings, you should rebuild it.
    (Michael McCandless, Shai Erera)
  • LUCENE-4732 : Fixed TermsEnum.seekCeil/seekExact on term vectors.
    (Adrien Grand, Robert Muir)
  • LUCENE-4739 : Fixed bugs that prevented FSTs more than ~1.1GB from being saved and loaded
    (Adrien Grand, Mike McCandless)
  • LUCENE-4717 : Fixed bug where Lucene40DocValuesFormat would sometimes write an extra unused ordinal for sorted types. The bug is detected and corrected on-the-fly for old indexes.
    (Robert Muir)
  • LUCENE-4547 : Fixed bug where Lucene40DocValuesFormat was unable to encode segments that would exceed 2GB total data. This could happen in some surprising cases, for example if you had an index with more than 260M documents and a VAR_INT field.
    (Simon Willnauer, Adrien Grand, Mike McCandless, Robert Muir)
  • LUCENE-4775 : Remove SegmentInfo.sizeInBytes() and make MergePolicy.OneMerge.totalBytesSize thread safe
    (Josh Bronson via Robert Muir, Mike McCandless)
  • LUCENE-4770 : If spatial's TermQueryPrefixTreeStrategy was used to search indexed non-point shapes, then there was an edge case where a query should find a shape but it didn't. The fix is the removal of an optimization that simplifies some leaf cells into a parent. The index data for such a field is now ~20% larger. This optimization is still done for the query shape, and for indexed data for RecursivePrefixTreeStrategy. Furthermore, this optimization is enhanced to roll up beyond the bottom cell level.
    (David Smiley, Florian Schilling)
  • LUCENE-4790 : Fix FieldCacheImpl.getDocTermOrds to not bake deletes into the cached datastructure. Otherwise this can cause inconsistencies with readers at different points in time.
    (Robert Muir)
  • LUCENE-4791 : A conjunction of terms (ConjunctionTermScorer) scanned on the lowest frequency term instead of skipping, leading to potentially large performance impacts for many non-random or non-uniform term distributions.
    (John Wang, yonik)
  • LUCENE-4798 : PostingsHighlighter's formatter sometimes didn't highlight matched terms.
    (Robert Muir)
  • LUCENE-4796 , SOLR-4373 : Fix concurrency issue in NamedSPILoader and AnalysisSPILoader when doing reload (e.g. from Solr).
    (Uwe Schindler, Hossman)
  • LUCENE-4802 : Don't compute norms for drill-down facet fields.
    (Mike McCandless)
  • LUCENE-4804 : PostingsHighlighter sometimes applied terms to the wrong passage, if they started exactly on a passage boundary.
    (Robert Muir)
  • Documentation (2)
  • LUCENE-4718 : Fixed documentation of oal.queryparser.classic.
    (Hayden Muhl via Adrien Grand)
  • LUCENE-4784 , LUCENE-4785 , LUCENE-4786 : Fixed references to deprecated classes SinkTokenizer, ValueSourceQuery and RangeQuery.
    (Hao Zhong via Adrien Grand)
  • Build (4)
  • LUCENE-4654 : Test duration statistics from multiple test runs should be reused.
    (Dawid Weiss)
  • LUCENE-4636 : Upgrade ivy to 2.3.0
    (Shawn Heisey via Robert Muir)
  • LUCENE-4570 : Use the Policeman Forbidden API checker, released separately from Lucene and downloaded via Ivy.
    (Uwe Schindler, Robert Muir)
  • LUCENE-4758 : 'ant jar', 'ant compile', and 'ant compile-test' should recurse.
    (Steve Rowe)
  • Release 4.1.0 [2013-01-22]

  • Changes in backwards compatibility policy (16)
  • LUCENE-4514 : Scorer's freq() method returns an integer value indicating the number of times the scorer matches the current document. Previously this was only sometimes the case, in some cases it returned a (meaningless) floating point value. Scorer now extends DocsEnum so it has attributes().
    (Robert Muir)
  • LUCENE-4543 : TFIDFSimilarity's index-time computeNorm is now final to match the fact that its query-time norm usage requires a FIXED_8 encoding. Override lengthNorm and/or encode/decodeNormValue to change the specifics, like Lucene 3.x.
    (Robert Muir)
  • LUCENE-3441 : The facet module now supports NRT. As a result, the following changes were made: DirectoryTaxonomyReader has a new constructor which takes a DirectoryTaxonomyWriter. You should use that constructor in order to get the NRT support (or the old one for non-NRT). TaxonomyReader.refresh() removed in exchange for TaxonomyReader.openIfChanged static method. Similar to DirectoryReader, the method either returns null if no changes were made to the taxonomy, or a new TR instance otherwise. Instead of calling refresh(), you should write similar code to how you reopen a regular DirectoryReader. TaxonomyReader.openIfChanged (previously refresh()) no longer throws InconsistentTaxonomyException, and supports recreate. InconsistentTaxoEx was removed. ChildrenArrays was pulled out of TaxonomyReader into a top-level class. TaxonomyReader was made an abstract class (instead of an interface), with methods such as close() and reference counting management pulled from DirectoryTaxonomyReader, and made final. The rest of the methods, remained abstract. (Shai Erera, Gilad Barkai)
  • LUCENE-4576 : Remove CachingWrapperFilter(Filter, boolean). This recacheDeletes option gave less than 1% speedup at the expense of cache churn (filters were invalidated on reopen if even a single delete was posted against the segment).
    (Robert Muir)
  • LUCENE-4575 : Replace IndexWriter's commit/prepareCommit versions that take commitData with setCommitData(). That allows committing changes to IndexWriter even if the commitData is the only thing that changes.
    (Shai Erera, Michael McCandless)
  • LUCENE-4565 : TaxonomyReader.getParentArray and .getChildrenArrays consolidated into one getParallelTaxonomyArrays(). You can obtain the 3 arrays that the previous two methods returned by calling parents(), children() or siblings() on the returned ParallelTaxonomyArrays.
    (Shai Erera)
  • LUCENE-4585 : Spatial PrefixTree based Strategies (either TermQuery or RecursivePrefix based) MAY want to re-index if used for point data. If a re-index is not done, then an indexed point is ~1/2 the smallest grid cell larger and as such is slightly more likely to match a query shape.
    (David Smiley)
  • LUCENE-4604 : DefaultOrdinalPolicy removed in favor of OrdinalPolicy.ALL_PARENTS. Same for DefaultPathPolicy (now PathPolicy.ALL_CATEGORIES). In addition, you can use OrdinalPolicy.NO_PARENTS to never write any parent category ordinal to the fulltree posting payload (but note that you need a special FacetsAccumulator - see javadocs).
    (Shai Erera)
  • LUCENE-4594 : Spatial PrefixTreeStrategy no longer indexes center points of non-point shapes. If you want to call makeDistanceValueSource() based on shape centers, you need to do this yourself in another spatial field.
    (David Smiley)
  • LUCENE-4615 : Replace IntArrayAllocator and FloatArrayAllocator by ArraysPool. FacetArrays no longer takes those allocators; if you need to reuse the arrays, you should use ReusingFacetArrays.
    (Shai Erera, Gilad Barkai)
  • LUCENE-4621 : FacetIndexingParams is now a concrete class (instead of DefaultFIP). Also, the entire IndexingParams chain is now immutable. If you need to override a setting, you should extend the relevant class. Additionally, FacetSearchParams is now immutable, and requires all FacetRequests to specified at initialization time.
    (Shai Erera)
  • LUCENE-4647 : CategoryDocumentBuilder and EnhancementsDocumentBuilder are replaced by FacetFields and AssociationsFacetFields respectively. CategoryEnhancement and AssociationEnhancement were removed in favor of a simplified CategoryAssociation interface, with CategoryIntAssociation and CategoryFloatAssociation implementations. NOTE: indexes that contain category enhancements/associations are not supported by the new code and should be recreated.
    (Shai Erera)
  • LUCENE-4659 : Massive cleanup to CategoryPath API. Additionally, CategoryPath is now immutable, so you don't need to clone() it.
    (Shai Erera)
  • LUCENE-4670 : StoredFieldsWriter and TermVectorsWriter have new finish* callbacks which are called after a doc/field/term has been completely added.
    (Adrien Grand, Robert Muir)
  • LUCENE-4620 : IntEncoder/Decoder were changed to do bulk encoding/decoding. As a result, few other classes such as Aggregator and CategoryListIterator were changed to handle bulk category ordinals.
    (Shai Erera)
  • LUCENE-4683 : CategoryListIterator and Aggregator are now per-segment. As such their implementations no longer take a top-level IndexReader in the constructor but rather implement a setNextReader.
    (Shai Erera)
  • New Features (14)
  • LUCENE-4226 : New experimental StoredFieldsFormat that compresses chunks of documents together in order to improve the compression ratio.
    (Adrien Grand)
  • LUCENE-4426 : New ValueSource implementations (in lucene/queries) for DocValues fields.
    (Adrien Grand)
  • LUCENE-4410 : FilteredQuery now exposes a FilterStrategy that exposes how filters are applied during query execution.
    (Simon Willnauer)
  • LUCENE-4404 : New ListOfOutputs (in lucene/misc) for FSTs wraps another Outputs implementation, allowing you to store more than one output for a single input. UpToTwoPositiveIntsOutputs was moved from lucene/core to lucene/misc.
    (Mike McCandless)
  • LUCENE-3842 : New AnalyzingSuggester, for doing auto-suggest using an analyzer. This can create powerful suggesters: if the analyzer remove stop words then "ghost chr..." could suggest "The Ghost of Christmas Past"; if SynonymFilter is used to map wifi and wireless network to hotspot, then "wirele..." could suggest "wifi router"; token normalization likes stemmers, accent removal, etc. would allow the suggester to ignore such variations.
    (Robert Muir, Sudarshan Gaikaiwari, Mike McCandless)
  • LUCENE-4446 : Lucene 4.1 has a new default index format (Lucene41Codec) that incorporates the previously experimental "Block" postings format for better search performance.
    (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
  • LUCENE-3846 : New FuzzySuggester, like AnalyzingSuggester except it also finds completions allowing for fuzzy edits in the input string.
    (Robert Muir, Simon Willnauer, Mike McCandless)
  • LUCENE-4515 : MemoryIndex now supports adding the same field multiple times.
    (Simon Willnauer)
  • LUCENE-4489 : Added consumeAllTokens option to LimitTokenCountFilter
    (hossman, Robert Muir)
  • LUCENE-4566 : Add NRT/SearcherManager.RefreshListener/addListener to be notified whenever a new searcher was opened.
    (selckin via Shai Erera, Mike McCandless)
  • SOLR-4123 : Add per-script customizability to ICUTokenizerFactory via rule files in the ICU RuleBasedBreakIterator format.
    (Shawn Heisey, Robert Muir, Steve Rowe)
  • LUCENE-4590 : Added WriteEnwikiLineDocTask - a benchmark task for writing Wikipedia category pages and non-category pages into separate line files. extractWikipedia.alg was changed to use this task, so now it creates two files.
    (Doron Cohen)
  • LUCENE-4290 : Added PostingsHighlighter to the highlighter module. It uses offsets from the postings lists to highlight documents.
    (Robert Muir)
  • LUCENE-4628 : Added CommonTermsQuery that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords.
    (Simon Willnauer)
  • API Changes (11)
  • LUCENE-4399 : Deprecated AppendingCodec. Lucene's term dictionaries no longer seek when writing.
    (Adrien Grand, Robert Muir)
  • LUCENE-4479 : Rename TokenStream.getTokenStream(IndexReader, int, String) to TokenStream.getTokenStreamWithOffsets, and return null on failure rather than throwing IllegalArgumentException.
    (Alan Woodward)
  • LUCENE-4472 : MergePolicy now accepts a MergeTrigger that provides information about the trigger of the merge ie. merge triggered due to a segment merge or a full flush etc.
    (Simon Willnauer)
  • LUCENE-4415 : TermsFilter is now immutable. All terms need to be provided as constructor argument.
    (Simon Willnauer)
  • LUCENE-4520 : ValueSource.getSortField no longer throws IOExceptions
    (Alan Woodward)
  • LUCENE-4537 : RateLimiter is now separated from FSDirectory and exposed via RateLimitingDirectoryWrapper. Any Directory can now be rate-limited.
    (Simon Willnauer)
  • LUCENE-4591 : CompressingStoredFields{Writer,Reader} now accept a segment suffix as a constructor parameter.
    (Renaud Delbru via Adrien Grand)
  • LUCENE-4605 : Added DocsEnum.FLAG_NONE which can be passed instead of 0 as the flag to .docs() and .docsAndPositions().
    (Shai Erera)
  • LUCENE-4617 : Remove FST.pack() method. Previously to make a packed FST, you had to make a Builder with willPackFST=true (telling it you will later pack it), create your fst with finish(), and then call pack() to get another FST. Instead just pass true for doPackFST to Builder and finish() returns a packed FST.
    (Robert Muir)
  • LUCENE-4663 : Deprecate IndexSearcher.document(int, Set). This was not intended to be final, nor named document(). Use IndexSearcher.doc(int, Set) instead.
    (Robert Muir)
  • LUCENE-4684 : Made DirectSpellChecker extendable.
    (Martijn van Groningen)
  • Bug Fixes (31)
  • LUCENE-1822 : BaseFragListBuilder hard-coded 6 char margin is too naive.
    (Alex Vigdor, Arcadius Ahouansou, Koji Sekiguchi)
  • LUCENE-4468 : Fix rareish integer overflows in Lucene41 postings format.
    (Robert Muir)
  • LUCENE-4486 : Add support for ConstantScoreQuery in Highlighter.
    (Simon Willnauer)
  • LUCENE-4485 : When CheckIndex terms, terms/docs pairs and tokens, these counts now all exclude deleted documents.
    (Mike McCandless)
  • LUCENE-4479 : Highlighter works correctly for fields with term vector positions, but no offsets.
    (Alan Woodward)
  • SOLR-3906 : JapaneseReadingFormFilter in romaji mode will return romaji even for out-of-vocabulary kana cases (e.g. half-width forms).
    (Robert Muir)
  • LUCENE-4511 : TermsFilter might return wrong results if a field is not indexed or doesn't exist in the index.
    (Simon Willnauer)
  • LUCENE-4521 : IndexWriter.tryDeleteDocument could return true (successfully deleting the document) but then on IndexWriter close/commit fail to write the new deletions, if no other changes happened in the IndexWriter instance.
    (Ivan Vasilev via Mike McCandless)
  • LUCENE-4513 : Fixed that deleted nested docs are scored into the parent doc when using ToParentBlockJoinQuery.
    (Martijn van Groningen)
  • LUCENE-4534 : Fixed WFSTCompletionLookup and Analyzing/FuzzySuggester to allow 0 byte values in the lookup keys.
    (Mike McCandless)
  • LUCENE-4532 : DirectoryTaxonomyWriter use a timestamp to denote taxonomy index re-creation, which could cause a bug in case machine clocks were not synced. Instead, it now tracks an 'epoch' version, which is incremented whenever the taxonomy is re-created, or replaced.
    (Shai Erera)
  • LUCENE-4544 : Fixed off-by-1 in ConcurrentMergeScheduler that would allow 1+maxMergeCount merges threads to be created, instead of just maxMergeCount
    (Radim Kolar, Mike McCandless)
  • LUCENE-4567 : Fixed NullPointerException in analyzing, fuzzy, and WFST suggesters when no suggestions were added
    (selckin via Mike McCandless)
  • LUCENE-4568 : Fixed integer overflow in PagedBytes.PagedBytesData{In,Out}put.getPosition.
    (Adrien Grand)
  • LUCENE-4581 : GroupingSearch.setAllGroups(true) was failing to actually compute allMatchingGroups
    ([email protected] via Mike McCandless)
  • LUCENE-4009 : Improve TermsFilter.toString
    (Tim Costermans via Chris Male, Mike McCandless)
  • LUCENE-4588 : Benchmark's EnwikiContentSource was discarding last wiki document and had leaking threads in 'forever' mode.
    (Doron Cohen)
  • LUCENE-4585 : Spatial RecursivePrefixTreeFilter had some bugs that only occurred when shapes were indexed. In what appears to be rare circumstances, documents with shapes near a query shape were erroneously considered a match. In addition, it wasn't possible to index a shape representing the entire globe.
  • LUCENE-4595 : EnwikiContentSource had a thread safety problem (NPE) in 'forever' mode
    (Doron Cohen)
  • LUCENE-4587 : fix WordBreakSpellChecker to not throw AIOOBE when presented with 2-char codepoints, and to correctly break/combine terms containing non-latin characters.
    (James Dyer, Andreas Hubold)
  • LUCENE-4596 : fix a concurrency bug in DirectoryTaxonomyWriter.
    (Shai Erera)
  • LUCENE-4594 : Spatial PrefixTreeStrategy would index center-points in addition to the shape to index if it was non-point, in the same field. But sometimes the center-point isn't actually in the shape (consider a LineString), and for highly precise shapes it could cause makeDistanceValueSource's cache to load parts of the shape's boundary erroneously too. So center points aren't indexed any more; you should use another spatial field.
    (David Smiley)
  • LUCENE-4629 : IndexWriter misses to delete documents if a document block is indexed and the Iterator throws an exception. Documents were only rolled back if the actual indexing process failed.
    (Simon Willnauer)
  • LUCENE-4608 : Handle large number of requested fragments better.
    (Martijn van Groningen)
  • LUCENE-4633 : DirectoryTaxonomyWriter.replaceTaxonomy did not refresh its internal reader, which could cause an existing category to be added twice.
    (Shai Erera)
  • LUCENE-4461 : If you added the same FacetRequest more than once, you would get inconsistent results.
    (Gilad Barkai via Shai Erera)
  • LUCENE-4656 : Fix regression in IndexWriter to work with empty TokenStreams that have no TermToBytesRefAttribute (commonly provided by CharTermAttribute), e.g., oal.analysis.miscellaneous.EmptyTokenStream.
    (Uwe Schindler, Adrien Grand, Robert Muir)
  • LUCENE-4660 : ConcurrentMergeScheduler was taking too long to un-pause incoming threads it had paused when too many merges were queued up.
    (Mike McCandless)
  • LUCENE-4662 : Add missing elided articles and prepositions to FrenchAnalyzer's DEFAULT_ARTICLES list passed to ElisionFilter.
    (David Leunen via Steve Rowe)
  • LUCENE-4671 : Fix CharsRef.subSequence method.
    (Tim Smith via Robert Muir)
  • LUCENE-4465 : Let ConstantScoreQuery's Scorer return its child scorer.
    (selckin via Uwe Schindler)
  • Changes in Runtime Behavior (2)
  • LUCENE-4586 : Change default ResultMode of FacetRequest to PER_NODE_IN_TREE. This only affects requests with depth>1. If you execute such requests and rely on the facet results being returned flat (i.e. no hierarchy), you should set the ResultMode to GLOBAL_FLAT.
    (Shai Erera, Gilad Barkai)
  • LUCENE-1822 : Improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified in FastVectorHighlighter. This way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize.
  • Optimizations (16)
  • LUCENE-2221 : oal.util.BitUtil was modified to use Long.bitCount and Long.numberOfTrailingZeros (which are intrinsics since Java 6u18) instead of pure java bit twiddling routines in order to improve performance on modern JVMs/hardware.
    (Dawid Weiss, Adrien Grand)
  • LUCENE-4509 : Enable stored fields compression by default in the Lucene 4.1 default codec.
    (Adrien Grand)
  • LUCENE-4536 : PackedInts on-disk format is now byte-aligned (it used to be long-aligned), saving up to 7 bytes per array of values.
    (Adrien Grand, Mike McCandless)
  • LUCENE-4512 : Additional memory savings for CompressingStoredFieldsFormat.
    (Adrien Grand, Robert Muir)
  • LUCENE-4443 : Lucene41PostingsFormat no longer writes unnecessary offsets into the skipdata.
    (Robert Muir)
  • LUCENE-4459 : Improve WeakIdentityMap.keyIterator() to remove GCed keys from backing map early instead of waiting for reap(). This makes test failures in TestWeakIdentityMap disappear, too.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-4473 : Lucene41PostingsFormat encodes offsets more efficiently for low frequency terms (< 128 occurrences).
    (Robert Muir)
  • LUCENE-4462 : DocumentsWriter now flushes deletes, segment infos and builds CFS files if necessary during segment flush and not during publishing. The latter was a single threaded process while now all IO and CPU heavy computation is done concurrently in DocumentsWriterPerThread.
    (Simon Willnauer)
  • LUCENE-4496 : Optimize Lucene41PostingsFormat when requesting a subset of the postings data (via flags to TermsEnum.docs/docsAndPositions) to use ForUtil.skipBlock.
    (Robert Muir)
  • LUCENE-4497 : Don't write PosVIntCount to the positions file in Lucene41PostingsFormat, as it's always totalTermFreq % BLOCK_SIZE.
    (Robert Muir)
  • LUCENE-4498 : In Lucene41PostingsFormat, when a term appears in only one document, Instead of writing a file pointer to a VIntBlock containing the doc id, just write the doc id.
    (Mike McCandless, Robert Muir)
  • LUCENE-4515 : MemoryIndex now uses Byte/IntBlockPool internally to hold terms and posting lists. All index data is represented as consecutive byte/int arrays to reduce GC cost and memory overhead.
    (Simon Willnauer)
  • LUCENE-4538 : DocValues now caches direct sources in a ThreadLocal exposed via SourceCache. Users of this API can now simply obtain an instance via DocValues#getDirectSource per thread.
    (Simon Willnauer)
  • LUCENE-4580 : DrillDown.query variants return a ConstantScoreQuery with boost set to 0.0f so that documents scores are not affected by running a drill-down query.
    (Shai Erera)
  • LUCENE-4598 : PayloadIterator no longer uses top-level IndexReader to iterate on the posting's payload.
    (Shai Erera, Michael McCandless)
  • LUCENE-4661 : Drop default maxThreadCount to 1 and maxMergeCount to 2 in ConcurrentMergeScheduler, for faster merge performance on spinning-magnet drives
    (Mike McCandless)
  • Documentation (1)
  • LUCENE-4483 : Refer to BytesRef.deepCopyOf in Term's constructor that takes BytesRef.
    (Paul Elschot via Robert Muir)
  • Build (6)
  • LUCENE-4650 : Upgrade randomized testing to version 2.0.8: make the test framework more robust under low memory conditions.
    (Dawid Weiss)
  • LUCENE-4603 : Upgrade randomized testing to version 2.0.5: print forked JVM PIDs on heartbeat from hung tests
    (Dawid Weiss)
  • Upgrade randomized testing to version 2.0.4: avoid hangs on shutdown hooks hanging forever by calling Runtime.halt() in addition to Runtime.exit() after a short delay to allow graceful shutdown
    (Dawid Weiss)
  • LUCENE-4451 : Memory leak per unique thread caused by RandomizedContext.contexts static map. Upgrade randomized testing to version 2.0.2
    (Mike McCandless, Dawid Weiss)
  • LUCENE-4589 : Upgraded benchmark module's Nekohtml dependency to version 1.9.17, removing the workaround in Lucene's HTML parser for the Turkish locale.
    (Uwe Schindler)
  • LUCENE-4601 : Fix ivy availability check to use typefound, so it works if called from another build file.
    (Ryan Ernst via Robert Muir)
  • Release 4.0.0 [2012-10-12]

  • Changes in backwards compatibility policy (2)
  • LUCENE-4392 : Class org.apache.lucene.util.SortedVIntList has been removed.
    (Adrien Grand)
  • LUCENE-4393 : RollingCharBuffer has been moved to the o.a.l.analysis.util package of lucene-analysis-common.
    (Adrien Grand)
  • New Features (5)
  • LUCENE-1888 : Added the option to store payloads in the term vectors (IndexableFieldType.storeTermVectorPayloads()). Note that you must store term vector positions to store payloads.
    (Robert Muir)
  • LUCENE-3892 : Add a new BlockPostingsFormat that bulk-encodes docs, freqs and positions in large (size 128) packed-int blocks for faster search performance. This was from Han Jiang's 2012 Google Summer of Code project
    (Han Jiang, Adrien Grand, Robert Muir, Mike McCandless)
  • LUCENE-4323 : Added support for an absolute maximum CFS segment size (in MiB) to LogMergePolicy and TieredMergePolicy.
    (Alexey Lef via Uwe Schindler)
  • LUCENE-4339 : Allow deletes against 3.x segments for easier upgrading. Lucene3x Codec is still otherwise read-only, you should not set it as the default Codec on IndexWriter, because it cannot write new segments.
    (Mike McCandless, Robert Muir)
  • SOLR-3441 : ElisionFilterFactory is now MultiTermAware
    (Jack Krupansky via hossman)
  • API Changes (15)
  • LUCENE-4391 , LUCENE-4440 : All methods of Lucene40Codec but getPostingsFormatForField are now final. To reuse functionality of Lucene40, you should extend FilterCodec and delegate to Lucene40 instead of extending Lucene40Codec.
    (Adrien Grand, Shai Erera, Robert Muir, Uwe Schindler)
  • LUCENE-4299 : Added Terms.hasPositions() and Terms.hasOffsets(). Previously you had no real way to know that a term vector field had positions or offsets, since this can be configured on a per-field-per-document basis.
    (Robert Muir)
  • Removed DocsAndPositionsEnum.hasPayload() and simplified the contract of getPayload(). It returns null if there is no payload, otherwise returns the current payload. You can now call it multiple times per position if you want.
    (Robert Muir)
  • Removed FieldsEnum. Fields API instead implements Iterable<String> and exposes Iterator, so you can iterate over field names with for (String field : fields) instead.
    (Robert Muir)
  • LUCENE-4152 : added IndexReader.leaves(), which lets you enumerate the leaf atomic reader contexts for all readers in the tree.
    (Uwe Schindler, Robert Muir)
  • LUCENE-4304 : removed PayloadProcessorProvider. If you want to change payloads (or other things) when merging indexes, it's recommended to just use a FilterAtomicReader + IndexWriter.addIndexes. See the OrdinalMappingAtomicReader and TaxonomyMergeUtils in the facets module if you want an example of this.
    (Mike McCandless, Uwe Schindler, Shai Erera, Robert Muir)
  • LUCENE-4304 : Make CompositeReader.getSequentialSubReaders() protected. To get atomic leaves of any IndexReader use the new method leaves() ( LUCENE-4152 ), which lists AtomicReaderContexts including the doc base of each leaf.
    (Uwe Schindler, Robert Muir)
  • LUCENE-4307 : Renamed IndexReader.getTopReaderContext to IndexReader.getContext.
    (Robert Muir)
  • LUCENE-4316 : Deprecate Fields.getUniqueTermCount and remove it from AtomicReader. If you really want the unique term count across all fields, just sum up Terms.size() across those fields. This method only exists so that this statistic can be accessed for Lucene 3.x segments, which don't support Terms.size().
    (Uwe Schindler, Robert Muir)
  • LUCENE-4321 : Change CharFilter to extend Reader directly, as FilterReader overdelegates (read(), read(char[], int, int), skip, etc). This made it hard to implement CharFilters that were correct. Instead only close() is delegated by default: read(char[], int, int) and correct(int) are abstract so that it's obvious which methods you should implement. The protected inner Reader is 'input' like CharFilter in the 3.x series, instead of 'in'.
    (Dawid Weiss, Uwe Schindler, Robert Muir)
  • LUCENE-3309 : The expert FieldSelector API, used to load only certain fields in a stored document, has been replaced with the simpler StoredFieldVisitor API.
    (Mike McCandless)
  • LUCENE-4343 : Made Tokenizer.setReader final. This is a setter that should not be overridden by subclasses: per-stream initialization should happen in reset().
    (Robert Muir)
  • LUCENE-4377 : Remove IndexInput.copyBytes(IndexOutput, long). Use DataOutput.copyBytes(DataInput, long) instead.
    (Mike McCandless, Robert Muir)
  • LUCENE-4355 : Simplify AtomicReader's sugar methods such as termDocsEnum, termPositionsEnum, docFreq, and totalTermFreq to only take Term as a parameter. If you want to do expert things such as pass a different Bits as liveDocs, then use the flex apis (fields(), terms(), etc) directly.
    (Mike McCandless, Robert Muir)
  • LUCENE-4425 : clarify documentation of StoredFieldVisitor.binaryValue and simplify the api to binaryField(FieldInfo, byte[]).
    (Adrien Grand, Robert Muir)
  • Bug Fixes (17)
  • LUCENE-4423 : DocumentStoredFieldVisitor.binaryField ignored offset and length.
    (Adrien Grand)
  • LUCENE-4297 : BooleanScorer2 would multiply the coord() factor twice for conjunctions: for most users this is no problem, but if you had a customized Similarity that returned something other than 1 when overlap == maxOverlap (always the case for conjunctions), then the score would be incorrect.
    (Pascal Chollet, Robert Muir)
  • LUCENE-4298 : MultiFields.getTermDocsEnum(IndexReader, Bits, String, BytesRef) did not work at all, it would infinitely recurse.
    (Alberto Paro via Robert Muir)
  • LUCENE-4300 : BooleanQuery's rewrite was not always safe: if you had a custom Similarity where coord(1,1) != 1F, then the rewritten query would be scored differently.
    (Robert Muir)
  • Don't allow negatives in the positions file. If you have an index from 2.4.0 or earlier with such negative positions, and you already upgraded to 3.x, then to Lucene 4.0-ALPHA or -BETA, you should run CheckIndex. If it fails, then you need to upgrade again to 4.0
    (Robert Muir)
  • LUCENE-4303 : PhoneticFilterFactory and SnowballPorterFilterFactory load their encoders / stemmers via the ResourceLoader now instead of Class.forName(). Solr users should now no longer have to embed these in its war.
    (David Smiley)
  • SOLR-3737 : StempelPolishStemFilterFactory loaded its stemmer table incorrectly. Also, ensure immutability and use only one instance of this table in RAM (lazy loaded) since it's quite large.
    (sausarkar, Steven Rowe, Robert Muir)
  • LUCENE-4310 : MappingCharFilter was failing to match input strings containing non-BMP Unicode characters.
    (Dawid Weiss, Robert Muir, Mike McCandless)
  • LUCENE-4224 : Add in-order scorer to query time joining and the out-of-order scorer throws an UOE.
    (Martijn van Groningen, Robert Muir)
  • LUCENE-4333 : Fixed NPE in TermGroupFacetCollector when faceting on mv fields.
    (Jesse MacVicar, Martijn van Groningen)
  • LUCENE-4218 : Document.get(String) and Field.stringValue() again return values for numeric fields, like Lucene 3.x and consistent with the documentation.
    (Jamie, Uwe Schindler, Robert Muir)
  • NRTCachingDirectory was always caching a newly flushed segment in RAM, instead of checking the estimated size of the segment to decide whether to cache it.
    (Mike McCandless)
  • LUCENE-3720 : fix memory-consumption issues with BeiderMorseFilter.
    (Thomas Neidhart via Robert Muir)
  • LUCENE-4401 : Fix bug where DisjunctionSumScorer would sometimes call score() on a subscorer that had already returned NO_MORE_DOCS.
    (Liu Chao, Robert Muir)
  • LUCENE-4411 : when sampling is enabled for a FacetRequest, its depth parameter is reset to the default (1), even if set otherwise.
    (Gilad Barkai via Shai Erera)
  • LUCENE-4455 : Fix bug in SegmentInfoPerCommit.sizeInBytes() that was returning 2X the true size, inefficiently. Also fixed bug in CheckIndex that would report no deletions when a segment has deletions, and vice/versa.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-4456 : Fixed double-counting sizeInBytes for a segment (affects how merge policies pick merges); fixed CheckIndex's incorrect reporting of whether a segment has deletions; fixed case where on abort Lucene could remove files it didn't create; fixed many cases where IndexWriter could leave leftover files (on exception in various places, on reuse of a segment name after crash and recovery.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • Optimizations (4)
  • LUCENE-4322 : Decrease lucene-core JAR size. The core JAR size had increased a lot because of generated code introduced in LUCENE-4161 and LUCENE-3892 .
    (Adrien Grand)
  • LUCENE-4317 : Improve reuse of internal TokenStreams and StringReader in oal.document.Field.
    (Uwe Schindler, Chris Male, Robert Muir)
  • LUCENE-4327 : Support out-of-order scoring in FilteredQuery for higher performance.
    (Mike McCandless, Robert Muir)
  • LUCENE-4364 : Optimize MMapDirectory to not make a mapping per-cfs-slice, instead one map per .cfs file. This reduces the total number of maps. Additionally factor out a (package-private) generic ByteBufferIndexInput from MMapDirectory.
    (Uwe Schindler, Robert Muir)
  • Build (6)
  • LUCENE-4406 , LUCENE-4407 : Upgrade to randomizedtesting 2.0.1. Workaround for broken test output XMLs due to non-XML text unicode chars in strings. Added printing of failed tests at the end of a test run
    (Dawid Weiss)
  • LUCENE-4252 : Detect/Fail tests when they leak RAM in static fields
    (Robert Muir, Dawid Weiss)
  • LUCENE-4360 : Support running the same test suite multiple times in parallel
    (Dawid Weiss)
  • LUCENE-3985 : Upgrade to randomizedtesting 2.0.0. Added support for thread leak detection. Added support for suite timeouts.
    (Dawid Weiss)
  • LUCENE-4354 : Corrected maven dependencies to be consistent with the licenses/ folder and the binary release. Some had different versions or additional unnecessary dependencies.
    (selckin via Robert Muir)
  • LUCENE-4340 : Move all non-default codec, postings format and terms dictionary implementations to lucene/codecs.
    (Adrien Grand)
  • Documentation (1)
  • LUCENE-4302 : Fix facet userguide to have HTML loose doctype like all other javadocs.
    (Karl Nicholas via Uwe Schindler)
  • Release 4.0.0-BETA [2012-08-13]

  • New features (10)
  • LUCENE-4249 : Changed the explanation of the PayloadTermWeight to use the underlying PayloadFunction's explanation as the explanation for the payload score.
    (Scott Smerchek via Robert Muir)
  • LUCENE-4069 : Added BloomFilteringPostingsFormat for use with low-frequency terms such as primary keys
    (Mark Harwood, Mike McCandless)
  • LUCENE-4201 : Added JapaneseIterationMarkCharFilter to normalize Japanese iteration marks.
    (Robert Muir, Christian Moen)
  • LUCENE-3832 : Added BasicAutomata.makeStringUnion method to efficiently create automata from a fixed collection of UTF-8 encoded BytesRef
    (Dawid Weiss, Robert Muir)
  • LUCENE-4153 : Added option to fast vector highlighting via BaseFragmentsBuilder to respect field boundaries in the case of highlighting for multivalued fields.
    (Martijn van Groningen)
  • LUCENE-4227 : Added DirectPostingsFormat, to hold all postings in memory as uncompressed simple arrays. This uses a tremendous amount of RAM but gives good search performance gains.
    (Mike McCandless)
  • LUCENE-2510 , LUCENE-4044 : Migrated Solr's Tokenizer-, TokenFilter-, and CharFilterFactories to the lucene-analysis module. The API is still experimental.
    (Chris Male, Robert Muir, Uwe Schindler)
  • LUCENE-4230 : When pulling a DocsAndPositionsEnum you can now specify whether or not you require payloads (in addition to offsets); turning one or both off may allow some codec implementations to optimize the enum implementation.
    (Robert Muir, Mike McCandless)
  • LUCENE-4203 : Add IndexWriter.tryDeleteDocument(AtomicReader reader, int docID), to attempt deletion by docID as long as the provided reader is an NRT reader, and the segment has not yet been merged
    (Mike McCandless) .
  • LUCENE-4286 : Added option to CJKBigramFilter to always also output unigrams. This can be used for a unigram+bigram approach, or at index-time only for better support of short queries.
    (Tom Burton-West, Robert Muir)
  • API Changes (12)
  • LUCENE-4138 : update of morfologik (Polish morphological analyzer) to 1.5.3. The tag attribute class has been renamed to MorphosyntacticTagsAttribute and has a different API (carries a list of tags instead of a compound tag). Upgrade of embedded morfologik dictionaries to version 1.9.
    (Dawid Weiss)
  • LUCENE-4178 : set 'tokenized' to true on FieldType by default, so that if you make a custom FieldType and set indexed = true, it's analyzed by the analyzer.
    (Robert Muir)
  • LUCENE-4220 : Removed the buggy JavaCC-based HTML parser in the benchmark module and replaced by NekoHTML. HTMLParser interface was cleaned up while changing method signatures.
    (Uwe Schindler, Robert Muir)
  • LUCENE-2191 : Rename Tokenizer.reset(Reader) to Tokenizer.setReader(Reader). The purpose of this method was always to set a new Reader on the Tokenizer, reusing the object. But the name was often confused with TokenStream.reset().
    (Robert Muir)
  • LUCENE-4228 : Refactored CharFilter to extend java.io.FilterReader. CharFilters filter another reader and you override correct() for offset correction.
    (Robert Muir)
  • LUCENE-4240 : Analyzer api now just takes fieldName for getOffsetGap. If the field is not analyzed (e.g. StringField), then the analyzer is not invoked at all. If you want to tweak things like positionIncrementGap and offsetGap, analyze the field with KeywordTokenizer instead.
    (Grant Ingersoll, Robert Muir)
  • LUCENE-4250 : Pass fieldName to the PayloadFunction explain method, so it parallels with docScore and the default implementation is correct.
    (Robert Muir)
  • LUCENE-3747 : Support Unicode 6.1.0.
    (Steve Rowe)
  • LUCENE-3884 : Moved ElisionFilter out of org.apache.lucene.analysis.fr package into org.apache.lucene.analysis.util.
    (Robert Muir)
  • LUCENE-4230 : When pulling a DocsAndPositionsEnum you now pass an int flags instead of the previous boolean needOffsets. Currently recognized flags are DocsAndPositionsEnum.FLAG_PAYLOADS and DocsAndPositionsEnum.FLAG_OFFSETS
    (Robert Muir, Mike McCandless)
  • LUCENE-4273 : When pulling a DocsEnum, you can pass an int flags instead of the previous boolean needsFlags; consistent with the changes for DocsAndPositionsEnum in LUCENE-4230 . Currently the only flag is DocsEnum.FLAG_FREQS.
    (Robert Muir, Mike McCandless)
  • LUCENE-3616 : TextField(String, Reader, Store) was reduced to TextField(String, Reader), as the Store parameter didn't make sense: if you supplied Store.YES, you would only receive an exception anyway.
    (Robert Muir)
  • Optimizations (5)
  • LUCENE-4171 : Performance improvements to Packed64.
    (Toke Eskildsen via Adrien Grand)
  • LUCENE-4184 : Performance improvements to the aligned packed bits impl.
    (Toke Eskildsen, Adrien Grand)
  • LUCENE-4235 : Remove enforcing of Filter rewrite for NRQ queries.
    (Uwe Schindler)
  • LUCENE-4279 : Regenerated snowball Stemmers from snowball r554, making them substantially more lightweight. Behavior is unchanged.
    (Robert Muir)
  • LUCENE-4291 : Reduced internal buffer size for Jflex-based tokenizers such as StandardTokenizer from 32kb to 8kb.
    (Raintung Li, Steven Rowe, Robert Muir)
  • Bug Fixes (13)
  • LUCENE-4109 : BooleanQueries are not parsed correctly with the flexible query parser.
    (Karsten Rauch via Robert Muir)
  • LUCENE-4176 : Fix AnalyzingQueryParser to analyze range endpoints as bytes, so that it works correctly with Analyzers that produce binary non-UTF-8 terms such as CollationAnalyzer.
    (Nattapong Sirilappanich via Robert Muir)
  • LUCENE-4209 : Fix FSTCompletionLookup to close its sorter, so that it won't leave temp files behind in /tmp. Fix SortedTermFreqIteratorWrapper to not leave temp files behind in /tmp on Windows. Fix Sort to not leave temp files behind when /tmp is a separate volume.
    (Uwe Schindler, Robert Muir)
  • LUCENE-4221 : Fix overeager CheckIndex validation for term vector offsets.
    (Robert Muir)
  • LUCENE-4222 : TieredMergePolicy.getFloorSegmentMB was returning the size in bytes not MB
    (Chris Fuller via Mike McCandless)
  • LUCENE-3505 : Fix bug (Lucene 4.0alpha only) where boolean conjunctions were sometimes scored incorrectly. Conjunctions of only termqueries where at least one term omitted term frequencies (IndexOptions.DOCS_ONLY) would be scored as if all terms omitted term frequencies.
    (Robert Muir)
  • LUCENE-2686 , LUCENE-3505 : Fixed BooleanQuery scorers to return correct freq(). Added support for scorer navigation API (Scorer.getChildren) to all queries. Made Scorer.freq() abstract.
    (Koji Sekiguchi, Mike McCandless, Robert Muir)
  • LUCENE-4234 : Exception when FacetsCollector is used with ScoreFacetRequest, and the number of matching documents is too large.
    (Gilad Barkai via Shai Erera)
  • LUCENE-4245 : Make IndexWriter#close() and MergeScheduler#close() non-interruptible.
    (Mark Miller, Uwe Schindler)
  • LUCENE-4190 : restrict allowed filenames that a codec may create to the patterns recognized by IndexFileNames. This also fixes IndexWriter to only delete files matching this pattern from an index directory, to reduce risk when the wrong index path is accidentally passed to IndexWriter
    (Robert Muir, Mike McCandless)
  • LUCENE-4277 : Fix IndexWriter deadlock during rollback if flushable DWPT instance are already checked out and queued up but not yet flushed.
    (Simon Willnauer)
  • LUCENE-4282 : Automaton FuzzyQuery didn't always deliver all results.
    (Johannes Christen, Uwe Schindler, Robert Muir)
  • LUCENE-4289 : Fix minor idf inconsistencies/inefficiencies in highlighter.
    (Robert Muir)
  • Changes in Runtime Behavior (2)
  • LUCENE-4109 : Enable position increments in the flexible queryparser by default.
    (Karsten Rauch via Robert Muir)
  • LUCENE-3616 : Field throws exception if you try to set a boost on an unindexed field or one that omits norms.
    (Robert Muir)
  • Build (7)
  • LUCENE-4094 : Support overriding file.encoding on forked test JVMs (force via -Drandomized.file.encoding=XXX).
    (Dawid Weiss)
  • LUCENE-4189 : Test output should include timestamps (start/end for each test/ suite). Added -Dtests.timestamps=[off by default].
    (Dawid Weiss)
  • LUCENE-4110 : Report long periods of forked jvm inactivity (hung tests/ suites). Added -Dtests.heartbeat=[seconds] with the default of 60 seconds.
    (Dawid Weiss)
  • LUCENE-4160 : Added a property to quit the tests after a given number of failures has occurred. This is useful in combination with -Dtests.iters=N (you can start N iterations and wait for M failures, in particular M = 1). -Dtests.maxfailures=M. Alternatively, specify -Dtests.failfast=true to skip all tests after the first failure.
    (Dawid Weiss)
  • LUCENE-4115 : JAR resolution/ cleanup should be done automatically for ant clean/ eclipse/ resolve
    (Dawid Weiss)
  • LUCENE-4199 , LUCENE-4202 , LUCENE-4206 : Add a new target "check-forbidden-apis" that parses all generated .class files for use of APIs that use default charset, default locale, or default timezone and fail build if violations found. This ensures, that Lucene / Solr is independent on local configuration options.
    (Uwe Schindler, Robert Muir, Dawid Weiss)
  • LUCENE-4217 : Add the possibility to run tests with Atlassian Clover loaded from IVY. A development License solely for Apache code was added in the tools/ folder, but is not included in releases.
    (Uwe Schindler)
  • Documentation (1)
  • LUCENE-4195 : Added package documentation and examples for org.apache.lucene.codecs
    (Alan Woodward via Robert Muir)
  • Release 4.0.0-ALPHA [2012-07-03]

  • More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/lucene-java/Lucene4.0
  • For "contrib" changes prior to 4.0, please see: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/contrib/CHANGES.txt
  • Changes in backwards compatibility policy (42)
  • LUCENE-1458 , LUCENE-2111 , LUCENE-2354 : Changes from flexible indexing: On upgrading to 4.0, if you do not fully reindex your documents, Lucene will emulate the new flex API on top of the old index, incurring some performance cost (up to ~10% slowdown, typically). To prevent this slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest file format ( LUCENE-3082 ). Mixed flex/pre-flex indexes are perfectly fine -- the two emulation layers (flex API on pre-flex index, and pre-flex API on flex index) will remap the access as required. So on upgrading to 4.0 you can start indexing new documents into an existing index. To get optimal performance, use oal.index.IndexUpgrader to upgrade your indexes to latest file format ( LUCENE-3082 ). The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum) have been removed in favor of the new flexible indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum, DocsEnum, DocsAndPositionsEnum). One big difference is that field and terms are now enumerated separately: a TermsEnum provides a BytesRef (wraps a byte[]) per term within a single field, not a Term. Another is that when asking for a Docs/AndPositionsEnum, you now specify the skipDocs explicitly (typically this will be the deleted docs, but in general you can provide any Bits). The term vectors APIs (TermFreqVector, TermPositionVector, TermVectorMapper) have been removed in favor of the above flexible indexing APIs, presenting a single-document inverted index of the document from the term vectors. MultiReader ctor now throws IOException Directory.copy/Directory.copyTo now copies all files (not just index files), since what is and isn't and index file is now dependent on the codecs used. UnicodeUtil now uses BytesRef for UTF-8 output, and some method signatures have changed to CharSequence. These are internal APIs and subject to change suddenly. Positional queries (PhraseQuery, *SpanQuery) will now throw an exception if use them on a field that omits positions during indexing (previously they silently returned no results). FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has changed -- each parse method now takes a BytesRef instead of a String. If you have an existing Parser, a simple way to fix it is invoke BytesRef.utf8ToString, and pass that String to your existing parser. This will work, but performance would be better if you could fix your parser to instead operate directly on the byte[] in the BytesRef. The internal (experimental) API of NumericUtils changed completely from String to BytesRef. Client code should never use this class, so the change would normally not affect you. If you used some of the methods to inspect terms or create TermQueries out of prefix encoded terms, change to use BytesRef. Please note: Do not use TermQueries to search for single numeric terms. The recommended way is to create a corresponding NumericRangeQuery with upper and lower bound equal and included. TermQueries do not score correct, so the constant score mode of NRQ is the only correct way to handle single value queries. NumericTokenStream now works directly on byte[] terms. If you plug a TokenFilter on top of this stream, you will likely get an IllegalArgumentException, because the NTS does not support TermAttribute/CharTermAttribute. If you want to further filter or attach Payloads to NTS, use the new NumericTermAttribute. (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
  • LUCENE-2858 , LUCENE-3733 : IndexReader was refactored into abstract AtomicReader, CompositeReader, and DirectoryReader. To open Directory- based indexes use DirectoryReader.open(), the corresponding method in IndexReader is now deprecated for easier migration. Only DirectoryReader supports commits, versions, and reopening with openIfChanged(). Terms, postings, docvalues, and norms can from now on only be retrieved using AtomicReader; DirectoryReader and MultiReader extend CompositeReader, only offering stored fields and access to the sub-readers (which may be composite or atomic). SlowCompositeReaderWrapper ( LUCENE-2597 ) can be used to emulate atomic readers on top of composites. Please review MIGRATE.txt for information how to migrate old code.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-2265 : FuzzyQuery and WildcardQuery now operate on Unicode codepoints, not unicode code units. For example, a Wildcard "?" represents any unicode character. Furthermore, the rest of the automaton package and RegexpQuery use true Unicode codepoint representation.
    (Robert Muir, Mike McCandless)
  • LUCENE-2380 : The String-based FieldCache methods (getStrings, getStringIndex) have been replaced with BytesRef-based equivalents (getTerms, getTermsIndex). Also, the sort values (returned in FieldDoc.fields) when sorting by SortField.STRING or SortField.STRING_VAL are now BytesRef instances. See MIGRATE.txt for more details.
    (yonik, Mike McCandless)
  • LUCENE-2480 : Though not a change in backwards compatibility policy, pre-3.0 indexes are no longer supported. You should upgrade to 3.x first, then run optimize(), or reindex.
    (Shai Erera, Earwin Burrfoot)
  • LUCENE-2484 : Removed deprecated TermAttribute. Use CharTermAttribute and TermToBytesRefAttribute instead.
    (Uwe Schindler)
  • LUCENE-2600 : Remove IndexReader.isDeleted in favor of AtomicReader.getDeletedDocs().
    (Mike McCandless)
  • LUCENE-2667 : FuzzyQuery's defaults have changed for more performant behavior: the minimum similarity is 2 edit distances from the word, and the priority queue size is 50. To support this, FuzzyQuery now allows specifying unscaled edit distances (foobar~2). If your application depends upon the old defaults of 0.5 (scaled) minimum similarity and Integer.MAX_VALUE priority queue size, you can use FuzzyQuery(Term, float, int, int) to specify those explicitly.
  • LUCENE-2674 : MultiTermQuery.TermCollector.collect now accepts the TermsEnum as well.
    (Robert Muir, Mike McCandless)
  • LUCENE-588 : WildcardQuery and QueryParser now allows escaping with the '\' character. Previously this was impossible (you could not escape */?, for example). If your code somehow depends on the old behavior, you will need to change it (e.g. using "\\" to escape '\' itself).
    (Sunil Kamath, Terry Yang via Robert Muir)
  • LUCENE-2837 : Collapsed Searcher, Searchable into IndexSearcher; removed contrib/remote and MultiSearcher (Mike McCandless); absorbed ParallelMultiSearcher into IndexSearcher as an optional ExecutorServiced passed to its ctor.
    (Mike McCandless)
  • LUCENE-2908 , LUCENE-4037 : Removed serialization code from lucene classes. It is recommended that you serialize user search needs at a higher level in your application.
    (Robert Muir, Benson Margulies)
  • LUCENE-2831 : Changed Weight#scorer, Weight#explain & Filter#getDocIdSet to operate on a AtomicReaderContext instead of directly on IndexReader to enable searches to be aware of IndexSearcher's context.
    (Simon Willnauer)
  • LUCENE-2839 : Scorer#score(Collector,int,int) is now public because it is called from other classes and part of public API.
    (Uwe Schindler)
  • LUCENE-2865 : Weight#scorer(AtomicReaderContext, boolean, boolean) now accepts a ScorerContext struct instead of booleans.(Simon Willnauer)
  • LUCENE-2882 : Cut over SpanQuery#getSpans to AtomicReaderContext to enforce per segment semantics on SpanQuery & Spans.
    (Simon Willnauer)
  • LUCENE-2236 : Similarity can now be configured on a per-field basis. See the migration notes in MIGRATE.txt for more details.
    (Robert Muir, Doron Cohen)
  • LUCENE-2315 : AttributeSource's methods for accessing attributes are now final, else it's easy to corrupt the internal states.
    (Uwe Schindler)
  • LUCENE-2814 : The IndexWriter.flush method no longer takes "boolean flushDocStores" argument, as we now always flush doc stores (index files holding stored fields and term vectors) while flushing a segment.
    (Mike McCandless)
  • LUCENE-2548 : Field names (eg in Term, FieldInfo) are no longer interned.
    (Mike McCandless)
  • LUCENE-2883 : The contents of o.a.l.search.function has been consolidated into the queries module and can be found at o.a.l.queries.function. See MIGRATE.txt for more information
    (Chris Male)
  • LUCENE-2392 , LUCENE-3299 : Decoupled vector space scoring from Query/Weight/Scorer. If you extended Similarity directly before, you should extend TFIDFSimilarity instead. Similarity is now a lower-level API to implement other scoring algorithms. See MIGRATE.txt for more details.
    (David Nemeskey, Simon Willnauer, Mike McCandless, Robert Muir)
  • LUCENE-3330 : The expert visitor API in Scorer has been simplified and extended to support arbitrary relationships. To navigate to a scorer's children, call Scorer.getChildren().
    (Robert Muir)
  • LUCENE-2308 : Field is now instantiated with an instance of IndexableFieldType, of which there is a core implementation FieldType. Most properties describing a Field have been moved to IndexableFieldType. See MIGRATE.txt for more details.
    (Nikola Tankovic, Mike McCandless, Chris Male)
  • LUCENE-3396 : ReusableAnalyzerBase.TokenStreamComponents.reset(Reader) now returns void instead of boolean. If a Component cannot be reset, it should throw an Exception.
    (Chris Male)
  • LUCENE-3396 : ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer implementations must now use Analyzer.TokenStreamComponents, rather than overriding .tokenStream() and .reusableTokenStream() (which are now final).
    (Chris Male)
  • LUCENE-3346 : Analyzer.reusableTokenStream() has been renamed to tokenStream() with the old tokenStream() method removed. Consequently it is now mandatory for all Analyzers to support reusability.
    (Chris Male)
  • LUCENE-3473 : AtomicReader.getUniqueTermCount() no longer throws UOE when it cannot be easily determined. Instead, it returns -1 to be consistent with this behavior across other index statistics.
    (Robert Muir)
  • LUCENE-1536 : The abstract FilteredDocIdSet.match() method is no longer allowed to throw IOException. This change was required to make it conform to the Bits interface. This method should never do I/O for performance reasons.
    (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, Jason Rutherglen, Paul Elschot)
  • LUCENE-3559 : The methods "docFreq" and "maxDoc" on IndexSearcher were removed, as these are no longer used by the scoring system. See MIGRATE.txt for more details.
    (Robert Muir)
  • LUCENE-3533 : Removed SpanFilters, they created large lists of objects and did not scale.
    (Robert Muir)
  • LUCENE-3606 : IndexReader and subclasses were made read-only. It is no longer possible to delete or undelete documents using IndexReader; you have to use IndexWriter now. As deleting by internal Lucene docID is no longer possible, this requires adding a unique identifier field to your index. Deleting/ relying upon Lucene docIDs is not recommended anyway, because they can change. Consequently commit() was removed and DirectoryReader.open(), openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy instances. Furthermore, IndexReader.setNorm() was removed. If you need customized norm values, the recommended way to do this is by modifying Similarity to use an external byte[] or one of the new DocValues fields ( LUCENE-3108 ). Alternatively, to dynamically change norms (boost *and* length norm) at query time, wrap your AtomicReader using FilterAtomicReader, overriding FilterAtomicReader.norms(). To persist the changes on disk, copy the FilteredIndexReader to a new index using IndexWriter.addIndexes().
    (Uwe Schindler, Robert Muir)
  • LUCENE-3640 : Removed IndexSearcher.close(), because IndexSearcher no longer takes a Directory and no longer "manages" IndexReaders, it is a no-op.
    (Robert Muir)
  • LUCENE-3684 : Add offsets into DocsAndPositionsEnum, and a few FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS.
    (Robert Muir, Mike McCandless)
  • LUCENE-2858 , LUCENE-3770 : FilterIndexReader was renamed to FilterAtomicReader and now extends AtomicReader. If you want to filter composite readers like DirectoryReader or MultiReader, filter their atomic leaves and build a new CompositeReader (e.g. MultiReader) around them.
    (Uwe Schindler, Robert Muir)
  • LUCENE-3736 : ParallelReader was split into ParallelAtomicReader and ParallelCompositeReader. Lucene 3.x's ParallelReader is now ParallelAtomicReader; but the new composite variant has improved performance as it works on the atomic subreaders. It requires that all parallel composite readers have the same subreader structure. If you cannot provide this, you can use SlowCompositeReaderWrapper to make all parallel readers atomic and use ParallelAtomicReader.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-2000 : clone() now returns covariant types where possible.
    (ryan)
  • LUCENE-3970 : Rename Fields.getUniqueFieldCount -> .size() and Terms.getUniqueTermCount -> .size().
    (Iulius Curt via Mike McCandless)
  • LUCENE-3514 : IndexSearcher.setDefaultFieldSortScoring was removed and replaced with per-search control via new expert search methods that take two booleans indicating whether hit scores and max score should be computed.
    (Mike McCandless)
  • LUCENE-4055 : You can't put foreign files into the index dir anymore.
  • LUCENE-3866 : CompositeReader.getSequentialSubReaders() now returns unmodifiable List<? extends IndexReader>. ReaderUtil.Gather was removed, as IndexReaderContext.leaves() is now the preferred way to access sub-readers.
    (Uwe Schindler)
  • LUCENE-4155 : oal.util.ReaderUtil, TwoPhaseCommit, TwoPhaseCommitTool classes were moved to oal.index package. oal.util.CodecUtil class was moved to oal.codecs package. oal.util.DummyConcurrentLock was removed (no longer used in Lucene 4.0).
    (Uwe Schindler)
  • Changes in Runtime Behavior (11)
  • LUCENE-2846 : omitNorms now behaves like omitTermFrequencyAndPositions, if you omitNorms(true) for field "a" for 1000 documents, but then add a document with omitNorms(false) for field "a", all documents for field "a" will have no norms. Previously, Lucene would fill the first 1000 documents with "fake norms" from Similarity.getDefault().
    (Robert Muir, Mike McCandless)
  • LUCENE-2846 : When some documents contain field "a", and others do not, the documents that don't have the field get a norm byte value of 0. Previously, Lucene would populate "fake norms" with Similarity.getDefault() for these documents.
    (Robert Muir, Mike McCandless)
  • LUCENE-2720 : IndexWriter throws IndexFormatTooOldException on open, rather than later when e.g. a merge starts.
    (Shai Erera, Mike McCandless, Uwe Schindler)
  • LUCENE-2881 : FieldInfos is now tracked per segment. Before it was tracked per IndexWriter session, which resulted in FieldInfos that had the FieldInfo properties from all previous segments combined. Field numbers are now tracked globally across IndexWriter sessions and persisted into a _X.fnx file on successful commit. The corresponding file format changes are backwards- compatible.
    (Michael Busch, Simon Willnauer)
  • LUCENE-2956 , LUCENE-2573 , LUCENE-2324 , LUCENE-2555 : Changes from DocumentsWriterPerThread: IndexWriter now uses a DocumentsWriter per thread when indexing documents. Each DocumentsWriterPerThread indexes documents in its own private segment, and the in memory segments are no longer merged on flush. Instead, each segment is separately flushed to disk and subsequently merged with normal segment merging. DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a FlushPolicy. When a DWPT is flushed, a fresh DWPT is swapped in so that indexing may continue concurrently with flushing. The selected DWPT flushes all its RAM resident documents do disk. Note: Segment flushes don't flush all RAM resident documents but only the documents private to the DWPT selected for flushing. Flushing is now controlled by FlushPolicy that is called for every add, update or delete on IndexWriter. By default DWPTs are flushed either on maxBufferedDocs per DWPT or the global active used memory. Once the active memory exceeds ramBufferSizeMB only the largest DWPT is selected for flushing and the memory used by this DWPT is subtracted from the active memory and added to a flushing memory pool, which can lead to temporarily higher memory usage due to ongoing indexing. IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address up to 2048 MB memory such that the ramBufferSize is now bounded by the max number of DWPT available in the used DocumentsWriterPerThreadPool. IndexWriters net memory consumption can grow far beyond the 2048 MB limit if the application can use all available DWPTs. To prevent a DWPT from exhausting its address space IndexWriter will forcefully flush a DWPT if its hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled via IndexWriterConfig and defaults to 1945 MB. Since IndexWriter flushes DWPT concurrently not all memory is released immediately. Applications should still use a ramBufferSize significantly lower than the JVMs available heap memory since under high load multiple flushing DWPT can consume substantial transient memory when IO performance is slow relative to indexing rate. IndexWriter#commit now doesn't block concurrent indexing while flushing all 'currently' RAM resident documents to disk. Yet, flushes that occur while a a full flush is running are queued and will happen after all DWPT involved in the full flush are done flushing. Applications using multiple threads during indexing and trigger a full flush (eg call commit() or open a new NRT reader) can use significantly more transient memory. IndexWriter#addDocument and IndexWriter.updateDocument can block indexing threads if the number of active + number of flushing DWPT exceed a safety limit. By default this happens if 2 * max number available thread states (DWPTPool) is exceeded. This safety limit prevents applications from exhausting their available memory if flushing can't keep up with concurrently indexing threads. IndexWriter only applies and flushes deletes if the maxBufferedDelTerms limit is reached during indexing. No segment flushes will be triggered due to this setting. IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter anymore. A dedicated flushLock has been introduced to prevent multiple full- flushes happening concurrently. DocumentsWriter doesn't write shared doc stores anymore. (Mike McCandless, Michael Busch, Simon Willnauer)
  • LUCENE-3309 : Stored fields no longer record whether they were tokenized or not. In general you should not rely on stored fields to record any "metadata" from indexing (tokenized, omitNorms, IndexOptions, boost, etc.)
    (Mike McCandless)
  • LUCENE-3309 : Fast vector highlighter now inserts the MultiValuedSeparator for NOT_ANALYZED fields (in addition to ANALYZED fields). To ensure your offsets are correct you should provide an analyzer that returns 1 from the offsetGap method.
    (Mike McCandless)
  • LUCENE-2621 : Removed contrib/instantiated.
    (Robert Muir)
  • LUCENE-1768 : StandardQueryTreeBuilder no longer uses RangeQueryNodeBuilder for RangeQueryNodes, since theses two classes were removed; TermRangeQueryNodeProcessor now creates TermRangeQueryNode, instead of RangeQueryNode; the same applies for numeric nodes;
    (Vinicius Barros via Uwe Schindler)
  • LUCENE-3455 : QueryParserBase.newFieldQuery() will throw a ParseException if any of the calls to the Analyzer throw an IOException. QueryParseBase.analyzeRangePart() will throw a RuntimeException if an IOException is thrown by the Analyzer.
  • LUCENE-4127 : IndexWriter will now throw IllegalArgumentException if the first token of an indexed field has 0 positionIncrement (previously it silently corrected it to 1, possibly masking bugs). OffsetAttributeImpl will throw IllegalArgumentException if startOffset is less than endOffset, or if offsets are negative.
    (Robert Muir, Mike McCandless)
  • API Changes (35)
  • LUCENE-2302 , LUCENE-1458 , LUCENE-2111 , LUCENE-2514 : Terms are no longer required to be character based. Lucene views a term as an arbitrary byte[]: during analysis, character-based terms are converted to UTF8 byte[], but analyzers are free to directly create terms as byte[] (NumericField does this, for example). The term data is buffered as byte[] during indexing, written as byte[] into the terms dictionary, and iterated as byte[] (wrapped in a BytesRef) by IndexReader for searching.
  • LUCENE-1458 , LUCENE-2111 : AtomicReader now directly exposes its deleted docs (getDeletedDocs), providing a new Bits interface to directly query by doc ID.
  • LUCENE-2691 : IndexWriter.getReader() has been made package local and is now exposed via open and reopen methods on DirectoryReader. The semantics of the call is the same as it was prior to the API change.
    (Grant Ingersoll, Mike McCandless)
  • LUCENE-2566 : QueryParser: Unary operators +,-,! will not be treated as operators if they are followed by whitespace.
    (yonik)
  • LUCENE-2831 : Weight#scorer, Weight#explain, Filter#getDocIdSet, Collector#setNextReader & FieldComparator#setNextReader now expect an AtomicReaderContext instead of an IndexReader.
    (Simon Willnauer)
  • LUCENE-2892 : Add QueryParser.newFieldQuery (called by getFieldQuery by default) which takes Analyzer as a parameter, for easier customization by subclasses.
    (Robert Muir)
  • LUCENE-2953 : In addition to changes in 3.x, PriorityQueue#initialize(int) function was moved into the ctor.
    (Uwe Schindler, Yonik Seeley)
  • LUCENE-3219 : SortField type properties have been moved to an enum SortField.Type. In be consistent, CachedArrayCreator.getSortTypeID() has been changed CachedArrayCreator.getSortType().
    (Chris Male)
  • LUCENE-3225 : Add TermsEnum.seekExact for faster seeking when you don't need the ceiling term; renamed existing seek methods to either seekCeil or seekExact; changed seekExact(ord) to return no value. Fixed MemoryCodec and SimpleTextCodec to optimize the seekExact case, and fixed places in Lucene to use seekExact when possible.
    (Mike McCandless)
  • LUCENE-1536 : Filter.getDocIdSet() now takes an acceptDocs Bits interface (like Scorer) limiting the documents that can appear in the returned DocIdSet. Filters are now required to respect these acceptDocs, otherwise deleted documents may get returned by searches. Most filters will pass these Bits down to DocsEnum, but those, e.g. working on FieldCache, may need to use BitsFilteredDocIdSet.wrap() to exclude them.
    (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, Jason Rutherglen, Paul Elschot)
  • LUCENE-3722 : Similarity methods and collection/term statistics now take long instead of int (to enable distributed scoring of > 2B docs).
    (Yonik Seeley, Andrzej Bialecki, Robert Muir)
  • LUCENE-3761 : Generalize SearcherManager into an abstract ReferenceManager. SearcherManager remains a concrete class, but due to the refactoring, the method maybeReopen has been deprecated in favor of maybeRefresh().
    (Shai Erera, Mike McCandless, Simon Willnauer)
  • LUCENE-3859 : AtomicReader.hasNorms(field) is deprecated, instead you can inspect the FieldInfo yourself to see if norms are present, which also allows you to get the type.
    (Robert Muir)
  • LUCENE-2606 : Changed RegexCapabilities interface to fix thread safety, serialization, and performance problems. If you have written a custom RegexCapabilities it will need to be updated to the new API.
    (Robert Muir, Uwe Schindler)
  • LUCENE-2638 MakeHighFreqTerms.TermStats public to make it more useful for API use.
    (Andrzej Bialecki)
  • LUCENE-2912 : The field-specific hashmaps in SweetSpotSimilarity were removed. Instead, use PerFieldSimilarityWrapper to return different SweetSpotSimilaritys for different fields, this way all parameters (such as TF factors) can be customized on a per-field basis.
    (Robert Muir)
  • LUCENE-3308 : DuplicateFilter keepMode and processingMode have been converted to enums DuplicateFilter.KeepMode and DuplicateFilter.ProcessingMode respectively.
  • LUCENE-3483 : Move Function grouping collectors from Solr to grouping module.
    (Martijn van Groningen)
  • LUCENE-3606 : FieldNormModifier was deprecated, because IndexReader's setNorm() was deprecated. Furthermore, this class is broken, as it does not take position overlaps into account while recalculating norms.
    (Uwe Schindler, Robert Muir)
  • LUCENE-3936 : Renamed StringIndexDocValues to DocTermsIndexDocValues.
    (Martijn van Groningen)
  • LUCENE-1768 : Deprecated Parametric(Range)QueryNode, RangeQueryNode(Builder), ParametricRangeQueryNodeProcessor were removed.
    (Vinicius Barros via Uwe Schindler)
  • LUCENE-3820 : Deprecated constructors accepting pattern matching bounds. The input is buffered and matched in one pass.
    (Dawid Weiss)
  • LUCENE-2413 : Deprecated PatternAnalyzer in common/miscellaneous, in favor of the pattern package (CharFilter, Tokenizer, TokenFilter).
    (Robert Muir)
  • LUCENE-2413 : Removed the AnalyzerUtil in common/miscellaneous.
    (Robert Muir)
  • LUCENE-1370 : Added ShingleFilter option to output unigrams if no shingles can be generated.
    (Chris Harris via Steven Rowe)
  • LUCENE-2514 , LUCENE-2551 : JDK and ICU CollationKeyAnalyzers were changed to use pure byte keys when Version >= 4.0. This cuts sort key size approximately in half.
    (Robert Muir)
  • LUCENE-3400 : Removed DutchAnalyzer.setStemDictionary
    (Chris Male)
  • LUCENE-3431 : Removed QueryAutoStopWordAnalyzer.addStopWords* deprecated methods since they prevented reuse. Stopwords are now generated at instantiation through the Analyzer's constructors.
    (Chris Male)
  • LUCENE-3434 : Removed ShingleAnalyzerWrapper.set* and PerFieldAnalyzerWrapper.addAnalyzer since they prevent reuse. Both Analyzers should be configured at instantiation.
    (Chris Male)
  • LUCENE-3765 : Stopset ctors that previously took Set<?> or Map<?,String> now take CharArraySet and CharArrayMap respectively. Previously the behavior was confusing, and sometimes different depending on the type of set, and ultimately a CharArraySet or CharArrayMap was always used anyway.
    (Robert Muir)
  • LUCENE-3830 : Switched to NormalizeCharMap.Builder to create immutable instances of NormalizeCharMap.
    (Dawid Weiss, Mike McCandless)
  • LUCENE-4063 : FrenchLightStemmer no longer deletes repeated digits.
    (Tanguy Moal via Steve Rowe)
  • LUCENE-4122 : Replace Payload with BytesRef.
    (Andrzej Bialecki)
  • LUCENE-4132 : IndexWriter.getConfig() now returns a LiveIndexWriterConfig object which can be used to change the IndexWriter's live settings. IndexWriterConfig is used only for initializing the IndexWriter.
    (Shai Erera)
  • LUCENE-3866 : IndexReaderContext.leaves() is now the preferred way to access atomic sub-readers of any kind of IndexReader (for AtomicReaders it returns itself as only leaf with docBase=0).
    (Uwe Schindler)
  • New features (66)
  • LUCENE-2604 : Added RegexpQuery support to QueryParser. Regular expressions are directly supported by the standard queryparser via fieldName:/expression/ OR /expression against default field/ Users who wish to search for literal "/" characters are advised to backslash-escape or quote those characters as needed.
    (Simon Willnauer, Robert Muir)
  • LUCENE-1606 , LUCENE-2089 : Adds AutomatonQuery, a MultiTermQuery that matches terms against a finite-state machine. Implement WildcardQuery and FuzzyQuery with finite-state methods. Adds RegexpQuery.
    (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
  • LUCENE-3662 : Add support for levenshtein distance with transpositions to LevenshteinAutomata, FuzzyTermsEnum, and DirectSpellChecker.
    (Jean-Philippe Barrette-LaPierre, Robert Muir)
  • LUCENE-2321 : Cutover to a more RAM efficient packed-ints based representation for the in-memory terms dict index.
    (Mike McCandless)
  • LUCENE-2126 : Add new classes for data (de)serialization: DataInput and DataOutput. IndexInput and IndexOutput extend these new classes.
    (Michael Busch)
  • LUCENE-1458 , LUCENE-2111 : With flexible indexing it is now possible for an application to create its own postings codec, to alter how fields, terms, docs and positions are encoded into the index. The standard codec is the default codec. IndexWriter accepts a Codec class to obtain codecs for newly written segments.
  • LUCENE-1458 , LUCENE-2111 : Some experimental codecs have been added for flexible indexing, including pulsing codec (inlines low-frequency terms directly into the terms dict, avoiding seeking for some queries), sep codec (stores docs, freqs, positions, skip data and payloads in 5 separate files instead of the 2 used by standard codec), and int block (really a "base" for using block-based compressors like PForDelta for storing postings data).
  • LUCENE-1458 , LUCENE-2111 : The in-memory terms index used by standard codec is more RAM efficient: terms data is stored as block byte arrays and packed integers. Net RAM reduction for indexes that have many unique terms should be substantial, and initial open time for IndexReaders should be faster. These gains only apply for newly written segments after upgrading.
  • LUCENE-1458 , LUCENE-2111 : Terms data are now buffered directly as byte[] during indexing, which uses half the RAM for ascii terms (and also numeric fields). This can improve indexing throughput for applications that have many unique terms, since it reduces how often a new segment must be flushed given a fixed RAM buffer size.
  • LUCENE-2489 : Added PerFieldCodecWrapper (in oal.index.codecs) which lets you set the Codec per field
    (Mike McCandless)
  • LUCENE-2373 : Extend Codec to use SegmentInfosWriter and SegmentInfosReader to allow customization of SegmentInfos data.
    (Andrzej Bialecki)
  • LUCENE-2504 : FieldComparator.setNextReader now returns a FieldComparator instance. You can "return this", to just reuse the same instance, or you can return a comparator optimized to the new segment.
    (yonik, Mike McCandless)
  • LUCENE-2648 : PackedInts.Iterator now supports to advance by more than a single ordinal.
    (Simon Willnauer)
  • LUCENE-2649 : Objects in the FieldCache can optionally store Bits that mark which docs have real values in the native[]
    (ryan)
  • LUCENE-2664 : Add SimpleText codec, which stores all terms/postings data in a single text file for transparency (at the expense of poor performance).
    (Sahin Buyrukbilen via Mike McCandless)
  • LUCENE-2589 : Add a VariableSizedIntIndexInput, which, when used w/ Sep*, makes it simple to take any variable sized int block coders (like Simple9/16) and use them in a codec.
    (Mike McCandless)
  • LUCENE-2597 : Add oal.index.SlowCompositeReaderWrapper, to wrap a composite reader (eg MultiReader or DirectoryReader), making it pretend it's an atomic reader. This is a convenience class (you can use MultiFields static methods directly, instead) if you need to use the flex APIs directly on a composite reader.
    (Mike McCandless)
  • LUCENE-2690 : MultiTermQuery boolean rewrites per segment.
    (Uwe Schindler, Robert Muir, Mike McCandless, Simon Willnauer)
  • LUCENE-996 : The QueryParser now accepts mixed inclusive and exclusive bounds for range queries. Example: "{3 TO 5]" QueryParser subclasses that overrode getRangeQuery will need to be changed to use the new getRangeQuery method.
    (Andrew Schurman, Mark Miller, yonik)
  • LUCENE-2742 : Add native per-field postings format support. Codec lets you now register a postings format for each field and which is in turn recorded into the index. Postings formats are maintained on a per-segment basis and be resolved without knowing the actual postings format used for writing the segment.
    (Simon Willnauer)
  • LUCENE-2741 : Add support for multiple codecs that use the same file extensions within the same segment. Codecs now use their per-segment codec ID in the file names.
    (Simon Willnauer)
  • LUCENE-2843 : Added a new terms index impl, VariableGapTermsIndexWriter/Reader, that accepts a pluggable IndexTermSelector for picking which terms should be indexed in the terms dict. This impl stores the indexed terms in an FST, which is much more RAM efficient than FixedGapTermsIndex.
    (Mike McCandless)
  • LUCENE-2862 : Added TermsEnum.totalTermFreq() and Terms.getSumTotalTermFreq().
    (Mike McCandless, Robert Muir)
  • LUCENE-3290 : Added Terms.getSumDocFreq()
    (Mike McCandless, Robert Muir)
  • LUCENE-3003 : Added new expert class oal.index.DocTermsOrd, refactored from Solr's UnInvertedField, for accessing term ords for multi-valued fields, per document. This is similar to FieldCache in that it inverts the index to compute the ords, but differs in that it's able to handle multi-valued fields and does not hold the term bytes in RAM.
    (Mike McCandless)
  • LUCENE-3108 , LUCENE-2935 , LUCENE-2168 , LUCENE-1231 : Changes from DocValues (ColumnStrideFields): IndexWriter now supports typesafe dense per-document values stored in a column like storage. DocValues are stored on a per-document basis where each documents field can hold exactly one value of a given type. DocValues are provided via Fieldable and can be used in conjunction with stored and indexed values. DocValues provides an entirely RAM resident document id to value mapping per field as well as a DocIdSetIterator based disk-resident sequential access API relying on filesystem-caches. Both APIs are exposed via IndexReader and the Codec / Flex API allowing expert users to integrate customized DocValues reader and writer implementations by extending existing Codecs. DocValues provides implementations for primitive datatypes like int, long, float, double and arrays of byte. Byte based implementations further provide storage variants like straight or dereferenced stored bytes, fixed and variable length bytes as well as index time sorted based on user-provided comparators. (Mike McCandless, Simon Willnauer)
  • LUCENE-3209 : Added MemoryCodec, which stores all terms & postings in RAM as an FST; this is good for primary-key fields if you frequently need to lookup by that field or perform deletions against it, for example in a near-real-time setting.
    (Mike McCandless)
  • SOLR-2533 : Added support for rewriting Sort and SortFields using an IndexSearcher. SortFields can have SortField.REWRITEABLE type which requires they are rewritten before they are used.
    (Chris Male)
  • LUCENE-3203 : FSDirectory can now limit the max allowed write rate (MB/sec) of all running merges, to reduce impact ongoing merging has on searching, NRT reopen time, etc.
    (Mike McCandless)
  • LUCENE-2793 : Directory#createOutput & Directory#openInput now accept an IOContext instead of a buffer size to allow low level optimizations for different usecases like merging, flushing and reading.
    (Simon Willnauer, Mike McCandless, Varun Thacker)
  • LUCENE-3354 : FieldCache can cache DocTermOrds.
    (Martijn van Groningen)
  • LUCENE-3376 : ReusableAnalyzerBase has been moved from modules/analysis/common into lucene/src/java/org/apache/lucene/analysis
    (Chris Male)
  • LUCENE-3423 : add Terms.getDocCount(), which returns the number of documents that have at least one term for a field.
    (Yonik Seeley, Robert Muir)
  • LUCENE-2959 : Added a variety of different relevance ranking systems to Lucene. Added Okapi BM25, Language Models, Divergence from Randomness, and Information-Based Models. The models are pluggable, support all of lucene's features (boosts, slops, explanations, etc) and queries (spans, etc). All models default to the same index-time norm encoding as DefaultSimilarity, so you can easily try these out/switch back and forth/run experiments and comparisons without reindexing. Note: most of the models do rely upon index statistics that are new in Lucene 4.0, so for existing 3.x indexes it's a good idea to upgrade your index to the new format with IndexUpgrader first. Added a new subclass SimilarityBase which provides a simplified API for plugging in new ranking algorithms without dealing with all of the nuances and implementation details of Lucene. For example, to use BM25 for all fields: searcher.setSimilarity(new BM25Similarity()); If you instead want to apply different similarities (e.g. ones with different parameter values or different algorithms entirely) to different fields, implement PerFieldSimilarityWrapper with your per-field logic.
    (David Mark Nemeskey via Robert Muir)

  • LUCENE-3396 : ReusableAnalyzerBase now provides a ReuseStrategy abstraction which controls how TokenStreamComponents are reused per request. Two implementations are provided - GlobalReuseStrategy which implements the current behavior of sharing components between all fields, and PerFieldReuseStrategy which shares per field.
    (Chris Male)
  • LUCENE-2309 : Added IndexableField.tokenStream(Analyzer) which is now responsible for creating the TokenStreams for Fields when they are to be indexed.
    (Chris Male)
  • LUCENE-3433 : Added random access for non RAM resident IndexDocValues. RAM resident and disk resident IndexDocValues are now exposed via the Source interface. ValuesEnum has been removed in favour of Source.
    (Simon Willnauer)
  • LUCENE-1536 : Filters can now be applied down-low, if their DocIdSet implements a new bits() method, returning all documents in a random access way. If the DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer as replacement for AtomicReader's live docs. In addition, FilteredQuery backs now IndexSearcher's filtering search methods. Using FilteredQuery you can chain Filters in a very performant way [new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not possible with IndexSearcher's methods. FilteredQuery also allows to override the heuristics used to decide if filtering should be done random access or using a conjunction on DocIdSet's iterator().
    (Mike McCandless, Uwe Schindler, Robert Muir, Chris Male, Yonik Seeley, Jason Rutherglen, Paul Elschot)
  • LUCENE-3638 : Added sugar methods to IndexReader and IndexSearcher to load only certain fields when loading a document.
    (Peter Chang via Mike McCandless)
  • LUCENE-3628 : Norms are represented as DocValues. AtomicReader exposes a #normValues(String) method to obtain norms per field.
    (Simon Willnauer)
  • LUCENE-3687 : Similarity#computeNorm(FieldInvertState, Norm) allows to compute norm values or arbitrary precision. Instead of returning a fixed single byte value, custom similarities can now set a integer, float or byte value to the given Norm object.
    (Simon Willnauer)
  • LUCENE-2604 , LUCENE-4103 : Added RegexpQuery support to contrib/queryparser.
    (Simon Willnauer, Robert Muir, Daniel Truemper)
  • LUCENE-2373 : Added a Codec implementation that works with append-only filesystems (such as e.g. Hadoop DFS). SegmentInfos writing/reading code is refactored to support append-only FS, and to allow for future customization of per-segment information.
    (Andrzej Bialecki)
  • LUCENE-2479 : Added ability to provide a sort comparator for spelling suggestions along with two implementations. The existing comparator (score, then frequency) is the default
    (Grant Ingersoll)
  • LUCENE-2608 : Added the ability to specify the accuracy at method time in the SpellChecker. The per class method is also still available.
    (Grant Ingersoll)
  • LUCENE-2507 : Added DirectSpellChecker, which retrieves correction candidates directly from the term dictionary using levenshtein automata.
    (Robert Muir)
  • LUCENE-3527 : Add LuceneLevenshteinDistance, which computes string distance in a compatible way as DirectSpellChecker. This can be used to merge top-N results from more than one SpellChecker.
    (James Dyer via Robert Muir)
  • LUCENE-3496 : Support grouping by DocValues.
    (Martijn van Groningen)
  • LUCENE-2795 : Generified DirectIOLinuxDirectory to work across any unix supporting the O_DIRECT flag when opening a file (tested on Linux and OS X but likely other Unixes will work), and improved it so it can be used for indexing and searching. The directory uses direct IO when doing large merges to avoid unnecessarily evicting cached IO pages due to large merges.
    (Varun Thacker, Mike McCandless)
  • LUCENE-3827 : DocsAndPositionsEnum from MemoryIndex implements start/endOffset, if offsets are indexed.
    (Alan Woodward via Mike McCandless)
  • LUCENE-3802 , LUCENE-3856 : Support for grouped faceting.
    (Martijn van Groningen)
  • LUCENE-3444 : Added a second pass grouping collector that keeps track of distinct values for a specified field for the top N group.
    (Martijn van Groningen)
  • LUCENE-3778 : Added a grouping utility class that makes it easier to use result grouping for pure Lucene apps.
    (Martijn van Groningen)
  • LUCENE-2341 : A new analysis/ filter: Morfologik - a dictionary-driven lemmatizer (accurate stemmer) for Polish (includes morphosyntactic annotations).
    (Michał Dybizbański, Dawid Weiss)
  • LUCENE-2413 : Consolidated Lucene/Solr analysis components into analysis/common. New features from Solr now available to Lucene users include: o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms and phrases. o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML constructs. o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words into subwords and performs optional transformations on subword groups. o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which filters out Tokens at the same position and Term text as the previous token. o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace from Tokens in the stream. o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens with text contained in the required words (inverse of StopFilter). o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts hyphenated words broken into two lines back together. o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies capitalization rules to tokens. o.a.l.analysis.pattern: Package for pattern-based analysis, containing a CharFilter, Tokenizer, and TokenFilter for transforming text with regexes. o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word synonyms. o.a.l.analysis.phonetic: Package for phonetic search, containing various phonetic encoders such as Double Metaphone. Some existing analysis components changed packages: o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils All analyzers in contrib/analyzers and contrib/icu were moved to the analysis/ module. The 'smartcn' and 'stempel' components now depend on 'common'.
    (Chris Male, Robert Muir)

  • LUCENE-4004 : Add DisjunctionMaxQuery support to the xml query parser.
    (Benson Margulies via Robert Muir)
  • LUCENE-4025 : Add maybeRefreshBlocking to ReferenceManager, to let a caller block until the refresh logic has been executed.
    (Shai Erera, Mike McCandless)
  • LUCENE-4039 : Add AddIndexesTask to benchmark, which uses IW.addIndexes.
    (Shai Erera)
  • LUCENE-3514 : Added IndexSearcher.searchAfter when Sort is used, returning results after a specified FieldDoc for deep paging.
    (Mike McCandless)
  • LUCENE-4043 : Added scoring support via score mode for query time joining.
    (Martijn van Groningen, Mike McCandless)
  • LUCENE-3523 : Added oal.search.spell.WordBreakSpellChecker, which generates suggestions by combining two or more terms and/or breaking terms into multiple words. See Javadocs for usage.
    (James Dyer)
  • LUCENE-4019 : Added improved parsing of Hunspell Dictionaries so those rules missing the required number of parameters either ignored or cause a ParseException (depending on whether strict parsing is enabled).
    (Luca Cavanna via Chris Male)
  • LUCENE-3440 : Add ordered fragments feature with IDF-weighted terms for FVH.
    (Sebastian Lutze via Koji Sekiguchi)
  • LUCENE-4082 : Added explain to ToParentBlockJoinQuery.
    (Christoph Kaser, Martijn van Groningen)
  • LUCENE-4108 : add replaceTaxonomy to DirectoryTaxonomyWriter, which replaces the taxonomy in place with the given one.
    (Shai Erera)
  • LUCENE-3030 : new BlockTree terms dictionary (used by the default Lucene40 postings format) uses less RAM (for the terms index) and disk space (for all terms and metadata) and gives sizable performance gains for terms dictionary intensive operations like FuzzyQuery, direct spell checker and primary-key lookup
    (Mike McCandless) .
  • Optimizations (17)
  • LUCENE-2588 : Don't store unnecessary suffixes when writing the terms index, saving RAM in IndexReader; change default terms index interval from 128 to 32, because the terms index now requires much less RAM.
    (Robert Muir, Mike McCandless)
  • LUCENE-2669 : Optimize NumericRangeQuery.NumericRangeTermsEnum to not seek backwards when a sub-range has no terms. It now only seeks when the current term is less than the next sub-range's lower end.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-2694 : Optimize MultiTermQuery to be single pass for Term lookups. MultiTermQuery now stores TermState per leaf reader during rewrite to re- seek the term dictionary in TermQuery / TermWeight.
    (Simon Willnauer, Mike McCandless, Robert Muir)
  • LUCENE-3292 : IndexWriter no longer shares the same SegmentReader instance for merging and NRT readers, which enables directory impls to separately tune IO flags for each.
    (Varun Thacker, Simon Willnauer, Mike McCandless)
  • LUCENE-3328 : BooleanQuery now uses a specialized ConjunctionScorer if all boolean clauses are required and instances of TermQuery.
    (Simon Willnauer, Robert Muir)
  • LUCENE-3643 : FilteredQuery and IndexSearcher.search(Query, Filter,...) now optimize the special case query instanceof MatchAllDocsQuery to execute as ConstantScoreQuery.
    (Uwe Schindler)
  • LUCENE-3509 : Added fasterButMoreRam option for docvalues. This option controls whether the space for packed ints should be rounded up for better performance. This option only applies for docvalues types bytes fixed sorted and bytes var sorted.
    (Simon Willnauer, Martijn van Groningen)
  • LUCENE-3795 : Replace contrib/spatial with modules/spatial. This includes a basic spatial strategy interface.
    (David Smiley, Chris Male, ryan)
  • LUCENE-3932 : Lucene3x codec loads terms index faster, by pre-allocating the packed ints array based on the .tii file size
    (Sean Bridges via Mike McCandless)
  • LUCENE-3468 : Replaced last() and remove() with pollLast() in FirstPassGroupingCollector
    (Martijn van Groningen)
  • LUCENE-3830 : Changed MappingCharFilter/NormalizeCharMap to use an FST under the hood, which requires less RAM. NormalizeCharMap no longer accepts empty string match (it did previously, but ignored
    (Dawid Weiss, Mike McCandless)
  • LUCENE-4061 : improve synchronization in DirectoryTaxonomyWriter.addCategory and few general improvements to DirectoryTaxonomyWriter.
    (Shai Erera, Gilad Barkai)
  • LUCENE-4062 : Add new aligned packed bits impls for faster lookup performance; add float acceptableOverheadRatio to getWriter and getMutable API to give packed ints freedom to pick faster implementations
    (Adrien Grand via Mike McCandless)
  • LUCENE-2357 : Reduce transient RAM usage when merging segments in IndexWriter.
    (Adrien Grand)
  • LUCENE-4098 : Add bulk get/set methods to PackedInts
    (Adrien Grand via Mike McCandless)
  • LUCENE-4156 : DirectoryTaxonomyWriter.getSize is no longer synchronized.
    (Shai Erera, Sivan Yogev)
  • LUCENE-4163 : Improve concurrency of MMapIndexInput.clone() by using the new WeakIdentityMap on top of a ConcurrentHashMap to manage the cloned instances. WeakIdentityMap was extended to support iterating over its keys.
    (Uwe Schindler)
  • Bug fixes (19)
  • LUCENE-2803 : The FieldCache can miss values if an entry for a reader with more document deletions is requested before a reader with fewer deletions, provided they share some segments.
    (yonik)
  • LUCENE-2645 : Fix false assertion error when same token was added one after another with 0 posIncr.
    (David Smiley, Kurosaka Teruhiko via Mike McCandless)
  • LUCENE-3348 : Fix thread safety hazards in IndexWriter that could rarely cause deletions to be incorrectly applied.
    (Yonik Seeley, Simon Willnauer, Mike McCandless)
  • LUCENE-3515 : Fix terrible merge performance versus 3.x, especially when the directory isn't MMapDirectory, due to failing to reuse DocsAndPositionsEnum while merging
    (Marc Sturlese, Erick Erickson, Robert Muir, Simon Willnauer, Mike McCandless)
  • LUCENE-3589 : BytesRef copy(short) didn't set length.
    (Peter Chang via Robert Muir)
  • LUCENE-3045 : fixed QueryNodeImpl.containsTag(String key) that was not lowercasing the key before checking for the tag
    (Adriano Crestani)
  • LUCENE-3890 : Fixed NPE for grouped faceting on multi-valued fields.
    (Michael McCandless, Martijn van Groningen)
  • LUCENE-2945 : Fix hashCode/equals for surround query parser generated queries.
    (Paul Elschot, Simon Rosenthal, gsingers via ehatcher)
  • LUCENE-3971 : MappingCharFilter could return invalid final token position.
    (Dawid Weiss, Robert Muir)
  • LUCENE-3820 : PatternReplaceCharFilter could return invalid token positions.
    (Dawid Weiss)
  • LUCENE-3969 : Throw IAE on bad arguments that could cause confusing errors in CompoundWordTokenFilterBase, PatternTokenizer, PositionFilter, SnowballFilter, PathHierarchyTokenizer, ReversePathHierarchyTokenizer, WikipediaTokenizer, and KeywordTokenizer. ShingleFilter and CommonGramsFilter now populate PositionLengthAttribute. Fixed PathHierarchyTokenizer to reset() all state. Protect against AIOOBE in ReversePathHierarchyTokenizer if skip is large. Fixed wrong final offset calculation in PathHierarchyTokenizer.
    (Mike McCandless, Uwe Schindler, Robert Muir)
  • LUCENE-4060 : Fix a synchronization bug in DirectoryTaxonomyWriter.addTaxonomies(). Also, the method has been renamed to addTaxonomy and now takes only one Directory and one OrdinalMap.
    (Shai Erera, Gilad Barkai)
  • LUCENE-3590 : Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix CharsRef's CharSequence methods to throw exceptions in boundary cases to properly meet the specification.
    (Robert Muir)
  • LUCENE-4084 : Attempting to reuse a single IndexWriterConfig instance across more than one IndexWriter resulted in a cryptic exception. This is now fixed, but requires that certain members of IndexWriterConfig (MergePolicy, FlushPolicy, DocumentsWriterThreadPool) implement clone.
    (Robert Muir, Simon Willnauer, Mike McCandless)
  • LUCENE-4079 : Fixed loading of Hunspell dictionaries that use aliasing (AF rules)
    (Ludovic Boutros via Chris Male)
  • LUCENE-4077 : Expose the max score and per-group scores from ToParentBlockJoinCollector
    (Christoph Kaser, Mike McCandless)
  • LUCENE-4114 : Fix int overflow bugs in BYTES_FIXED_STRAIGHT and BYTES_FIXED_DEREF doc values implementations
    (Walt Elder via Mike McCandless) .
  • LUCENE-4147 : Fixed thread safety issues when rollback() and commit() are called simultaneously.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-4165 : Removed closing of the Reader used to read the affix file in HunspellDictionary. Consumers are now responsible for closing all InputStreams once the Dictionary has been instantiated.
    (Torsten Krah, Uwe Schindler, Chris Male)
  • Documentation (1)
  • LUCENE-3958 : Javadocs corrections for IndexWriter.
    (Iulius Curt via Robert Muir)
  • Build (12)
  • LUCENE-4047 : Cleanup of LuceneTestCase: moved blocks of initialization/ cleanup code into JUnit instance and class rules.
    (Dawid Weiss)
  • LUCENE-4016 : Require ANT 1.8.2+ for the build.
  • LUCENE-3808 : Refactoring of testing infrastructure to use randomizedtesting package: http://labs.carrotsearch.com/randomizedtesting.html
    (Dawid Weiss)
  • LUCENE-3964 : Added target stage-maven-artifacts, which stages Maven release artifacts to a Maven staging repository in preparation for release.
    (Steve Rowe)
  • LUCENE-2845 : Moved contrib/benchmark to lucene/benchmark.
  • LUCENE-2995 : Moved contrib/spellchecker into lucene/suggest.
  • LUCENE-3285 : Moved contrib/queryparser into lucene/queryparser
  • LUCENE-3285 : Moved contrib/xml-query-parser's demo into lucene/demo
  • LUCENE-3271 : Moved contrib/queries BooleanFilter, BoostingQuery, ChainedFilter, FilterClause and TermsFilter into lucene/queries
  • LUCENE-3381 : Moved contrib/queries regex.*, DuplicateFilter, FuzzyLikeThisQuery and SlowCollated* into lucene/sandbox. Removed contrib/queries.
  • LUCENE-3286 : Moved remainder of contrib/xml-query-parser to lucene/queryparser. Classes now found at org.apache.lucene.queryparser.xml.*
  • LUCENE-4059 : Improve ANT task prepare-webpages (used by documentation tasks) to correctly encode build file names as URIs for later processing by
    (Greg Bowyer, Uwe Schindler)
  • Release 3.6.2 [2012-12-25]

  • Bug Fixes (8)
  • LUCENE-4234 : Exception when FacetsCollector is used with ScoreFacetRequest, and the number of matching documents is too large.
    (Gilad Barkai via Shai Erera)
  • LUCENE-2686 , LUCENE-3505 , LUCENE-4401 : Fix BooleanQuery scorers to return correct freq().
    (Koji Sekiguchi, Mike McCandless, Liu Chao, Robert Muir)
  • LUCENE-2501 : Fixed rare thread-safety issue that could cause ArrayIndexOutOfBoundsException inside ByteBlockPool
    (Robert Muir, Mike McCandless)
  • LUCENE-4297 : BooleanScorer2 would multiply the coord() factor twice for conjunctions: for most users this is no problem, but if you had a customized Similarity that returned something other than 1 when overlap == maxOverlap (always the case for conjunctions), then the score would be incorrect.
    (Pascal Chollet, Robert Muir)
  • LUCENE-4300 : BooleanQuery's rewrite was not always safe: if you had a custom Similarity where coord(1,1) != 1F, then the rewritten query would be scored differently.
    (Robert Muir)
  • LUCENE-4398 : If you index many different field names in your documents then due to a bug in how it measures its RAM usage, IndexWriter would flush each segment too early eventually reaching the point where it flushes after every doc.
    (Tim Smith via Mike McCandless)
  • LUCENE-4411 : when sampling is enabled for a FacetRequest, its depth parameter is reset to the default (1), even if set otherwise.
    (Gilad Barkai via Shai Erera)
  • LUCENE-4635 : Fixed ArrayIndexOutOfBoundsException when in-memory terms index requires more than 2.1 GB RAM (indices with billions of terms).
    (Tom Burton-West via Mike McCandless)
  • Documentation (1)
  • LUCENE-4302 : Fix facet userguide to have HTML loose doctype like all other javadocs.
    (Karl Nicholas via Uwe Schindler)
  • Release 3.6.1 [2012-07-22]

  • More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/lucene-java/Lucene3.6.1
  • Bug Fixes (6)
  • LUCENE-3969 : Throw IAE on bad arguments that could cause confusing errors in KeywordTokenizer.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-3971 : MappingCharFilter could return invalid final token position.
    (Dawid Weiss, Robert Muir)
  • LUCENE-4023 : DisjunctionMaxScorer now implements visitSubScorers().
    (Uwe Schindler)
  • LUCENE-2566 : + - operators allow any amount of whitespace
    (yonik, janhoy)
  • LUCENE-3590 : Fix AIOOBE in BytesRef/CharsRef copyBytes/copyChars when offset is nonzero, fix off-by-one in CharsRef.subSequence, and fix CharsRef's CharSequence methods to throw exceptions in boundary cases to properly meet the specification.
    (Robert Muir)
  • LUCENE-4222 : TieredMergePolicy.getFloorSegmentMB was returning the size in bytes not MB
    (Chris Fuller via Mike McCandless)
  • API Changes (1)
  • LUCENE-4023 : Changed the visibility of Scorer#visitSubScorers() to public, otherwise it's impossible to implement Scorers outside the Lucene package.
    (Uwe Schindler)
  • Optimizations (1)
  • LUCENE-4163 : Improve concurrency of MMapIndexInput.clone() by using the new WeakIdentityMap on top of a ConcurrentHashMap to manage the cloned instances. WeakIdentityMap was extended to support iterating over its keys.
    (Uwe Schindler)
  • Tests (2)
  • LUCENE-3873 : add MockGraphTokenFilter, testing analyzers with random graph tokens.
    (Mike McCandless)
  • LUCENE-3968 : factor out LookaheadTokenFilter from MockGraphTokenFilter
    (Mike McCandless)
  • Release 3.6.0 [2012-04-12]

  • More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/lucene-java/Lucene3.6
  • Changes in backwards compatibility policy (15)
  • LUCENE-3594 : The protected inner class (never intended to be visible) FieldCacheTermsFilter.FieldCacheTermsFilterDocIdSet was removed and replaced by another internal implementation.
    (Uwe Schindler)
  • LUCENE-3620 : FilterIndexReader now overrides all methods of IndexReader that it should (note that some are still not overridden, as they should be overridden by sub-classes only). In the process, some methods of IndexReader were made final. This is not expected to affect many apps, since these methods already delegate to abstract methods, which you had to already override anyway.
    (Shai Erera)
  • LUCENE-3636 : Added SearcherFactory, used by SearcherManager and NRTManager to create new IndexSearchers. You can provide your own implementation to warm new searchers, set an ExecutorService, set a custom Similarity, or even return your own subclass of IndexSearcher. The SearcherWarmer and ExecutorService parameters on these classes were removed, as they are subsumed by SearcherFactory.
    (Shai Erera, Mike McCandless, Robert Muir)
  • LUCENE-3644 : The expert ReaderFinishedListener api suffered problems (propagated down to subreaders, but was not called on SegmentReaders, unless they were the owner of the reader core, and other ambiguities). The API is revised: You can set ReaderClosedListeners on any IndexReader, and onClose is called when that reader is closed. SegmentReader has CoreClosedListeners that you can register to know when a shared reader core is closed.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • LUCENE-3652 : The package org.apache.lucene.messages was moved to contrib/queryparser. If you have used those classes in your code just add the lucene-queryparser.jar file to your classpath.
    (Uwe Schindler)
  • LUCENE-3681 : FST now stores labels for BYTE2 input type as 2 bytes instead of vInt; this can make FSTs smaller and faster, but it is a break in the binary format so if you had built and saved any FSTs then you need to rebuild them.
    (Robert Muir, Mike McCandless)
  • LUCENE-3679 : The expert IndexReader.getFieldNames(FieldOption) API has been removed and replaced with the experimental getFieldInfos API. All IndexReader subclasses must implement getFieldInfos.
    (Mike McCandless)
  • LUCENE-3695 : Move confusing add(X) methods out of FST.Builder into FST.Util.
    (Robert Muir, Mike McCandless)
  • LUCENE-3701 : Added an additional argument to the expert FST.Builder ctor to take FreezeTail, which you can use to (very-expertly) customize the FST construction process. Pass null if you want the default behavior. Added seekExact() to FSTEnum, and added FST.save/read from a File.
    (Mike McCandless, Dawid Weiss, Robert Muir)
  • LUCENE-3712 : Removed unused and untested ReaderUtil#subReader methods.
    (Uwe Schindler)
  • LUCENE-3672 : Deprecate Directory.fileModified, IndexCommit.getTimestamp and .getVersion and IndexReader.lastModified and getCurrentVersion
    (Andrzej Bialecki, Robert Muir, Mike McCandless)
  • LUCENE-3760 : In IndexReader/DirectoryReader, deprecate static methods getCurrentVersion and getCommitUserData, and non-static method getCommitUserData (use getIndexCommit().getUserData() instead).
    (Ryan McKinley, Robert Muir, Mike McCandless)
  • LUCENE-3867 : Deprecate instance creation of RamUsageEstimator, instead the new static method sizeOf(Object) should be used. As the algorithm is now using Hotspot(TM) internals (reference size, header sizes, object alignment), the abstract o.a.l.util.MemoryModel class was completely removed (without replacement). The new static methods no longer support String intern-ness checking, interned strings now count to memory usage as any other Java object.
    (Dawid Weiss, Uwe Schindler, Shai Erera)
  • LUCENE-3738 : All readXxx methods in BufferedIndexInput were made final. Subclasses should only override protected readInternal / seekInternal.
    (Uwe Schindler)
  • LUCENE-2599 : Deprecated the spatial contrib module, which was buggy and not well maintained. Lucene 4 includes a new spatial module that replaces this.
    (David Smiley, Ryan McKinley, Chris Male)
  • Changes in Runtime Behavior (3)
  • LUCENE-3796 , SOLR-3241 : Throw an exception if you try to set an index-time boost on a field that omits norms. Because the index-time boost is multiplied into the norm, previously your boost would be silently discarded.
    (Tomás Fernández Löbbe, Hoss Man, Robert Muir)
  • LUCENE-3848 : Fix tokenstreams to not produce a stream with an initial position increment of 0: which is out of bounds (overlapping with a non-existent previous term). Consumers such as IndexWriter and QueryParser still check for and silently correct this situation today, but at some point in the future they may throw an exception.
    (Mike McCandless, Robert Muir)
  • LUCENE-3738 : DataInput/DataOutput no longer allow negative vLongs. Negative vInts are still supported (for index backwards compatibility), but should not be used in new code. The read method for negative vLongs was already broken since Lucene 3.1.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • Security fixes (1)
  • LUCENE-3588 : Try harder to prevent SIGSEGV on cloned MMapIndexInputs: Previous versions of Lucene could SIGSEGV the JVM if you try to access the clone of an IndexInput retrieved from MMapDirectory. This security fix prevents this as best as it can by throwing AlreadyClosedException also on clones.
    (Uwe Schindler, Robert Muir)
  • API Changes (7)
  • LUCENE-3606 : IndexReader will be made read-only in Lucene 4.0, so all methods allowing to delete or undelete documents using IndexReader were deprecated; you should use IndexWriter now. Consequently IndexReader.commit() and all open(), openIfChanged(), clone() methods taking readOnly booleans (or IndexDeletionPolicy instances) were deprecated. IndexReader.setNorm() is superfluous and was deprecated. If you have to change per-document boost use CustomScoreQuery. If you want to dynamically change norms (boost *and* length norm) at query time, wrap your IndexReader using FilterIndexReader, overriding FilterIndexReader.norms(). To persist the changes on disk, copy the FilteredIndexReader to a new index using IndexWriter.addIndexes(). In Lucene 4.0, SimilarityProvider will allow you to customize scoring using external norms, too.
    (Uwe Schindler, Robert Muir)
  • LUCENE-3735 : PayloadProcessorProvider was changed to return a ReaderPayloadProcessor instead of DirPayloadProcessor. The selection of the provider to return for the factory is now based on the IndexReader to be merged. To mimic the old behaviour, just use IndexReader.directory() for choosing the provider by Directory.
    (Uwe Schindler)
  • LUCENE-3765 : Deprecated StopFilter ctor that took ignoreCase, because in some cases (if the set is a CharArraySet), the argument is ignored. Deprecated StandardAnalyzer and ClassicAnalyzer ctors that take File, please use the Reader ctor instead.
    (Robert Muir)
  • LUCENE-3766 : Deprecate no-arg ctors of Tokenizer. Tokenizers are TokenStreams with Readers: tokenizers with null Readers will not be supported in Lucene 4.0, just use a TokenStream.
    (Mike McCandless, Robert Muir)
  • LUCENE-3769 : Simplified NRTManager by requiring applyDeletes to be passed to ctor only; if an app needs to mix and match it's free to create two NRTManagers (one always applying deletes and the other never applying deletes).
    (MJB, Shai Erera, Mike McCandless)
  • LUCENE-3761 : Generalize SearcherManager into an abstract ReferenceManager. SearcherManager remains a concrete class, but due to the refactoring, the method maybeReopen has been deprecated in favor of maybeRefresh().
    (Shai Erera, Mike McCandless, Simon Willnauer)
  • LUCENE-3776 : You now acquire/release the IndexSearcher directly from NRTManager.
    (Mike McCandless)
  • New Features (11)
  • LUCENE-3593 : Added a FieldValueFilter that accepts all documents that either have at least one or no value at all in a specific field.
    (Simon Willnauer, Uwe Schindler, Robert Muir)
  • LUCENE-3586 : CheckIndex and IndexUpgrader allow you to specify the specific FSDirectory implementation to use (with the new -dir-impl command-line option).
    (Luca Cavanna via Mike McCandless)
  • LUCENE-3634 : IndexReader's static main method was moved to a new tool, CompoundFileExtractor, in contrib/misc.
    (Robert Muir, Mike McCandless)
  • LUCENE-995 : The QueryParser now interprets * as an open end for range queries. Literal asterisks may be represented by quoting or escaping (i.e. \* or "*") Custom QueryParser subclasses overriding getRangeQuery() will be passed null for any open endpoint. (Ingo Renner, Adriano Crestani, yonik, Mike McCandless
  • LUCENE-3121 : Add sugar reverse lookup (given an output, find the input mapping to it) for FSTs that have strictly monotonic long outputs (such as an ord).
    (Mike McCandless)
  • LUCENE-3671 : Add TypeTokenFilter that filters tokens based on their TypeAttribute.
    (Tommaso Teofili via Uwe Schindler)
  • LUCENE-3690 , LUCENE-3913 : Added HTMLStripCharFilter, a CharFilter that strips HTML markup.
    (Steve Rowe)
  • LUCENE-3725 : Added optional packing to FST building; this uses extra RAM during building but results in a smaller FST.
    (Mike McCandless)
  • LUCENE-3714 : Add top N shortest cost paths search for FST.
    (Robert Muir, Dawid Weiss, Mike McCandless)
  • LUCENE-3789 : Expose MTQ TermsEnum via RewriteMethod for non package private access
    (Simon Willnauer)
  • LUCENE-3881 : Added UAX29URLEmailAnalyzer: a standard analyzer that recognizes URLs and emails.
    (Steve Rowe)
  • Bug fixes (17)
  • LUCENE-3595 : Fixed FieldCacheRangeFilter and FieldCacheTermsFilter to correctly respect deletions on reopened SegmentReaders. Factored out FieldCacheDocIdSet to be a top-level class.
    (Uwe Schindler, Simon Willnauer)
  • LUCENE-3627 : Don't let an errant 0-byte segments_N file corrupt the index.
    (Ken McCracken via Mike McCandless)
  • LUCENE-3630 : The internal method MultiReader.doOpenIfChanged(boolean doClone) was overriding IndexReader.doOpenIfChanged(boolean readOnly), so changing the contract of the overridden method. This method was renamed and made private. In ParallelReader the bug was not existent, but the implementation method was also made private.
    (Uwe Schindler)
  • LUCENE-3641 : Fixed MultiReader to correctly propagate readerFinishedListeners to clones/reopened readers.
    (Uwe Schindler)
  • LUCENE-3642 , SOLR-2891 , LUCENE-3717 : Fixed bugs in CharTokenizer, n-gram tokenizers/filters, compound token filters, thai word filter, icutokenizer, pattern analyzer, wikipediatokenizer, and smart chinese where they would create invalid offsets in some situations, leading to problems in highlighting.
    (Max Beutel, Edwin Steiner via Robert Muir)
  • LUCENE-3639 : TopDocs.merge was incorrectly setting TopDocs.maxScore to Float.MIN_VALUE when it should be Float.NaN, when there were 0 hits. Improved age calculation in SearcherLifetimeManager, to have double precision and to compute age to be how long ago the searcher was replaced with a new searcher
    (Mike McCandless)
  • LUCENE-3658 : Corrected potential concurrency issues with NRTCachingDir, fixed createOutput to overwrite any previous file, and removed invalid asserts
    (Robert Muir, Mike McCandless)
  • LUCENE-3605 : don't sleep in a retry loop when trying to locate the segments_N file
    (Robert Muir, Mike McCandless)
  • LUCENE-3711 : SentinelIntSet with a small initial size can go into an infinite loop when expanded. This can affect grouping using TermAllGroupsCollector or TermAllGroupHeadsCollector if instantiated with a non default small size.
    (Martijn van Groningen, yonik)
  • LUCENE-3727 : When writing stored fields and term vectors, Lucene checks file sizes to detect a bug in some Sun JREs ( LUCENE-1282 ), however, on some NFS filesystems File.length() could be stale, resulting in false errors like "fdx size mismatch while indexing". These checks now use getFilePointer instead to avoid this.
    (Jamir Shaikh, Mike McCandless, Robert Muir)
  • LUCENE-3816 : Fixed problem in FilteredDocIdSet, if null was returned from the delegate DocIdSet.iterator(), which is allowed to return null by DocIdSet specification when no documents match.
    (Shay Banon via Uwe Schindler)
  • LUCENE-3821 : SloppyPhraseScorer missed documents that ExactPhraseScorer finds When phrase query had repeating terms (e.g. "yes no yes") sloppy query missed documents that exact query matched. Fixed except when for repeating multiterms (e.g. "yes no yes|no").
    (Robert Muir, Doron Cohen)
  • LUCENE-3841 : Fix CloseableThreadLocal to also purge stale entries on get(); this fixes certain cases where we were holding onto objects for dead threads for too long
    (Matthew Bellew, Mike McCandless)
  • LUCENE-3872 : IndexWriter.close() now throws IllegalStateException if you call it after calling prepareCommit() without calling commit() first.
    (Tim Bogaert via Mike McCandless)
  • LUCENE-3874 : Throw IllegalArgumentException from IndexWriter (rather than producing a corrupt index), if a positionIncrement would cause integer overflow. This can happen, for example when using a buggy TokenStream that forgets to call clearAttributes() in combination with a StopFilter.
    (Robert Muir)
  • LUCENE-3876 : Fix bug where positions for a document exceeding Integer.MAX_VALUE/2 would produce a corrupt index.
    (Simon Willnauer, Mike McCandless, Robert Muir)
  • LUCENE-3880 : UAX29URLEmailTokenizer now recognizes emails when the mailto: scheme is prepended.
    (Kai Gülzau, Steve Rowe)
  • Optimizations (1)
  • LUCENE-3653 : Improve concurrency in VirtualMethod and AttributeSource by using a WeakIdentityMap based on a ConcurrentHashMap.
    (Uwe Schindler, Gerrit Jansen van Vuuren)
  • Documentation (2)
  • LUCENE-3597 : Fixed incorrect grouping documentation.
    (Martijn van Groningen, Robert Muir)
  • LUCENE-3926 : Improve documentation of RAMDirectory, because this class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful.
    (Uwe Schindler, Mike McCandless, Robert Muir)
  • Build (8)
  • LUCENE-3857 : exceptions from other threads in beforeclass/etc do not fail the test
    (Dawid Weiss)
  • LUCENE-3847 : LuceneTestCase will now check for modifications of System properties before and after each test (and suite). If changes are detected, the test will fail. A rule can be used to reset system properties to before-scope state (and this has been used to make Solr tests pass).
    (Dawid Weiss, Uwe Schindler) .
  • LUCENE-3228 : Stop downloading external javadoc package-list files: Added package-list files for Oracle Java javadocs and JUnit javadocs to Lucene/Solr subversion. The Oracle Java javadocs package-list file is excluded from Lucene and Solr source release packages. Regardless of network connectivity, javadocs built from a subversion checkout contain links to Oracle & JUnit javadocs. Building javadocs from a source release package will download the Oracle Java package-list file if it isn't already present. When the Oracle Java package-list file is not present and download fails, the javadocs targets will not fail the build, though an error will appear in the build log. In this case, the built javadocs will not contain links to Oracle Java javadocs. Links from Solr javadocs to Lucene's javadocs are enabled. When building a X.Y.Z-SNAPSHOT version, the links are to the most recently built nightly Jenkins javadocs. When building a release version, links are to the Lucene release javadocs for the same version. (Steve Rowe, hossman)
  • LUCENE-3753 : Restructure the Lucene build system: Created a new Lucene-internal module named "core" by moving the java/ and test/ directories from lucene/src/ to lucene/core/src/. Eliminated lucene/src/ by moving all its directories up one level. Each internal module (core/, test-framework/, and tools/) now has its own build.xml, from which it is possible to run module-specific targets. lucene/build.xml delegates all build tasks (via <ant dir="internal-module-dir"> calls) to these modules' build.xml files. (Steve Rowe)
  • LUCENE-3774 : Optimized and streamlined license and notice file validation by refactoring the build task into an ANT task and modifying build scripts to perform top-level checks.
    (Dawid Weiss, Steve Rowe, Robert Muir)
  • LUCENE-3762 : Upgrade JUnit to 4.10, refactor state-machine of detecting setUp/tearDown call chaining in LuceneTestCase.
    (Dawid Weiss, Robert Muir)
  • LUCENE-3944 : Make the 'generate-maven-artifacts' target use filtered POMs placed under lucene/build/poms/, rather than in each module's base directory. The 'clean' target now removes them.
    (Steve Rowe, Robert Muir)
  • LUCENE-3930 : Changed build system to use Apache Ivy for retrival of 3rd party JAR files. Please review BUILD.txt for instructions.
    (Robert Muir, Chris Male, Uwe Schindler, Steven Rowe, Hossman)
  • Release 3.5.0 [2011-11-11]

  • Changes in backwards compatibility policy (3)
  • LUCENE-3390 : The first approach in Lucene 3.4.0 for missing values support for sorting had a design problem that made the missing value be populated directly into the FieldCache arrays during sorting, leading to concurrency issues. To fix this behaviour, the method signatures had to be changed: FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField() returning a Bits interface (backported from Lucene 4.0). FieldComparator.setMissingValue() was removed and added to constructor As this is expert API, most code will not be affected.
    (Uwe Schindler, Doron Cohen, Mike McCandless)
  • LUCENE-3541 : Remove IndexInput's protected copyBuf. If you want to keep a buffer in your IndexInput, do this yourself in your implementation, and be sure to do the right thing on clone()!
    (Robert Muir)
  • LUCENE-2822 : TimeLimitingCollector now expects a counter clock instead of relying on a private daemon thread. The global time limiting clock thread has been exposed and is now lazily loaded and fully optional. TimeLimitingCollector now supports setting clock baseline manually to include prelude of a search. Previous versions set the baseline on construction time, now baseline is set once the first IndexReader is passed to the collector unless set before.
    (Simon Willnauer)
  • Changes in runtime behavior (1)
  • LUCENE-3520 : IndexReader.openIfChanged, when passed a near-real-time reader, will now return null if there are no changes. The API has always reserved the right to do this; it's just that in the past for near-real-time readers it never did.
    (Mike McCandless)
  • Bug fixes (14)
  • LUCENE-3412 : SloppyPhraseScorer was returning non-deterministic results for queries with many repeats
    (Doron Cohen)
  • LUCENE-3421 : PayloadTermQuery's explain was wrong when includeSpanScore=false.
    (Edward Drapkin via Robert Muir)
  • LUCENE-3432 : IndexWriter.expungeDeletes with TieredMergePolicy should ignore the maxMergedSegmentMB setting
    (v.sevel via Mike McCandless)
  • LUCENE-3442 : TermQuery.TermWeight.scorer() returns null for non-atomic IndexReaders (optimization bug, introcuced by LUCENE-2829 ), preventing QueryWrapperFilter and similar classes to get a top-level DocIdSet.
    (Dan C., Uwe Schindler)
  • LUCENE-3390 : Corrected handling of missing values when two parallel searches using different missing values for sorting: the missing value was populated directly into the FieldCache arrays during sorting, leading to concurrency issues.
    (Uwe Schindler, Doron Cohen, Mike McCandless)
  • LUCENE-3439 : Closing an NRT reader after the writer was closed was incorrectly invoking the DeletionPolicy and (then possibly deleting files) on the closed IndexWriter
    (Robert Muir, Mike McCandless)
  • LUCENE-3215 : SloppyPhraseScorer sometimes computed Infinite freq
    (Robert Muir, Doron Cohen)
  • LUCENE-3503 : DisjunctionSumScorer would give slightly different scores for a document depending if you used nextDoc() versus advance().
    (Mike McCandless, Robert Muir)
  • LUCENE-3529 : Properly support indexing an empty field with empty term text. Previously, if you had assertions enabled you would receive an error during flush, if you didn't, you would get an invalid index.
    (Mike McCandless, Robert Muir)
  • LUCENE-2633 : PackedInts Packed32 and Packed64 did not support internal structures larger than 256MB
    (Toke Eskildsen via Mike McCandless)
  • LUCENE-3540 : LUCENE-3255 dropped support for pre-1.9 indexes, but the error message in IndexFormatTooOldException was incorrect.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-3541 : IndexInput's default copyBytes() implementation was not safe across multiple threads, because all clones shared the same buffer.
    (Robert Muir)
  • LUCENE-3548 : Fix CharsRef#append to extend length of the existing char[] and preserve existing chars.
    (Simon Willnauer)
  • LUCENE-3582 : Normalize NaN values in NumericUtils.floatToSortableInt() / NumericUtils.doubleToSortableLong(), so this is consistent with stored fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open ranges (one bound is null). Because of normalization, NumericRangeQuery can now be used to hit NaN values by creating a query with upper == lower == NaN (inclusive).
    (Dawid Weiss, Uwe Schindler)
  • API Changes (6)
  • LUCENE-3454 : Rename IndexWriter.optimize to forceMerge to discourage use of this method since it is horribly costly and rarely justified anymore. MergePolicy.findMergesForOptimize was renamed to findForcedMerges. IndexReader.isOptimized was deprecated. IndexCommit.isOptimized was replaced with getSegmentCount.
    (Robert Muir, Mike McCandless)
  • LUCENE-3205 : Deprecated MultiTermQuery.getTotalNumerOfTerms() [and related methods], as the numbers returned are not useful for multi-segment indexes. They were only needed for tests of NumericRangeQuery.
    (Mike McCandless, Uwe Schindler)
  • LUCENE-3574 : Deprecate outdated constants in org.apache.lucene.util.Constants and add new ones for Java 6 and Java 7.
    (Uwe Schindler)
  • LUCENE-3571 : Deprecate IndexSearcher(Directory). Use the constructors that take IndexReader instead.
    (Robert Muir)
  • LUCENE-3577 : Rename IndexWriter.expungeDeletes to forceMergeDeletes, and revamped the javadocs, to discourage use of this method since it is horribly costly and rarely justified. MergePolicy.findMergesToExpungeDeletes was renamed to findForcedDeletesMerges.
    (Robert Muir, Mike McCandless)
  • LUCENE-3464 : IndexReader.reopen has been renamed to IndexReader.openIfChanged (a static method), and now returns null (instead of the old reader) if there are no changes in the index, to prevent the common pitfall of accidentally closing the old reader.
  • LUCENE-3448 : Added FixedBitSet.and(other/DISI), andNot(other/DISI).
    (Uwe Schindler)
  • LUCENE-2215 : Added IndexSearcher.searchAfter which returns results after a specified ScoreDoc (e.g. last document on the previous page) to support deep paging use cases.
    (Aaron McCurry, Grant Ingersoll, Robert Muir)
  • LUCENE-1990 : Adds internal packed ints implementation, to be used for more efficient storage of int arrays when the values are bounded, for example for storing the terms dict index
    (Toke Eskildsen via Mike McCandless)
  • LUCENE-3558 : Moved SearcherManager, NRTManager & SearcherLifetimeManager into core. All classes are contained in o.a.l.search.
    (Simon Willnauer)
  • Optimizations (5)
  • LUCENE-3426 : Add NGramPhraseQuery which extends PhraseQuery and tries to reduce the number of terms of the query when rewrite(), in order to improve performance.
    (Robert Muir, Koji Sekiguchi)
  • LUCENE-3494 : Optimize FilteredQuery to remove a multiply in score()
    (Uwe Schindler, Robert Muir)
  • LUCENE-3534 : Remove filter logic from IndexSearcher and delegate to FilteredQuery's Scorer. This is a partial backport of a cleanup in FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
    (Uwe Schindler)
  • LUCENE-2205 : Very substantial (3-5X) RAM reduction required to hold the terms index on opening an IndexReader
    (Aaron McCurry via Mike McCandless)
  • LUCENE-3443 : FieldCache can now set docsWithField, and create an array, in a single pass. This results in faster init time for apps that need both (such as sorting by a field with a missing value).
    (Mike McCandless)
  • Test Cases (2)
  • LUCENE-3420 : Disable the finalness checks in TokenStream and Analyzer for implementing subclasses in different packages, where assertions are not enabled.
    (Uwe Schindler)
  • LUCENE-3506 : tests relying on assertions being enabled were no-op because they ignored AssertionError. With this fix now entire test framework (every test) fails if assertions are disabled, unless Dtests.asserts.gracious=true is specified.
    (Doron Cohen)
  • SOLR-2849 : Fix dependencies in Maven POMs.
    (David Smiley via Steve Rowe)
  • LUCENE-3561 : Fix maven xxx-src.jar files that were missing resources.
    (Uwe Schindler)
  • Release 3.4.0 [2011-09-15]

  • Bug fixes (12)
  • LUCENE-3251 : Directory#copy failed to close target output if opening the source stream failed.
    (Simon Willnauer)
  • LUCENE-3255 : If segments_N file is all zeros (due to file corruption), don't read that to mean the index is empty.
    (Gregory Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
  • LUCENE-3254 : Fixed minor bug in deletes were written to disk, causing the file to sometimes be larger than it needed to be.
    (Mike McCandless)
  • LUCENE-3224 : Fixed a big where CheckIndex would incorrectly report a corrupt index if a term with docfreq >= 16 was indexed more than once at the same position.
    (Robert Muir)
  • LUCENE-3339 : Fixed deadlock case when multiple threads use the new block-add (IndexWriter.add/updateDocuments) methods.
    (Robert Muir, Mike McCandless)
  • LUCENE-3340 : Fixed case where IndexWriter was not flushing at exactly maxBufferedDeleteTerms
    (Mike McCandless)
  • LUCENE-3358 , LUCENE-3361 : StandardTokenizer and UAX29URLEmailTokenizer wrongly discarded combining marks attached to Han or Hiragana characters, this is fixed if you supply Version >= 3.4 If you supply a previous lucene version, you get the old buggy behavior for backwards compatibility.
    (Trejkaz, Robert Muir)
  • LUCENE-3368 : IndexWriter commits segments without applying their buffered deletes when flushing concurrently.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-3365 : Create or Append mode determined before obtaining write lock can cause IndexWriter overriding an existing index.
    (Geoff Cooney via Simon Willnauer)
  • LUCENE-3380 : Fixed a bug where FileSwitchDirectory's listAll() would wrongly throw NoSuchDirectoryException when all files written so far have been written to one directory, but the other still has not yet been created on the filesystem.
    (Robert Muir)
  • LUCENE-3409 : IndexWriter.deleteAll was failing to close pooled NRT SegmentReaders, leading to unused files accumulating in the Directory.
    (tal steier via Mike McCandless)
  • LUCENE-3418 : Lucene was failing to fsync index files on commit, meaning an operating system or hardware crash, or power loss, could easily corrupt the index.
    (Mark Miller, Robert Muir, Mike McCandless)
  • New Features (5)
  • LUCENE-3290 : Added FieldInvertState.numUniqueTerms
    (Mike McCandless, Robert Muir)
  • LUCENE-3280 : Add FixedBitSet, like OpenBitSet but is not elastic (grow on demand if you set/get/clear too-large indices).
    (Mike McCandless)
  • LUCENE-2048 : Added the ability to omit positions but still index term frequencies, you can now control what is indexed into the postings via AbstractField.setIndexOptions: DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions AbstractField.setOmitTermFrequenciesAndPositions is deprecated, you should use DOCS_ONLY instead.
    (Robert Muir)
  • LUCENE-3097 : Added a new grouping collector that can be used to retrieve all most relevant documents per group. This can be useful in situations when one wants to compute grouping based facets / statistics on the complete query result.
    (Martijn van Groningen)
  • LUCENE-3334 : If Java7 is detected, IOUtils.closeSafely() will log suppressed exceptions in the original exception, so stack trace will contain them.
    (Uwe Schindler)
  • Optimizations (2)
  • LUCENE-3201 , LUCENE-3218 : CompoundFileSystem code has been consolidated into a Directory implementation. Reading is optimized for MMapDirectory, NIOFSDirectory and SimpleFSDirectory to only map requested parts of the CFS into an IndexInput. Writing to a CFS now tries to append to the CF directly if possible and merges separately written files on the fly instead of during close.
    (Simon Willnauer, Robert Muir)
  • LUCENE-3289 : When building an FST you can now tune how aggressively the FST should try to share common suffixes. Typically you can greatly reduce RAM required during building, and CPU consumed, at the cost of a somewhat larger FST.
    (Mike McCandless)
  • Test Cases (1)
  • LUCENE-3327 : Fix AIOOBE when TestFSTs is run with -Dtests.verbose=true
    (James Dyer via Mike McCandless)
  • Build (1)
  • LUCENE-3406 : Add ant target 'package-local-src-tgz' to Lucene and Solr to package sources from the local working copy.
    (Seung-Yeoul Yang via Steve Rowe)
  • Release 3.3.0 [2011-07-10]

  • Changes in backwards compatibility policy (4)
  • LUCENE-3140 : IndexOutput.copyBytes now takes a DataInput (superclass of IndexInput) as its first argument.
    (Robert Muir, Dawid Weiss, Mike McCandless)
  • LUCENE-3191 : FieldComparator.value now returns an Object not Comparable; FieldDoc.fields also changed from Comparable[] to Object[]
    (Uwe Schindler, Mike McCandless)
  • LUCENE-3208 : Made deprecated methods Query.weight(Searcher) and Searcher.createWeight() final to prevent override. If you have overridden one of these methods, cut over to the non-deprecated implementation.
    (Uwe Schindler, Robert Muir, Yonik Seeley)
  • LUCENE-3238 : Made MultiTermQuery.rewrite() final, to prevent problems (such as not properly setting rewrite methods, or not working correctly with things like SpanMultiTermQueryWrapper). To rewrite to a simpler form, instead return a simpler enum from getEnum(IndexReader). For example, to rewrite to a single term, return a SingleTermEnum.
    (ludovic Boutros, Uwe Schindler, Robert Muir)
  • Changes in runtime behavior (4)
  • LUCENE-2834 : the hash used to compute the lock file name when the lock file is not stored in the index has changed. This means you will see a different lucene-XXX-write.lock in your lock directory.
    (Robert Muir, Uwe Schindler, Mike McCandless)
  • LUCENE-3146 : IndexReader.setNorm throws IllegalStateException if the field does not store norms.
    (Shai Erera, Mike McCandless)
  • LUCENE-3198 : On Linux, if the JRE is 64 bit and supports unmapping, FSDirectory.open now defaults to MMapDirectory instead of NIOFSDirectory since MMapDirectory gives better performance.
    (Mike McCandless)
  • LUCENE-3200 : MMapDirectory now uses chunk sizes that are powers of 2. When setting the chunk size, it is rounded down to the next possible value. The new default value for 64 bit platforms is 2^30 (1 GiB), for 32 bit platforms it stays unchanged at 2^28 (256 MiB). Internally, MMapDirectory now only uses one dedicated final IndexInput implementation supporting multiple chunks, which makes Hotspot's life easier.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • Bug fixes (7)
  • LUCENE-3147 , LUCENE-3152 : Fixed open file handles leaks in many places in the code. Now MockDirectoryWrapper (in test-framework) tracks all open files, including locks, and fails if the test fails to release all of them.
    (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
  • LUCENE-3102 : CachingCollector.replay was failing to call setScorer per-segment
    (Martijn van Groningen via Mike McCandless)
  • LUCENE-3183 : Fix rare corner case where seeking to empty term (field="", term="") with terms index interval 1 could hit ArrayIndexOutOfBoundsException
    (selckin, Robert Muir, Mike McCandless)
  • LUCENE-3208 : IndexSearcher had its own private similarity field and corresponding get/setter overriding Searcher's implementation. If you setted a different Similarity instance on IndexSearcher, methods implemented in the superclass Searcher were not using it, leading to strange bugs.
    (Uwe Schindler, Robert Muir)
  • LUCENE-3197 : Fix core merge policies to not over-merge during background optimize when documents are still being deleted concurrently with the optimize
    (Mike McCandless)
  • LUCENE-3222 : The RAM accounting for buffered delete terms was failing to measure the space required to hold the term's field and text character data.
    (Mike McCandless)
  • LUCENE-3238 : Fixed bug where using WildcardQuery("prefix*") inside of a SpanMultiTermQueryWrapper rewrote incorrectly and returned an error instead.
    (ludovic Boutros, Uwe Schindler, Robert Muir)
  • API Changes (2)
  • LUCENE-3208 : Renamed protected IndexSearcher.createWeight() to expert public method IndexSearcher.createNormalizedWeight() as this better describes what this method does. The old method is still there for backwards compatibility. Query.weight() was deprecated and simply delegates to IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
    (Uwe Schindler, Robert Muir, Yonik Seeley)
  • LUCENE-3197 : MergePolicy.findMergesForOptimize now takes Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second argument, so the merge policy knows which segments were originally present vs produced by an optimizing merge
    (Mike McCandless)
  • Optimizations (1)
  • LUCENE-1736 : DateTools.java general improvements.
    (David Smiley via Steve Rowe)
  • New Features (5)
  • LUCENE-3140 : Added experimental FST implementation to Lucene.
    (Robert Muir, Dawid Weiss, Mike McCandless)
  • LUCENE-3193 : A new TwoPhaseCommitTool allows running a 2-phase commit algorithm over objects that implement the new TwoPhaseCommit interface (such as IndexWriter).
    (Shai Erera)
  • LUCENE-3191 : Added TopDocs.merge, to facilitate merging results from different shards
    (Uwe Schindler, Mike McCandless)
  • LUCENE-3179 : Added OpenBitSet.prevSetBit
    (Paul Elschot via Mike McCandless)
  • LUCENE-3210 : Made TieredMergePolicy more aggressive in reclaiming segments with deletions; added new methods set/getReclaimDeletesWeight to control this.
    (Mike McCandless)
  • Build (2)
  • LUCENE-1344 : Create OSGi bundle using dev-tools/maven.
    (Nicolas Lalevée, Luca Stancapiano via ryan)
  • LUCENE-3204 : The maven-ant-tasks jar is now included in the source tree; users of the generate-maven-artifacts target no longer have to manually place this jar in the Ant classpath. NOTE: when Ant looks for the maven-ant-tasks jar, it looks first in its pre-existing classpath, so any copies it finds will be used instead of the copy included in the Lucene/Solr source tree. For this reason, it is recommeded to remove any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under ~/.ant/lib/ or under the Ant installation's lib/ directory.
    (Steve Rowe)
  • Release 3.2.0 [2011-06-03]

  • Changes in backwards compatibility policy (3)
  • LUCENE-2953 : PriorityQueue's internal heap was made private, as subclassing with generics can lead to ClassCastException. For advanced use (e.g. in Solr) a method getHeapArray() was added to retrieve the internal heap array as a non-generic Object[].
    (Uwe Schindler, Yonik Seeley)
  • LUCENE-1076 : IndexWriter.setInfoStream now throws IOException
    (Mike McCandless, Shai Erera)
  • LUCENE-3084 : MergePolicy.OneMerge.segments was changed from SegmentInfos to a List<SegmentInfo>. SegmentInfos itself was changed to no longer extend Vector<SegmentInfo> (to update code that is using Vector-API, use the new asList() and asSet() methods returning unmodifiable collections; modifying SegmentInfos is now only possible through the explicitely declared methods). IndexWriter.segString() now takes Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile should fix this. MergePolicy and SegmentInfos are internal/experimental APIs not covered by the strict backwards compatibility policy.
    (Uwe Schindler, Mike McCandless)
  • Changes in runtime behavior (2)
  • LUCENE-3065 : When a NumericField is retrieved from a Document loaded from IndexReader (or IndexSearcher), it will now come back as NumericField not as a Field with a string-ified version of the numeric value you had indexed. Note that this only applies for newly-indexed Documents; older indices will still return Field with the string-ified numeric value. If you call Document.get(), the value comes still back as String, but Document.getFieldable() returns NumericField instances.
    (Uwe Schindler, Ryan McKinley, Mike McCandless)
  • LUCENE-1076 : Changed the default merge policy from LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32 (passed to IndexWriterConfig), which is able to merge non-contiguous segments. This means docIDs no longer necessarily stay "in order" during indexing. If this is a problem then you can use either of the LogMergePolicy impls.
    (Mike McCandless)
  • New features (5)
  • LUCENE-3082 : Added index upgrade tool oal.index.IndexUpgrader that allows to upgrade all segments to last recent supported index format without fully optimizing.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-1076 : Added TieredMergePolicy which is able to merge non-contiguous segments, which means docIDs no longer necessarily stay "in order".
    (Mike McCandless, Shai Erera)
  • LUCENE-3071 : Adding ReversePathHierarchyTokenizer, added skip parameter to PathHierarchyTokenizer
    (Olivier Favre via ryan)
  • LUCENE-1421 , LUCENE-3102 : added CachingCollector which allow you to cache document IDs and scores encountered during the search, and "replay" them to another Collector.
    (Mike McCandless, Shai Erera)
  • LUCENE-3112 : Added experimental IndexWriter.add/updateDocuments, enabling a block of documents to be indexed, atomically, with guaranteed sequential docIDs.
    (Mike McCandless)
  • API Changes (3)
  • LUCENE-3061 : IndexWriter's getNextMerge() and merge(OneMerge) are now public (though @lucene.experimental), allowing for custom MergeScheduler implementations.
    (Shai Erera)
  • LUCENE-3065 : Document.getField() was deprecated, as it throws ClassCastException when loading lazy fields or NumericFields.
    (Uwe Schindler, Ryan McKinley, Mike McCandless)
  • LUCENE-2027 : Directory.touchFile is deprecated and will be removed in 4.0.
    (Mike McCandless)
  • Optimizations (3)
  • LUCENE-2990 : ArrayUtil/CollectionUtil.*Sort() methods now exit early on empty or one-element lists/arrays.
    (Uwe Schindler)
  • LUCENE-2897 : Apply deleted terms while flushing a segment. We still buffer deleted terms to later apply to past segments.
    (Mike McCandless)
  • LUCENE-3126 : IndexWriter.addIndexes copies incoming segments into CFS if they aren't already and MergePolicy allows that.
    (Shai Erera)
  • Bug fixes (6)
  • LUCENE-2996 : addIndexes(IndexReader) did not flush before adding the new indexes, causing existing deletions to be applied on the incoming indexes as well.
    (Shai Erera, Mike McCandless)
  • LUCENE-3024 : Index with more than 2.1B terms was hitting AIOOBE when seeking TermEnum (eg used by Solr's faceting)
    (Tom Burton-West, Mike McCandless)
  • LUCENE-3042 : When a filter or consumer added Attributes to a TokenStream chain after it was already (partly) consumed [or clearAttributes(), captureState(), cloneAttributes(),... was called by the Tokenizer], the Tokenizer calling clearAttributes() or capturing state after addition may not do this on the newly added Attribute. This bug affected only very special use cases of the TokenStream-API, most users would not have recognized it.
    (Uwe Schindler, Robert Muir)
  • LUCENE-3054 : PhraseQuery can in some cases stack overflow in SorterTemplate.quickSort(). This fix also adds an optimization to PhraseQuery as term with lower doc freq will also have less positions.
    (Uwe Schindler, Robert Muir, Otis Gospodnetic)
  • LUCENE-3068 : sloppy phrase query failed to match valid documents when multiple query terms had same position in the query.
    (Doron Cohen)
  • LUCENE-3012 : Lucene writes the header now for separate norm files (*.sNNN)
    (Robert Muir)
  • Build (2)
  • LUCENE-3006 : Building javadocs will fail on warnings by default. Override with -Dfailonjavadocwarning=false
    (sarowe, gsingers)
  • LUCENE-3128 : "ant eclipse" creates a .project file for easier Eclipse integration (unless one already exists).
    (Daniel Serodio via Shai Erera)
  • Test Cases (1)
  • LUCENE-3002 : added 'tests.iter.min' to control 'tests.iter' by allowing to stop iterating if at least 'tests.iter.min' ran and a failure occured.
    (Shai Erera, Chris Hostetter)
  • Release 3.1.0 [2011-03-31]

  • Changes in backwards compatibility policy (18)
  • LUCENE-2719 : Changed API of internal utility class org.apache.lucene.util.SorterTemplate to support faster quickSort using pivot values and also merge sort and insertion sort. If you have used this class, you have to implement two more methods for handling pivots.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-1923 : Renamed SegmentInfo & SegmentInfos segString method to toString. These are advanced APIs and subject to change suddenly.
    (Tim Smith via Mike McCandless)
  • LUCENE-2190 : Removed deprecated customScore() and customExplain() methods from experimental CustomScoreQuery.
    (Uwe Schindler)
  • LUCENE-2286 : Enabled DefaultSimilarity.setDiscountOverlaps by default. This means that terms with a position increment gap of zero do not affect the norms calculation by default.
    (Robert Muir)
  • LUCENE-2320 : MergePolicy.writer is now of type SetOnce, which allows setting the IndexWriter for a MergePolicy exactly once. You can change references to 'writer' from writer.doXYZ() to writer.get().doXYZ() (it is also advisable to add an assert writer != null; before you access the wrapped IndexWriter.) In addition, MergePolicy only exposes a default constructor, and the one that took IndexWriter as argument has been removed from all MergePolicy extensions.
    (Shai Erera via Mike McCandless)

  • LUCENE-2328 : SimpleFSDirectory.SimpleFSIndexInput is moved to FSDirectory.FSIndexInput. Anyone extending this class will have to fix their code on upgrading.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2302 : The new interface for term attributes, CharTermAttribute, now implements CharSequence. This requires the toString() methods of CharTermAttribute, deprecated TermAttribute, and Token to return only the term text and no other attribute contents. LUCENE-2374 implements an attribute reflection API to no longer rely on toString() for attribute inspection.
    (Uwe Schindler, Robert Muir)
  • LUCENE-2372 , LUCENE-2389 : StandardAnalyzer, KeywordAnalyzer, PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod(). Analyzer and TokenStream base classes now have an assertion in their ctor, that check subclasses to be final or at least have final implementations of incrementToken(), tokenStream(), and reusableTokenStream().
    (Uwe Schindler, Robert Muir)
  • LUCENE-2316 : Directory.fileLength contract was clarified - it returns the actual file's length if the file exists, and throws FileNotFoundException otherwise. Returning length=0 for a non-existent file is no longer allowed. If you relied on that, make sure to catch the exception.
    (Shai Erera)
  • LUCENE-2386 : IndexWriter no longer performs an empty commit upon new index creation. Previously, if you passed an empty Directory and set OpenMode to CREATE*, IndexWriter would make a first empty commit. If you need that behavior you can call writer.commit()/close() immediately after you create it.
    (Shai Erera, Mike McCandless)
  • LUCENE-2733 : Removed public constructors of utility classes with only static methods to prevent instantiation.
    (Uwe Schindler)
  • LUCENE-2602 : The default (LogByteSizeMergePolicy) merge policy now takes deletions into account by default. You can disable this by calling setCalibrateSizeByDeletes(false) on the merge policy.
    (Mike McCandless)
  • LUCENE-2529 , LUCENE-2668 : Position increment gap and offset gap of empty values in multi-valued field has been changed for some cases in index. If you index empty fields and uses positions/offsets information on that fields, reindex is recommended.
    (David Smiley, Koji Sekiguchi)
  • LUCENE-2804 : Directory.setLockFactory new declares throwing an IOException.
    (Shai Erera, Robert Muir)
  • LUCENE-2837 : Added deprecations noting that in 4.0, Searcher and Searchable are collapsed into IndexSearcher; contrib/remote and MultiSearcher have been removed.
    (Mike McCandless)
  • LUCENE-2854 : Deprecated SimilarityDelegator and Similarity.lengthNorm; the latter is now final, forcing any custom Similarity impls to cutover to the more general computeNorm
    (Robert Muir, Mike McCandless)
  • LUCENE-2869 : Deprecated Query.getSimilarity: instead of using "runtime" subclassing/delegation, subclass the Weight instead.
    (Robert Muir)
  • LUCENE-2674 : A new idfExplain method was added to Similarity, that accepts an incoming docFreq. If you subclass Similarity, make sure you also override this method on upgrade.
    (Robert Muir, Mike McCandless)
  • Changes in runtime behavior (13)
  • LUCENE-1923 : Made IndexReader.toString() produce something meaningful
    (Tim Smith via Mike McCandless)
  • LUCENE-2179 : CharArraySet.clear() is now functional.
    (Robert Muir, Uwe Schindler)
  • LUCENE-2455 : IndexWriter.addIndexes no longer optimizes the target index before it adds the new ones. Also, the existing segments are not merged and so the index will not end up with a single segment (unless it was empty before). In addition, addIndexesNoOptimize was renamed to addIndexes and no longer invokes a merge on the incoming and target segments, but instead copies the segments to the target index. You can call maybeMerge or optimize after this method completes, if you need to. In addition, Directory.copyTo* were removed in favor of copy which takes the target Directory, source and target files as arguments, and copies the source file to the target Directory under the target file name.
    (Shai Erera)

  • LUCENE-2663 : IndexWriter no longer forcefully clears any existing locks when create=true. This was a holdover from when SimpleFSLockFactory was the default locking implementation, and, even then it was dangerous since it could mask bugs in IndexWriter's usage, allowing applications to accidentally open two writers on the same directory.
    (Mike McCandless)
  • LUCENE-2701 : maxMergeMBForOptimize and maxMergeDocs constraints set on LogMergePolicy now affect optimize() as well (as opposed to only regular merges). This means that you can run optimize() and too large segments won't be merged.
    (Shai Erera)
  • LUCENE-2753 : IndexReader and DirectoryReader .listCommits() now return a List, guaranteeing the commits are sorted from oldest to latest.
    (Shai Erera)
  • LUCENE-2785 : TopScoreDocCollector, TopFieldCollector and the IndexSearcher search methods that take an int nDocs will now throw IllegalArgumentException if nDocs is 0. Instead, you should use the newly added TotalHitCountCollector.
    (Mike McCandless)
  • LUCENE-2790 : LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio to determine whether the passed in segment should be compound.
    (Shai Erera, Earwin Burrfoot)
  • LUCENE-2805 : IndexWriter now increments the index version on every change to the index instead of for every commit. Committing or closing the IndexWriter without any changes to the index will not cause any index version increment.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-2650 , LUCENE-2825 : The behavior of FSDirectory.open has changed. On 64-bit Windows and Solaris systems that support unmapping, FSDirectory.open returns MMapDirectory. Additionally the behavior of MMapDirectory has been changed to enable unmapping by default if supported by the JRE.
    (Mike McCandless, Uwe Schindler, Robert Muir)
  • LUCENE-2829 : Improve the performance of "primary key" lookup use case (running a TermQuery that matches one document) on a multi-segment index.
    (Robert Muir, Mike McCandless)
  • LUCENE-2010 : Segments with 100% deleted documents are now removed on IndexReader or IndexWriter commit.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-2960 : Allow some changes to IndexWriterConfig to take effect "live" (after an IW is instantiated), via IndexWriter.getConfig().setXXX(...)
    (Shay Banon, Mike McCandless)
  • API Changes (26)
  • LUCENE-2076 : Rename FSDirectory.getFile -> getDirectory.
    (George Aroush via Mike McCandless)
  • LUCENE-1260 : Change norm encode (float->byte) and decode (byte->float) to be instance methods not static methods. This way a custom Similarity can alter how norms are encoded, though they must still be encoded as a single byte
    (Johan Kindgren via Mike McCandless)
  • LUCENE-2103 : NoLockFactory should have a private constructor; until Lucene 4.0 the default one will be deprecated.
    (Shai Erera via Uwe Schindler)
  • LUCENE-2177 : Deprecate the Field ctors that take byte[] and Store. Since the removal of compressed fields, Store can only be YES, so it's not necessary to specify.
    (Erik Hatcher via Mike McCandless)
  • LUCENE-2200 : Several final classes had non-overriding protected members. These were converted to private and unused protected constructors removed.
    (Steven Rowe via Robert Muir)
  • LUCENE-2240 : SimpleAnalyzer and WhitespaceAnalyzer now have Version ctors.
    (Simon Willnauer via Uwe Schindler)
  • LUCENE-2259 : Add IndexWriter.deleteUnusedFiles, to attempt removing unused files. This is only useful on Windows, which prevents deletion of open files. IndexWriter will eventually remove these files itself; this method just lets you do so when you know the files are no longer open by IndexReaders.
    (luocanrao via Mike McCandless)
  • LUCENE-2282 : IndexFileNames is exposed as a public class allowing for easier use by external code. In addition it offers a matchExtension method which callers can use to query whether a certain file matches a certain extension.
    (Shai Erera via Mike McCandless)
  • LUCENE-124 : Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery. This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but only scores terms by their boost values. For example, this can be used with FuzzyQuery to ensure that exact matches are always scored higher, because only the boost will be used in scoring.
    (Robert Muir)
  • LUCENE-2015 : Add a static method foldToASCII to ASCIIFoldingFilter to expose its folding logic.
    (Cédrik Lime via Robert Muir)
  • LUCENE-2294 : IndexWriter constructors have been deprecated in favor of a single ctor which accepts IndexWriterConfig and a Directory. You can set all the parameters related to IndexWriter on IndexWriterConfig. The different setter/getter methods were deprecated as well. One should call writer.getConfig().getXYZ() to query for a parameter XYZ. Additionally, the setter/getter related to MergePolicy were deprecated as well. One should interact with the MergePolicy directly.
    (Shai Erera via Mike McCandless)
  • LUCENE-2320 : IndexWriter's MergePolicy configuration was moved to IndexWriterConfig and the respective methods on IndexWriter were deprecated.
    (Shai Erera via Mike McCandless)
  • LUCENE-2328 : Directory now keeps track itself of the files that are written but not yet fsynced. The old Directory.sync(String file) method is deprecated and replaced with Directory.sync(Collection<String> files). Take a look at FSDirectory to see a sample of how such tracking might look like, if needed in your custom Directories.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2302 : Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions).
    (Uwe Schindler, Robert Muir)
  • LUCENE-2402 : IndexWriter.deleteUnusedFiles now deletes unreferenced commit points too. If you use an IndexDeletionPolicy which holds onto index commits (such as SnapshotDeletionPolicy), you can call this method to remove those commit points when they are not needed anymore (instead of waiting for the next commit).
    (Shai Erera)
  • LUCENE-2481 : SnapshotDeletionPolicy.snapshot() and release() were replaced with equivalent ones that take a String (id) as argument. You can pass whatever ID you want, as long as you use the same one when calling both.
    (Shai Erera)
  • LUCENE-2356 : Add IndexWriterConfig.set/getReaderTermIndexDivisor, to set what IndexWriter passes for termsIndexDivisor to the readers it opens internally when apply deletions or creating a near-real-time reader.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2167 , LUCENE-2699 , LUCENE-2763 , LUCENE-2847 : StandardTokenizer/Analyzer in common/standard/ now implement the Word Break rules from the Unicode 6.0.0 Text Segmentation algorithm (UAX#29), covering the full range of Unicode code points, including values from U+FFFF to U+10FFFF ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/ Analyzer implementation and behavior. Only the Unicode Basic Multilingual Plane (code points from U+0000 to U+FFFF) is covered. UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
    (Steven Rowe, Robert Muir, Uwe Schindler)

  • LUCENE-2778 : RAMDirectory now exposes newRAMFile() which allows to override and return a different RAMFile implementation.
    (Shai Erera)
  • LUCENE-2785 : Added TotalHitCountCollector whose sole purpose is to count the number of hits matching the query.
    (Mike McCandless)
  • LUCENE-2846 : Deprecated IndexReader.setNorm(int, String, float). This method is only syntactic sugar for setNorm(int, String, byte), but using the global Similarity.getDefault().encodeNormValue(). Use the byte-based method instead to ensure that the norm is encoded with your Similarity.
    (Robert Muir, Mike McCandless)
  • LUCENE-2374 : Added Attribute reflection API: It's now possible to inspect the contents of AttributeImpl and AttributeSource using a well-defined API. This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes in a structured way. There are also some backwards incompatible changes in toString() output, as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute leading to changed toString() return values. The new API allows to get a string representation in a well-defined way using a new method reflectAsString(). For backwards compatibility reasons, when toString() was implemented by implementation subclasses, the default implementation of AttributeImpl.reflectWith() uses toString()s output instead to report the Attribute's properties. Otherwise, reflectWith() uses Java's reflection (like toString() did before) to get the attribute properties. In addition, the mandatory equals() and hashCode() are no longer required for AttributeImpls, but can still be provided (if needed).
    (Uwe Schindler)
  • LUCENE-2691 : Deprecate IndexWriter.getReader in favor of IndexReader.open(IndexWriter)
    (Grant Ingersoll, Mike McCandless)
  • LUCENE-2876 : Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity, it should keep it itself. Fixed Scorers to pass their parent Weight, so that Scorer.visitSubScorers ( LUCENE-2590 ) will work correctly.
    (Robert Muir, Doron Cohen)
  • LUCENE-2900 : When opening a near-real-time (NRT) reader (IndexReader.re/open(IndexWriter)) you can now specify whether deletes should be applied. Applying deletes can be costly, and some expert use cases can handle seeing deleted documents returned. The deletes remain buffered so that the next time you open an NRT reader and pass true, all deletes will be a applied.
    (Mike McCandless)
  • LUCENE-1253 : LengthFilter (and Solr's KeepWordTokenFilter) now require up front specification of enablePositionIncrement. Together with StopFilter they have a common base class (FilteringTokenFilter) that handles the position increments automatically. Implementors only need to override an accept() method that filters tokens.
    (Uwe Schindler, Robert Muir)
  • Bug fixes (23)
  • LUCENE-2249 : ParallelMultiSearcher should shut down thread pool on close.
    (Martin Traverso via Uwe Schindler)
  • LUCENE-2273 : FieldCacheImpl.getCacheEntries() used WeakHashMap incorrectly and lead to ConcurrentModificationException.
    (Uwe Schindler, Robert Muir)
  • LUCENE-2328 : Index files fsync tracking moved from IndexWriter/IndexReader to Directory, and it no longer leaks memory.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2074 : Reduce buffer size of lexer back to default on reset.
    (Ruben Laguna, Shai Erera via Uwe Schindler)
  • LUCENE-2496 : Don't throw NPE if IndexWriter is opened with CREATE on a prior (corrupt) index missing its segments_N file.
    (Mike McCandless)
  • LUCENE-2458 : QueryParser no longer automatically forms phrase queries, assuming whitespace tokenization. Previously all CJK queries, for example, would be turned into phrase queries. The old behavior is preserved with the matchVersion parameter for previous versions. Additionally, you can explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
    (Robert Muir)
  • LUCENE-2537 : FSDirectory.copy() implementation was unsafe and could result in OOM if a large file was copied.
    (Shai Erera)
  • LUCENE-2580 : MultiPhraseQuery throws AIOOBE if number of positions exceeds number of terms at one position
    (Jayendra Patil via Mike McCandless)
  • LUCENE-2617 : Optional clauses of a BooleanQuery were not factored into coord if the scorer for that segment returned null. This can cause the same document to score to differently depending on what segment it resides in.
    (yonik)
  • LUCENE-2272 : Fix explain in PayloadNearQuery and also fix scoring issue
    (Peter Keegan via Grant Ingersoll)
  • LUCENE-2732 : Fix charset problems in XML loading in HyphenationCompoundWordTokenFilter.
    (Uwe Schindler)
  • LUCENE-2802 : NRT DirectoryReader returned incorrect values from getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due to a mutable reference to the IndexWriters SegmentInfos.
    (Simon Willnauer, Earwin Burrfoot)
  • LUCENE-2852 : Fixed corner case in RAMInputStream that would hit a false EOF after seeking to EOF then seeking back to same block you were just in and then calling readBytes
    (Robert Muir, Mike McCandless)
  • LUCENE-2860 : Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it decides whether to return the cached computed size or not.
    (Shai Erera)
  • LUCENE-2584 : SegmentInfo.files() could hit ConcurrentModificationException if called by multiple threads.
    (Alexander Kanarsky via Shai Erera)
  • LUCENE-2809 : Fixed IndexWriter.numDocs to take into account applied but not yet flushed deletes.
    (Mike McCandless)
  • LUCENE-2879 : MultiPhraseQuery previously calculated its phrase IDF by summing internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
    (Robert Muir)
  • LUCENE-2693 : RAM used by IndexWriter was slightly incorrectly computed.
    (Jason Rutherglen via Shai Erera)
  • LUCENE-1846 : DateTools now uses the US locale everywhere, so DateTools.round() is safe also in strange locales.
    (Uwe Schindler)
  • LUCENE-2891 : IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor, which can be used to prevent loading the terms index into memory.
    (Shai Erera)
  • LUCENE-2937 : Encoding a float into a byte (e.g. encoding field norms during indexing) had an underflow detection bug that caused floatToByte(f)==0 where f was greater than 0, but slightly less than byteToFloat(1). This meant that certain very small field norms (index_boost * length_norm) could have been rounded down to 0 instead of being rounded up to the smallest positive number.
    (yonik)
  • LUCENE-2936 : PhraseQuery score explanations were not correctly identifying matches vs non-matches.
    (hossman)
  • LUCENE-2975 : A hotspot bug corrupts IndexInput#readVInt()/readVLong() if the underlying readByte() is inlined (which happens e.g. in MMapDirectory). The loop was unwinded which makes the hotspot bug disappear.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • New features (31)
  • LUCENE-2128 : Parallelized fetching document frequencies during weight creation.
    (Israel Tsadok, Simon Willnauer via Uwe Schindler)
  • LUCENE-2069 : Added Unicode 4 support to CharArraySet. Due to the switch to Java 5, supplementary characters are now lowercased correctly if the set is created as case insensitive. CharArraySet now requires a Version argument to preserve backwards compatibility. If Version < 3.1 is passed to the constructor, CharArraySet yields the old behavior.
    (Simon Willnauer)
  • LUCENE-2069 : Added Unicode 4 support to LowerCaseFilter. Due to the switch to Java 5, supplementary characters are now lowercased correctly. LowerCaseFilter now requires a Version argument to preserve backwards compatibility. If Version < 3.1 is passed to the constructor, LowerCaseFilter yields the old behavior.
    (Simon Willnauer, Robert Muir)
  • LUCENE-2034 : Added ReusableAnalyzerBase, an abstract subclass of Analyzer that makes it easier to reuse TokenStreams correctly. This issue also added StopwordAnalyzerBase, which improves consistency of all Analyzers that use stopwords, and implement many analyzers in contrib with it.
    (Simon Willnauer via Robert Muir)
  • LUCENE-2198 , LUCENE-2901 : Support protected words in stemming TokenFilters using a new KeywordAttribute.
    (Simon Willnauer, Drew Farris via Uwe Schindler)
  • LUCENE-2183 , LUCENE-2240 , LUCENE-2241 : Added Unicode 4 support to CharTokenizer and its subclasses. CharTokenizer now has new int-API which is conditionally preferred to the old char-API depending on the provided Version. Version < 3.1 will use the char-API.
    (Simon Willnauer via Uwe Schindler)
  • LUCENE-2247 : Added a CharArrayMap<V> for performance improvements in some stemmers and synonym filters.
    (Uwe Schindler)
  • LUCENE-2320 : Added SetOnce which wraps an object and allows it to be set exactly once.
    (Shai Erera via Mike McCandless)
  • LUCENE-2314 : Added AttributeSource.copyTo(AttributeSource) that allows to use cloneAttributes() and this method as a replacement for captureState()/restoreState(), if the state itself needs to be inspected/modified.
    (Uwe Schindler)
  • LUCENE-2293 : Expose control over max number of threads that IndexWriter will allow to run concurrently while indexing documents (previously this was hardwired to 5), using IndexWriterConfig.setMaxThreadStates.
    (Mike McCandless)
  • LUCENE-2297 : Enable turning on reader pooling inside IndexWriter even when getReader (near-real-timer reader) is not in use, through IndexWriterConfig.enable/disableReaderPooling.
    (Mike McCandless)
  • LUCENE-2331 : Add NoMergePolicy which never returns any merges to execute. In addition, add NoMergeScheduler which never executes any merges. These two are convenient classes in case you want to disable segment merges by IndexWriter without tweaking a particular MergePolicy parameters, such as mergeFactor. MergeScheduler's methods are now public.
    (Shai Erera via Mike McCandless)
  • LUCENE-2339 : Deprecate static method Directory.copy in favor of Directory.copyTo, and use nio's FileChannel.transferTo when copying files between FSDirectory instances.
    (Earwin Burrfoot via Mike McCandless) .
  • LUCENE-2074 : Make StandardTokenizer fit for Unicode 4.0, if the matchVersion parameter is Version.LUCENE_31.
    (Uwe Schindler)
  • LUCENE-2385 : Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy can be used to prevent commits from ever getting deleted from the index.
    (Shai Erera)
  • LUCENE-1585 : IndexWriter now accepts a PayloadProcessorProvider which can return a DirPayloadProcessor for a given Directory, which returns a PayloadProcessor for a given Term. The PayloadProcessor will be used to process the payloads of the segments as they are merged (e.g. if one wants to rewrite payloads of external indexes as they are added, or of local ones).
    (Shai Erera, Michael Busch, Mike McCandless)
  • LUCENE-2440 : Add support for custom ExecutorService in ParallelMultiSearcher
    (Edward Drapkin via Mike McCandless)
  • LUCENE-2295 : Added a LimitTokenCountAnalyzer / LimitTokenCountFilter to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter. This patch also fixes a bug in the offset calculation in CharTokenizer.
    (Uwe Schindler, Shai Erera)
  • LUCENE-2526 : Don't throw NPE from MultiPhraseQuery.toString when it's empty.
    (Ross Woolf via Mike McCandless)
  • LUCENE-2559 : Added SegmentReader.reopen methods
    (John Wang via Mike McCandless)
  • LUCENE-2590 : Added Scorer.visitSubScorers, and Scorer.freq. Along with a custom Collector these experimental methods make it possible to gather the hit-count per sub-clause and per document while a search is running.
    (Simon Willnauer, Mike McCandless)
  • LUCENE-2636 : Added MultiCollector which allows running the search with several Collectors.
    (Shai Erera)
  • LUCENE-2754 , LUCENE-2757 : Added a wrapper around MultiTermQueries to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>. Using this wrapper it's easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
    (Robert Muir, Uwe Schindler)
  • LUCENE-2838 : ConstantScoreQuery now directly supports wrapping a Query instance for stripping off scores. The use of a QueryWrapperFilter is no longer needed and discouraged for that use case. Directly wrapping Query improves performance, as out-of-order collection is now supported.
    (Uwe Schindler)
  • LUCENE-2864 : Add getMaxTermFrequency (maximum within-document TF) to FieldInvertState so that it can be used in Similarity.computeNorm.
    (Robert Muir)
  • LUCENE-2720 : Segments now record the code version which created them.
    (Shai Erera, Mike McCandless, Uwe Schindler)
  • LUCENE-2474 : Added expert ReaderFinishedListener API to IndexReader, to allow apps that maintain external per-segment caches to evict entries when a segment is finished.
    (Shay Banon, Yonik Seeley, Mike McCandless)
  • LUCENE-2911 : The new StandardTokenizer, UAX29URLEmailTokenizer, and the ICUTokenizer in contrib now all tag types with a consistent set of token types (defined in StandardTokenizer). Tokens in the major CJK types are explicitly marked to allow for custom downstream handling: <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
    (Robert Muir, Steven Rowe)
  • LUCENE-2913 : Add missing getters to Numeric* classes.
    (Uwe Schindler)
  • LUCENE-1810 : Added FieldSelectorResult.LATENT to not cache lazy loaded fields
    (Tim Smith, Grant Ingersoll)
  • LUCENE-2692 : Added several new SpanQuery classes for positional checking (match is in a range, payload is a specific value)
    (Grant Ingersoll)
  • Optimizations (23)
  • LUCENE-2494 : Use CompletionService in ParallelMultiSearcher instead of simple polling for results.
    (Edward Drapkin, Simon Willnauer)
  • LUCENE-2075 : Terms dict cache is now shared across threads instead of being stored separately in thread local storage. Also fixed terms dict so that the cache is used when seeking the thread local term enum, which will be important for MultiTermQuery impls that do lots of seeking
    (Mike McCandless, Uwe Schindler, Robert Muir, Yonik Seeley)
  • LUCENE-2136 : If the multi reader (DirectoryReader or MultiReader) only has a single sub-reader, delegate all enum requests to it. This avoid the overhead of using a PQ unnecessarily.
    (Mike McCandless)
  • LUCENE-2137 : Switch to AtomicInteger for some ref counting
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2123 , LUCENE-2261 : Move FuzzyQuery rewrite to separate RewriteMode into MultiTermQuery. The number of fuzzy expansions can be specified with the maxExpansions parameter to FuzzyQuery.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-2164 : ConcurrentMergeScheduler has more control over merge threads. First, it gives smaller merges higher thread priority than larges ones. Second, a new set/getMaxMergeCount setting will pause the larger merges to allow smaller ones to finish. The defaults for these settings are now dynamic, depending the number CPU cores as reported by Runtime.getRuntime().availableProcessors()
    (Mike McCandless)
  • LUCENE-2169 : Improved CharArraySet.copy(), if source set is also a CharArraySet.
    (Simon Willnauer via Uwe Schindler)
  • LUCENE-2084 : Change IndexableBinaryStringTools to work on byte[] and char[] directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to take advantage of this for faster performance.
    (Steven Rowe, Uwe Schindler, Robert Muir)
  • LUCENE-2188 : Add a utility class for tracking deprecated overridden methods in non-final subclasses.
    (Uwe Schindler, Robert Muir)
  • LUCENE-2195 : Speedup CharArraySet if set is empty.
    (Simon Willnauer via Robert Muir)
  • LUCENE-2285 : Code cleanup.
    (Shai Erera via Uwe Schindler)
  • LUCENE-2303 : Remove code duplication in Token class by subclassing TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve null-handling for TypeAttribute.
    (Uwe Schindler)
  • LUCENE-2329 : Switch TermsHash* from using a PostingList object per unique term to parallel arrays, indexed by termID. This reduces garbage collection overhead significantly, which results in great indexing performance wins when the available JVM heap space is low. This will become even more important when the DocumentsWriter RAM buffer is searchable in the future, because then it will make sense to make the RAM buffers as large as possible.
    (Mike McCandless, Michael Busch)
  • LUCENE-2380 : The terms field cache methods (getTerms, getTermsIndex), which replace the older String equivalents (getStrings, getStringIndex), consume quite a bit less RAM in most cases.
    (Mike McCandless)
  • LUCENE-2410 : ~20% speedup on exact (slop=0) PhraseQuery matching.
    (Mike McCandless)
  • LUCENE-2531 : Fix issue when sorting by a String field that was causing too many fallbacks to compare-by-value (instead of by-ord).
    (Mike McCandless)
  • LUCENE-2574 : IndexInput exposes copyBytes(IndexOutput, long) to allow for efficient copying by sub-classes. Optimized copy is implemented for RAM and FS streams.
    (Shai Erera)
  • LUCENE-2719 : Improved TermsHashPerField's sorting to use a better quick sort algorithm that dereferences the pivot element not on every compare call. Also replaced lots of sorting code in Lucene by the improved SorterTemplate class.
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-2760 : Optimize SpanFirstQuery and SpanPositionRangeQuery.
    (Robert Muir)
  • LUCENE-2770 : Make SegmentMerger always work on atomic subreaders, even when IndexWriter.addIndexes(IndexReader...) is used with DirectoryReaders or other MultiReaders. This saves lots of memory during merge of norms.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-2824 : Optimize BufferedIndexInput to do less bounds checks.
    (Robert Muir)
  • LUCENE-2010 : Segments with 100% deleted documents are now removed on IndexReader or IndexWriter commit.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-1472 : Removed synchronization from static DateTools methods by using a ThreadLocal. Also converted DateTools.Resolution to a Java 5 enum (this should not break backwards).
    (Uwe Schindler)
  • Build (8)
  • LUCENE-2124 : Moved the JDK-based collation support from contrib/collation into core, and moved the ICU-based collation support into contrib/icu.
    (Robert Muir)
  • LUCENE-2326 : Removed SVN checkouts for backwards tests. The backwards branch is now included in the svn repository using "svn copy" after release.
    (Uwe Schindler)
  • LUCENE-2074 : Regenerating StandardTokenizerImpl files now needs JFlex 1.5 (currently only available on SVN).
    (Uwe Schindler)
  • LUCENE-1709 : Tests are now parallelized by default (except for benchmark). You can force them to run sequentially by passing -Drunsequential=1 on the command line. The number of threads that are spawned per CPU defaults to '1'. If you wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
    (Robert Muir, Shai Erera, Peter Kofler)
  • LUCENE-2516 : Backwards tests are now compiled against released lucene-core.jar from tarball of previous version. Backwards tests are now packaged together with src distribution.
    (Uwe Schindler)
  • LUCENE-2611 : Added Ant target to install IntelliJ IDEA configuration: "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
    (Steven Rowe)
  • LUCENE-2657 : Switch from using Maven POM templates to full POMs when generating Maven artifacts
    (Steven Rowe)
  • LUCENE-2609 : Added jar-test-framework Ant target which packages Lucene's tests' framework classes.
    (Drew Farris, Grant Ingersoll, Shai Erera, Steven Rowe)
  • Test Cases (10)
  • LUCENE-2037 Allow Junit4 tests in our environment
    (Erick Erickson via Mike McCandless)
  • LUCENE-1844 : Speed up the unit tests
    (Mark Miller, Erick Erickson, Mike McCandless)
  • LUCENE-2065 : Use Java 5 generics throughout our unit tests. Kay via Mike McCandless)
  • LUCENE-2155 : Fix time and zone dependent localization test failures in queryparser tests.
    (Uwe Schindler, Chris Male, Robert Muir)
  • LUCENE-2170 : Fix thread starvation problems.
    (Uwe Schindler)
  • LUCENE-2248 , LUCENE-2251 , LUCENE-2285 : Refactor tests to not use Version.LUCENE_CURRENT, but instead use a global static value from LuceneTestCase(J4), that contains the release version.
    (Uwe Schindler, Simon Willnauer, Shai Erera)
  • LUCENE-2313 , LUCENE-2322 : Add VERBOSE to LuceneTestCase(J4) to control verbosity of tests. If VERBOSE==false (default) tests should not print anything other than errors to System.(out|err). The setting can be changed with -Dtests.verbose=true on test invocation.
    (Shai Erera, Paul Elschot, Uwe Schindler)
  • LUCENE-2318 : Remove inconsistent system property code for retrieving temp and data directories inside test cases. It is now centralized in LuceneTestCase(J4). Also changed lots of tests to use getClass().getResourceAsStream() to retrieve test data. Tests needing access to "real" files from the test folder itself, can use LuceneTestCase(J4).getDataFile().
    (Uwe Schindler)
  • LUCENE-2398 , LUCENE-2611 : Improve tests to work better from IDEs such as Eclipse and IntelliJ.
    (Paolo Castagna, Steven Rowe via Robert Muir)
  • LUCENE-2804 : add newFSDirectory to LuceneTestCase to create a FSDirectory at random.
    (Shai Erera, Robert Muir)
  • Documentation (3)
  • LUCENE-2579 : Fix oal.search's package.html description of abstract methods.
    (Santiago M. Mola via Mike McCandless)
  • LUCENE-2625 : Add a note to IndexReader.termDocs() with additional verbiage that the TermEnum must be seeked since it is unpositioned.
    (Adriano Crestani via Robert Muir)
  • LUCENE-2894 : Use google-code-prettify for syntax highlighting in javadoc.
    (Shinichiro Abe, Koji Sekiguchi)
  • Release 2.9.4 / 3.0.3 [2010-12-03]

  • Changes in runtime behavior (3)
  • LUCENE-2689 : NativeFSLockFactory no longer attempts to acquire a test lock just before the real lock is acquired.
    (Surinder Pal Singh Bindra via Mike McCandless)
  • LUCENE-2762 : Fixed bug in IndexWriter causing it to hold open file handles against deleted files when compound-file was enabled (the default) and readers are pooled. As a result of this the peak worst-case free disk space required during optimize is now 3X the index size, when compound file is enabled (else 2X).
    (Mike McCandless)
  • LUCENE-2773 : LogMergePolicy accepts a double noCFSRatio (default = 0.1), which means any time a merged segment is greater than 10% of the index size, it will be left in non-compound format even if compound format is on. This change was made to reduce peak transient disk usage during optimize which increased due to LUCENE-2762 .
    (Mike McCandless)
  • Bug fixes (24)
  • LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer throws an exception when term count exceeds doc count.
    (Mike McCandless, Uwe Schindler)
  • LUCENE-2513 : when opening writable IndexReader on a not-current commit, do not overwrite "future" commits.
    (Mike McCandless)
  • LUCENE-2536 : IndexWriter.rollback was failing to properly rollback buffered deletions against segments that were flushed
    (Mark Harwood via Mike McCandless)
  • LUCENE-2541 : Fixed NumericRangeQuery that returned incorrect results with endpoints near Long.MIN_VALUE and Long.MAX_VALUE: NumericUtils.splitRange() overflowed, if the range contained a LOWER bound that was greater than (Long.MAX_VALUE - (1L << precisionStep)) the range contained an UPPER bound that was less than (Long.MIN_VALUE + (1L << precisionStep)) With standard precision steps around 4, this had no effect on most queries, only those that met the above conditions. Queries with large precision steps failed more easy. Queries with precision step >=64 were not affected. Also 32 bit data types int and float were not affected.
    (Yonik Seeley, Uwe Schindler)
  • LUCENE-2593 : Fixed certain rare cases where a disk full could lead to a corrupted index
    (Robert Muir, Mike McCandless)
  • LUCENE-2620 : Fixed a bug in WildcardQuery where too many asterisks would result in unbearably slow performance.
    (Nick Barkas via Robert Muir)
  • LUCENE-2627 : Fixed bug in MMapDirectory chunking when a file is an exact multiple of the chunk size.
    (Robert Muir)
  • LUCENE-2634 : isCurrent on an NRT reader was failing to return false if the writer had just committed
    (Nikolay Zamosenchuk via Mike McCandless)
  • LUCENE-2650 : Added extra safety to MMapIndexInput clones to prevent accessing an unmapped buffer if the input is closed
    (Mike McCandless, Uwe Schindler, Robert Muir)
  • LUCENE-2384 : Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
    (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074 )
  • LUCENE-2658 : Exceptions while processing term vectors enabled for multiple fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
    (Robert Muir, Mike McCandless)
  • LUCENE-2235 : Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
    (Javier Godoy via Uwe Schindler)
  • LUCENE-2328 : Fixed memory leak in how IndexWriter/Reader tracked already sync'd files.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2549 : Fix TimeLimitingCollector#TimeExceededException to record the absolute docid.
    (Uwe Schindler)
  • LUCENE-2533 : fix FileSwitchDirectory.listAll to not return dups when primary & secondary dirs share the same underlying directory.
    (Michael McCandless)
  • LUCENE-2365 : IndexWriter.newestSegment (used normally for testing) is fixed to return null if there are no segments.
    (Karthick Sankarachary via Mike McCandless)
  • LUCENE-2730 : Fix two rare deadlock cases in IndexWriter
    (Mike McCandless)
  • LUCENE-2744 : CheckIndex was stating total number of fields, not the number that have norms enabled, on the "test: field norms..." output.
    (Mark Kristensson via Mike McCandless)
  • LUCENE-2759 : Fixed two near-real-time cases where doc store files may be opened for read even though they are still open for write.
    (Mike McCandless)
  • LUCENE-2618 : Fix rare thread safety issue whereby IndexWriter.optimize could sometimes return even though the index wasn't fully optimized
    (Mike McCandless)
  • LUCENE-2767 : Fix thread safety issue in addIndexes(IndexReader[]) that could potentially result in index corruption.
    (Mike McCandless)
  • LUCENE-2762 : Fixed bug in IndexWriter causing it to hold open file handles against deleted files when compound-file was enabled (the default) and readers are pooled. As a result of this the peak worst-case free disk space required during optimize is now 3X the index size, when compound file is enabled (else 2X).
    (Mike McCandless)
  • LUCENE-2216 : OpenBitSet.hashCode returned different hash codes for sets that only differed by trailing zeros.
    (Dawid Weiss, yonik)
  • LUCENE-2782 : Fix rare potential thread hazard with IndexWriter.commit
    (Mike McCandless)
  • API Changes (1)
  • LUCENE-2773 : LogMergePolicy accepts a double noCFSRatio (default = 0.1), which means any time a merged segment is greater than 10% of the index size, it will be left in non-compound format even if compound format is on. This change was made to reduce peak transient disk usage during optimize which increased due to LUCENE-2762 .
    (Mike McCandless)
  • Optimizations (2)
  • LUCENE-2556 : Improve memory usage after cloning TermAttribute.
    (Adriano Crestani via Uwe Schindler)
  • LUCENE-2098 : Improve the performance of BaseCharFilter, especially for large documents.
    (Robin Wojciki, Koji Sekiguchi, Robert Muir)
  • New features (1)
  • LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files also in 2.9. The file format did not change, only the version number was upgraded to mark segments that have no compression. FieldsWriter still only writes 2.9 segments as they could contain compressed fields. This cross-version index format compatibility is provided here solely because Lucene 2.9 and 3.0 have the same bugfix level, features, and the same index format with this slight compression difference. In general, Lucene does not support reading newer indexes with older library versions.
    (Uwe Schindler)
  • Documentation (1)
  • LUCENE-2239 : Documented limitations in NIOFSDirectory and MMapDirectory due to Java NIO behavior when a Thread is interrupted while blocking on IO.
    (Simon Willnauer, Robert Muir)
  • Release 2.9.3 / 3.0.2 [2010-06-18]

  • Changes in backwards compatibility policy (1)
  • LUCENE-2135 : Added FieldCache.purge(IndexReader) method to the interface. Anyone implementing FieldCache externally will need to fix their code to implement this, on upgrading.
    (Mike McCandless)
  • Changes in runtime behavior (2)
  • LUCENE-2421 : NativeFSLockFactory does not throw LockReleaseFailedException if it cannot delete the lock file, since obtaining the lock does not fail if the file is there.
    (Shai Erera)
  • LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for maxNumThreads from 3 to 1, because in practice we get the most gains from running a single merge in the backround. More than one concurrent merge causes alot of thrashing (though it's possible on SSD storage that there would be net gains).
    (Jason Rutherglen, Mike McCandless)
  • Bug fixes (21)
  • LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after IndexWriter.prepareCommit has been called but before IndexWriter.commit is called.
    (Peter Keegan via Mike McCandless)
  • LUCENE-2119 : Don't throw NegativeArraySizeException if you pass Integer.MAX_VALUE as nDocs to IndexSearcher search methods.
    (Paul Taylor via Mike McCandless)
  • LUCENE-2142 : FieldCacheImpl.getStringIndex no longer throws an exception when term count exceeds doc count.
    (Mike McCandless)
  • LUCENE-2104 : NativeFSLock.release() would silently fail if the lock is held by another thread/process.
    (Shai Erera via Uwe Schindler)
  • LUCENE-2283 : Use shared memory pool for term vector and stored fields buffers. This memory will be reclaimed if needed according to the configured RAM Buffer Size for the IndexWriter. This also fixes potentially excessive memory usage when many threads are indexing a mix of small and large documents.
    (Tim Smith via Mike McCandless)
  • LUCENE-2300 : If IndexWriter is pooling reader (because NRT reader has been obtained), and addIndexes* is run, do not pool the readers from the external directory. This is harmless (NRT reader is correct), but a waste of resources.
    (Mike McCandless)
  • LUCENE-2422 : Don't reuse byte[] in IndexInput/Output -- it gains little performance, and ties up possibly large amounts of memory for apps that index large docs.
    (Ross Woolf via Mike McCandless)
  • LUCENE-2387 : Don't hang onto Fieldables from the last doc indexed, in IndexWriter, nor the Reader in Tokenizer after close is called.
    (Ruben Laguna, Uwe Schindler, Mike McCandless)
  • LUCENE-2417 : IndexCommit did not implement hashCode() and equals() consistently. Now they both take Directory and version into consideration. In addition, all of IndexComnmit methods which threw UnsupportedOperationException are now abstract.
    (Shai Erera)
  • LUCENE-2467 : Fixed memory leaks in IndexWriter when large documents are indexed.
    (Mike McCandless)
  • LUCENE-2473 : Clicking on the "More Results" link in the luceneweb.war demo resulted in ArrayIndexOutOfBoundsException.
    (Sami Siren via Robert Muir)
  • LUCENE-2476 : If any exception is hit init'ing IW, release the write lock (previously we only released on IOException).
    (Tamas Cservenak via Mike McCandless)
  • LUCENE-2478 : Fix CachingWrapperFilter to not throw NPE when Filter.getDocIdSet() returns null.
    (Uwe Schindler, Daniel Noll)
  • LUCENE-2468 : Allow specifying how new deletions should be handled in CachingWrapperFilter and CachingSpanFilter. By default, new deletions are ignored in CachingWrapperFilter, since typically this filter is AND'd with a query that correctly takes new deletions into account. This should be a performance gain (higher cache hit rate) in apps that reopen readers, or use near-real-time reader (IndexWriter.getReader()), but may introduce invalid search results (allowing deleted docs to be returned) for certain cases, so a new expert ctor was added to CachingWrapperFilter to enforce deletions at a performance cost. CachingSpanFilter by default recaches if there are new deletions
    (Shay Banon via Mike McCandless)
  • LUCENE-2299 : If you open an NRT reader while addIndexes* is running, it may miss some segments
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-2397 : Don't throw NPE from SnapshotDeletionPolicy.snapshot if there are no commits yet
    (Shai Erera)
  • LUCENE-2424 : Fix FieldDoc.toString to actually return its fields
    (Stephen Green via Mike McCandless)
  • LUCENE-2311 : Always pass a "fully loaded" (terms index & doc stores) SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so that warming is free to do whatever it needs to.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-3029 : Fix corner case when MultiPhraseQuery is used with zero position-increment tokens that would sometimes assign different scores to identical docs.
    (Mike McCandless)
  • LUCENE-2486 : Fixed intermittent FileNotFoundException on doc store files when a mergedSegmentWarmer is set on IndexWriter.
    (Mike McCandless)
  • LUCENE-2130 : Fix performance issue when FuzzyQuery runs on a multi-segment index
    (Michael McCandless)
  • API Changes (2)
  • LUCENE-2281 : added doBeforeFlush to IndexWriter to allow extensions to perform operations before flush starts. Also exposed doAfterFlush as protected instead of package-private.
    (Shai Erera via Mike McCandless)
  • LUCENE-2356 : Add IndexWriter.set/getReaderTermsIndexDivisor, to set what IndexWriter passes for termsIndexDivisor to the readers it opens internally when applying deletions or creating a near-real-time reader.
    (Earwin Burrfoot via Mike McCandless)
  • Optimizations (4)
  • LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher instead of simple polling for results.
    (Edward Drapkin, Simon Willnauer)
  • LUCENE-2135 : On IndexReader.close, forcefully evict any entries from the FieldCache rather than waiting for the WeakHashMap to release the reference
    (Mike McCandless)
  • LUCENE-2161 : Improve concurrency of IndexReader, especially in the context of near real-time readers.
    (Mike McCandless)
  • LUCENE-2360 : Small speedup to recycling of reused per-doc RAM in IndexWriter
    (Robert Muir, Mike McCandless)
  • Build (1)
  • LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5 contrib modules on request (pass '-Dforce.jdk14.build=true') when compiling/testing/packaging. This marks the benchmark contrib also as Java 1.5, as it depends on fast-vector-highlighter.
    (Uwe Schindler)
  • Release 2.9.2 / 3.0.1 [2010-02-26]

  • Changes in backwards compatibility policy (1)
  • LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm from FuzzyQuery. The change was needed because the comparator of this class had to be changed in an incompatible way. The class was never intended to be public.
    (Uwe Schindler, Mike McCandless)
  • Bug fixes (10)
  • LUCENE-2092 : BooleanQuery was ignoring disableCoord in its hashCode and equals methods, cause bad things to happen when caching BooleanQueries.
    (Chris Hostetter, Mike McCandless)
  • LUCENE-2095 : Fixes: when two threads call IndexWriter.commit() at the same time, it's possible for commit to return control back to one of the threads before all changes are actually committed.
    (Sanne Grinovero via Mike McCandless)
  • LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser with a Version argument.
    (Brian Li via Robert Muir)
  • LUCENE-2166 : Don't incorrectly keep warning about the same immense term, when IndexWriter.infoStream is on.
    (Mike McCandless)
  • LUCENE-2158 : At high indexing rates, NRT reader could temporarily lose deletions.
    (Mike McCandless)
  • LUCENE-2182 : DEFAULT_ATTRIBUTE_FACTORY was failing to load implementation class when interface was loaded by a different class loader.
    (Uwe Schindler, reported on java-user by Ahmed El-dawy)
  • LUCENE-2257 : Increase max number of unique terms in one segment to termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
    (Tom Burton-West via Mike McCandless)
  • LUCENE-2260 : Fixed AttributeSource to not hold a strong reference to the Attribute/AttributeImpl classes which prevents unloading of custom attributes loaded by other classloaders (e.g. in Solr plugins).
    (Uwe Schindler)
  • LUCENE-1941 : Fix Min/MaxPayloadFunction returns 0 when only one payload is present.
    (Erik Hatcher, Mike McCandless via Uwe Schindler)
  • LUCENE-2270 : Queries consisting of all zero-boost clauses (for example, text:foo^0) sorted incorrectly and produced invalid docids.
    (yonik)
  • API Changes (4)
  • LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor (it was accidentally removed in 3.0.0)
    (Mike McCandless)
  • LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource (it was accidentally removed in 3.0.0)
    (John Wang via Uwe Schindler)
  • LUCENE-2190 : Added a new class CustomScoreProvider to function package that can be subclassed to provide custom scoring to CustomScoreQuery. The methods in CustomScoreQuery that did this before were deprecated and replaced by a method getCustomScoreProvider(IndexReader) that returns a custom score implementation using the above class. The change is necessary with per-segment searching, as CustomScoreQuery is a stateless class (like all other Queries) and does not know about the currently searched segment. This API works similar to Filter's getDocIdSet(IndexReader).
    (Paul chez Jamespot via Mike McCandless, Uwe Schindler)
  • LUCENE-2080 : Deprecate Version.LUCENE_CURRENT, as using this constant will cause backwards compatibility problems when upgrading Lucene. See the Version javadocs for additional information.
    (Robert Muir)
  • Optimizations (3)
  • LUCENE-2086 : When resolving deleted terms, do so in term sort order for better performance
    (Bogdan Ghidireac via Mike McCandless)
  • LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue added by LUCENE-504 .
    (Uwe Schindler, Robert Muir, Mike McCandless)
  • LUCENE-2258 : Remove unneeded synchronization in FuzzyTermEnum.
    (Uwe Schindler, Robert Muir)
  • Test Cases (3)
  • LUCENE-2114 : Change TestFilteredSearch to test on multi-segment index as well.
    (Simon Willnauer via Mike McCandless)
  • LUCENE-2211 : Improves BaseTokenStreamTestCase to use a fake attribute that checks if clearAttributes() was called correctly.
    (Uwe Schindler, Robert Muir)
  • LUCENE-2207 , LUCENE-2219 : Improve BaseTokenStreamTestCase to check if end() is implemented correctly.
    (Koji Sekiguchi, Robert Muir)
  • Documentation (1)
  • LUCENE-2114 : Improve javadocs of Filter to call out that the provided reader is per-segment
    (Simon Willnauer via Mike McCandless)
  • Release 3.0.0 [2009-11-25]

  • Changes in backwards compatibility policy (7)
  • LUCENE-1979 : Change return type of SnapshotDeletionPolicy#snapshot() from IndexCommitPoint to IndexCommit. Code that uses this method needs to be recompiled against Lucene 3.0 in order to work. The previously deprecated IndexCommitPoint is also removed.
    (Michael Busch)
  • o.a.l.Lock.isLocked() is now allowed to throw an IOException.
    (Mike McCandless)
  • LUCENE-2030 : CachingWrapperFilter and CachingSpanFilter now hide the internal cache implementation for thread safety, before it was declared protected.
    (Peter Lenahan, Uwe Schindler, Simon Willnauer)
  • LUCENE-2053 : If you call Thread.interrupt() on a thread inside Lucene, Lucene will do its best to interrupt the thread. However, instead of throwing InterruptedException (which is a checked exception), you'll get an oal.util.ThreadInterruptedException (an unchecked exception, subclassing RuntimeException). The interrupt status on the thread is cleared when this exception is thrown.
    (Mike McCandless)
  • LUCENE-2052 : Some methods in Lucene core were changed to accept Java 5 varargs. This is not a backwards compatibility problem as long as you not try to override such a method. We left common overridden methods unchanged and added varargs to constructors, static, or final methods (MultiSearcher,...).
    (Uwe Schindler)
  • LUCENE-1558 : IndexReader.open(Directory) now opens a readOnly=true reader, and new IndexSearcher(Directory) does the same. Note that this is a change in the default from 2.9, when these methods were previously deprecated.
    (Mike McCandless)
  • LUCENE-1753 : Make not yet final TokenStreams final to enforce decorator pattern.
    (Uwe Schindler)
  • Changes in runtime behavior (3)
  • LUCENE-1677 : Remove the system property to set SegmentReader class implementation.
    (Uwe Schindler)
  • LUCENE-1960 : As a consequence of the removal of Field.Store.COMPRESS, support for this type of fields was removed. Lucene 3.0 is still able to read indexes with compressed fields, but as soon as merges occur or the index is optimized, all compressed fields are decompressed and converted to Field.Store.YES. Because of this, indexes with compressed fields can suddenly get larger. Also the first merge with decompression cannot be done in raw mode, it is therefore slower. This change has no effect for code that uses such old indexes, they behave as before (fields are automatically decompressed during read). Indexes converted to Lucene 3.0 format cannot be read anymore with previous versions. It is recommended to optimize your indexes after upgrading to convert to the new format and decompress all fields. If you want compressed fields, you can use CompressionTools, that creates compressed byte[] to be added as binary stored field. This cannot be done automatically, as you also have to decompress such fields when reading. You have to reindex to do that.
    (Michael Busch, Uwe Schindler)
  • LUCENE-2060 : Changed ConcurrentMergeScheduler's default for maxNumThreads from 3 to 1, because in practice we get the most gains from running a single merge in the background. More than one concurrent merge causes a lot of thrashing (though it's possible on SSD storage that there would be net gains).
    (Jason Rutherglen, Mike McCandless)
  • API Changes (6)
  • LUCENE-1257 , LUCENE-1984 , LUCENE-1985 , LUCENE-2057 , LUCENE-1833 , LUCENE-2012 , LUCENE-1998 : Port to Java 1.5: Add generics to public and internal APIs (see below). Replace new Integer(int), new Double(double),... by static valueOf() calls. Replace for-loops with Iterator by foreach loops. Replace StringBuffer with StringBuilder. Replace o.a.l.util.Parameter by Java 5 enums (see below). Add @Override annotations. (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera, DM Smith)
  • Generify Lucene API: TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an instance of the requested attribute interface and no cast needed anymore ( LUCENE-1855 ). NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter now have Integer, Long, Float, Double as type param ( LUCENE-1857 ). Document.getFields() returns List<Fieldable>. Query.extractTerms(Set<Term>) CharArraySet and stop word sets in core/contrib PriorityQueue ( LUCENE-1935 ) TopDocCollector DisjunctionMaxQuery ( LUCENE-1984 ) MultiTermQueryWrapperFilter CloseableThreadLocal MapOfSets o.a.l.util.cache package lot's of internal APIs of IndexWriter (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
  • LUCENE-1944 , LUCENE-1856 , LUCENE-1957 , LUCENE-1960 , LUCENE-1961 , LUCENE-1968 , LUCENE-1970 , LUCENE-1946 , LUCENE-1971 , LUCENE-1975 , LUCENE-1972 , LUCENE-1978 , LUCENE-944 , LUCENE-1979 , LUCENE-1973 , LUCENE-2011 : Remove deprecated methods/constructors/classes: Remove all String/File directory paths in IndexReader / IndexSearcher / IndexWriter. Remove FSDirectory.getDirectory() Make FSDirectory abstract. Remove Field.Store.COMPRESS (see above). Remove Filter.bits(IndexReader) method and make Filter.getDocIdSet(IndexReader) abstract. Remove old DocIdSetIterator methods and make the new ones abstract. Remove some methods in PriorityQueue. Remove old TokenStream API and backwards compatibility layer. Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery. Remove SpanQuery.getTerms(). Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO. Remove old-style custom sort. Remove legacy search setting in SortField. Remove Hits and all references from core and contrib. Remove HitCollector and its TopDocs support implementations. Remove term field and accessors in MultiTermQuery (and fix Highlighter). Remove deprecated methods in BooleanQuery. Remove deprecated methods in Similarity. Remove BoostingTermQuery. Remove MultiValueSource. Remove Scorer.explain(int). ...and some other minor ones
    (Uwe Schindler, Michael Busch, Mark Miller)
  • LUCENE-1925 : Make IndexSearcher's subReaders and docStarts members protected; add expert ctor to directly specify reader, subReaders and docStarts.
    (John Wang, Tim Smith via Mike McCandless)
  • LUCENE-1945 : All public classes that have a close() method now also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
    (Uwe Schindler)
  • LUCENE-1998 : Change all Parameter instances to Java 5 enums. This is no backwards-break, only a change of the super class. Parameter was deprecated and will be removed in a later version.
    (DM Smith, Uwe Schindler)
  • Bug fixes (5)
  • LUCENE-1951 : When the text provided to WildcardQuery has no wildcard characters (ie matches a single term), don't lose the boost and rewrite method settings. Also, rewrite to PrefixQuery if the wildcard is form "foo*", for slightly faster performance.
    (Robert Muir via Mike McCandless)
  • LUCENE-2013 : SpanRegexQuery does not work with QueryScorer.
    (Benjamin Keil via Mark Miller)
  • LUCENE-2088 : addAttribute() should only accept interfaces that extend Attribute.
    (Shai Erera, Uwe Schindler)
  • LUCENE-2045 : Fix silly FileNotFoundException hit if you enable infoStream on IndexWriter and then add an empty document and commit
    (Shai Erera via Mike McCandless)
  • LUCENE-2046 : IndexReader should not see the index as changed, after IndexWriter.prepareCommit has been called but before IndexWriter.commit is called.
    (Peter Keegan via Mike McCandless)
  • New features (3)
  • LUCENE-1933 : Provide a convenience AttributeFactory that creates a Token instance for all basic attributes.
    (Uwe Schindler)
  • LUCENE-2041 : Parallelize the rest of ParallelMultiSearcher. Lots of code refactoring and Java 5 concurrent support in MultiSearcher.
    (Joey Surls, Simon Willnauer via Uwe Schindler)
  • LUCENE-2051 : Add CharArraySet.copy() as a simple method to copy any Set<?> to a CharArraySet that is optimized, if Set<?> is already an CharArraySet.
    (Simon Willnauer)
  • Optimizations (3)
  • LUCENE-1183 : Optimize Levenshtein Distance computation in FuzzyQuery.
    (Cédrik Lime via Mike McCandless)
  • LUCENE-2006 : Optimization of FieldDocSortedHitQueue to always use Comparable<?> interface.
    (Uwe Schindler, Mark Miller)
  • LUCENE-2087 : Remove recursion in NumericRangeTermEnum.
    (Uwe Schindler)
  • Build (2)
  • LUCENE-486 : Remove test->demo dependencies.
    (Michael Busch)
  • LUCENE-2024 : Raise build requirements to Java 1.5 and ANT 1.7.0
    (Uwe Schindler, Mike McCandless)
  • Release 2.9.1 [2009-11-06]

  • Changes in backwards compatibility policy (1)
  • LUCENE-2002 : Add required Version matchVersion argument when constructing QueryParser or MultiFieldQueryParser and, default (as of 2.9) enablePositionIncrements to true to match StandardAnalyzer's 2.9 default
    (Uwe Schindler, Mike McCandless)
  • Bug fixes (8)
  • LUCENE-1974 : Fixed nasty bug in BooleanQuery (when it used BooleanScorer for scoring), whereby some matching documents fail to be collected.
    (Fulin Tang via Mike McCandless)
  • LUCENE-1124 : Make sure FuzzyQuery always matches the precise term.
    ([email protected] via Mike McCandless)
  • LUCENE-1976 : Fix IndexReader.isCurrent() to return the right thing when the reader is a near real-time reader.
    (Jake Mannix via Mike McCandless)
  • LUCENE-1986 : Fix NPE when scoring PayloadNearQuery
    (Peter Keegan, Mark Miller via Mike McCandless)
  • LUCENE-1992 : Fix thread hazard if a merge is committing just as an exception occurs during sync
    (Uwe Schindler, Mike McCandless)
  • LUCENE-1995 : Note in javadocs that IndexWriter.setRAMBufferSizeMB cannot exceed 2048 MB, and throw IllegalArgumentException if it does.
    (Aaron McKee, Yonik Seeley, Mike McCandless)
  • LUCENE-2004 : Fix Constants.LUCENE_MAIN_VERSION to not be inlined by client code.
    (Uwe Schindler)
  • LUCENE-2016 : Replace illegal U+FFFF character with the replacement char (U+FFFD) during indexing, to prevent silent index corruption.
    (Peter Keegan, Mike McCandless)
  • API Changes (5)
  • Un-deprecate search(Weight weight, Filter filter, int n) from Searchable interface (deprecated by accident).
    (Uwe Schindler)
  • Un-deprecate o.a.l.util.Version constants.
    (Mike McCandless)
  • LUCENE-1987 : Un-deprecate some ctors of Token, as they will not be removed in 3.0 and are still useful. Also add some missing o.a.l.util.Version constants for enabling invalid acronym settings in StandardAnalyzer to be compatible with the coming Lucene 3.0.
    (Uwe Schindler)
  • LUCENE-1973 : Un-deprecate IndexSearcher.setDefaultFieldSortScoring, to allow controlling per-IndexSearcher whether scores are computed when sorting by field.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-2043 : Make IndexReader.commit(Map<String,String>) public.
    (Mike McCandless)
  • Documentation (3)
  • LUCENE-1955 : Fix Hits deprecation notice to point users in right direction.
    (Mike McCandless, Mark Miller)
  • Fix javadoc about score tracking done by search methods in Searcher and IndexSearcher.
    (Mike McCandless)
  • LUCENE-2008 : Javadoc improvements for TokenStream/Tokenizer/Token
    (Luke Nezda via Mike McCandless)
  • Release 2.9.0 [2009-09-25]

  • Changes in backwards compatibility policy (8)
  • LUCENE-1575 : Searchable.search(Weight, Filter, int, Sort) no longer computes a document score for each hit by default. If document score tracking is still needed, you can call IndexSearcher.setDefaultFieldSortScoring(true, true) to enable both per-hit and maxScore tracking; however, this is deprecated and will be removed in 3.0. Alternatively, use Searchable.search(Weight, Filter, Collector) and pass in a TopFieldCollector instance, using the following code sample: TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields, true /* trackDocScores */, true /* trackMaxScore */, false /* docsInOrder */); searcher.search(query, tfc); TopDocs results = tfc.topDocs();> Note that your Sort object cannot use SortField.AUTO when you directly instantiate TopFieldCollector. Also, the method search(Weight, Filter, Collector) was added to the Searchable interface and the Searcher abstract class to replace the deprecated HitCollector versions. If you either implement Searchable or extend Searcher, you should change your code to implement this method. If you already extend IndexSearcher, no further changes are needed to use Collector. Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not valid scores. Lucene uses these values internally in certain places, so if you have hits with such scores, it will cause problems.
    (Shai Erera via Mike McCandless)

  • LUCENE-1687 : All methods and parsers from the interface ExtendedFieldCache have been moved into FieldCache. ExtendedFieldCache is now deprecated and contains only a few declarations for binary backwards compatibility. ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation. The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of ExtendedFieldCache and FieldCache, FieldCache can now additionally return long[] and double[] arrays in addition to int[] and float[] and StringIndex. The interface changes are only notable for users implementing the interfaces, which was unlikely done, because there is no possibility to change Lucene's FieldCache implementation.
    (Grant Ingersoll, Uwe Schindler)

  • LUCENE-1630 , LUCENE-1771 : Weight, previously an interface, is now an abstract class. Some of the method signatures have changed, but it should be fairly easy to see what adjustments must be made to existing code to sync up with the new API. You can find more detail in the API Changes section. Going forward Searchable will be kept for convenience only and may be changed between minor releases without any deprecation process. It is not recommended that you implement it, but rather extend Searcher.
    (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)

  • LUCENE-1422 , LUCENE-1693 : The new Attribute based TokenStream API (see below) has some backwards breaks in rare cases. We did our best to make the transition as easy as possible and you are not likely to run into any problems. If your tokenizers still implement next(Token) or next(), the calls are automatically wrapped. The indexer and query parser use the new API (eg use incrementToken() calls). All core TokenStreams are implemented using the new API. You can mix old and new API style TokenFilters/TokenStream. Problems only occur when you have done the following: You have overridden next(Token) or next() in one of the non-abstract core TokenStreams/-Filters. These classes should normally be final, but some of them are not. In this case, next(Token)/next() would never be called. To fail early with a hard compile/runtime error, the next(Token)/next() methods in these TokenStreams/-Filters were made final in this release.
    (Michael Busch, Uwe Schindler)
  • LUCENE-1763 : MergePolicy now requires an IndexWriter instance to be passed upon instantiation. As a result, IndexWriter was removed as a method argument from all MergePolicy methods.
    (Shai Erera via Mike McCandless)
  • LUCENE-1748 : LUCENE-1001 introduced PayloadSpans, but this was a back compat break and caused custom SpanQuery implementations to fail at runtime in a variety of ways. This issue attempts to remedy things by causing a compile time break on custom SpanQuery implementations and removing the PayloadSpans class, with its functionality now moved to Spans. To help in alleviating future back compat pain, Spans has been changed from an interface to an abstract class.
    (Hugh Cayless, Mark Miller)
  • LUCENE-1808 : Query.createWeight has been changed from protected to public. This will be a back compat break if you have overridden this method - but you are likely already affected by the LUCENE-1693 (make Weight abstract rather than an interface) back compat break if you have overridden Query.creatWeight, so we have taken the opportunity to make this change.
    (Tim Smith, Shai Erera via Mark Miller)
  • LUCENE-1708 - IndexReader.document() no longer checks if the document is deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
    (Shai Erera via Mike McCandless)
  • Changes in runtime behavior (14)
  • LUCENE-1424 : QueryParser now by default uses constant score auto rewriting when it generates a WildcardQuery and PrefixQuery (it already does so for TermRangeQuery, as well). Call setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE) to revert to slower BooleanQuery rewriting method.
    (Mark Miller via Mike McCandless)
  • LUCENE-1575 : As of 2.9, the core collectors as well as IndexSearcher's search methods that return top N results, no longer filter documents with scores <= 0.0. If you rely on this functionality you can use PositiveScoresOnlyCollector like this: TopDocsCollector tdc = new TopScoreDocCollector(10); Collector c = new PositiveScoresOnlyCollector(tdc); searcher.search(query, c); TopDocs hits = tdc.topDocs();
  • LUCENE-1604 : IndexReader.norms(String field) is now allowed to return null if the field has no norms, as long as you've previously called IndexReader.setDisableFakeNorms(true). This setting now defaults to false (to preserve the fake norms back compatible behavior) but in 3.0 will be hardwired to true.
    (Shon Vella via Mike McCandless) .
  • LUCENE-1624 : If you open IndexWriter with create=true and autoCommit=false on an existing index, IndexWriter no longer writes an empty commit when it's created.
    (Paul Taylor via Mike McCandless)
  • LUCENE-1593 : When you call Sort() or Sort.setSort(String field, boolean reverse), the resulting SortField array no longer ends with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties internally by docID).
    (Shai Erera via Michael McCandless)
  • LUCENE-1542 : When the first token(s) have 0 position increment, IndexWriter used to incorrectly record the position as -1, if no payload is present, or Integer.MAX_VALUE if a payload is present. This causes positional queries to fail to match. The bug is now fixed, but if your app relies on the buggy behavior then you must call IndexWriter.setAllowMinus1Position(). That API is deprecated so you must fix your application, and rebuild your index, to not rely on this behavior by the 3.0 release of Lucene.
    (Jonathan Mamou, Mark Miller via Mike McCandless)
  • LUCENE-1715 : Finalizers have been removed from the 4 core classes that still had them, since they will cause GC to take longer, thus tying up memory for longer, and at best they mask buggy app code. DirectoryReader (returned from IndexReader.open) & IndexWriter previously released the write lock during finalize. SimpleFSDirectory.FSIndexInput closed the descriptor in its finalizer, and NativeFSLock released the lock. It's possible applications will be affected by this, but only if the application is failing to close reader/writers.
    (Brian Groose via Mike McCandless)
  • LUCENE-1717 : Fixed IndexWriter to account for RAM usage of buffered deletions.
    (Mike McCandless)
  • LUCENE-1727 : Ensure that fields are stored & retrieved in the exact order in which they were added to the document. This was true in all Lucene releases before 2.3, but was broken in 2.3 and 2.4, and is now fixed in 2.9.
    (Mike McCandless)
  • LUCENE-1678 : The addition of Analyzer.reusableTokenStream accidentally broke back compatibility of external analyzers that subclassed core analyzers that implemented tokenStream but not reusableTokenStream. This is now fixed, such that if reusableTokenStream is invoked on such a subclass, that method will forcefully fallback to tokenStream.
    (Mike McCandless)
  • LUCENE-1801 : Token.clear() and Token.clearNoTermBuffer() now also clear startOffset, endOffset and type. This is not likely to affect any Tokenizer chains, as Tokenizers normally always set these three values. This change was made to be conform to the new AttributeImpl.clear() and AttributeSource.clearAttributes() to work identical for Token as one for all AttributeImpl and the 6 separate AttributeImpls.
    (Uwe Schindler, Michael Busch)
  • LUCENE-1483 : When searching over multiple segments, a new Scorer is now created for each segment. Searching has been telescoped out a level and IndexSearcher now operates much like MultiSearcher does. The Weight is created only once for the top level Searcher, but each Scorer is passed a per-segment IndexReader. This will result in doc ids in the Scorer being internal to the per-segment IndexReader. It has always been outside of the API to count on a given IndexReader to contain every doc id in the index - and if you have been ignoring MultiSearcher in your custom code and counting on this fact, you will find your code no longer works correctly. If a custom Scorer implementation uses any caches/filters that rely on being based on the top level IndexReader, it will need to be updated to correctly use contextless caches/filters eg you can't count on the IndexReader to contain any given doc id or all of the doc ids.
    (Mark Miller, Mike McCandless)
  • LUCENE-1846 : DateTools now uses the US locale to format the numbers in its date/time strings instead of the default locale. For most locales there will be no change in the index format, as DateFormatSymbols is using ASCII digits. The usage of the US locale is important to guarantee correct ordering of generated terms.
    (Uwe Schindler)
  • LUCENE-1860 : MultiTermQuery now defaults to CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery and WildcardQuery will now produce constant score for all matching docs, equal to the boost of the query.
    (Mike McCandless)
  • API Changes (38)
  • LUCENE-1419 : Add expert API to set custom indexing chain. This API is package-protected for now, so we don't have to officially support it. Yet, it will give us the possibility to try out different consumers in the chain.
    (Michael Busch)
  • LUCENE-1427 : DocIdSet.iterator() is now allowed to throw IOException.
    (Paul Elschot, Mike McCandless)
  • LUCENE-1422 , LUCENE-1693 : New TokenStream API that uses a new class called AttributeSource instead of the Token class, which is now a utility class that holds common Token attributes. All attributes that the Token class had have been moved into separate classes: TermAttribute, OffsetAttribute, PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute. The new API is much more flexible; it allows to combine the Attributes arbitrarily and also to define custom Attributes. The new API has the same performance as the old next(Token) approach. For conformance with this new API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
    (Michael Busch, Uwe Schindler; additional contributions and bug fixes by Daniel Shane, Doron Cohen)
  • LUCENE-1467 : Add nextDoc() and next(int) methods to OpenBitSetIterator. These methods can be used to avoid additional calls to doc().
    (Michael Busch)
  • LUCENE-1468 : Deprecate Directory.list(), which sometimes (in FSDirectory) filters out files that don't look like index files, in favor of new Directory.listAll(), which does no filtering. Also, listAll() will never return null; instead, it throws an IOException (or subclass). Specifically, FSDirectory.listAll() will throw the newly added NoSuchDirectoryException if the directory does not exist.
    (Marcel Reutegger, Mike McCandless)
  • LUCENE-1546 : Add IndexReader.flush(Map commitUserData), allowing you to record an opaque commitUserData (maps String -> String) into the commit written by IndexReader. This matches IndexWriter's commit methods.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-652 : Added org.apache.lucene.document.CompressionTools, to enable compressing & decompressing binary content, external to Lucene's indexing. Deprecated Field.Store.COMPRESS.
  • LUCENE-1561 : Renamed Field.omitTf to Field.omitTermFreqAndPositions
    (Otis Gospodnetic via Mike McCandless)
  • LUCENE-1500 : Added new InvalidTokenOffsetsException to Highlighter methods to denote issues when offsets in TokenStream tokens exceed the length of the provided text.
    (Mark Harwood)
  • LUCENE-1575 , LUCENE-1483 : HitCollector is now deprecated in favor of a new Collector abstract class. For easy migration, people can use HitCollectorWrapper which translates (wraps) HitCollector into Collector. Note that this class is also deprecated and will be removed when HitCollector is removed. Also TimeLimitedCollector is deprecated in favor of the new TimeLimitingCollector which extends Collector.
    (Shai Erera, Mark Miller, Mike McCandless)
  • LUCENE-1592 : The method TermsEnum.skipTo() was deprecated, because it is used nowhere in core/contrib and there is only a very ineffective default implementation available. If you want to position a TermEnum to another Term, create a new one using IndexReader.terms(Term).
    (Uwe Schindler)
  • LUCENE-1621 : MultiTermQuery.getTerm() has been deprecated as it does not make sense for all subclasses of MultiTermQuery. Check individual subclasses to see if they support getTerm().
    (Mark Miller)
  • LUCENE-1636 : Make TokenFilter.input final so it's set only once.
    (Wouter Heijke, Uwe Schindler via Mike McCandless) .
  • LUCENE-1658 , LUCENE-1451 : Renamed FSDirectory to SimpleFSDirectory (but left an FSDirectory base class). Added an FSDirectory.open static method to pick a good default FSDirectory implementation given the OS. FSDirectories should now be instantiated using FSDirectory.open or with public constructors rather than FSDirectory.getDirectory(), which has been deprecated.
    (Michael McCandless, Uwe Schindler, yonik)
  • LUCENE-1665 : Deprecate SortField.AUTO, to be removed in 3.0. Instead, when sorting by field, the application should explicitly state the type of the field.
    (Mike McCandless)
  • LUCENE-1660 : StopFilter, StandardAnalyzer, StopAnalyzer now require up front specification of enablePositionIncrement
    (Mike McCandless)
  • LUCENE-1614 : DocIdSetIterator's next() and skipTo() were deprecated in favor of the new nextDoc() and advance(). The new methods return the doc Id they landed on, saving an extra call to doc() in most cases. For easy migration of the code, you can change the calls to next() to nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo(). However it is advised that you take advantage of the returned doc ID and not call doc() following those two. Also, doc() was deprecated in favor of docID(). docID() should return -1 or NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the iterator has exhausted. Otherwise it should return the current doc ID.
    (Shai Erera via Mike McCandless)
  • LUCENE-1672 : All ctors/opens and other methods using String/File to specify the directory in IndexReader, IndexWriter, and IndexSearcher were deprecated. You should instantiate the Directory manually before and pass it to these classes ( LUCENE-1451 , LUCENE-1658 ).
    (Uwe Schindler)
  • LUCENE-1407 : Move RemoteSearchable, RemoteCachingWrapperFilter out of Lucene's core into new contrib/remote package. Searchable no longer extends java.rmi.Remote
    (Simon Willnauer via Mike McCandless)
  • LUCENE-1677 : The global property org.apache.lucene.SegmentReader.class, and ReadOnlySegmentReader.class are now deprecated, to be removed in 3.0. src/gcj/* has been removed.
    (Earwin Burrfoot via Mike McCandless)
  • LUCENE-1673 : Deprecated NumberTools in favour of the new NumericRangeQuery and its new indexing format for numeric or date values.
    (Uwe Schindler)
  • LUCENE-1630 , LUCENE-1771 : Weight is now an abstract class, and adds a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /* topScorer */) method instead of scorer(IndexReader). IndexSearcher uses this method to obtain a scorer matching the capabilities of the Collector wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more efficient if out-of-order documents scoring is allowed by a Collector. Collector must now implement acceptsDocsOutOfOrder. If you write a Collector which does not care about doc ID orderness, it is recommended that you return true. Weight has a scoresDocsOutOfOrder method, which by default returns false. If you create a Weight which will score documents out of order if requested, you should override that method to return true. BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been deprecated as they are not needed anymore. BooleanQuery will now score docs out of order when used with a Collector that can accept docs out of order. Finally, Weight#explain now takes a sub-reader and sub-docID, rather than a top level reader and docID.
    (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
  • LUCENE-1466 , LUCENE-1906 : Added CharFilter and MappingCharFilter, which allows chaining & mapping of characters before tokenizers run. CharStream (subclass of Reader) is the base class for custom java.io.Reader's, that support offset correction. Tokenizers got an additional method correctOffset() that is passed down to the underlying CharStream if input is a subclass of CharStream/-Filter.
    (Koji Sekiguchi via Mike McCandless, Uwe Schindler)
  • LUCENE-1703 : Add IndexWriter.waitForMerges.
    (Tim Smith via Mike McCandless)
  • LUCENE-1625 : CheckIndex's programmatic API now returns separate classes detailing the status of each component in the index, and includes more detailed status than previously.
    (Tim Smith via Mike McCandless)
  • LUCENE-1713 : Deprecated RangeQuery and RangeFilter and renamed to TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant score auto rewrite mode by default. The new classes also have new ctors taking field and term ranges as Strings (see also LUCENE-1424 ).
    (Uwe Schindler)
  • LUCENE-1609 : The termInfosIndexDivisor must now be specified up-front when opening the IndexReader. Attempts to call IndexReader.setTermInfosIndexDivisor will hit an UnsupportedOperationException. This was done to enable removal of all synchronization in TermInfosReader, which previously could cause threads to pile up in certain cases.
    (Dan Rosher via Mike McCandless)
  • LUCENE-1688 : Deprecate static final String stop word array in and StopAnalzyer and replace it with an immutable implementation of CharArraySet.
    (Simon Willnauer via Mark Miller)
  • LUCENE-1742 : SegmentInfos, SegmentInfo and SegmentReader have been made public as expert, experimental APIs. These APIs may suddenly change from release to release
    (Jason Rutherglen via Mike McCandless) .
  • LUCENE-1754 : QueryWeight.scorer() can return null if no documents are going to be matched by the query. Similarly, Filter.getDocIdSet() can return null if no documents are going to be accepted by the Filter. Note that these 'can' return null, however they don't have to and can return a Scorer/DocIdSet which does not match / reject all documents. This is already the behavior of some QueryWeight/Filter implementations, and is documented here just for emphasis.
    (Shai Erera via Mike McCandless)
  • LUCENE-1705 : Added IndexWriter.deleteAllDocuments.
    (Tim Smith via Mike McCandless)
  • LUCENE-1460 : Changed TokenStreams/TokenFilters in contrib to use the new TokenStream API.
    (Robert Muir, Michael Busch)
  • LUCENE-1748 : LUCENE-1001 introduced PayloadSpans, but this was a back compat break and caused custom SpanQuery implementations to fail at runtime in a variety of ways. This issue attempts to remedy things by causing a compile time break on custom SpanQuery implementations and removing the PayloadSpans class, with its functionality now moved to Spans. To help in alleviating future back compat pain, Spans has been changed from an interface to an abstract class.
    (Hugh Cayless, Mark Miller)
  • LUCENE-1808 : Query.createWeight has been changed from protected to public.
    (Tim Smith, Shai Erera via Mark Miller)
  • LUCENE-1826 : Add constructors that take AttributeSource and AttributeFactory to all Tokenizer implementations.
    (Michael Busch)
  • LUCENE-1847 : Similarity#idf for both a Term and Term Collection have been deprecated. New versions that return an IDFExplanation have been added.
    (Yasoja Seneviratne, Mike McCandless, Mark Miller)
  • LUCENE-1877 : Made NativeFSLockFactory the default for the new FSDirectory API (open(), FSDirectory subclass ctors). All FSDirectory system properties were deprecated and all lock implementations use no lock prefix if the locks are stored inside the index directory. Because the deprecated String/File ctors of IndexWriter and IndexReader ( LUCENE-1672 ) and FSDirectory.getDirectory() still use the old SimpleFSLockFactory and the new API NativeFSLockFactory, we strongly recommend not to mix deprecated and new API.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-1911 : Added a new method isCacheable() to DocIdSet. This method should return true, if the underlying implementation does not use disk I/O and is fast enough to be directly cached by CachingWrapperFilter. OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates. The default implementation of the abstract DocIdSet class returns false. In this case, CachingWrapperFilter copies the DocIdSetIterator into an OpenBitSet for caching.
    (Uwe Schindler, Thomas Becker)
  • Bug fixes (26)
  • LUCENE-1415 : MultiPhraseQuery has incorrect hashCode() and equals() implementation - Leads to Solr Cache misses.
    (Todd Feak, Mark Miller via yonik)
  • LUCENE-1327 : Fix TermSpans#skipTo() to behave as specified in javadocs of Terms#skipTo().
    (Michael Busch)
  • LUCENE-1573 : Do not ignore InterruptedException (caused by Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt will cause a RuntimeException to be thrown. In 3.0 we will change public APIs to throw InterruptedException.
    (Jeremy Volkman via Mike McCandless)
  • LUCENE-1590 : Fixed stored-only Field instances do not change the value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you retrieve such fields they will now have omitNorms=true and omitTermFreqAndPositions=false (though these values are unused).
    (Uwe Schindler via Mike McCandless)
  • LUCENE-1587 : RangeQuery#equals() could consider a RangeQuery without a collator equal to one with a collator.
    (Mark Platvoet via Mark Miller)
  • LUCENE-1600 : Don't call String.intern unnecessarily in some cases when loading documents from the index.
    (P Eger via Mike McCandless)
  • LUCENE-1611 : Fix case where OutOfMemoryException in IndexWriter could cause "infinite merging" to happen.
    (Christiaan Fluit via Mike McCandless)
  • LUCENE-1623 : Properly handle back-compatibility of 2.3.x indexes that contain field names with non-ascii characters.
    (Mike Streeton via Mike McCandless)
  • LUCENE-1593 : MultiSearcher and ParallelMultiSearcher did not break ties (in sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs. when it wasn't).
    (Shai Erera via Michael McCandless)
  • LUCENE-1647 : Fix case where IndexReader.undeleteAll would cause the segment's deletion count to be incorrect.
    (Mike McCandless)
  • LUCENE-1542 : When the first token(s) have 0 position increment, IndexWriter used to incorrectly record the position as -1, if no payload is present, or Integer.MAX_VALUE if a payload is present. This causes positional queries to fail to match. The bug is now fixed, but if your app relies on the buggy behavior then you must call IndexWriter.setAllowMinus1Position(). That API is deprecated so you must fix your application, and rebuild your index, to not rely on this behavior by the 3.0 release of Lucene.
    (Jonathan Mamou, Mark Miller via Mike McCandless)
  • LUCENE-1658 : Fixed MMapDirectory to correctly throw IOExceptions on EOF, removed numeric overflow possibilities and added support for a hack to unmap the buffers on closing IndexInput.
    (Uwe Schindler)
  • LUCENE-1681 : Fix infinite loop caused by a call to DocValues methods getMinValue, getMaxValue, getAverageValue.
    (Simon Willnauer via Mark Miller)
  • LUCENE-1599 : Add clone support for SpanQuerys. SpanRegexQuery counts on this functionality and does not work correctly without it.
    (Billow Gao, Mark Miller)
  • LUCENE-1718 : Fix termInfosIndexDivisor to carry over to reopened readers
    (Mike McCandless)
  • LUCENE-1583 : SpanOrQuery skipTo() doesn't always move forwards as Spans documentation indicates it should.
    (Moti Nisenson via Mark Miller)
  • LUCENE-1566 : Sun JVM Bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes invalid OutOfMemoryError when reading too many bytes at once from a file on 32bit JVMs that have a large maximum heap size. This fix adds set/getReadChunkSize to FSDirectory so that large reads are broken into chunks, to work around this JVM bug. On 32bit JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't show the bug, the default is Integer.MAX_VALUE.
    (Simon Willnauer via Mike McCandless)
  • LUCENE-1448 : Added TokenStream.end() to perform end-of-stream operations (ie to return the end offset of the tokenization). This is important when multiple fields with the same name are added to a document, to ensure offsets recorded in term vectors for all of the instances are correct.
    (Mike McCandless, Mark Miller, Michael Busch)
  • LUCENE-1805 : CloseableThreadLocal did not allow a null Object in get(), although it does allow it in set(Object). Fix get() to not assert the object is not null.
    (Shai Erera via Mike McCandless)
  • LUCENE-1801 : Changed all Tokenizers or TokenStreams in core/contrib) that are the source of Tokens to always call AttributeSource.clearAttributes() first.
    (Uwe Schindler)
  • LUCENE-1819 : MatchAllDocsQuery.toString(field) should produce output that is parsable by the QueryParser.
    (John Wang, Mark Miller)
  • LUCENE-1836 : Fix localization bug in the new query parser and add new LocalizedTestCase as base class for localization junit tests.
    (Robert Muir, Uwe Schindler via Michael Busch)
  • LUCENE-1847 : PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats in their Weight#explain methods - these stats should be corpus wide.
    (Yasoja Seneviratne, Mike McCandless, Mark Miller)
  • LUCENE-1885 : Fix the bug that NativeFSLock.isLocked() did not work, if the lock was obtained by another NativeFSLock(Factory) instance. Because of this IndexReader.isLocked() and IndexWriter.isLocked() did not work correctly.
    (Uwe Schindler)
  • LUCENE-1899 : Fix O(N^2) CPU cost when setting docIDs in order in an OpenBitSet, due to an inefficiency in how the underlying storage is reallocated.
    (Nadav Har'El via Mike McCandless)
  • LUCENE-1918 : Fixed cases where a ParallelReader would generate exceptions on being passed to IndexWriter.addIndexes(IndexReader[]). First case was when the ParallelReader was empty. Second case was when the ParallelReader used to contain documents with TermVectors, but all such documents have been deleted.
    (Christian Kohlschütter via Mike McCandless)
  • New features (36)
  • LUCENE-1411 : Added expert API to open an IndexWriter on a prior commit, obtained from IndexReader.listCommits. This makes it possible to rollback changes to an index even after you've closed the IndexWriter that made the changes, assuming you are using an IndexDeletionPolicy that keeps past commits around. This is useful when building transactional support on top of Lucene.
    (Mike McCandless)
  • LUCENE-1382 : Add an optional arbitrary Map (String -> String) "commitUserData" to IndexWriter.commit(), which is stored in the segments file and is then retrievable via IndexReader.getCommitUserData instance and static methods.
    (Shalin Shekhar Mangar via Mike McCandless)
  • LUCENE-1420 : Similarity now has a computeNorm method that allows custom Similarity classes to override how norm is computed. It's provided a FieldInvertState instance that contains details from inverting the field. The default impl is boost * lengthNorm(numTerms), to be backwards compatible. Also added {set/get}DiscountOverlaps to DefaultSimilarity, to control whether overlapping tokens (tokens with 0 position increment) should be counted in lengthNorm.
    (Andrzej Bialecki via Mike McCandless)
  • LUCENE-1424 : Moved constant score query rewrite capability into MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery to switch between constant-score rewriting or BooleanQuery expansion rewriting via a new setRewriteMethod method. Deprecated ConstantScoreRangeQuery
    (Mark Miller via Mike McCandless)
  • LUCENE-1461 : Added FieldCacheRangeFilter, a RangeFilter for single-term fields that uses FieldCache to compute the filter. If your documents all have a single term for a given field, and you need to create many RangeFilters with varying lower/upper bounds, then this is likely a much faster way to create the filters than RangeFilter. FieldCacheRangeFilter allows ranges on all data types, FieldCache supports (term ranges, byte, short, int, long, float, double). However, it comes at the expense of added RAM consumption and slower first-time usage due to populating the FieldCache. It also does not support collation
    (Tim Sturge, Matt Ericson via Mike McCandless and Uwe Schindler)
  • LUCENE-1296 : add protected method CachingWrapperFilter.docIdSetToCache to allow subclasses to choose which DocIdSet implementation to use
    (Paul Elschot via Mike McCandless)
  • LUCENE-1390 : Added ASCIIFoldingFilter, a Filter that converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which handles a subset of this filter, has been deprecated.
    (Andi Vajda, Steven Rowe via Mark Miller)
  • LUCENE-1478 : Added new SortField constructor allowing you to specify a custom FieldCache parser to generate numeric values from terms for a field.
    (Uwe Schindler via Mike McCandless)
  • LUCENE-1528 : Add support for Ideographic Space to the queryparser.
    (Luis Alves via Michael Busch)
  • LUCENE-1487 : Added FieldCacheTermsFilter, to filter by multiple terms on single-valued fields. The filter loads the FieldCache for the field the first time it's called, and subsequent usage of that field, even with different Terms in the filter, are fast.
    (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless) .
  • LUCENE-1314 : Add clone(), clone(boolean readOnly) and reopen(boolean readOnly) to IndexReader. Cloning an IndexReader gives you a new reader which you can make changes to (deletions, norms) without affecting the original reader. Now, with clone or reopen you can change the readOnly of the original reader.
    (Jason Rutherglen, Mike McCandless)
  • LUCENE-1506 : Added FilteredDocIdSet, an abstract class which you subclass to implement the "match" method to accept or reject each docID. Unlike ChainedFilter (under contrib/misc), FilteredDocIdSet never requires you to materialize the full bitset. Instead, match() is called on demand per docID.
    (John Wang via Mike McCandless)
  • LUCENE-1398 : Add ReverseStringFilter to contrib/analyzers, a filter to reverse the characters in each token.
    (Koji Sekiguchi via yonik)
  • LUCENE-1551 : Add expert IndexReader.reopen(IndexCommit) to allow efficiently opening a new reader on a specific commit, sharing resources with the original reader.
    (Torin Danil via Mike McCandless)
  • LUCENE-1434 : Added org.apache.lucene.util.IndexableBinaryStringTools, to encode byte[] as String values that are valid terms, and maintain sort order of the original byte[] when the bytes are interpreted as unsigned.
    (Steven Rowe via Mike McCandless)
  • LUCENE-1543 : Allow MatchAllDocsQuery to optionally use norms from a specific fields to set the score for a document.
    (Karl Wettin via Mike McCandless)
  • LUCENE-1586 : Add IndexReader.getUniqueTermCount().
    (Mike McCandless via Derek)
  • LUCENE-1516 : Added "near real-time search" to IndexWriter, via a new expert getReader() method. This method returns a reader that searches the full index, including any uncommitted changes in the current IndexWriter session. This should result in a faster turnaround than the normal approach of commiting the changes and then reopening a reader.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1603 : Added new MultiTermQueryWrapperFilter, to wrap any MultiTermQuery as a Filter. Also made some improvements to MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no terms in the enum; track the total number of terms it visited during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also more friendly to subclassing.
    (Uwe Schindler via Mike McCandless)
  • LUCENE-1605 : Added BitVector.subset().
    (Jeremy Volkman via Mike McCandless)
  • LUCENE-1618 : Added FileSwitchDirectory that enables files with specified extensions to be stored in a primary directory and the rest of the files to be stored in the secondary directory. For example, this can be useful for the large doc-store (stored fields, term vectors) files in FSDirectory and the rest of the index files in a RAMDirectory.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1494 : Added FieldMaskingSpanQuery which can be used to cross-correlate Spans from different fields.
    (Paul Cowan and Chris Hostetter)
  • LUCENE-1634 : Add calibrateSizeByDeletes to LogMergePolicy, to take deletions into account when considering merges.
    (Yasuhiro Matsuda via Mike McCandless)
  • LUCENE-1550 : Added new n-gram based String distance measure for spell checking. See the Javadocs for NGramDistance.java for a reference paper on why this is helpful
    (Tom Morton via Grant Ingersoll)
  • LUCENE-1470 , LUCENE-1582 , LUCENE-1602 , LUCENE-1673 , LUCENE-1701 , LUCENE-1712 : Added NumericRangeQuery and NumericRangeFilter, a fast alternative to RangeQuery/RangeFilter for numeric searches. They depend on a specific structure of terms in the index that can be created by indexing using the new NumericField or NumericTokenStream classes. NumericField can only be used for indexing and optionally stores the values as string representation in the doc store. Documents returned from IndexReader/IndexSearcher will return only the String value using the standard Fieldable interface. NumericFields can be sorted on and loaded into the FieldCache.
    (Uwe Schindler, Yonik Seeley, Mike McCandless)
  • LUCENE-1405 : Added support for Ant resource collections in contrib/ant <index> task.
    (Przemyslaw Sztoch via Erik Hatcher)
  • LUCENE-1699 : Allow setting a TokenStream on Field/Fieldable for indexing in conjunction with any other ways to specify stored field values, currently binary or string values.
    (yonik)
  • LUCENE-1701 : Made the standard FieldCache.Parsers public and added parsers for fields generated using NumericField/NumericTokenStream. All standard parsers now also implement Serializable and enforce their singleton status.
    (Uwe Schindler, Mike McCandless)
  • LUCENE-1741 : User configurable maximum chunk size in MMapDirectory. On 32 bit platforms, the address space can be very fragmented, so one big ByteBuffer for the whole file may not fit into address space.
    (Eks Dev via Uwe Schindler)
  • LUCENE-1644 : Enable 4 rewrite modes for queries deriving from MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery, NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a filter and then assigns constant score (boost) to docs; CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also creates a BooleanQuery but keeps the BooleanQuery's scores; CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant constant-score rewrite method.
    (Mike McCandless)
  • LUCENE-1448 : Added TokenStream.end(), to perform end-of-stream operations. This is currently used to fix offset problems when multiple fields with the same name are added to a document.
    (Mike McCandless, Mark Miller, Michael Busch)
  • LUCENE-1776 : Add an option to not collect payloads for an ordered SpanNearQuery. Payloads were not lazily loaded in this case as the javadocs implied. If you have payloads and want to use an ordered SpanNearQuery that does not need to use the payloads, you can disable loading them with a new constructor switch.
    (Mark Miller)
  • LUCENE-1341 : Added PayloadNearQuery to enable SpanNearQuery functionality with payloads
    (Peter Keegan, Grant Ingersoll, Mark Miller)
  • LUCENE-1790 : Added PayloadTermQuery to enable scoring of payloads based on the maximum payload seen for a document. Slight refactoring of Similarity and other payload queries
    (Grant Ingersoll, Mark Miller)
  • LUCENE-1749 : Addition of FieldCacheSanityChecker utility, and hooks to use it in all existing Lucene Tests. This class can be used by any application to inspect the FieldCache and provide diagnostic information about the possibility of inconsistent FieldCache usage. Namely: FieldCache entries for the same field with different datatypes or parsers; and FieldCache entries for the same field in both a reader, and one of its (descendant) sub readers.
    (Chris Hostetter, Mark Miller)
  • LUCENE-1789 : Added utility class oal.search.function.MultiValueSource to ease the transition to segment based searching for any apps that directly call oal.search.function.* APIs. This class wraps any other ValueSource, but takes care when composite (multi-segment) are passed to not double RAM usage in the FieldCache.
    (Chris Hostetter, Mark Miller, Mike McCandless)
  • Optimizations (13)
  • LUCENE-1427 : Fixed QueryWrapperFilter to not waste time computing scores of the query, since they are just discarded. Also, made it more efficient (single pass) by not creating & populating an intermediate OpenBitSet
    (Paul Elschot, Mike McCandless)
  • LUCENE-1443 : Performance improvement for OpenBitSetDISI.inPlaceAnd()
    (Paul Elschot via yonik)
  • LUCENE-1484 : Remove synchronization of IndexReader.document() by using CloseableThreadLocal internally.
    (Jason Rutherglen via Mike McCandless) .
  • LUCENE-1124 : Short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity.
    (Timo Nentwig, Mark Miller)
  • LUCENE-1316 : MatchAllDocsQuery now avoids the synchronized IndexReader.isDeleted() call per document, by directly accessing the underlying deleteDocs BitVector. This improves performance with non-readOnly readers, especially in a multi-threaded environment.
    (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike McCandless)
  • LUCENE-1483 : When searching over multiple segments we now visit each sub-reader one at a time. This speeds up warming, since FieldCache entries (if required) can be shared across reopens for those segments that did not change, and also speeds up searches that sort by relevance or by field values.
    (Mark Miller, Mike McCandless)
  • LUCENE-1575 : The new Collector class decouples collect() from score computation. Collector.setScorer is called to establish the current Scorer in-use per segment. Collectors that require the score should then call Scorer.score() per hit inside collect().
    (Shai Erera via Mike McCandless)
  • LUCENE-1596 : MultiTermDocs speedup when set with MultiTermDocs.seek(MultiTermEnum)
    (yonik)
  • LUCENE-1653 : Avoid creating a Calendar in every call to DateTools#dateToString, DateTools#timeToString and DateTools#round.
    (Shai Erera via Mark Miller)
  • LUCENE-1688 : Deprecate static final String stop word array and replace it with an immutable implementation of CharArraySet. Removes conversions between Set and array.
    (Simon Willnauer via Mark Miller)
  • LUCENE-1754 : BooleanQuery.queryWeight.scorer() will return null if it won't match any documents (e.g. if there are no required and optional scorers, or not enough optional scorers to satisfy minShouldMatch).
    (Shai Erera via Mike McCandless)
  • LUCENE-1607 : To speed up string interning for commonly used strings, the StringHelper.intern() interface was added with a default implementation that uses a lockless cache.
    (Earwin Burrfoot, yonik)
  • LUCENE-1800 : QueryParser should use reusable TokenStreams.
    (yonik)
  • Documentation (9)
  • LUCENE-1908 : Scoring documentation imrovements in Similarity javadocs.
    (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
  • LUCENE-1872 : NumericField javadoc improvements
    (Michael McCandless, Uwe Schindler)
  • LUCENE-1875 : Make TokenStream.end javadoc less confusing.
    (Uwe Schindler)
  • LUCENE-1862 : Rectified duplicate package level javadocs for o.a.l.queryParser and o.a.l.analysis.cn.
    (Chris Hostetter)
  • LUCENE-1886 : Improved hyperlinking in key Analysis javadocs
    (Bernd Fondermann via Chris Hostetter)
  • LUCENE-1884 : massive javadoc and comment cleanup, primarily dealing with typos.
    (Robert Muir via Chris Hostetter)
  • LUCENE-1898 : Switch changes to use bullets rather than numbers and update changes-to-html script to handle the new format.
    (Steven Rowe, Mark Miller)
  • LUCENE-1900 : Improve Searchable Javadoc.
    (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
  • LUCENE-1896 : Improve Similarity#queryNorm javadocs.
    (Jiri Kuhn, Mark Miller)
  • Build (5)
  • LUCENE-1440 : Add new targets to build.xml that allow downloading and executing the junit testcases from an older release for backwards-compatibility testing.
    (Michael Busch)
  • LUCENE-1446 : Add compatibility tag to common-build.xml and run backwards-compatibility tests in the nightly build.
    (Michael Busch)
  • LUCENE-1529 : Properly test "drop-in" replacement of jar with backwards-compatibility tests.
    (Mike McCandless, Michael Busch)
  • LUCENE-1851 : Change 'javacc' and 'clean-javacc' targets to build and clean contrib/surround files.
    (Luis Alves via Michael Busch)
  • LUCENE-1854 : tar task should use longfile="gnu" to avoid false file name length warnings.
    (Mark Miller)
  • Test Cases (4)
  • LUCENE-1791 : Enhancements to the QueryUtils and CheckHits utility classes to wrap IndexReaders and Searchers in MultiReaders or MultiSearcher when possible to help exercise more edge cases.
    (Chris Hostetter, Mark Miller)
  • LUCENE-1852 : Fix localization test failures.
    (Robert Muir via Michael Busch)
  • LUCENE-1843 : Refactored all tests that use assertAnalyzesTo() & others in core and contrib to use a new BaseTokenStreamTestCase base class. Also rewrote some tests to use this general analysis assert functions instead of own ones (e.g. TestMappingCharFilter). The new base class also tests tokenization with the TokenStream.next() backwards layer enabled (using Token/TokenWrapper as attribute implementation) and disabled (default for Lucene 3.0)
    (Uwe Schindler, Robert Muir)
  • LUCENE-1836 : Added a new LocalizedTestCase as base class for localization junit tests.
    (Robert Muir, Uwe Schindler via Michael Busch)
  • Release 2.4.1 [2009-03-09]

  • API Changes (1)
  • LUCENE-1186 : Add Analyzer.close() to free internal ThreadLocal resources.
    (Christian Kohlschütter via Mike McCandless)
  • Bug fixes (15)
  • LUCENE-1452 : Fixed silent data-loss case whereby binary fields are truncated to 0 bytes during merging if the segments being merged are non-congruent (same field name maps to different field numbers). This bug was introduced with LUCENE-1219 .
    (Andrzej Bialecki via Mike McCandless) .
  • LUCENE-1429 : Don't throw incorrect IllegalStateException from IndexWriter.close() if you've hit an OOM when autoCommit is true.
    (Mike McCandless)
  • LUCENE-1474 : If IndexReader.flush() is called twice when there were pending deletions, it could lead to later false AssertionError during IndexReader.open.
    (Mike McCandless)
  • LUCENE-1430 : Fix false AlreadyClosedException from IndexReader.open (masking an actual IOException) that takes String or File path.
    (Mike McCandless)
  • LUCENE-1442 : Multiple-valued NOT_ANALYZED fields can double-count token offsets.
    (Mike McCandless)
  • LUCENE-1453 : Ensure IndexReader.reopen()/clone() does not result in incorrectly closing the shared FSDirectory. This bug would only happen if you use IndexReader.open() with a File or String argument. The returned readers are wrapped by a FilterIndexReader that correctly handles closing of directory after reopen()/clone().
    (Mark Miller, Uwe Schindler, Mike McCandless)
  • LUCENE-1457 : Fix possible overflow bugs during binary searches.
    (Mark Miller via Mike McCandless)
  • LUCENE-1459 : Fix CachingWrapperFilter to not throw exception if both bits() and getDocIdSet() methods are called.
    (Matt Jones via Mike McCandless)
  • LUCENE-1519 : Fix int overflow bug during segment merging.
    (Deepak via Mike McCandless)
  • LUCENE-1521 : Fix int overflow bug when flushing segment.
    (Shon Vella via Mike McCandless) .
  • LUCENE-1544 : Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
    (Mike McCandless via Doug Sale)
  • LUCENE-1547 : Fix rare thread safety issue if two threads call IndexWriter commit() at the same time.
    (Mike McCandless)
  • LUCENE-1465 : NearSpansOrdered returns payloads from first possible match rather than the correct, shortest match; Payloads could be returned even if the max slop was exceeded; The wrong payload could be returned in certain situations.
    (Jonathan Mamou, Greg Shackles, Mark Miller)
  • LUCENE-1186 : Add Analyzer.close() to free internal ThreadLocal resources.
    (Christian Kohlschütter via Mike McCandless)
  • LUCENE-1552 : Fix IndexWriter.addIndexes(IndexReader[]) to properly rollback IndexWriter's internal state on hitting an exception.
    (Scott Garland via Mike McCandless)
  • Release 2.4.0 [2008-10-08]

  • Changes in backwards compatibility policy (1)
  • LUCENE-1340 : In a minor change to Lucene's backward compatibility policy, we are now allowing the Fieldable interface to have changes, within reason, and made on a case-by-case basis. If an application implements its own Fieldable, please be aware of this. Otherwise, no need to be concerned. This is in effect for all 2.X releases, starting with 2.4. Also note, that in all likelihood, Fieldable will be changed in 3.0.
  • Changes in runtime behavior (4)
  • LUCENE-1151 : Fix StandardAnalyzer to not mis-identify host names (eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4 backwards compatible, but buggy, behavior, you can either call StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static method), or, set system property org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym to "false" on JVM startup. All StandardAnalyzer instances created after that will then show the pre-2.4 behavior. Alternatively, you can call setReplaceInvalidAcronym(false) to change the behavior per instance of StandardAnalyzer. This backwards compatibility will be removed in 3.0 (hardwiring the value to true).
    (Mike McCandless)
  • LUCENE-1044 : IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated)
    (Mike McCandless)
  • LUCENE-1335 : IndexWriter.addIndexes(Directory[]) and addIndexesNoOptimize no longer allow the same Directory instance to be passed in more than once. Internally, IndexWriter uses Directory and segment name to uniquely identify segments, so adding the same Directory more than once was causing duplicates which led to problems
    (Mike McCandless)
  • LUCENE-1396 : Improve PhraseQuery.toString() so that gaps in the positions are indicated with a ? and multiple terms at the same position are joined with a |.
    (Andrzej Bialecki via Mike McCandless)
  • API Changes (26)
  • LUCENE-1084 : Changed all IndexWriter constructors to take an explicit parameter for maximum field size. Deprecated all the pre-existing constructors; these will be removed in release 3.0. NOTE: these new constructors set autoCommit to false.
    (Steven Rowe via Mike McCandless)
  • LUCENE-584 : Changed Filter API to return a DocIdSet instead of a java.util.BitSet. This allows using more efficient data structures for Filters and makes them more flexible. This deprecates Filter.bits(), so all filters that implement this outside the Lucene code base will need to be adapted. See also the javadocs of the Filter class.
    (Paul Elschot, Michael Busch)
  • LUCENE-1044 : Added IndexWriter.commit() which flushes any buffered adds/deletes and then commits a new segments file so readers will see the changes. Deprecate IndexWriter.flush() in favor of IndexWriter.commit().
    (Mike McCandless)
  • LUCENE-325 : Added IndexWriter.expungeDeletes methods, which consult the MergePolicy to find merges necessary to merge away all deletes from the index. This should be a somewhat lower cost operation than optimize.
    (John Wang via Mike McCandless)
  • LUCENE-1233 : Return empty array instead of null when no fields match the specified name in these methods in Document: getFieldables, getFields, getValues, getBinaryValues.
    (Stefan Trcek vai Mike McCandless)
  • LUCENE-1234 : Make BoostingSpanScorer protected.
    (Andi Vajda via Grant Ingersoll)
  • LUCENE-510 : The index now stores strings as true UTF-8 bytes (previously it was Java's modified UTF-8). If any text, either stored fields or a token, has illegal UTF-16 surrogate characters, these characters are now silently replaced with the Unicode replacement character U+FFFD. This is a change to the index file format.
    (Marvin Humphrey via Mike McCandless)
  • LUCENE-852 : Let the SpellChecker caller specify IndexWriter mergeFactor and RAM buffer size.
    (Otis Gospodnetic)
  • LUCENE-1290 : Deprecate org.apache.lucene.search.Hits, Hit and HitIterator and remove all references to these classes from the core. Also update demos and tutorials.
    (Michael Busch)
  • LUCENE-1288 : Add getVersion() and getGeneration() to IndexCommit. getVersion() returns the same value that IndexReader.getVersion() returns when the reader is opened on the same commit.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1311 : Added IndexReader.listCommits(Directory) static method to list all commits in a Directory, plus IndexReader.open methods that accept an IndexCommit and open the index as of that commit. These methods are only useful if you implement a custom DeletionPolicy that keeps more than the last commit around.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1325 : Added IndexCommit.isOptimized().
    (Shalin Shekhar Mangar via Mike McCandless)
  • LUCENE-1324 : Added TokenFilter.reset().
    (Shai Erera via Mike McCandless)
  • LUCENE-1340 : Added Fieldable.omitTf() method to skip indexing term frequency, positions and payloads. This saves index space, and indexing/searching time.
    (Eks Dev via Mike McCandless)
  • LUCENE-1219 : Add basic reuse API to Fieldable for binary fields: getBinaryValue/Offset/Length(); currently only lazy fields reuse the provided byte[] result to getBinaryValue.
    (Eks Dev via Mike McCandless)
  • LUCENE-1334 : Add new constructor for Term: Term(String fieldName) which defaults term text to "".
    (DM Smith via Mike McCandless)
  • LUCENE-1333 : Added Token.reinit(*) APIs to re-initialize (reuse) a Token. Also added term() method to return a String, with a performance penalty clearly documented. Also implemented hashCode() and equals() in Token, and fixed all core and contrib analyzers to use the re-use APIs.
    (DM Smith via Mike McCandless)
  • LUCENE-1329 : Add optional readOnly boolean when opening an IndexReader. A readOnly reader is not allowed to make changes (deletions, norms) to the index; in exchanged, the isDeleted method, often a bottleneck when searching with many threads, is not synchronized. The default for readOnly is still false, but in 3.0 the default will become true.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1367 : Add IndexCommit.isDeleted().
    (Shalin Shekhar Mangar via Mike McCandless)
  • LUCENE-1061 : Factored out all "new XXXQuery(...)" in QueryParser.java into protected methods newXXXQuery(...) so that subclasses can create their own subclasses of each Query type.
    (John Wang via Mike McCandless)
  • LUCENE-753 : Added new Directory implementation org.apache.lucene.store.NIOFSDirectory, which uses java.nio's FileChannel to do file reads. On most non-Windows platforms, with many threads sharing a single searcher, this may yield sizable improvement to query throughput when compared to FSDirectory, which only allows a single thread to read from an open file at a time.
    (Jason Rutherglen via Mike McCandless)
  • LUCENE-1371 : Added convenience method TopDocs Searcher.search(Query query, int n).
    (Mike McCandless)
  • LUCENE-1356 : Allow easy extensions of TopDocCollector by turning constructor and fields from package to protected.
    (Shai Erera via Doron Cohen)
  • LUCENE-1375 : Added convenience method IndexCommit.getTimestamp, which is equivalent to getDirectory().fileModified(getSegmentsFileName()).
    (Mike McCandless)
  • LUCENE-1366 : Rename Field.Index options to be more accurate: TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED; NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS is added.
    (Mike McCandless)
  • LUCENE-1131 : Added numDeletedDocs method to IndexReader
    (Otis Gospodnetic)
  • Bug fixes (16)
  • LUCENE-1134 : Fixed BooleanQuery.rewrite to only optimize a single clause query if minNumShouldMatch<=0.
    (Shai Erera via Michael Busch)
  • LUCENE-1169 : Fixed bug in IndexSearcher.search(): searching with a filter might miss some hits because scorer.skipTo() is called without checking if the scorer is already at the right position. scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as scorer.next().
    (Eks Dev, Michael Busch)
  • LUCENE-1182 : Added scorePayload to SimilarityDelegator
    (Andi Vajda via Grant Ingersoll)
  • LUCENE-1213 : MultiFieldQueryParser was ignoring slop in case of a single field phrase.
    (Trejkaz via Doron Cohen)
  • LUCENE-1228 : IndexWriter.commit() was not updating the index version and as result IndexReader.reopen() failed to sense index changes.
    (Doron Cohen)
  • LUCENE-1267 : Added numDocs() and maxDoc() to IndexWriter; deprecated docCount().
    (Mike McCandless)
  • LUCENE-1274 : Added new prepareCommit() method to IndexWriter, which does phase 1 of a 2-phase commit (commit() does phase 2). This is needed when you want to update an index as part of a transaction involving external resources (eg a database). Also deprecated abort(), renaming it to rollback().
    (Mike McCandless)
  • LUCENE-1003 : Stop RussianAnalyzer from removing numbers.
    (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
  • LUCENE-1152 : SpellChecker fix around clearIndex and indexDictionary methods, plus removal of IndexReader reference.
    (Naveen Belkale via Otis Gospodnetic)
  • LUCENE-1046 : Removed dead code in SpellChecker
    (Daniel Naber via Otis Gospodnetic)
  • LUCENE-1189 : Fixed the QueryParser to handle escaped characters within quoted terms correctly.
    (Tomer Gabel via Michael Busch)
  • LUCENE-1299 : Fixed NPE in SpellChecker when IndexReader is not null and field is
    (Grant Ingersoll)
  • LUCENE-1303 : Fixed BoostingTermQuery's explanation to be marked as a Match depending only upon the non-payload score part, regardless of the effect of the payload on the score. Prior to this, score of a query containing a BTQ differed from its explanation.
    (Doron Cohen)
  • LUCENE-1310 : Fixed SloppyPhraseScorer to work also for terms repeating more than twice in the query.
    (Doron Cohen)
  • LUCENE-1351 : ISOLatin1AccentFilter now cleans additional ligatures
    (Cedrik Lime via Grant Ingersoll)
  • LUCENE-1383 : Workaround a nasty "leak" in Java's builtin ThreadLocal, to prevent Lucene from causing unexpected OutOfMemoryError in certain situations (notably J2EE applications).
    (Chris Lu via Mike McCandless)
  • New features (20)
  • LUCENE-1137 : Added Token.set/getFlags() accessors for passing more information about a Token through the analysis process. The flag is not indexed/stored and is thus only used by analysis.
  • LUCENE-1147 : Add -segment option to CheckIndex tool so you can check only a specific segment or segments in your index.
    (Mike McCandless)
  • LUCENE-1045 : Reopened this issue to add support for short and bytes.
  • LUCENE-584 : Added new data structures to o.a.l.util, such as OpenBitSet and SortedVIntList. These extend DocIdSet and can directly be used for Filters with the new Filter API. Also changed the core Filters to use OpenBitSet instead of java.util.BitSet.
    (Paul Elschot, Michael Busch)
  • LUCENE-494 : Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms. This Analyzer is not intended for use during indexing.
    (Mark Harwood via Grant Ingersoll)
  • LUCENE-1044 : Change Lucene to properly "sync" files after committing, to ensure on a machine or OS crash or power cut, even with cached writes, the index remains consistent. Also added explicit commit() method to IndexWriter to force a commit without having to close.
    (Mike McCandless)
  • LUCENE-997 : Add search timeout (partial) support. A TimeLimitedCollector was added to allow limiting search time. It is a partial solution since timeout is checked only when collecting a hit, and therefore a search for rare words in a huge index might not stop within the specified time.
    (Sean Timm via Doron Cohen)
  • LUCENE-1184 : Allow SnapshotDeletionPolicy to be re-used across close/re-open of IndexWriter while still protecting an open snapshot
    (Tim Brennan via Mike McCandless)
  • LUCENE-1194 : Added IndexWriter.deleteDocuments(Query) to delete documents matching the specified query. Also added static unlock and isLocked methods (deprecating the ones in IndexReader).
    (Mike McCandless)
  • LUCENE-1201 : Add IndexReader.getIndexCommit() method.
    (Tim Brennan via Mike McCandless)
  • LUCENE-550 : Added InstantiatedIndex implementation. Experimental Index store similar to MemoryIndex but allows for multiple documents in memory.
    (Karl Wettin via Grant Ingersoll)
  • LUCENE-400 : Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper that wraps another Analyzer's token stream with a ShingleFilter
    (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
  • LUCENE-1166 : Decomposition tokenfilter for languages like German and Swedish
    (Thomas Peuss via Grant Ingersoll)
  • LUCENE-1187 : ChainedFilter and BooleanFilter now work with new Filter API and DocIdSetIterator-based filters. Backwards-compatibility with old BitSet-based filters is ensured.
    (Paul Elschot via Michael Busch)
  • LUCENE-1295 : Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public.
    (Grant Ingersoll)
  • LUCENE-1298 : MoreLikeThis can now accept a custom Similarity
    (Grant Ingersoll)
  • LUCENE-1297 : Allow other string distance measures for the SpellChecker
    (Thomas Morton via Otis Gospodnetic)
  • LUCENE-1001 : Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement.
    (Mark Miller, Grant Ingersoll)
  • LUCENE-1354 : Provide programmatic access to CheckIndex
    (Grant Ingersoll, Mike McCandless)
  • LUCENE-1279 : Add support for Collators to RangeFilter/Query and Query Parser.
    (Steve Rowe via Grant Ingersoll)
  • Optimizations (6)
  • LUCENE-705 : When building a compound file, use RandomAccessFile.setLength() to tell the OS/filesystem to pre-allocate space for the file. This may improve fragmentation in how the CFS file is stored, and allows us to detect an upcoming disk full situation before actually filling up the disk.
    (Mike McCandless)
  • LUCENE-1120 : Speed up merging of term vectors by bulk-copying the raw bytes for each contiguous range of non-deleted documents.
    (Mike McCandless)
  • LUCENE-1185 : Avoid checking if the TermBuffer 'scratch' in SegmentTermEnum is null for every call of scanTo().
    (Christian Kohlschuetter via Michael Busch)
  • LUCENE-1217 : Internal to Field.java, use isBinary instead of runtime type checking for possible speedup of binaryValue().
    (Eks Dev via Mike McCandless)
  • LUCENE-1183 : Optimized TRStringDistance class (in contrib/spell) that uses less memory than the previous version.
    (Cédrik LIME via Otis Gospodnetic)
  • LUCENE-1195 : Improve term lookup performance by adding a LRU cache to the TermInfosReader. In performance experiments the speedup was about 25% on average on mid-size indexes with ~500,000 documents for queries with 3 terms and about 7% on larger indexes with ~4.3M documents.
    (Michael Busch)
  • Documentation (4)
  • LUCENE-1236 : Added some clarifying remarks to EdgeNGram*.java
    (Hiroaki Kawai via Grant Ingersoll)
  • LUCENE-1157 and LUCENE-1256 : HTML changes log, created automatically from CHANGES.txt. This HTML file is currently visible only via developers page.
    (Steven Rowe via Doron Cohen)
  • LUCENE-1349 : Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at the top of this file and also on Fieldable.java).
    (Grant Ingersoll)
  • LUCENE-1873 : Update documentation to reflect current Contrib area status.
    (Steven Rowe, Mark Miller)
  • Build (3)
  • LUCENE-1153 : Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
  • LUCENE-1202 : Small fixes to the way Clover is used to work better with contribs. Of particular note: a single clover db is used regardless of whether tests are run globally or in the specific contrib directories.
  • LUCENE-1353 : Javacc target in contrib/miscellaneous for generating the precedence query parser.
  • Test Cases (2)
  • LUCENE-1238 : Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded. Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped collector to collect also the last doc, after allowed-tTime passed.
    (Doron Cohen)
  • LUCENE-1348 : relax TestTimeLimitedCollector to not fail due to timeout exceeded (just because test machine is very busy).
  • LUCENE-1191 : On hitting OutOfMemoryError in any index-modifying methods in IndexWriter, do not commit any further changes to the index to prevent risk of possible corruption.
    (Mike McCandless)
  • LUCENE-1197 : Fixed issue whereby IndexWriter would flush by RAM too early when TermVectors were in use.
    (Mike McCandless)
  • LUCENE-1198 : Don't corrupt index if an exception happens inside DocumentsWriter.init
    (Mike McCandless)
  • LUCENE-1199 : Added defensive check for null indexReader before calling close in IndexModifier.close()
    (Mike McCandless)
  • LUCENE-1200 : Fix rare deadlock case in addIndexes* when ConcurrentMergeScheduler is in use
    (Mike McCandless)
  • LUCENE-1208 : Fix deadlock case on hitting an exception while processing a document that had triggered a flush
    (Mike McCandless)
  • LUCENE-1210 : Fix deadlock case on hitting an exception while starting a merge when using ConcurrentMergeScheduler
    (Mike McCandless)
  • LUCENE-1222 : Fix IndexWriter.doAfterFlush to always be called on flush
    (Mark Ferguson via Mike McCandless)
  • LUCENE-1226 : Fixed IndexWriter.addIndexes(IndexReader[]) to commit successfully created compound files.
    (Michael Busch)
  • LUCENE-1150 : Re-expose StandardTokenizer's constants publicly; this was accidentally lost with LUCENE-966 .
    (Nicolas Lalevée via Mike McCandless)
  • LUCENE-1262 : Fixed bug in BufferedIndexReader.refill whereby on hitting an exception in readInternal, the buffer is incorrectly filled with stale bytes such that subsequent calls to readByte() return incorrect results.
    (Trejkaz via Mike McCandless)
  • LUCENE-1270 : Fixed intermittent case where IndexWriter.close() would hang after IndexWriter.addIndexesNoOptimize had been called.
    (Stu Hood via Mike McCandless)
  • Build (1)
  • LUCENE-1230 : Include *pom.xml* in source release files.
    (Michael Busch)
  • Release 2.3.1 [2008-02-22]

  • Bug fixes (7)
  • LUCENE-1168 : Fixed corruption cases when autoCommit=false and documents have mixed term vectors
    (Suresh Guvvala via Mike McCandless) .
  • LUCENE-1171 : Fixed some cases where OOM errors could cause deadlock in IndexWriter
    (Mike McCandless) .
  • LUCENE-1173 : Fixed corruption case when autoCommit=false and bulk merging of stored fields is used
    (Yonik via Mike McCandless) .
  • LUCENE-1163 : Fixed bug in CharArraySet.contains(char[] buffer, int offset, int len) that was ignoring offset and thus giving the wrong answer.
    (Thomas Peuss via Mike McCandless)
  • LUCENE-1177 : Fix rare case where IndexWriter.optimize might do too many merges at the end.
    (Mike McCandless)
  • LUCENE-1176 : Fix corruption case when documents with no term vector fields are added before documents with term vector fields.
    (Mike McCandless)
  • LUCENE-1179 : Fixed assert statement that was incorrectly preventing Fields with empty-string field name from working.
    (Sergey Kabashnyuk via Mike McCandless)
  • Release 2.3.0 [2008-01-23]

  • Changes in runtime behavior (2)
  • LUCENE-994 : Defaults for IndexWriter have been changed to maximize out-of-the-box indexing speed. First, IndexWriter now flushes by RAM usage (16 MB by default) instead of a fixed doc count (call IndexWriter.setMaxBufferedDocs to get backwards compatible behavior). Second, ConcurrentMergeScheduler is used to run merges using background threads (call IndexWriter.setMergeScheduler(new SerialMergeScheduler()) to get backwards compatible behavior). Third, merges are chosen based on size in bytes of each segment rather than document count of each segment (call IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get backwards compatible behavior). NOTE: users of ParallelReader must change back all of these defaults in order to ensure the docIDs "align" across all parallel indices.
    (Mike McCandless)

  • LUCENE-1045 : SortField.AUTO didn't work with long. When detecting the field type for sorting automatically, numbers used to be interpreted as int, then as float, if parsing the number as an int failed. Now the detection checks for int, then for long, then for float.
    (Daniel Naber)
  • API Changes (14)
  • LUCENE-843 : Added IndexWriter.setRAMBufferSizeMB(...) to have IndexWriter flush whenever the buffered documents are using more than the specified amount of RAM. Also added new APIs to Token that allow one to set a char[] plus offset and length to specify a token (to avoid creating a new String() for each Token).
    (Mike McCandless)
  • LUCENE-963 : Add setters to Field to allow for re-using a single Field instance during indexing. This is a sizable performance gain, especially for small documents.
    (Mike McCandless)
  • LUCENE-969 : Add new APIs to Token, TokenStream and Analyzer to permit re-using of Token and TokenStream instances during indexing. Changed Token to use a char[] as the store for the termText instead of String. This gives faster tokenization performance (~10-15%).
    (Mike McCandless)
  • LUCENE-847 : Factored MergePolicy, which determines which merges should take place and when, as well as MergeScheduler, which determines when the selected merges should actually run, out of IndexWriter. The default merge policy is now LogByteSizeMergePolicy (see LUCENE-845 ) and the default merge scheduler is now ConcurrentMergeScheduler (see LUCENE-870 ).
    (Steven Parkes via Mike McCandless)
  • LUCENE-1052 : Add IndexReader.setTermInfosIndexDivisor(int) method that allows you to reduce memory usage of the termInfos by further sub-sampling (over the termIndexInterval that was used during indexing) which terms are loaded into memory.
    (Chuck Williams, Doug Cutting via Mike McCandless)
  • LUCENE-743 : Add IndexReader.reopen() method that re-opens an existing IndexReader (see New features -> 8.)
    (Michael Busch)
  • LUCENE-1062 : Add setData(byte[] data), setData(byte[] data, int offset, int length), getData(), getOffset() and clone() methods to o.a.l.index.Payload. Also add the field name as arg to Similarity.scorePayload().
    (Michael Busch)
  • LUCENE-982 : Add IndexWriter.optimize(int maxNumSegments) method to "partially optimize" an index down to maxNumSegments segments.
    (Mike McCandless)
  • LUCENE-1080 : Changed Token.DEFAULT_TYPE to be public.
  • LUCENE-1064 : Changed TopDocs constructor to be public.
    (Shai Erera via Michael Busch)
  • LUCENE-1079 : DocValues cleanup: constructor now has no params, and getInnerArray() now throws UnsupportedOperationException
    (Doron Cohen)
  • LUCENE-1089 : Added PriorityQueue.insertWithOverflow, which returns the Object (if any) that was bumped from the queue to allow re-use.
    (Shai Erera via Mike McCandless)
  • LUCENE-1101 : Token reuse 'contract' (defined LUCENE-969 ) modified so it is token producer's responsibility to call Token.clear().
    (Doron Cohen)
  • LUCENE-1118 : Changed StandardAnalyzer to skip too-long (default > 255 characters) tokens. You can increase this limit by calling StandardAnalyzer.setMaxTokenLength(...).
    (Michael McCandless)
  • Bug fixes (28)
  • LUCENE-933 : QueryParser fixed to not produce empty sub BooleanQueries "()" even if the Analyzer produced no tokens for input.
    (Doron Cohen)
  • LUCENE-955 : Fixed SegmentTermPositions to work correctly with the first term in the dictionary.
    (Michael Busch)
  • LUCENE-951 : Fixed NullPointerException in MultiLevelSkipListReader that was thrown after a call of TermPositions.seek().
    (Rich Johnson via Michael Busch)
  • LUCENE-938 : Fixed cases where an unhandled exception in IndexWriter's methods could cause deletes to be lost.
    (Steven Parkes via Mike McCandless)
  • LUCENE-962 : Fixed case where an unhandled exception in IndexWriter.addDocument or IndexWriter.updateDocument could cause unreferenced files in the index to not be deleted
    (Steven Parkes via Mike McCandless)
  • LUCENE-957 : RAMDirectory fixed to properly handle directories larger than Integer.MAX_VALUE.
    (Doron Cohen)
  • LUCENE-781 : MultiReader fixed to not throw NPE if isCurrent(), isOptimized() or getVersion() is called. Separated MultiReader into two classes: MultiSegmentReader extends IndexReader, is package-protected and is created automatically by IndexReader.open() in case the index has multiple segments. The public MultiReader now extends MultiSegmentReader and is intended to be used by users who want to add their own subreaders.
    (Daniel Naber, Michael Busch)
  • LUCENE-970 : FilterIndexReader now implements isOptimized(). Before a call of isOptimized() would throw a NPE.
    (Michael Busch)
  • LUCENE-832 : ParallelReader fixed to not throw NPE if isCurrent(), isOptimized() or getVersion() is called.
    (Michael Busch)
  • LUCENE-948 : Fix FNFE exception caused by stale NFS client directory listing caches when writers on different machines are sharing an index over NFS and using a custom deletion policy
    (Mike McCandless)
  • LUCENE-978 : Ensure TermInfosReader, FieldsReader, and FieldsReader close any streams they had opened if an exception is hit in the constructor.
    (Ning Li via Mike McCandless)
  • LUCENE-985 : If an extremely long term is in a doc (> 16383 chars), we now throw an IllegalArgumentException saying the term is too long, instead of cryptic ArrayIndexOutOfBoundsException.
    (Karl Wettin via Mike McCandless)
  • LUCENE-991 : The explain() method of BoostingTermQuery had errors when no payloads were present on a document.
    (Peter Keegan via Grant Ingersoll)
  • LUCENE-992 : Fixed IndexWriter.updateDocument to be atomic again (this was broken by LUCENE-843 ).
    (Ning Li via Mike McCandless)
  • LUCENE-1008 : Fixed corruption case when document with no term vector fields is added after documents with term vector fields. This bug was introduced with LUCENE-843 .
    (Grant Ingersoll via Mike McCandless)
  • LUCENE-1006 : Fixed QueryParser to accept a "" field value (zero length quoted string.)
    (yonik)
  • LUCENE-1010 : Fixed corruption case when document with no term vector fields is added after documents with term vector fields. This case is hit during merge and would cause an EOFException. This bug was introduced with LUCENE-984 .
    (Andi Vajda via Mike McCandless)
  • LUCENE-1009 : Fix merge slowdown with LogByteSizeMergePolicy when autoCommit=false and documents are using stored fields and/or term vectors.
    (Mark Miller via Mike McCandless)
  • LUCENE-1011 : Fixed corruption case when two or more machines, sharing an index over NFS, can be writers in quick succession.
    (Patrick Kimber via Mike McCandless)
  • LUCENE-1028 : Fixed Weight serialization for few queries: DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery. Serialization check added for all queries.
    (Kyle Maxwell via Doron Cohen)
  • LUCENE-1048 : Fixed incorrect behavior in Lock.obtain(...) when the timeout argument is very large (eg Long.MAX_VALUE). Also added Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout.
    (Nikolay Diakov via Mike McCandless)
  • LUCENE-1050 : Throw LockReleaseFailedException in Simple/NativeFSLockFactory if we fail to delete the lock file when releasing the lock.
    (Nikolay Diakov via Mike McCandless)
  • LUCENE-1071 : Fixed SegmentMerger to correctly set payload bit in the merged segment.
    (Michael Busch)
  • LUCENE-1042 : Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent.
    (Karl Wettin via Grant Ingersoll)
  • LUCENE-1096 : Fixed Hits behavior when hits' docs are deleted along with iterating the hits. Deleting docs already retrieved now works seamlessly. If docs not yet retrieved are deleted (e.g. from another thread), and then, relying on the initial Hits.length(), an application attempts to retrieve more hits than actually exist , a ConcurrentMidificationException is thrown.
    (Doron Cohen)
  • LUCENE-1068 : Changed StandardTokenizer to fix an issue with it marking the type of some tokens incorrectly. This is done by adding a new flag named replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting this flag to true fixes the problem. This flag is a temporary fix and is already marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll) LUCENE-1140 : Fixed NPE caused by 1068
    (Alexei Dets via Grant Ingersoll)
  • LUCENE-749 : ChainedFilter behavior fixed when logic of first filter is ANDNOT.
    (Antonio Bruno via Doron Cohen)
  • LUCENE-508 : Make sure SegmentTermEnum.prev() is accurate (= last term) after next() returns false.
    (Steven Tamm via Mike McCandless)
  • New features (14)
  • LUCENE-906 : Elision filter for French.
    (Mathieu Lecarme via Otis Gospodnetic)
  • LUCENE-960 : Added a SpanQueryFilter and related classes to allow for not only filtering, but knowing where in a Document a Filter matches
    (Grant Ingersoll)
  • LUCENE-868 : Added new Term Vector access features. New callback mechanism allows application to define how and where to read Term Vectors from disk. This implementation contains several extensions of the new abstract TermVectorMapper class. The new API should be back-compatible. No changes in the actual storage of Term Vectors has taken place.
  • LUCENE-1038 : Added setDocumentNumber() method to TermVectorMapper to provide information about what document is being accessed.
    (Karl Wettin via Grant Ingersoll)
  • LUCENE-975 : Added PositionBasedTermVectorMapper that allows for position based lookup of term vector information. See item #3 above ( LUCENE-868 ).
  • LUCENE-1011 : Added simple tools (all in org.apache.lucene.store) to verify that locking is working properly. LockVerifyServer runs a separate server to verify locks. LockStressTest runs a simple tool that rapidly obtains and releases locks. VerifyingLockFactory is a LockFactory that wraps any other LockFactory and consults the LockVerifyServer whenever a lock is obtained or released, throwing an exception if an illegal lock obtain occurred.
    (Patrick Kimber via Mike McCandless)
  • LUCENE-1015 : Added FieldCache extension (ExtendedFieldCache) to support doubles and longs. Added support into SortField for sorting on doubles and longs as well.
    (Grant Ingersoll)
  • LUCENE-1020 : Created basic index checking & repair tool (o.a.l.index.CheckIndex). When run without -fix it does a detailed test of all segments in the index and reports summary information and any errors it hit. With -fix it will remove segments that had errors.
    (Mike McCandless)
  • LUCENE-743 : Add IndexReader.reopen() method that re-opens an existing IndexReader by only loading those portions of an index that have changed since the reader was (re)opened. reopen() can be significantly faster than open(), depending on the amount of index changes. SegmentReader, MultiSegmentReader, MultiReader, and ParallelReader implement reopen().
    (Michael Busch)
  • LUCENE-1040 : CharArraySet useful for efficiently checking set membership of text specified by char[].
    (yonik)
  • LUCENE-1073 : Created SnapshotDeletionPolicy to facilitate taking a live backup of an index without pausing indexing.
    (Mike McCandless)
  • LUCENE-1019 : CustomScoreQuery enhanced to support multiple ValueSource queries.
    (Kyle Maxwell via Doron Cohen)
  • LUCENE-1095 : Added an option to StopFilter to increase positionIncrement of the token succeeding a stopped token. Disabled by default. Similar option added to QueryParser to consider token positions when creating PhraseQuery and MultiPhraseQuery. Disabled by default (so by default the query parser ignores position increments).
    (Doron Cohen)
  • LUCENE-1380 : Added TokenFilter for setting position increment in special cases related to the ShingleFilter
    (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
  • Optimizations (14)
  • LUCENE-937 : CachingTokenFilter now uses an iterator to access the Tokens that are cached in the LinkedList. This increases performance significantly, especially when the number of Tokens is large.
    (Mark Miller via Michael Busch)
  • LUCENE-843 : Substantial optimizations to improve how IndexWriter uses RAM for buffering documents and to speed up indexing (2X-8X faster). A single shared hash table now records the in-memory postings per unique term and is directly flushed into a single segment.
    (Mike McCandless)
  • LUCENE-892 : Fixed extra "buffer to buffer copy" that sometimes takes place when using compound files.
    (Mike McCandless)
  • LUCENE-959 : Remove synchronization in Document
    (yonik)
  • LUCENE-963 : Add setters to Field to allow for re-using a single Field instance during indexing. This is a sizable performance gain, especially for small documents.
    (Mike McCandless)
  • LUCENE-939 : Check explicitly for boundary conditions in FieldInfos and don't rely on exceptions.
    (Michael Busch)
  • LUCENE-966 : Very substantial speedups (~6X faster) for StandardTokenizer (StandardAnalyzer) by using JFlex instead of JavaCC to generate the tokenizer.
    (Stanislaw Osinski via Mike McCandless)
  • LUCENE-969 : Changed core tokenizers & filters to re-use Token and TokenStream instances when possible to improve tokenization performance (~10-15%).
    (Mike McCandless)
  • LUCENE-871 : Speedup ISOLatin1AccentFilter
    (Ian Boston via Mike McCandless)
  • LUCENE-986 : Refactored SegmentInfos from IndexReader into the new subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader now extend DirectoryIndexReader and are the only IndexReader implementations that use SegmentInfos to access an index and acquire a write lock for index modifications.
    (Michael Busch)
  • LUCENE-1007 : Allow flushing in IndexWriter to be triggered by either RAM usage or document count or both (whichever comes first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable one of the flush triggers.
    (Ning Li via Mike McCandless)
  • LUCENE-1043 : Speed up merging of stored fields by bulk-copying the raw bytes for each contiguous range of non-deleted documents.
    (Robert Engels via Mike McCandless)
  • LUCENE-693 : Speed up nested conjunctions (~2x) that match many documents, and a slight performance increase for top level conjunctions.
    (yonik)
  • LUCENE-1098 : Make inner class StandardAnalyzer.SavedStreams static and final.
    (Nathan Beyer via Michael Busch)
  • Documentation (2)
  • LUCENE-1051 : Generate separate javadocs for core, demo and contrib classes, as well as an unified view. Also add an appropriate menu structure to the website.
    (Michael Busch)
  • LUCENE-746 : Fix error message in AnalyzingQueryParser.getPrefixQuery.
    (Ronnie Kolehmainen via Michael Busch)
  • Build (8)
  • LUCENE-908 : Improvements and simplifications for how the MANIFEST file and the META-INF dir are created.
    (Michael Busch)
  • LUCENE-935 : Various improvements for the maven artifacts. Now the artifacts also include the sources as .jar files.
    (Michael Busch)
  • Added apply-patch target to top-level build. Defaults to looking for a patch in ${basedir}/../patches with name specified by -Dpatch.name. Can also specify any location by -Dpatch.file property on the command line. This should be helpful for easy application of patches, but it is also a step towards integrating automatic patch application with JIRA and Hudson, and is thus subject to change.
    (Grant Ingersoll)
  • LUCENE-935 : Defined property "m2.repository.url" to allow setting the url to a maven remote repository to deploy to.
    (Michael Busch)
  • LUCENE-1051 : Include javadocs in the maven artifacts.
    (Michael Busch)
  • LUCENE-1055 : Remove gdata-server from build files and its sources from trunk.
    (Michael Busch)
  • LUCENE-935 : Allow to deploy maven artifacts to a remote m2 repository via scp and ssh authentication.
    (Michael Busch)
  • LUCENE-1123 : Allow overriding the specification version for MANIFEST.MF
    (Michael Busch)
  • Test Cases (1)
  • LUCENE-766 : Test adding two fields with the same name but different term vector setting.
    (Nicolas Lalevée via Doron Cohen)
  • Release 2.2.0 [2007-06-19]

  • Changes in runtime behavior (none)
  • API Changes (13)
  • LUCENE-793 : created new exceptions and added them to throws clause for many methods (all subclasses of IOException for backwards compatibility): index.StaleReaderException, index.CorruptIndexException, store.LockObtainFailedException. This was done to better call out the possible root causes of an IOException from these methods.
    (Mike McCandless)
  • LUCENE-811 : make SegmentInfos class, plus a few methods from related classes, package-private again (they were unnecessarily made public as part of LUCENE-701 ).
    (Mike McCandless)
  • LUCENE-710 : added optional autoCommit boolean to IndexWriter constructors. When this is false, index changes are not committed until the writer is closed. This gives explicit control over when a reader will see the changes. Also added optional custom deletion policy to explicitly control when prior commits are removed from the index. This is intended to allow applications to share an index over NFS by customizing when prior commits are deleted.
    (Mike McCandless)
  • LUCENE-818 : changed most public methods of IndexWriter, IndexReader (and its subclasses), FieldsReader and RAMDirectory to throw AlreadyClosedException if they are accessed after being closed.
    (Mike McCandless)
  • LUCENE-834 : Changed some access levels for certain Span classes to allow them to be overridden. They have been marked expert only and not for public consumption.
    (Grant Ingersoll)
  • LUCENE-796 : Removed calls to super.* from various get*Query methods in MultiFieldQueryParser, in order to allow sub-classes to override them.
    (Steven Parkes via Otis Gospodnetic)
  • LUCENE-857 : Removed caching from QueryFilter and deprecated QueryFilter in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter combination when caching is desired.
    (Chris Hostetter, Otis Gospodnetic)
  • LUCENE-869 : Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory to enable extensibility of these classes.
    (Michael Busch)
  • LUCENE-580 : Added the public method reset() to TokenStream. This method does nothing by default, but may be overwritten by subclasses to support consuming the TokenStream more than once.
    (Michael Busch)
  • LUCENE-580 : Added a new constructor to Field that takes a TokenStream as argument, available as tokenStreamValue(). This is useful to avoid the need of "dummy analyzers" for pre-analyzed fields.
    (Karl Wettin, Michael Busch)
  • LUCENE-730 : Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.) improves performance for certain queries but results in scoring out of docid order. This patch reverse this change, so now by default hit docs are scored in docid order if not setAllowDocsOutOfOrder(true) is explicitly called. This patch also enables the tests in QueryUtils again that check for docid order.
    (Paul Elschot, Doron Cohen, Michael Busch)
  • LUCENE-888 : Added Directory.openInput(File path, int bufferSize) to optionally specify the size of the read buffer. Also added BufferedIndexInput.setBufferSize(int) to change the buffer size.
    (Mike McCandless)
  • LUCENE-923 : Make SegmentTermPositionVector package-private. It does not need to be public because it implements the public interface TermPositionVector.
    (Michael Busch)
  • Bug fixes (24)
  • LUCENE-804 : Fixed build.xml to pack a fully compilable src dist.
    (Doron Cohen)
  • LUCENE-813 : Leading wildcard fixed to work with trailing wildcard. Query parser modified to create a prefix query only for the case that there is a single trailing wildcard (and no additional wildcard or '?' in the query text).
    (Doron Cohen)
  • LUCENE-812 : Add no-argument constructors to NativeFSLockFactory and SimpleFSLockFactory. This enables all 4 builtin LockFactory implementations to be specified via the System property org.apache.lucene.store.FSDirectoryLockFactoryClass.
    (Mike McCandless)
  • LUCENE-821 : The new single-norm-file introduced by LUCENE-756 failed to reduce the number of open descriptors since it was still opened once per field with norms.
    (yonik)
  • LUCENE-823 : Make sure internal file handles are closed when hitting an exception (eg disk full) while flushing deletes in IndexWriter's mergeSegments, and also during IndexWriter.addIndexes.
    (Mike McCandless)
  • LUCENE-825 : If directory is removed after FSDirectory.getDirectory() but before IndexReader.open you now get a FileNotFoundException like Lucene pre-2.1 (before this fix you got an NPE).
    (Mike McCandless)
  • LUCENE-800 : Removed backslash from the TERM_CHAR list in the queryparser, because the backslash is the escape character. Also changed the ESCAPED_CHAR list to contain all possible characters, because every character that follows a backslash should be considered as escaped.
    (Michael Busch)
  • LUCENE-372 : QueryParser.parse() now ensures that the entire input string is consumed. Now a ParseException is thrown if a query contains too many closing parentheses.
    (Andreas Neumann via Michael Busch)
  • LUCENE-814 : javacc build targets now fix line-end-style of generated files. Now also deleting all javacc generated files before calling javacc.
    (Steven Parkes, Doron Cohen)
  • LUCENE-829 : close readers in contrib/benchmark.
    (Karl Wettin, Doron Cohen)
  • LUCENE-828 : Minor fix for Term's equal().
    (Paul Cowan via Otis Gospodnetic)
  • LUCENE-846 : Fixed: if IndexWriter is opened with autoCommit=false, and you call addIndexes, and hit an exception (eg disk full) then when IndexWriter rolls back its internal state this could corrupt the instance of IndexWriter (but, not the index itself) by referencing already deleted segments. This bug was only present in 2.2 (trunk), ie was never released.
    (Mike McCandless)
  • LUCENE-736 : Sloppy phrase query with repeating terms matches wrong docs. For example query "B C B"~2 matches the doc "A B C D E".
    (Doron Cohen)
  • LUCENE-789 : Fixed: custom similarity is ignored when using MultiSearcher (problem reported by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used. Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity was set has no effect - it is masked by the similarity of the MultiSearcher. This is as designed, because MultiSearcher operates on Searchables (not Searchers).
    (Doron Cohen)
  • LUCENE-880 : Fixed DocumentWriter to close the TokenStreams after it has written the postings. Then the resources associated with the TokenStreams can safely be released.
    (Michael Busch)
  • LUCENE-883 : consecutive calls to Spellchecker.indexDictionary() won't insert terms twice anymore.
    (Daniel Naber)
  • LUCENE-881 : QueryParser.escape() now also escapes the characters '|' and '&' which are part of the queryparser syntax.
    (Michael Busch)
  • LUCENE-886 : Spellchecker clean up: exceptions aren't printed to STDERR anymore and ignored, but re-thrown. Some javadoc improvements.
    (Daniel Naber)
  • LUCENE-698 : FilteredQuery now takes the query boost into account for scoring.
    (Michael Busch)
  • LUCENE-763 : Spellchecker: LuceneDictionary used to skip first word in enumeration.
    (Christian Mallwitz via Daniel Naber)
  • LUCENE-903 : FilteredQuery explanation inaccuracy with boost. Explanation tests now "deep" check the explanation details.
    (Chris Hostetter, Doron Cohen)
  • LUCENE-912 : DisjunctionMaxScorer first skipTo(target) call ignores the skip target param and ends up at the first match.
    (Sudaakeran B. via Chris Hostetter & Doron Cohen)
  • LUCENE-913 : Two consecutive score() calls return different scores for Boolean Queries.
    (Michael Busch, Doron Cohen)
  • LUCENE-1013 : Fix IndexWriter.setMaxMergeDocs to work "out of the box", again, by moving set/getMaxMergeDocs up from LogDocMergePolicy into LogMergePolicy. This fixes the API breakage (non backwards compatible change) caused by LUCENE-994 .
    (Yonik Seeley via Mike McCandless)
  • New features (9)
  • LUCENE-759 : Added two n-gram-producing TokenFilters.
    (Otis Gospodnetic)
  • LUCENE-822 : Added FieldSelector capabilities to Searchable for use with RemoteSearcher, and other Searchable implementations.
    (Mark Miller, Grant Ingersoll)
  • LUCENE-755 : Added the ability to store arbitrary binary metadata in the posting list. These metadata are called Payloads. For every position of a Token one Payload in the form of a variable length byte array can be stored in the prox file. Remark: The APIs introduced with this feature are in experimental state and thus contain appropriate warnings in the javadocs.
    (Michael Busch)
  • LUCENE-834 : Added BoostingTermQuery which can boost scores based on the values of a payload (see #3 above.)
    (Grant Ingersoll)
  • LUCENE-834 : Similarity has a new method for scoring payloads called scorePayloads that can be overridden to take advantage of payload storage (see #3 above)
  • LUCENE-834 : Added isPayloadAvailable() onto TermPositions interface and implemented it in the appropriate places
    (Grant Ingersoll)
  • LUCENE-853 : Added RemoteCachingWrapperFilter to enable caching of Filters on the remote side of the RMI connection.
    (Matt Ericson via Otis Gospodnetic)
  • LUCENE-446 : Added Solr's search.function for scores based on field values, plus CustomScoreQuery for simple score (post) customization.
    (Yonik Seeley, Doron Cohen)
  • LUCENE-1058 : Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
    (Grant Ingersoll, Michael Busch, Yonik Seeley)
  • Optimizations (7)
  • LUCENE-761 : The proxStream is now cloned lazily in SegmentTermPositions when nextPosition() is called for the first time. This allows using instances of SegmentTermPositions instead of SegmentTermDocs without additional costs.
    (Michael Busch)
  • LUCENE-431 : RAMInputStream and RAMOutputStream extend IndexInput and IndexOutput directly now. This avoids further buffering and thus avoids unnecessary array copies.
    (Michael Busch)
  • LUCENE-730 : Updated BooleanScorer2 to make use of BooleanScorer in some cases and possibly improve scoring performance. Documents can now be delivered out-of-order as they are scored (e.g. to HitCollector). N.B. A bit of code had to be disabled in QueryUtils in order for TestBoolean2 test to keep passing.
    (Paul Elschot via Otis Gospodnetic)
  • LUCENE-882 : Spellchecker doesn't store the ngrams anymore but only indexes them to keep the spell index small.
    (Daniel Naber)
  • LUCENE-430 : Delay allocation of the buffer after a clone of BufferedIndexInput. Together with LUCENE-888 this will allow to adjust the buffer size dynamically.
    (Paul Elschot, Michael Busch)
  • LUCENE-888 : Increase buffer sizes inside CompoundFileWriter and BufferedIndexOutput. Also increase buffer size in BufferedIndexInput, but only when used during merging. Together, these increases yield 10-18% overall performance gain vs the previous 1K defaults.
    (Mike McCandless)
  • LUCENE-866 : Adds multi-level skip lists to the posting lists. This speeds up most queries that use skipTo(), especially on big indexes with large posting lists. For average AND queries the speedup is about 20%, for queries that contain very frequent and very unique terms the speedup can be over 80%.
    (Michael Busch)
  • Documentation (6)
  • LUCENE 791 && INFRA-1173 : Infrastructure moved the Wiki to http://wiki.apache.org/lucene-java/ Updated the links in the docs and wherever else I found references.
    (Grant Ingersoll, Joe Schaefer)
  • LUCENE-807 : Fixed the javadoc for ScoreDocComparator.compare() to be consistent with java.util.Comparator.compare(): Any integer is allowed to be returned instead of only -1/0/1.
    (Paul Cowan via Michael Busch)
  • LUCENE-875 : Solved javadoc warnings & errors under jdk1.4. Solved javadoc errors under jdk5 (jars in path for gdata). Made "javadocs" target depend on "build-contrib" for first downloading contrib jars configured for dynamic downloaded. (Note: when running behind firewall, a firewall prompt might pop up)
    (Doron Cohen)
  • LUCENE-740 : Added SNOWBALL-LICENSE.txt to the snowball package and a remark about the license to NOTICE.TXT.
    (Steven Parkes via Michael Busch)
  • LUCENE-925 : Added analysis package javadocs.
    (Grant Ingersoll and Doron Cohen)
  • LUCENE-926 : Added document package javadocs.
    (Grant Ingersoll)
  • Build (10)
  • LUCENE-802 : Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
    (Steven Parkes via Michael Busch)
  • LUCENE-885 : "ant test" now includes all contrib tests. The new "ant test-core" target can be used to run only the Core (non contrib) tests.
    (Chris Hostetter)
  • LUCENE-900 : "ant test" now enables Java assertions (in Lucene packages).
    (Doron Cohen)
  • LUCENE-894 : Add custom build file for binary distributions that includes targets to build the demos.
    (Chris Hostetter, Michael Busch)
  • LUCENE-904 : The "package" targets in build.xml now also generate .md5 checksum files.
    (Chris Hostetter, Michael Busch)
  • LUCENE-907 : Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of demo war, demo jar, and the contrib jars.
    (Michael Busch)
  • LUCENE-909 : Demo targets for running the demo.
    (Doron Cohen)
  • LUCENE-908 : Improves content of MANIFEST file and makes it customizable for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
    (Chris Hostetter, Michael Busch)
  • LUCENE-930 : Various contrib building improvements to ensure contrib dependencies are met, and test compilation errors fail the build.
    (Steven Parkes, Chris Hostetter)
  • LUCENE-622 : Add ant target and pom.xml files for building maven artifacts of the Lucene core and the contrib modules.
    (Sami Siren, Karl Wettin, Michael Busch)
  • Release 2.1.0 [2007-02-17]

  • Changes in runtime behavior (9)
  • 's' and 't' have been removed from the list of default stopwords in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's' as a stopword meant that 's-class' led to the same results as 'class'. Note that this problem still exists for 'a', e.g. in 'a-class' as 'a' continues to be a stopword.
    (Daniel Naber)
  • LUCENE-478 : Updated the list of Unicode code point ranges for CJK (now split into CJ and K) in StandardAnalyzer.
    (John Wang and Steven Rowe via Otis Gospodnetic)
  • Modified some CJK Unicode code point ranges in StandardTokenizer.jj, and added a few more of them to increase CJK character coverage. Also documented some of the ranges.
    (Otis Gospodnetic)
  • LUCENE-489 : Add support for leading wildcard characters (*, ?) to QueryParser. Default is to disallow them, as before.
    (Steven Parkes via Otis Gospodnetic)
  • LUCENE-703 : QueryParser changed to default to use of ConstantScoreRangeQuery for range queries. Added useOldRangeQuery property to QueryParser to allow selection of old RangeQuery class if required.
    (Mark Harwood)
  • LUCENE-543 : WildcardQuery now performs a TermQuery if the provided term does not contain a wildcard character (? or *), when previously a StringIndexOutOfBoundsException was thrown.
    (Michael Busch via Erik Hatcher)
  • LUCENE-726 : Removed the use of deprecated doc.fields() method and Enumeration.
    (Michael Busch via Otis Gospodnetic)
  • LUCENE-436 : Removed finalize() in TermInfosReader and SegmentReader, and added a call to enumerators.remove() in TermInfosReader.close(). The finalize() overrides were added to help with a pre-1.4.2 JVM bug that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
    (Otis Gospodnetic)
  • LUCENE-771 : The default location of the write lock is now the index directory, and is named simply "write.lock" (without a big digest prefix). The system properties "org.apache.lucene.lockDir" nor "java.io.tmpdir" are no longer used as the global directory for storing lock files, and the LOCK_DIR field of FSDirectory is now deprecated.
    (Mike McCandless)
  • New features (15)
  • LUCENE-503 : New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
    (Samphan Raruenrom via Chris Hostetter)
  • LUCENE-545 : New FieldSelector API and associated changes to IndexReader and implementations. New Fieldable interface for use with the lazy field loading mechanism.
    (Grant Ingersoll and Chuck Williams via Grant Ingersoll)
  • LUCENE-676 : Move Solr's PrefixFilter to Lucene core.
    (Yura Smolsky, Yonik Seeley)
  • LUCENE-678 : Added NativeFSLockFactory, which implements locking using OS native locking (via java.nio.*).
    (Michael McCandless via Yonik Seeley)
  • LUCENE-544 : Added the ability to specify different boosts for different fields when using MultiFieldQueryParser
    (Matt Ericson via Otis Gospodnetic)
  • LUCENE-528 : New IndexWriter.addIndexesNoOptimize() that doesn't optimize the index when adding new segments, only performing merges as needed.
    (Ning Li via Yonik Seeley)
  • LUCENE-573 : QueryParser now allows backslash escaping in quoted terms and phrases.
    (Michael Busch via Yonik Seeley)
  • LUCENE-716 : QueryParser now allows specification of Unicode characters in terms via a unicode escape of the form \uXXXX
    (Michael Busch via Yonik Seeley)
  • LUCENE-709 : Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes() and IndexWriter.flushRamSegments(), allowing applications to control the amount of memory used to buffer documents.
    (Chuck Williams via Yonik Seeley)
  • LUCENE-723 : QueryParser now parses *:* as MatchAllDocsQuery
    (Yonik Seeley)
  • LUCENE-741 : Command-line utility for modifying or removing norms on fields in an existing index. This is mostly based on LUCENE-496 and lives in contrib/miscellaneous.
    (Chris Hostetter, Otis Gospodnetic)
  • LUCENE-759 : Added NGramTokenizer and EdgeNGramTokenizer classes and their passing unit tests.
    (Otis Gospodnetic)
  • LUCENE-565 : Added methods to IndexWriter to more efficiently handle updating documents (the "delete then add" use case). This is intended to be an eventual replacement for the existing IndexModifier. Added IndexWriter.flush() (renamed from flushRamSegments()) to flush all pending updates (held in RAM), to the Directory.
    (Ning Li via Mike McCandless)
  • LUCENE-762 : Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options which allow one to retrieve the size of a field without retrieving the actual field.
    (Chuck Williams via Grant Ingersoll)
  • LUCENE-799 : Properly handle lazy, compressed fields.
    (Mike Klaas via Grant Ingersoll)
  • API Changes (19)
  • LUCENE-438 : Remove "final" from Token, implement Cloneable, allow changing of termText via setTermText().
    (Yonik Seeley)
  • org.apache.lucene.analysis.nl.WordlistLoader has been deprecated and is supposed to be replaced with the WordlistLoader class in package org.apache.lucene.analysis
    (Daniel Naber)
  • LUCENE-609 : Revert return type of Document.getField(s) to Field for backward compatibility, added new Document.getFieldable(s) for access to new lazy loaded fields.
    (Yonik Seeley)
  • LUCENE-608 : Document.fields() has been deprecated and a new method Document.getFields() has been added that returns a List instead of an Enumeration
    (Daniel Naber)
  • LUCENE-605 : New Explanation.isMatch() method and new ComplexExplanation subclass allows explain methods to produce Explanations which model "matching" independent of having a positive value.
    (Chris Hostetter)
  • LUCENE-621 : New static methods IndexWriter.setDefaultWriteLockTimeout and IndexWriter.setDefaultCommitLockTimeout for overriding default timeout values for all future instances of IndexWriter (as well as for any other classes that may reference the static values, ie: IndexReader).
    (Michael McCandless via Chris Hostetter)
  • LUCENE-638 : FSDirectory.list() now only returns the directory's Lucene-related files. Thanks to this change one can now construct a RAMDirectory from a file system directory that contains files not related to Lucene.
    (Simon Willnauer via Daniel Naber)
  • LUCENE-635 : Decoupling locking implementation from Directory implementation. Added set/getLockFactory to Directory and moved all locking code into subclasses of abstract class LockFactory. FSDirectory and RAMDirectory still default to their prior locking implementations, but now you can mix & match, for example using SingleInstanceLockFactory (ie, in memory locking) locking with an FSDirectory. Note that now you must call setDisableLocks before the instantiation a FSDirectory if you wish to disable locking for that Directory.
    (Michael McCandless, Jeff Patterson via Yonik Seeley)
  • LUCENE-657 : Made FuzzyQuery non-final and inner ScoreTerm protected.
    (Steven Parkes via Otis Gospodnetic)
  • LUCENE-701 : Lockless commits: a commit lock is no longer required when a writer commits and a reader opens the index. This includes a change to the index file format (see docs/fileformats.html for details). It also removes all APIs associated with the commit lock & its timeout. Readers are now truly read-only and do not block one another on startup. This is the first step to getting Lucene to work correctly over NFS (second step is LUCENE-710 ).
    (Mike McCandless)
  • LUCENE-722 : DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ in Similarity's MoreLikeThis class. The misspelling has been replaced by the correct spelling.
    (Andi Vajda via Daniel Naber)
  • LUCENE-738 : Reduce the size of the file that keeps track of which documents are deleted when the number of deleted documents is small. This changes the index file format and cannot be read by previous versions of Lucene.
    (Doron Cohen via Yonik Seeley)
  • LUCENE-756 : Maintain all norms in a single .nrm file to reduce the number of open files and file descriptors for the non-compound index format. This changes the index file format, but maintains the ability to read and update older indices. The first segment merge on an older format index will create a single .nrm file for the new segment.
    (Doron Cohen via Yonik Seeley)
  • LUCENE-732 : DateTools support has been added to QueryParser, with setters for both the default Resolution, and per-field Resolution. For backwards compatibility, DateField is still used if no Resolutions are specified.
    (Michael Busch via Chris Hostetter)
  • Added isOptimized() method to IndexReader.
    (Otis Gospodnetic)
  • LUCENE-773 : Deprecate the FSDirectory.getDirectory(*) methods that take a boolean "create" argument. Instead you should use IndexWriter's "create" argument to create a new index.
    (Mike McCandless)
  • LUCENE-780 : Add a static Directory.copy() method to copy files from one Directory to another.
    (Jiri Kuhn via Mike McCandless)
  • LUCENE-773 : Added Directory.clearLock(String name) to forcefully remove an old lock. The default implementation is to ask the lockFactory (if non null) to clear the lock.
    (Mike McCandless)
  • LUCENE-795 : Directory.renameFile() has been deprecated as it is not used anymore inside Lucene.
    (Daniel Naber)
  • Bug fixes (32)
  • Fixed the web application demo (built with "ant war-demo") which didn't work because it used a QueryParser method that had been removed
    (Daniel Naber)
  • LUCENE-583 : ISOLatin1AccentFilter fails to preserve positionIncrement
    (Yonik Seeley)
  • LUCENE-575 : SpellChecker min score is incorrectly changed by suggestSimilar
    (Karl Wettin via Yonik Seeley)
  • LUCENE-587 : Explanation.toHtml was producing malformed HTML
    (Chris Hostetter)
  • Fix to allow MatchAllDocsQuery to be used with RemoteSearcher
    (Yonik Seeley)
  • LUCENE-601 : RAMDirectory and RAMFile made Serializable
    (Karl Wettin via Otis Gospodnetic)
  • LUCENE-557 : Fixes to BooleanQuery and FilteredQuery so that the score Explanations match up with the real scores.
    (Chris Hostetter)
  • LUCENE-607 : ParallelReader's TermEnum fails to advance properly to new fields
    (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
  • LUCENE-610 , LUCENE-611 : Simple syntax changes to allow compilation with ecj: disambiguate inner class scorer's use of doc() in BooleanScorer2, other test code changes.
    (DM Smith via Yonik Seeley)
  • LUCENE-451 : All core query types now use ComplexExplanations so that boosts of zero don't confuse the BooleanWeight explain method.
    (Chris Hostetter)
  • LUCENE-593 : Fixed LuceneDictionary's inner Iterator
    (Kåre Fiedler Christiansen via Otis Gospodnetic)
  • LUCENE-641 : fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
    (Daniel Naber)
  • LUCENE-659 : Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap() to the correct analyzer for the field.
    (Chuck Williams via Yonik Seeley)
  • LUCENE-650 : Fixed NPE in Locale specific String Sort when Document has no value.
    (Oliver Hutchison via Chris Hostetter)
  • LUCENE-683 : Fixed data corruption when reading lazy loaded fields.
    (Yonik Seeley)
  • LUCENE-678 : Fixed bug in NativeFSLockFactory which caused the same lock to be shared between different directories.
    (Michael McCandless via Yonik Seeley)
  • LUCENE-690 : Fixed thread unsafe use of IndexInput by lazy loaded fields.
    (Yonik Seeley)
  • LUCENE-696 : Fix bug when scorer for DisjunctionMaxQuery has skipTo() called on it before next().
    (Yonik Seeley)
  • LUCENE-569 : Fixed SpanNearQuery bug, for 'inOrder' queries it would fail to recognize ordered spans if they overlapped with unordered spans.
    (Paul Elschot via Chris Hostetter)
  • LUCENE-706 : Updated fileformats.xml|html concerning the docdelta value in the frequency file.
    (Johan Stuyts, Doron Cohen via Grant Ingersoll)
  • LUCENE-715 : Fixed private constructor in IndexWriter.java to properly release the acquired write lock if there is an IOException after acquiring the write lock but before finishing instantiation.
    (Matthew Bogosian via Mike McCandless)
  • LUCENE-651 : Multiple different threads requesting the same FieldCache entry (often for Sorting by a field) at the same time caused multiple generations of that entry, which was detrimental to performance and memory use.
    (Oliver Hutchison via Otis Gospodnetic)
  • LUCENE-717 : Fixed build.xml not to fail when there is no lib dir.
    (Doron Cohen via Otis Gospodnetic)
  • LUCENE-728 : Removed duplicate/old MoreLikeThis and SimilarityQueries classes from contrib/similarity, as their new home is under contrib/queries.
    (Otis Gospodnetic)
  • LUCENE-669 : Do not double-close the RandomAccessFile in FSIndexInput/Output during finalize(). Besides sending an IOException up to the GC, this may also be the cause intermittent "The handle is invalid" IOExceptions on Windows when trying to close readers or writers.
    (Michael Busch via Mike McCandless)
  • LUCENE-702 : Fix IndexWriter.addIndexes(*) to not corrupt the index on any exceptions (eg disk full). The semantics of these methods is now transactional: either all indices are merged or none are. Also fixed IndexWriter.mergeSegments (called outside of addIndexes(*) by addDocument, optimize, flushRamSegments) and IndexReader.commit() (called by close) to clean up and keep the instance state consistent to what's actually in the index
    (Mike McCandless) .
  • LUCENE-129 : Change finalizers to do "try {...} finally {super.finalize();}" to make sure we don't miss finalizers in classes above us.
    (Esmond Pitt via Mike McCandless)
  • LUCENE-754 : Fix a problem introduced by LUCENE-651 , causing IndexReaders to hang around forever, in addition to not fixing the original FieldCache performance problem.
    (Chris Hostetter, Yonik Seeley)
  • LUCENE-140 : Fix IndexReader.deleteDocument(int docNum) to correctly raise ArrayIndexOutOfBoundsException when docNum is too large. Previously, if docNum was only slightly too large (within the same multiple of 8, ie, up to 7 ints beyond maxDoc), no exception would be raised and instead the index would become silently corrupted. The corruption then only appears much later, in mergeSegments, when the corrupted segment is merged with segment(s) after it.
    (Mike McCandless)
  • LUCENE-768 : Fix case where an Exception during deleteDocument, undeleteAll or setNorm in IndexReader could leave the reader in a state where close() fails to release the write lock.
    (Mike McCandless)
  • Remove "tvp" from known index file extensions because it is never used.
    (Nicolas Lalevée via Bernhard Messer)
  • LUCENE-767 : Change how SegmentReader.maxDoc() is computed to not rely on file length check and instead use the SegmentInfo's docCount that's already stored explicitly in the index. This is a defensive bug fix (ie, there is no known problem seen "in real life" due to this, just a possible future problem).
    (Chuck Williams via Mike McCandless)
  • Optimizations (16)
  • LUCENE-586 : TermDocs.skipTo() is now more efficient for multi-segment indexes. This will improve the performance of many types of queries against a non-optimized index.
    (Andrew Hudson via Yonik Seeley)
  • LUCENE-623 : RAMDirectory.close now nulls out its reference to all internal "files", allowing them to be GCed even if references to the RAMDirectory itself still exist.
    (Nadav Har'El via Chris Hostetter)
  • LUCENE-629 : Compressed fields are no longer uncompressed and recompressed during segment merges (e.g. during indexing or optimizing), thus improving performance .
    (Michael Busch via Otis Gospodnetic)
  • LUCENE-388 : Improve indexing performance when maxBufferedDocs is large by keeping a count of buffered documents rather than counting after each document addition.
    (Doron Cohen, Paul Smith, Yonik Seeley)
  • Modified TermScorer.explain to use TermDocs.skipTo() instead of looping through docs.
    (Grant Ingersoll)
  • LUCENE-672 : New indexing segment merge policy flushes all buffered docs to their own segment and delays a merge until mergeFactor segments of a certain level have been accumulated. This increases indexing performance in the presence of deleted docs or partially full segments as well as enabling future optimizations. NOTE: this also fixes an "under-merging" bug whereby it is possible to get far too many segments in your index (which will drastically slow down search, risks exhausting file descriptor limit, etc.). This can happen when the number of buffered docs at close, plus the number of docs in the last non-ram segment is greater than mergeFactor.
    (Ning Li, Yonik Seeley)

  • Lazy loaded fields unnecessarily retained an extra copy of loaded String data.
    (Yonik Seeley)
  • LUCENE-443 : ConjunctionScorer performance increase. Speed up any BooleanQuery with more than one mandatory clause.
    (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
  • LUCENE-365 : DisjunctionSumScorer performance increase of ~30%. Speeds up queries with optional clauses.
    (Paul Elschot via Yonik Seeley)
  • LUCENE-695 : Optimized BufferedIndexInput.readBytes() for medium size buffers, which will speed up merging and retrieving binary and compressed fields.
    (Nadav Har'El via Yonik Seeley)
  • LUCENE-687 : Lazy skipping on proximity file speeds up most queries involving term positions, including phrase queries.
    (Michael Busch via Yonik Seeley)
  • LUCENE-714 : Replaced 2 cases of manual for-loop array copying with calls to System.arraycopy instead, in DocumentWriter.java.
    (Nicolas Lalevee via Mike McCandless)
  • LUCENE-729 : Non-recursive skipTo and next implementation of TermDocs for a MultiReader. The old implementation could recurse up to the number of segments in the index.
    (Yonik Seeley)
  • LUCENE-739 : Improve segment merging performance by reusing the norm array across different fields and doing bulk writes of norms of segments with no deleted docs.
    (Michael Busch via Yonik Seeley)
  • LUCENE-745 : Add BooleanQuery.clauses(), allowing direct access to the List of clauses and replaced the internal synchronized Vector with an unsynchronized List.
    (Yonik Seeley)
  • LUCENE-750 : Remove finalizers from FSIndexOutput and move the FSIndexInput finalizer to the actual file so all clones don't register a new finalizer.
    (Yonik Seeley)
  • Test Cases (3)
  • Added TestTermScorer.java
    (Grant Ingersoll)
  • Added TestWindowsMMap.java
    (Benson Margulies via Mike McCandless)
  • LUCENE-744 Append the user.name property onto the temporary directory that is created so it doesn't interfere with other users.
    (Grant Ingersoll)
  • Documentation (11)
  • Added style sheet to xdocs named lucene.css and included in the Anakia VSL descriptor.
    (Grant Ingersoll)
  • Added scoring.xml document into xdocs. Updated Similarity.java scoring formula.(Grant Ingersoll and Steve Rowe. Updates from: Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting). Issue 664 .
  • Added javadocs for FieldSelectorResult.java.
    (Grant Ingersoll)
  • Moved xdocs directory to src/site/src/documentation/content/xdocs per Issue 707 . Site now builds using Forrest, just like the other Lucene siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite for info on updating the website.
    (Grant Ingersoll with help from Steve Rowe, Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
  • Added in Developer and System Requirements sections under Resources
    (Grant Ingersoll)
  • LUCENE-713 Updated the Term Vector section of File Formats to include documentation on how Offset and Position info are stored in the TVF file.
    (Grant Ingersoll, Samir Abdou)
  • Added in link to Clover Test Code Coverage Reports under the Develop section in Resources
    (Grant Ingersoll)
  • LUCENE-748 : Added details for semantics of IndexWriter.close on hitting an Exception.
    (Jed Wesley-Smith via Mike McCandless)
  • Added some text about what is contained in releases.
    (Eric Haszlakiewicz via Grant Ingersoll)
  • LUCENE-758 : Fix javadoc to clarify that RAMDirectory(Directory) makes a full copy of the starting Directory.
    (Mike McCandless)
  • LUCENE-764 : Fix javadocs to detail temporary space requirements for IndexWriter's optimize(), addIndexes(*) and addDocument(...) methods.
    (Mike McCandless)
  • Build (3)
  • Added in clover test code coverage per LUCENE-721 To enable clover code coverage, you must have clover.jar in the ANT classpath and specify -Drun.clover=true on the command line.
    (Michael Busch and Grant Ingersoll)
  • Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to ${build.dir}/test just like the tempDir sysproperty.
  • LUCENE-757 Added new target named init-dist that does setup for distribution of both binary and source distributions. Called by package and package-*-src
  • All deprecated methods and fields have been removed, except DateField, which will still be supported for some time so Lucene can read its date fields from old indexes
    (Yonik Seeley & Grant Ingersoll)
  • DisjunctionSumScorer is no longer public.
    (Paul Elschot via Otis Gospodnetic)
  • Creating a Field with both an empty name and an empty value now throws an IllegalArgumentException
    (Daniel Naber)
  • LUCENE-301 : Added new IndexWriter({String,File,Directory}, Analyzer) constructors that do not take a boolean "create" argument. These new constructors will create a new index if necessary, else append to the existing one.
    (Dan Armbrust via Mike McCandless)
  • New features (2)
  • LUCENE-496 : Command line tool for modifying the field norms of an existing index; added to contrib/miscellaneous.
    (Chris Hostetter)
  • LUCENE-577 : SweetSpotSimilarity added to contrib/miscellaneous.
    (Chris Hostetter)
  • Bug fixes (16)
  • LUCENE-330 : Fix issue of FilteredQuery not working properly within BooleanQuery.
    (Paul Elschot via Erik Hatcher)
  • LUCENE-515 : Make ConstantScoreRangeQuery and ConstantScoreQuery work with RemoteSearchable.
    (Philippe Laflamme via Yonik Seeley)
  • Added methods to get/set writeLockTimeout and commitLockTimeout in IndexWriter. These could be set in Lucene 1.4 using a system property. This feature had been removed without adding the corresponding getter/setter methods.
    (Daniel Naber)
  • LUCENE-413 : Fixed ArrayIndexOutOfBoundsException exceptions when using SpanQueries.
    (Paul Elschot via Yonik Seeley)
  • Implemented FilterIndexReader.getVersion() and isCurrent()
    (Yonik Seeley)
  • LUCENE-540 : Fixed a bug with IndexWriter.addIndexes(Directory[]) that sometimes caused the index order of documents to change.
    (Yonik Seeley)
  • LUCENE-526 : Fixed a bug in FieldSortedHitQueue that caused subsequent String sorts with different locales to sort identically.
    (Paul Cowan via Yonik Seeley)
  • LUCENE-541 : Add missing extractTerms() to DisjunctionMaxQuery
    (Stefan Will via Yonik Seeley)
  • LUCENE-514 : Added getTermArrays() and extractTerms() to MultiPhraseQuery
    (Eric Jain & Yonik Seeley)
  • LUCENE-512 : Fixed ClassCastException in ParallelReader.getTermFreqVectors
    (frederic via Yonik)
  • LUCENE-352 : Fixed bug in SpanNotQuery that manifested as NullPointerException when "exclude" query was not a SpanTermQuery.
    (Chris Hostetter)
  • LUCENE-572 : Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
    (Chris Hostetter)
  • LUCENE-561 : Fixed some ParallelReader bugs. NullPointerException if the reader didn't know about the field yet, reader didn't keep track if it had deletions, and deleteDocument calls could circumvent synchronization on the subreaders.
    (Chuck Williams via Yonik Seeley)
  • LUCENE-556 : Added empty extractTerms() implementation to MatchAllDocsQuery and ConstantScoreQuery in order to allow their use with a MultiSearcher.
    (Yonik Seeley)
  • LUCENE-546 : Removed 2GB file size limitations for RAMDirectory.
    (Peter Royal, Michael Chan, Yonik Seeley)
  • LUCENE-485 : Don't hold commit lock while removing obsolete index files.
    (Luc Vanlerberghe via cutting)
  • Release 1.9.1 [2006-03-02]

  • Bug fixes (1)
  • LUCENE-511 : Fix a bug in the BufferedIndexOutput optimization introduced in 1.9-final.
    (Shay Banon & Steven Tamm via cutting)
  • Release 1.9 final [2006-02-27]

  • Note that this release is mostly but not 100% source compatible with the previous release of Lucene (1.4.3). In other words, you should make sure your application compiles with this version of Lucene before you replace the old Lucene JAR with the new one. Many methods have been deprecated in anticipation of release 2.0, so deprecation warnings are to be expected when upgrading from 1.4.3 to 1.9.
  • Bug fixes (1)
  • The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative effects on indexing performance and has thus been reverted. The argument for setMaxBufferedDocs(int) must now at least be 2, otherwise an exception is thrown.
    (Daniel Naber)
  • Optimizations (1)
  • Optimized BufferedIndexOutput.writeBytes() to use System.arraycopy() in more cases, rather than copying byte-by-byte.
    (Lukas Zapletal via Cutting)
  • Release 1.9 RC1 [2006-02-21]

  • Requirements (1)
  • To compile and use Lucene you now need Java 1.4 or later.
  • Changes in runtime behavior (9)
  • FuzzyQuery can no longer throw a TooManyClauses exception. If a FuzzyQuery expands to more than BooleanQuery.maxClauseCount terms only the BooleanQuery.maxClauseCount most similar terms go into the rewritten query and thus the exception is avoided.
    (Christoph)
  • Changed system property from "org.apache.lucene.lockdir" to "org.apache.lucene.lockDir", so that its casing follows the existing pattern used in other Lucene system properties.
    (Bernhard)
  • The terms of RangeQueries and FuzzyQueries are now converted to lowercase by default (as it has been the case for PrefixQueries and WildcardQueries before). Use setLowercaseExpandedTerms(false) to disable that behavior but note that this also affects PrefixQueries and WildcardQueries.
    (Daniel Naber)
  • Document frequency that is computed when MultiSearcher is used is now computed correctly and "globally" across subsearchers and indices, while before it used to be computed locally to each index, which caused ranking across multiple indices not to be equivalent.
    (Chuck Williams, Wolf Siberski via Otis, bug #31841 [LUCENE-295] )
  • When opening an IndexWriter with create=true, Lucene now only deletes its own files from the index directory (looking at the file name suffixes to decide if a file belongs to Lucene). The old behavior was to delete all files.
    (Daniel Naber and Bernhard Messer, bug #34695 [LUCENE-385] )
  • The version of an IndexReader, as returned by getCurrentVersion() and getVersion() doesn't start at 0 anymore for new indexes. Instead, it is now initialized by the system time in milliseconds.
    (Bernhard Messer via Daniel Naber)
  • Several default values cannot be set via system properties anymore, as this has been considered inappropriate for a library like Lucene. For most properties there are set/get methods available in IndexWriter which you should use instead. This affects the following properties: See IndexWriter for getter/setter methods: org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout, org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs, org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval, org.apache.lucene.mergeFactor, See BooleanQuery for getter/setter methods: org.apache.lucene.maxClauseCount See FSDirectory for getter/setter methods: disableLuceneLocks
    (Daniel Naber)
  • Fixed FieldCacheImpl to use user-provided IntParser and FloatParser, instead of using Integer and Float classes for parsing.
    (Yonik Seeley via Otis Gospodnetic)
  • Expert level search routines returning TopDocs and TopFieldDocs no longer normalize scores. This also fixes bugs related to MultiSearchers and score sorting/normalization.
    (Luc Vanlerberghe via Yonik Seeley, LUCENE-469 )
  • New features (33)
  • Added support for stored compressed fields ( patch #31149 [LUCENE-274] )
    (Bernhard Messer via Christoph)
  • Added support for binary stored fields ( patch #29370 [LUCENE-229] )
    (Drew Farris and Bernhard Messer via Christoph)
  • Added support for position and offset information in term vectors ( patch #18927 [LUCENE-95] ).
    (Grant Ingersoll & Christoph)
  • A new class DateTools has been added. It allows you to format dates in a readable format adequate for indexing. Unlike the existing DateField class DateTools can cope with dates before 1970 and it forces you to specify the desired date resolution (e.g. month, day, second, ...) which can make RangeQuerys on those fields more efficient.
    (Daniel Naber)
  • QueryParser now correctly works with Analyzers that can return more than one token per position. For example, a query "+fast +car" would be parsed as "+fast +(car automobile)" if the Analyzer returns "car" and "automobile" at the same position whenever it finds "car" ( Patch #23307 [LUCENE-133] ).
    (Pierrick Brihaye, Daniel Naber)
  • Permit unbuffered Directory implementations (e.g., using mmap). InputStream is replaced by the new classes IndexInput and BufferedIndexInput. OutputStream is replaced by the new classes IndexOutput and BufferedIndexOutput. InputStream and OutputStream are now deprecated and FSDirectory is now subclassable.
    (cutting)
  • Add native Directory and TermDocs implementations that work under GCJ. These require GCC 3.4.0 or later and have only been tested on Linux. Use 'ant gcj' to build demo applications.
    (cutting)
  • Add MMapDirectory, which uses nio to mmap input files. This is still somewhat slower than FSDirectory. However it uses less memory per query term, since a new buffer is not allocated per term, which may help applications which use, e.g., wildcard queries. It may also someday be faster.
    (cutting & Paul Elschot)
  • Added javadocs-internal to build.xml - bug #30360 [LUCENE-250]
    (Paul Elschot via Otis)
  • Added RangeFilter, a more generically useful filter than DateFilter.
    (Chris M Hostetter via Erik)
  • Added NumberTools, a utility class indexing numeric fields.
    (adapted from code contributed by Matt Quail; committed by Erik)
  • Added public static IndexReader.main(String[] args) method. IndexReader can now be used directly at command line level to list and optionally extract the individual files from an existing compound index file.
    (adapted from code contributed by Garrett Rooney; committed by Bernhard)
  • Add IndexWriter.setTermIndexInterval() method. See javadocs.
    (Doug Cutting)
  • Added LucenePackage, whose static get() method returns java.util.Package, which lets the caller get the Lucene version information specified in the Lucene Jar.
    (Doug Cutting via Otis)
  • Added Hits.iterator() method and corresponding HitIterator and Hit objects. This provides standard java.util.Iterator iteration over Hits. Each call to the iterator's next() method returns a Hit object.
    (Jeremy Rayner via Erik)
  • Add ParallelReader, an IndexReader that combines separate indexes over different fields into a single virtual index.
    (Doug Cutting)
  • Add IntParser and FloatParser interfaces to FieldCache, so that fields in arbitrarily formats can be cached as ints and floats.
    (Doug Cutting)
  • Added class org.apache.lucene.index.IndexModifier which combines IndexWriter and IndexReader, so you can add and delete documents without worrying about synchronization/locking issues.
    (Daniel Naber)
  • Lucene can now be used inside an unsigned applet, as Lucene's access to system properties will not cause a SecurityException anymore.
    (Jon Schuster via Daniel Naber, bug #34359 [LUCENE-369] )
  • Added a new class MatchAllDocsQuery that matches all documents.
    (John Wang via Daniel Naber, bug #34946 [LUCENE-389] )
  • Added ability to omit norms on a per field basis to decrease index size and memory consumption when there are many indexed fields. See Field.setOmitNorms()
    (Yonik Seeley, LUCENE-448 )
  • Added NullFragmenter to contrib/highlighter, which is useful for highlighting entire documents or fields.
    (Erik Hatcher)
  • Added regular expression queries, RegexQuery and SpanRegexQuery. Note the same term enumeration caveats apply with these queries as apply to WildcardQuery and other term expanding queries. These two new queries are not currently supported via QueryParser.
    (Erik Hatcher)
  • Added ConstantScoreQuery which wraps a filter and produces a score equal to the query boost for every matching document.
    (Yonik Seeley, LUCENE-383 )
  • Added ConstantScoreRangeQuery which produces a constant score for every document in the range. One advantage over a normal RangeQuery is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum number of terms the range can cover. Both endpoints may also be open.
    (Yonik Seeley, LUCENE-383 )
  • Added ability to specify a minimum number of optional clauses that must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
    (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395 )
  • Added DisjunctionMaxQuery which provides the maximum score across its clauses. It's very useful for searching across multiple fields.
    (Chuck Williams via Yonik Seeley, LUCENE-323 )
  • New class ISOLatin1AccentFilter that replaces accented characters in the ISO Latin 1 character set by their unaccented equivalent.
    (Sven Duzont via Erik Hatcher)
  • New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
    (Erik Hatcher)
  • Copied LengthFilter from contrib area to core. Removes words that are too long and too short from the stream.
    (David Spencer via Otis and Daniel)
  • Added getPositionIncrementGap(String fieldName) to Analyzer. This allows custom analyzers to put gaps between Field instances with the same field name, preventing phrase or span queries crossing these boundaries. The default implementation issues a gap of 0, allowing the default token position increment of 1 to put the next field's first token into a successive position.
    (Erik Hatcher, with advice from Yonik)
  • StopFilter can now ignore case when checking for stop words.
    (Grant Ingersoll via Yonik, LUCENE-248 )
  • Add TopDocCollector and TopFieldDocCollector. These simplify the implementation of hit collectors that collect only the top-scoring or top-sorting hits.
  • API Changes (5)
  • Several methods and fields have been deprecated. The API documentation contains information about the recommended replacements. It is planned that most of the deprecated methods and fields will be removed in Lucene 2.0.
    (Daniel Naber)
  • The Russian and the German analyzers have been moved to contrib/analyzers. Also, the WordlistLoader class has been moved one level up in the hierarchy and is now org.apache.lucene.analysis.WordlistLoader
    (Daniel Naber)
  • The API contained methods that declared to throw an IOException but that never did this. These declarations have been removed. If your code tries to catch these exceptions you might need to remove those catch clauses to avoid compile errors.
    (Daniel Naber)
  • Add a serializable Parameter Class to standardize parameter enum classes in BooleanClause and Field.
    (Christoph)
  • Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys. This allows custom SpanQuery subclasses that rewrite (for term expansion, for example) to nest within the built-in SpanQuery classes successfully.
  • Bug fixes (24)
  • The JSP demo page (src/jsp/results.jsp) now properly closes the IndexSearcher it opens.
    (Daniel Naber)
  • Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that prevented deletion of obsolete segments.
    (Christoph Goller)
  • Fix in FieldInfos to avoid the return of an extra blank field in IndexReader.getFieldNames() ( Patch #19058 [LUCENE-102] ).
    (Mark Harwood via Bernhard)
  • Some combinations of BooleanQuery and MultiPhraseQuery (formerly PhrasePrefixQuery) could provoke UnsupportedOperationException ( bug #33161 [LUCENE-337] ).
    (Rhett Sutphin via Daniel Naber)
  • Small bug in skipTo of ConjunctionScorer that caused NullPointerException if skipTo() was called without prior call to next() fixed.
    (Christoph)
  • Disable Similiarty.coord() in the scoring of most automatically generated boolean queries. The coord() score factor is appropriate when clauses are independently specified by a user, but is usually not appropriate when clauses are generated automatically, e.g., by a fuzzy, wildcard or range query. Matches on such automatically generated queries are no longer penalized for not matching all terms.
    (Doug Cutting, Patch #33472 [LUCENE-346] )
  • Getting a lock file with Lock.obtain(long) was supposed to wait for a given amount of milliseconds, but this didn't work.
    (John Wang via Daniel Naber, Bug #33799 [LUCENE-353] )
  • Fix FSDirectory.createOutput() to always create new files. Previously, existing files were overwritten, and an index could be corrupted when the old version of a file was longer than the new. Now any existing file is first removed.
    (Doug Cutting)
  • Fix BooleanQuery containing nested SpanTermQuery's, which previously could return an incorrect number of hits.
    (Reece Wilton via Erik Hatcher, Bug #35157 [LUCENE-393] )
  • Fix NullPointerException that could occur with a MultiPhraseQuery inside a BooleanQuery.
    (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626 [LUCENE-404] )
  • Fixed SnowballFilter to pass through the position increment from the original token.
    (Yonik Seeley via Erik Hatcher, LUCENE-437 )
  • Added Unicode range of Korean characters to StandardTokenizer, grouping contiguous characters into a token rather than one token per character. This change also changes the token type to "<CJ>" for Chinese and Japanese character tokens (previously it was "<CJK>").
    (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461 )
  • FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and FieldInfo.storePositionWithTermVector and creates the Field with correct TermVector parameter.
    (Frank Steinmann via Bernhard, LUCENE-455 )
  • Fixed WildcardQuery to prevent "cat" matching "ca??".
    (Xiaozheng Ma via Bernhard, LUCENE-306 )
  • Fixed a bug where MultiSearcher and ParallelMultiSearcher could change the sort order when sorting by string for documents without a value for the sort field.
    (Luc Vanlerberghe via Yonik, LUCENE-453 )
  • Fixed a sorting problem with MultiSearchers that can lead to missing or duplicate docs due to equal docs sorting in an arbitrary order.
    (Yonik Seeley, LUCENE-456 )
  • A single hit using the expert level sorted search methods resulted in the score not being normalized.
    (Yonik Seeley, LUCENE-462 )
  • Fixed inefficient memory usage when loading an index into RAMDirectory.
    (Volodymyr Bychkoviak via Bernhard, LUCENE-475 )
  • Corrected term offsets returned by ChineseTokenizer.
    (Ray Tsang via Erik Hatcher, LUCENE-324 )
  • Fixed MultiReader.undeleteAll() to correctly update numDocs.
    (Robert Kirchgessner via Doug Cutting, LUCENE-479 )
  • Race condition in IndexReader.getCurrentVersion() and isCurrent() fixed by acquiring the commit lock.
    (Luc Vanlerberghe via Yonik Seeley, LUCENE-481 )
  • IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect, this has now been fixed.
    (Daniel Naber)
  • Fixed QueryParser when called with a date in local form like "[1/16/2000 TO 1/18/2000]". This query did not include the documents of 1/18/2000, i.e. the last day was not included.
    (Daniel Naber)
  • Removed sorting constraint that threw an exception if there were not yet any values for the sort field
    (Yonik Seeley, LUCENE-374 )
  • Optimizations (11)
  • Disk usage (peak requirements during indexing and optimization) in case of compound file format has been improved.
    (Bernhard, Dmitry, and Christoph)
  • Optimize the performance of certain uses of BooleanScorer, TermScorer and IndexSearcher. In particular, a BooleanQuery composed of TermQuery, with not all terms required, that returns a TopDocs (e.g., through a Hits with no Sort specified) runs much faster.
    (cutting)
  • Removed synchronization from reading of term vectors with an IndexReader ( Patch #30736 [LUCENE-265] ).
    (Bernhard Messer via Christoph)
  • Optimize term-dictionary lookup to allocate far fewer terms when scanning for the matching term. This speeds searches involving low-frequency terms, where the cost of dictionary lookup can be significant.
    (cutting)
  • Optimize fuzzy queries so the standard fuzzy queries with a prefix of 0 now run 20-50% faster ( Patch #31882 [LUCENE-296] ).
    (Jonathan Hager via Daniel Naber)
  • A Version of BooleanScorer (BooleanScorer2) added that delivers documents in increasing order and implements skipTo. For queries with required or forbidden clauses it may be faster than the old BooleanScorer, for BooleanQueries consisting only of optional clauses it is probably slower. The new BooleanScorer is now the default.
    ( Patch 31785 [LUCENE-294] by Paul Elschot via Christoph)
  • Use uncached access to norms when merging to reduce RAM usage. ( Bug #32847 [LUCENE-326] ).
    (Doug Cutting)
  • Don't read term index when random-access is not required. This reduces time to open IndexReaders and they use less memory when random access is not required, e.g., when merging segments. The term index is now read into memory lazily at the first random-access.
    (Doug Cutting)
  • Optimize IndexWriter.addIndexes(Directory[]) when the number of added indexes is larger than mergeFactor. Previously this could result in quadratic performance. Now performance is n log(n).
    (Doug Cutting)
  • Speed up the creation of TermEnum for indices with multiple segments and deleted documents, and thus speed up PrefixQuery, RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter, and sorting the first time on a field.
    (Yonik Seeley, LUCENE-454 )
  • Optimized and generalized 32 bit floating point to byte (custom 8 bit floating point) conversions. Increased the speed of Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
    (Yonik Seeley, LUCENE-467 )
  • Infrastructure (2)
  • Lucene's source code repository has converted from CVS to Subversion. The new repository is at http://svn.apache.org/repos/asf/lucene/java/trunk
  • Lucene's issue tracker has migrated from Bugzilla to JIRA. Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE The old issues are still available at http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx (use the bug number instead of xxxx)

    Release 1.4.3 [2004-12-07]

  • The JSP demo page (src/jsp/results.jsp) now properly escapes error messages which might contain user input (e.g. error messages about query parsing). If you used that page as a starting point for your own code please make sure your code also properly escapes HTML characters from user input in order to avoid so-called cross site scripting attacks.
    (Daniel Naber)
  • QueryParser changes in 1.4.2 broke the QueryParser API. Now the old API is supported again.
    (Christoph)
  • Release 1.4.2 [2004-10-01]

  • Fixed bug #31241 [LUCENE-277] : Sorting could lead to incorrect results (documents missing, others duplicated) if the sort keys were not unique and there were more than 100 matches.
    (Daniel Naber)
  • Memory leak in Sort code ( bug #31240 [LUCENE-276] ) eliminated.
    (Rafal Krzewski via Christoph and Daniel)
  • FuzzyQuery now takes an additional parameter that specifies the minimum similarity that is required for a term to match the query. The QueryParser syntax for this is term~x, where x is a floating point number >= 0 and < 1 (a bigger number means that a higher similarity is required). Furthermore, a prefix can be specified for FuzzyQuerys so that only those terms are considered similar that start with this prefix. This can speed up FuzzyQuery greatly.
    (Daniel Naber, Christoph Goller)
  • PhraseQuery and PhrasePrefixQuery now allow the explicit specification of relative positions.
    (Christoph Goller)
  • QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions ( patch #9110 [LUCENE-35] ); some unused method parameters removed; The ability to specify a minimum similarity for FuzzyQuery has been added.
    (Christoph Goller)
  • IndexSearcher optimization: a new ScoreDoc is no longer allocated for every non-zero-scoring hit. This makes 'OR' queries that contain common terms substantially faster.
    (cutting)
  • Release 1.4.1 [2004-08-02]

  • Fixed a performance bug in hit sorting code, where values were not correctly cached.
    (Aviran via cutting)
  • Fixed errors in file format documentation.
    (Daniel Naber)
  • Release 1.4 final [2004-07-01]

  • Added "an" to the list of stop words in StopAnalyzer, to complement the existing "a" there. Fix for bug 28960 [LUCENE-132] (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960).
    (Otis)
  • Added new class FieldCache to manage in-memory caches of field term values.
    (Tim Jones)
  • Added overloaded getFieldQuery method to QueryParser which accepts the slop factor specified for the phrase (or the default phrase slop for the QueryParser instance). This allows overriding methods to replace a PhraseQuery with a SpanNearQuery instead, keeping the proper slop factor.
    (Erik Hatcher)
  • Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to UTF-8 and changed the build encoding to UTF-8, to make changed files compile.
    (Otis Gospodnetic)
  • Removed synchronization from term lookup under IndexReader methods termFreq(), termDocs() or termPositions() to improve multi-threaded performance.
    (cutting)
  • Fix a bug where obsolete segment files were not deleted on Win32.

    Release 1.4 RC3 [2004-05-11]

  • Fixed several search bugs introduced by the skipTo() changes in release 1.4RC1. The index file format was changed a bit, so collections must be re-indexed to take advantage of the skipTo() optimizations.
    (Christoph Goller)
  • Added new Document methods, removeField() and removeFields().
    (Christoph Goller)
  • Fixed inconsistencies with index closing. Indexes and directories are now only closed automatically by Lucene when Lucene opened them automatically.
    (Christoph Goller)
  • Added new class: FilteredQuery.
    (Tim Jones)
  • Added a new SortField type for custom comparators.
    (Tim Jones)
  • Lock obtain timed out message now displays the full path to the lock file.
    (Daniel Naber via Erik)
  • Fixed a bug in SpanNearQuery when ordered.
    (Paul Elschot via cutting)
  • Fixed so that FSDirectory's locks still work when the java.io.tmpdir system property is null.
    (cutting)
  • Changed FilteredTermEnum's constructor to take no parameters, as the parameters were ignored anyway
    ( bug #28858 [LUCENE-224] )
  • Release 1.4 RC2 [2004-03-30]

  • GermanAnalyzer now throws an exception if the stopword file cannot be found ( bug #27987 [LUCENE-203] ). It now uses LowerCaseFilter ( bug #18410 [LUCENE-87] )
    (Daniel Naber via Otis, Erik)
  • Fixed a few bugs in the file format documentation.
    (cutting)
  • Release 1.4 RC1 [2004-03-29]

  • Changed the format of the .tis file, so that: it has a format version number, which makes it easier to back-compatibly change file formats in the future. the term count is now stored as a long. This was the one aspect of the Lucene's file formats which limited index size. a few internal index parameters are now stored in the index, so that they can (in theory) now be changed from index to index, although there is not yet an API to do so. These changes are back compatible. The new code can read old indexes. But old code will not be able read new indexes.
    (cutting)

  • Added an optimized implementation of TermDocs.skipTo(). A skip table is now stored for each term in the .frq file. This only adds a percent or two to overall index size, but can substantially speedup many searches.
    (cutting)
  • Restructured the Scorer API and all Scorer implementations to take advantage of an optimized TermDocs.skipTo() implementation. In particular, PhraseQuerys and conjunctive BooleanQuerys are faster when one clause has substantially fewer matches than the others. (A conjunctive BooleanQuery is a BooleanQuery where all clauses are required.)
    (cutting)
  • Added new class ParallelMultiSearcher. Combined with RemoteSearchable this makes it easy to implement distributed search systems.
    (Jean-Francois Halleux via cutting)
  • Added support for hit sorting. Results may now be sorted by any indexed field. For details see the javadoc for Searcher#search(Query, Sort).
    (Tim Jones via Cutting)
  • Changed FSDirectory to auto-create a full directory tree that it needs by using mkdirs() instead of mkdir().
    (Mladen Turk via Otis)
  • Added a new span-based query API. This implements, among other things, nested phrases. See javadocs for details.
    (Doug Cutting)
  • Added new method Query.getSimilarity(Searcher), and changed scorers to use it. This permits one to subclass a Query class so that it can specify its own Similarity implementation, perhaps one that delegates through that of the Searcher.
    (Julien Nioche via Cutting)
  • Added MultiReader, an IndexReader that combines multiple other IndexReaders.
    (Cutting)
  • Added support for term vectors. See Field#isTermVectorStored().
    (Grant Ingersoll, Cutting & Dmitry)
  • Fixed the old bug with escaping of special characters in query strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
    (Jean-Francois Halleux via Otis)
  • Added support for overriding default values for the following, using system properties: default commit lock timeout default maxFieldLength default maxMergeDocs default mergeFactor default minMergeDocs default write lock timeout (Otis)
  • Changed QueryParser.jj to allow '-' and '+' within tokens: http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
    (Morus Walter via Otis)
  • Changed so that the compound index format is used by default. This makes indexing a bit slower, but vastly reduces the chances of file handle problems.
    (Cutting)
  • Release 1.3 final [2003-12-26]

  • Added catch of BooleanQuery$TooManyClauses in QueryParser to throw ParseException instead.
    (Erik Hatcher)
  • Fixed a NullPointerException in Query.explain().
    (Doug Cutting)
  • Added a new method IndexReader.setNorm(), that permits one to alter the boosting of fields after an index is created.
  • Distinguish between the final position and length when indexing a field. The length is now defined as the total number of tokens, instead of the final position, as it was previously. Length is used for score normalization (Similarity.lengthNorm()) and for controlling memory usage (IndexWriter.maxFieldLength). In both of these cases, the total number of tokens is a better value to use than the final token position. Position is used in phrase searching (see PhraseQuery and Token.setPositionIncrement()).
  • Fix StandardTokenizer's handling of CJK characters (Chinese, Japanese and Korean ideograms). Previously contiguous sequences were combined in a single token, which is not very useful. Now each ideogram generates a separate token, which is more useful.

    Release 1.3 RC3 [2003-11-25]

  • Added minMergeDocs in IndexWriter. This can be raised to speed indexing without altering the number of files, but only using more memory.
    (Julien Nioche via Otis)
  • Fix bug #24786 [LUCENE-162] , in query rewriting.
    (bschneeman via Cutting)
  • Fix bug #16952 [LUCENE-85] , in demo HTML parser, skip comments in javascript.
    (Christoph Goller)
  • Fix bug #19253 [LUCENE-105] , in demo HTML parser, add whitespace as needed to output
    (Daniel Naber via Christoph Goller)
  • Fix bug #24301 [LUCENE-159] , in demo HTML parser, long titles no longer hang things.
    (Christoph Goller)
  • Fix bug #23534 [LUCENE-138] , Replace use of file timestamp of segments file with an index version number stored in the segments file. This resolves problems when running on file systems with low-resolution timestamps, e.g., HFS under MacOS X.
    (Christoph Goller)
  • Fix QueryParser so that TokenMgrError is not thrown, only ParseException.
    (Erik Hatcher)
  • Fix some bugs introduced by change 11 of RC2.
    (Christoph Goller)
  • Fixed a problem compiling TestRussianStem.
    (Christoph Goller)
  • Cleaned up some build stuff.
    (Erik Hatcher)
  • Release 1.3 RC2 [2003-10-22]

  • Added getFieldNames(boolean) to IndexReader, SegmentReader, and SegmentsReader.
    (Julien Nioche via otis)
  • Changed file locking to place lock files in System.getProperty("java.io.tmpdir"), where all users are permitted to write files. This way folks can open and correctly lock indexes which are read-only to them.
  • IndexWriter: added a new method, addDocument(Document, Analyzer), permitting one to easily use different analyzers for different documents in the same index.
  • Minor enhancements to FuzzyTermEnum.
    (Christoph Goller via Otis)
  • PriorityQueue: added insert(Object) method and adjusted IndexSearcher and MultiIndexSearcher to use it.
    (Christoph Goller via Otis)
  • Fixed a bug in IndexWriter that returned incorrect docCount().
    (Christoph Goller via Otis)
  • Fixed SegmentsReader to eliminate the confusing and slightly different behaviour of TermEnum when dealing with an enumeration of all terms, versus an enumeration starting from a specific term. This patch also fixes incorrect term document frequencies when the same term is present in multiple segments.
    (Christoph Goller via Otis)
  • Added CachingWrapperFilter and PerFieldAnalyzerWrapper.
    (Erik Hatcher)
  • Added support for the new "compound file" index format
    (Dmitry Serebrennikov)
  • Added Locale setting to QueryParser, for use by date range parsing.
  • Changed IndexReader so that it can be subclassed by classes outside of its package. Previously it had package-private abstract methods. Also modified the index merging code so that it can work on an arbitrary IndexReader implementation, and added a new method, IndexWriter.addIndexes(IndexReader[]), to take advantage of this.
    (cutting)
  • Added a limit to the number of clauses which may be added to a BooleanQuery. The default limit is 1024 clauses. This should stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy queries which run amok.
    (cutting)
  • Add new method: IndexReader.undeleteAll(). This undeletes all deleted documents which still remain in the index.
    (cutting)
  • Release 1.3 RC1 [2003-03-24]

  • Fixed PriorityQueue's clear() method. Fix for bug 9454 [LUCENE-37] , http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
    (Matthijs Bomhoff via otis)
  • Changed StandardTokenizer.jj grammar for EMAIL tokens. Fix for bug 9015 [LUCENE-34] , http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
    (Dale Anson via otis)
  • Added the ability to disable lock creation by using disableLuceneLocks system property. This is useful for read-only media, such as CD-ROMs.
    (otis)
  • Added id method to Hits to be able to access the index global id. Required for sorting options.
    (carlson)
  • Added support for new range query syntax to QueryParser.jj.
    (briangoetz)
  • Added the ability to retrieve HTML documents' META tag values to HTMLParser.jj.
    (Mark Harwood via otis)
  • Modified QueryParser to make it possible to programmatically specify the default Boolean operator (OR or AND).
    (Péter Halácsy via otis)
  • Made many search methods and classes non-final, per requests. This includes IndexWriter and IndexSearcher, among others.
    (cutting)
  • Added class RemoteSearchable, providing support for remote searching via RMI. The test class RemoteSearchableTest.java provides an example of how this can be used.
    (cutting)
  • Added PhrasePrefixQuery (and supporting MultipleTermPositions). The test class TestPhrasePrefixQuery provides the usage example.
    (Anders Nielsen via otis)
  • Changed the German stemming algorithm to ignore case while stripping. The new algorithm is faster and produces more equal stems from nouns and verbs derived from the same word.
    (gschwarz)
  • Added support for boosting the score of documents and fields via the new methods Document.setBoost(float) and Field.setBoost(float). Note: This changes the encoding of an indexed value. Indexes should be re-created from scratch in order for search scores to be correct. With the new code and an old index, searches will yield very large scores for shorter fields, and very small scores for longer fields. Once the index is re-created, scores will be as before.
    (cutting)

  • Added new method Token.setPositionIncrement(). This permits, for the purpose of phrase searching, placing multiple terms in a single position. This is useful with stemmers that produce multiple possible stems for a word. This also permits the introduction of gaps between terms, so that terms which are adjacent in a token stream will not be matched by and exact phrase query. This makes it possible, e.g., to build an analyzer where phrases are not matched over stop words which have been removed. Finally, repeating a token with an increment of zero can also be used to boost scores of matches on that token.
    (cutting)

  • Added new Filter class, QueryFilter. This constrains search results to only match those which also match a provided query. Results are cached, so that searches after the first on the same index using this filter are very fast. This could be used, for example, with a RangeQuery on a formatted date field to implement date filtering. One could re-use a single QueryFilter that matches, e.g., only documents modified within the last week. The QueryFilter and RangeQuery would only need to be reconstructed once per day.
    (cutting)

  • Added a new IndexWriter method, getAnalyzer(). This returns the analyzer used when adding documents to this index.
    (cutting)
  • Fixed a bug with IndexReader.lastModified(). Before, document deletion did not update this. Now it does.
    (cutting)
  • Added Russian Analyzer.
    (Boris Okner via otis)
  • Added a public, extensible scoring API. For details, see the javadoc for org.apache.lucene.search.Similarity.
  • Fixed return of Hits.id() from float to int.
    (Terry Steichen via Peter) .
  • Added getFieldNames() to IndexReader and Segment(s)Reader classes.
    (Peter Mularien via otis)
  • Added getFields(String) and getValues(String) methods. Contributed by Rasik Pandey on 2002-10-09
    (Rasik Pandey via otis)
  • Revised internal search APIs. Changes include: a. Queries are no longer modified during a search. This makes it possible, e.g., to reuse the same query instance with multiple indexes from multiple threads. b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery, etc.) now work correctly with MultiSearcher, fixing bugs 12619 [LUCENE-56] and 12667 [LUCENE-57] . c. Boosting BooleanQuery's now works, and is supported by the query parser (problem reported by Lee Mallabone). Thus a query like "(+foo +bar)^2 +baz" is now supported and equivalent to "(+foo^2 +bar^2) +baz". d. New method: Query.rewrite(IndexReader). This permits a query to re-write itself as an alternate, more primitive query. Most of the term-expanding query classes (PrefixQuery, WildcardQuery, etc.) are now implemented using this method. e. New method: Searchable.explain(Query q, int doc). This returns an Explanation instance that describes how a particular document is scored against a query. An explanation can be displayed as either plain text, with the toString() method, or as HTML, with the toHtml() method. Note that computing an explanation is as expensive as executing the query over the entire index. This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. f. Scorer and Weight are public, not package protected. It now possible for someone to write a Scorer implementation that is not in the org.apache.lucene.search package. This is still fairly advanced programming, and I don't expect anyone to do this anytime soon, but at least now it is possible. g. Added public accessors to the primitive query classes (TermQuery, PhraseQuery and BooleanQuery), permitting access to their terms and clauses. Caution: These are extensive changes and they have not yet been tested extensively. Bug reports are appreciated.
    (cutting)

  • Added convenience RAMDirectory constructors taking File and String arguments, for easy FSDirectory to RAMDirectory conversion.
    (otis)
  • Added code for manual renaming of files in FSDirectory, since it has been reported that java.io.File's renameTo(File) method sometimes fails on Windows JVMs.
    (Matt Tucker via otis)
  • Refactored QueryParser to make it easier for people to extend it. Added the ability to automatically lower-case Wildcard terms in the QueryParser.
    (Tatu Saloranta via otis)
  • Release 1.2 RC6 [2002-06-13]

  • Changed QueryParser.jj to have "?" be a special character which allowed it to be used as a wildcard term. Updated TestWildcard unit test also.
    (Ralf Hettesheimer via carlson)
  • Release 1.2 RC5 [2002-05-14]

  • Renamed build.properties to default.properties and updated the BUILD.txt document to describe how to override the default.property settings without having to edit the file. This brings the build process closer to Scarab's build process.
    (jon)
  • Added MultiFieldQueryParser class.
    (Kelvin Tan, via otis)
  • Updated "powered by" links.
    (otis)
  • Fixed instruction for setting up JavaCC - Bug #7017 [LUCENE-18]
    (otis)
  • Added throwing exception if FSDirectory could not create directory Bug #6914 [LUCENE-16]
    (Eugene Gluzberg via otis)
  • Update MultiSearcher, MultiFieldParse, Constants, DateFilter, LowerCaseTokenizer javadoc
    (otis)
  • Added fix to avoid NullPointerException in results.jsp
    (Mark Hayes via otis)
  • Changed Wildcard search to find 0 or more char instead of 1 or more
    (Lee Mallobone, via otis)
  • Fixed error in offset issue in GermanStemFilter - Bug #7412 [LUCENE-23]
    (Rodrigo Reyes, via otis)
  • Added unit tests for wildcard search and DateFilter
    (otis)
  • Allow co-existence of indexed and non-indexed fields with the same name
    (cutting/casper, via otis)
  • Add escape character to query parser.
    (briangoetz)
  • Applied a patch that ensures that searches that use DateFilter don't throw an exception when no matches are found.
    (David Smiley, via otis)
  • Fixed bugs in DateFilter and wildcardquery unit tests.
    (cutting, otis, carlson)
  • Release 1.2 RC4 [2002-02-14]

  • Updated contributions section of website. Add XML Document #3 implementation to Document Section. Also added Term Highlighting to Misc Section.
    (carlson)
  • Fixed NullPointerException for phrase searches containing unindexed terms, introduced in 1.2RC3.
    (cutting)
  • Changed document deletion code to obtain the index write lock, enforcing the fact that document addition and deletion cannot be performed concurrently.
    (cutting)
  • Various documentation cleanups.
    (otis, acoliver)
  • Updated "powered by" links.
    (cutting, jon)
  • Fixed a bug in the GermanStemmer.
    (Bernhard Messer, via otis)
  • Changed Term and Query to implement Serializable.
    (scottganyo)
  • Fixed to never delete indexes added with IndexWriter.addIndexes().
    (cutting)
  • Upgraded to JUnit 3.7.
    (otis)
  • Release 1.2 RC3 [2002-01-27]

  • IndexWriter: fixed a bug where adding an optimized index to an empty index failed. This was encountered using addIndexes to copy a RAMDirectory index to an FSDirectory.
  • RAMDirectory: fixed a bug where RAMInputStream could not read across more than across a single buffer boundary.
  • Fix query parser so it accepts queries with unicode characters.
    (briangoetz)
  • Fix query parser so that PrefixQuery is used in preference to WildcardQuery when there's only an asterisk at the end of the term. Previously PrefixQuery would never be used.
  • Fix tests so they compile; fix ant file so it compiles tests properly. Added test cases for Analyzers and PriorityQueue.
  • Updated demos, added Getting Started documentation.
    (acoliver)
  • Added 'contributions' section to website & docs.
    (carlson)
  • Removed JavaCC from source distribution for copyright reasons. Folks must now download this separately from metamata in order to compile Lucene.
    (cutting)
  • Substantially improved the performance of DateFilter by adding the ability to reuse TermDocs objects.
    (cutting)
  • Added IndexReader methods: public static boolean indexExists(String directory); public static boolean indexExists(File directory); public static boolean indexExists(Directory directory); public static boolean isLocked(Directory directory); public static void unlock(Directory directory);
    (cutting, otis)
  • Fixed bugs in GermanAnalyzer
    (gschwarz)
  • Release 1.2 RC2 [2001-10-19]

  • added sources to distribution
  • removed broken build scripts and libraries from distribution
  • SegmentsReader: fixed potential race condition
  • FSDirectory: fixed so that getDirectory(xxx,true) correctly erases the directory contents, even when the directory has already been accessed in this JVM.
  • RangeQuery: Fix issue where an inclusive range query would include the nearest term in the index above a non-existant specified upper term.
  • SegmentTermEnum: Fix NullPointerException in clone() method when the Term is null.
  • JDK 1.1 compatibility fix: disabled lock files for JDK 1.1, since they rely on a feature added in JDK 1.2.

    Release 1.2 RC1 [2001-10-02]

  • first Apache release
  • packages renamed from com.lucene to org.apache.lucene
  • license switched from LGPL to Apache
  • ant-only build -- no more makefiles
  • addition of lock files--now fully thread & process safe
  • addition of German stemmer
  • MultiSearcher now supports low-level search API
  • added RangeQuery, for term-range searching
  • Analyzers can choose tokenizer based on field name
  • misc bug fixes.

    Release 1.01b [2001-06-02]

  • last Sourceforge release
  • a few bug fixes
  • new Query Parser
  • new prefix query (search for "foo*" matches "food")

    Release 1.0 [2000-10-04]

  • This release fixes a few serious bugs and also includes some performance optimizations, a stemmer, and a few other minor enhancements.

    Release 0.04 [2000-04-19]

  • Lucene now includes a grammar-based tokenizer, StandardTokenizer.
  • The only tokenizer included in the previous release (LetterTokenizer) identified terms consisting entirely of alphabetic characters. The new tokenizer uses a regular-expression grammar to identify more complex classes of terms, including numbers, acronyms, email addresses, etc.
  • StandardTokenizer serves two purposes:
  • 1. It is a much better, general purpose tokenizer for use by applications as is.
  • The easiest way for applications to start using StandardTokenizer is to use StandardAnalyzer.
  • 2. It provides a good example of grammar-based tokenization.
  • If an application has special tokenization requirements, it can implement a custom tokenizer by copying the directory containing the new tokenizer into the application and modifying it accordingly.

    Release 0.01 [2000-03-30]

  • First open source release.
  • The code has been re-organized into a new package and directory structure for this release. It builds OK, but has not been tested
  •