Improved overall apache lucene searching performance

12/30/2023

There are currently some patches in JIRA for Lucene that implement ML algorithms. Ingersoll also described a new project named Mahout which he is in the process of launching: That will be a separate project, but may be beneficial to Lucene users. The Lucene community as a whole was also discussed, with Ingersoll indicating that Lucene and Solr have a strong integration, and that Nutch, Tika and Hadoop also enjoyed a fair amount of intercommunication. The 3.0 version will be a major release which will involve moving the codebase to JDK 5 as the minimum supported codebase - the other major features of 3.0 are yet to be determined. The 2.9 release will be a relatively minor, with items being marked as deprecated and other clean-up being performed in preparation for Lucene 3.0.

improved overall apache lucene searching performance

Ingersoll also discussed the future plans for Lucene, saying that the next release would be 2.9. A comprehensive changelog is also available. In addition, 2.3 is intended to be a drop-in replacement for 2.2, with no recompilation required. Easier IndexWriter tuning - The setMaxBufferedDocs method has been supplanted by the more intuitive setRAMBufferSizeMB method.IndexReader reopening - Reopening an IndexReader to capture the latest changes in an index is now much faster with the new reopen() method, which loads in only those index segments which have changed rather than reloading the entire index.Object pooling - Document, Field and Token instances can now be reused during indexing analysis, which both speeds up analysis and reduces the number of allocations during indexing.Improved index management - long pauses which were occasionally seen during indexing due to merging of internal index files have been eliminated, and other approaches to managing the indexing process are now easy to implement.According to Ingersoll, simply switching the existing Lucene 2.2 JAR for a Lucene 2.3 JAR resulted in speed-ups of 500% in indexing performance in several tests which were performed. Ingersoll indicated that the largest change in this release is a new indexing algorithm, which uses new in-memory models to achieve large speed improvements. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.

The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today.

0 Comments

I'm James. This is my year of travel.

Improved overall apache lucene searching performance

Leave a Reply.

Author

Archives

Categories