Answer a question

It is known that when the content of a document is updated or deleted in elasticsearch, the segment is not immediately deleted, but newly created.

And after that, we know that segments are merged through a schedule.

I know that the reason it works like this is because it is expensive.

But I don't know the exact reason why segments are immutable and don't merge immediately.

Even if I search the document, the exact reason cannot be found, but if anyone knows about this, please comment.

thank you.

Answers

Having a segment immutable provides a lot of benefits, such as

  1. It can be easily used in a multi-threaded environment, as content is not changeable, you don't have to worry about the shared state and race-conditions and a lot of complexity when you have mutable contents.
  2. It can be cached effectively as caching fast changing dataset will defeat the purpose of caching.

Refer below content from official ES docs on why lucene segments are cache friendly

Lucene is designed to leverage the underlying OS for caching in-memory data structures. Lucene segments are stored in individual files. Because segments are immutable, these files never change. This makes them very cache friendly, and the underlying OS will happily keep hot segments resident in memory for faster access. These segments include both the inverted index (for fulltext search) and doc values (for aggregations).

Also refer benefits of immutable data in general for more details.

Logo

欢迎大家访问Elastic 中国社区。由Elastic 资深布道师,Elastic 认证工程师,认证分析师,认证可观测性工程师运营管理。

更多推荐