what are bw trees, simplified summary from microsoft research paper

Azure DocumentDB utilizes Bw trees for indexing documents and that too in a schema-agnostic manner. While Azure DocumentDB has its own white paper on Schema-Agnostic Indexing in Azure DocumentDB, as a first step wanted to understand Bw tree and how they impact performance of DocumentDB.

REFERENCE: A Bw – tree for New Hardware Platforms

  • SUMMARY: Bw tree can be summarized as B+ tree with mapping table that virtualizes both location of size of physical page. This mapping table looks similar to Page Table structure used by Operating Systems to map virtual memory to a physical memory (RAM). This decoupling enables latch free data and log structured storage making Bw tree highly performant.
    • In multi core server system, concurrency of thread execution is key for higher throughput. But with higher concurrency system level synchronization mechanisms like latches (that protect physical consistency of data in memory) tend to become blocking factor thus impacting scalability and throughput. When thread wait for an object, processor preempts that thread to be scheduled for later that increasing context switch again.
    • Additionally in multi core system, caches are shared across multiple logical processors. Updating data in place causes “Cache invalidation” that may require CPU cycles to fetch fresh data into cache. This results in context switch of threads and processors spend more time doing context switch than performing actual work.

Bw trees that enable a latch free access to data structure and performs delta (only data that changed) updates to minimize context switches and thus increase scalability and throughput.

Bw – Tree Architecture:


A Bw Tree layers include

  • Bw Tree Layer that is on top of Cache Layer and provides access methods (search / update) for underlying B+ Tree.
  • Cache Layer: When operations occur at Bw Tree layer instead of performing actions directly on physical pages, mapping table in cache layer is used. Mapping table is a map between logical pages (PIDs) and Physical Pages (either on SSD / Memory).
  • Flash Layer or Storage Layer implements Log Structured Storage enabling “Delta Updates”

Update: 10th Feb 2017 Let us know dive & understand deeper into each of these individual component.  We are at Section II Bw – Tree Architecture of Microsoft Research paper afore mentioned…

Mapping table as we understand is key to Bw Index which sits atop a physical B-Tree structure.

Caching Layer sits between Bw Tree Layer and Storage Layer. It maintains a Mapping Table. Mapping table maps logical pages to physical pages. Logical pages are identified by unique Page identifiers (PID) and each PID maps to a Physical Page. Physical Pages may be stored either In Memory or persisted on durable storage like SSDs / HDD. If physical page in SSD, mapping table holds offset of physical page and if physical page in Memory, address pointers are stored in mapping table. Thus objective of mapping table is to decouple or loosely couple logical pages in Bw layer to physical pages (irrespective of storage) . With this decouple approach, for Bw Tree upper layers it does not matter where physical page is as they access only logical pages (structures) that are stored in mapping table.

In addition to decoupling links between logical to physical pages, mapping table also holds inter node links either at different levels (child / parent) or same level (sibling). If nodes are linked at logical layer, there is no requirement to maintain inter node links at physical level. For example if we need to traverse a B Tree to seek a particular record, node traversal can be done at logical layer and on extracting leaf node record in mapping table, refer its physical location and extract data from storage. Additionally if any modifications to Tree structures (due to CRUD – R) operations, physical inter node links are not required to be maintained, only logical layer links need to be maintained.