Sharding

Horizontal partitioning, where different data are stored on different nodes.

Sharding is not the same as partitioning. Partition is always part of a single database instance. Sharding happens on multiple database instances.

Strategies

Three objectives of sharding are

uniform data distribution
balanced workload (read and write)
respect physical location These usually contradict each other and preferences change over time.

Mapping structure

Shard of an aggregate is stored in a structure that needs to be maintained. Usually some centralized index structures.

General rules

Shard is determined by rule, which should be faster. Doesn’t require to be stored.

Hash sharding

Compute hash from a set of columns. Cons - adding new node requires recalculating all hashes (solution lies in Consistent hashing.

Range based sharding

Each shard hold a range of values. e.g.

1-20, 21-40, 41-60, …
A-C, D-F, G-I, …

Directory-based sharding

Dedicated server stores information about each shard. Upon requesting data, node asks this server which shard it lies in.

Entity/relationship-based sharding

You create a network and create shard based on strong components

your friends, messages and posts

Geograhy-based sharding

Closer = faster, although it separates users from different locations.

Functional sharding

Data are separated based on some logic, their purpose (orders, users, messages, …)

Time-based sharding

Each shard contains data for a specific time (for example, a day, week, month, or year).

Combined sharding

Example A banking system uses geographic sharding to distribute data by region, and within each region, range sharding is applied based on account numbers.

Vojtěch Tóth

Explorer