Horizontal partitioning, where different data are stored on different nodes.
Sharding is not the same as partitioning. Partition is always part of a single database instance. Sharding happens on multiple database instances.
Strategies
Three objectives of sharding are
- uniform data distribution
- balanced workload (read and write)
- respect physical location These usually contradict each other and preferences change over time.
Mapping structure
Shard of an aggregate is stored in a structure that needs to be maintained. Usually some centralized index structures.
General rules
Shard is determined by rule, which should be faster. Doesn’t require to be stored.
Hash sharding
Compute hash from a set of columns. Cons - adding new node requires recalculating all hashes (solution lies in Consistent hashing.
Range based sharding
Each shard hold a range of values. e.g.
- 1-20, 21-40, 41-60, …
- A-C, D-F, G-I, …
Directory-based sharding
Dedicated server stores information about each shard. Upon requesting data, node asks this server which shard it lies in.
Entity/relationship-based sharding
You create a network and create shard based on strong components
- your friends, messages and posts
Geograhy-based sharding
Closer = faster, although it separates users from different locations.
Functional sharding
Data are separated based on some logic, their purpose (orders, users, messages, …)
Time-based sharding
Each shard contains data for a specific time (for example, a day, week, month, or year).
Combined sharding
Example A banking system uses geographic sharding to distribute data by region, and within each region, range sharding is applied based on account numbers.