In traditional monolithic systems, accessing data through a single database is straightforward. With SQL table joins, developers can efficiently retrieve related data across multiple tables in one query. However, as applications move toward a microservices architecture with separate databases or schemas per service, data access for read operations becomes more complex. Services now face the challenge of accessing data they don’t own without violating the principle of bounded contexts.
This article explores four common data access patterns that enable services to read data outside their database boundaries:
- Interservice Communication Pattern
- Column Schema Replication Pattern
- Replicated Cache Pattern
- Data Domain Pattern
Each pattern comes with its own advantages, disadvantages, and trade-offs, and selecting the right one requires careful consideration of performance, scalability, and service independence. Let’s examine each pattern in detail.
1. Interservice Communication Pattern
The Interservice Communication pattern is the most straightforward way for a service to retrieve data from another service. When a service, say, the Wishlist Service needs data (like item descriptions) from another service, such as the Catalog Service, it simply makes a remote call to retrieve that data.
Example: Wishlist Service and Catalog Service
Consider the example of a Wishlist Service and a Catalog Service:
The Wishlist Service stores a list of items a customer may wish to purchase, containing fields like customer_id
, item_id
, and date_added
.
The Catalog Service holds details about each product the company sells, with fields such as item_id
, item_description
, and dimensions (weight, height, length).
When a customer requests to view their wishlist, the Wishlist Service needs to provide both the item_id
and item_description
. However, the Wishlist Service doesn’t store item_description
because it is managed by the Catalog Service. By applying the Interservice Communication pattern, the Wishlist Service makes a call to the Catalog Service, passing the item IDs and retrieving the corresponding descriptions.
Challenges of the Interservice Communication Pattern
While simple, this pattern introduces several challenges:
- Network Latency: Every request to fetch an item description requires a network call, adding between 30ms to 300ms, depending on network conditions.
- Security Latency: If the endpoint has stringent security requirements, authorization steps can add 20ms to 400ms per request.
- Data Latency: Instead of joining tables within a single database, the Catalog Service might need additional calls to retrieve item details, which would add 10ms to 50ms of processing time.
This cumulative latency can result in delays of up to one second or more per request, negatively impacting user experience.
- Service Coupling: The Wishlist Service becomes dependent on the availability of the Catalog Service. If the Catalog Service is down, the Wishlist Service cannot retrieve item descriptions, making it effectively unavailable to users. Additionally, as the Wishlist Service scales, the Catalog Service must also scale to handle the increased demand.
2. Column Schema Replication Pattern
In the Column Schema Replication Pattern, the Wishlist Service replicates specific columns (like item_description
) from the Catalog Service database schema into its own database schema. This way, the Wishlist Service can serve requests directly from its database without making repeated calls to the Catalog Service.
Pros and Cons
- Pros: Reduced dependency on Catalog Service availability, faster response times as there’s no network or security latency.
- Cons: Data duplication can lead to inconsistencies, especially if the Catalog Service frequently updates
item_description
. Synchronizing updates across services can also be complex.
3. Replicated Cache Pattern
The Replicated Cache Pattern utilizes a distributed caching solution like Redis to store frequently requested data in memory. In this approach, when the Wishlist Service requests an item description, it first checks the cache. If the data exists in the cache, it retrieves it directly, avoiding a call to the Catalog Service.
Advantages and Disadvantages
- Advantages: Significantly improves performance due to the speed of in-memory data retrieval. Reduces dependency on Catalog Service availability.
- Disadvantages: Cache consistency must be maintained. Cached data may become stale if there are frequent updates in the Catalog Service. Cache invalidation strategies are crucial for ensuring accurate data.
4. Data Domain Pattern
The Data Domain Pattern involves defining clear data ownership boundaries and distributing data across different domains. In this pattern, the Wishlist Service and Catalog Service would each maintain their own specific set of data, but they could share certain types of data that are common to both services.
Trade-Offs
- Pros: Decentralized control of data allows each service to operate independently, reducing coupling and allowing for greater flexibility.
- Cons: Implementing the Data Domain Pattern often requires substantial architectural planning to define data ownership boundaries effectively, and managing data synchronization can be complex.
Choosing the Right Data Access Pattern
Choosing the appropriate pattern depends on various factors, including:
- Frequency of Data Access: If data is frequently accessed, caching or schema replication might be preferable to avoid the latency of remote calls.
- Data Consistency Requirements: For data that changes frequently, Interservice Communication may be the best choice to always retrieve the latest data.
- Service Independence Needs: If high independence is crucial, the Data Domain pattern reduces service coupling.
Each pattern offers trade-offs, and a hybrid approach may even be warranted, where a combination of these patterns is used to strike the right balance between performance, scalability, and service independence.
Conclusion
In microservices, managing data access outside a service’s bounded context requires careful consideration of various architectural patterns. The Interservice Communication, Column Schema Replication, Replicated Cache, and Data Domain patterns each offer solutions with unique trade-offs. By analyzing application requirements and expected system behaviours, architects can select the pattern that best aligns with the service’s goals, ensuring optimized data access in a distributed, scalable system.