Core DynamoDB Concepts and Table Design
DynamoDB organizes data into tables containing items (rows) made up of attributes (columns). Unlike relational databases, DynamoDB is schema-less, meaning only the primary key requires definition upfront.
Understanding Primary Keys
The primary key uniquely identifies each item. It consists of either a partition key alone or a partition key combined with a sort key. The partition key determines which partition stores the item and is critical for even data distribution.
A sort key allows you to query a range of items within a partition. In a user profile application, you might use UserId as the partition key and CreatedDate as the sort key. This enables efficient queries for all profiles created by a user within specific date ranges.
Denormalization vs. Normalization
Table design in DynamoDB differs fundamentally from relational databases. You must anticipate access patterns upfront instead of designing flexible schemas. DynamoDB often favors denormalization to minimize query operations, whereas SQL databases normalize data across multiple tables.
The AWS Developer exam frequently tests your ability to optimize table structure for specific application requirements. Understanding these design patterns is crucial.
Extending Query Flexibility with Indexes
Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs) extend your querying flexibility. They allow alternative partition and sort key combinations beyond your table's primary key. However, they consume additional read and write capacity units and incur extra costs.
Choose indexes carefully based on your actual access patterns to balance query efficiency against cost.
Read and Write Capacity Units Explained
DynamoDB measures throughput using capacity units. A write capacity unit (WCU) represents one write per second for an item up to 1 KB. A read capacity unit (RCU) represents one strongly consistent read per second for an item up to 4 KB.
If your item exceeds these sizes, additional capacity units are consumed proportionally. Writing an 8 KB item consumes 8 WCUs. Reading an 8 KB item with strong consistency consumes 2 RCUs.
Provisioned vs. On-Demand Capacity
DynamoDB offers two billing modes:
- Provisioned capacity: You specify RCUs and WCUs upfront, paying a fixed hourly rate whether used or not. This suits predictable workloads and consistent traffic patterns.
- On-demand capacity: Automatically scales based on actual consumption. You pay per million read and write units, making it ideal for unpredictable or bursty workloads.
The AWS Developer exam tests your understanding of when to use each mode and how to optimize costs for different scenarios.
Consistency Models and RCU Impact
Understand eventually consistent versus strongly consistent reads. Strongly consistent reads reflect all successful writes before the read, consuming one RCU per 4 KB. Eventually consistent reads might reflect older data but consume only half the RCUs.
Most applications use eventually consistent reads to reduce costs. Reserve strongly consistent reads for scenarios requiring the absolute latest data.
Query Operations, Filtering, and Scan Behavior
DynamoDB supports several operations for retrieving data, each with distinct performance and cost implications. Choose the right operation to avoid wasting capacity.
Efficient Data Retrieval Operations
GetItem retrieves a single item using the complete primary key. This is the most efficient operation.
Query finds all items sharing the same partition key and optionally filters by sort key conditions. This makes it ideal for accessing related data efficiently. Querying a Users table by UserId and filtering by CreatedDate range is efficient if those are your partition and sort keys.
Scan reads every item in a table or index. It consumes significant capacity for large tables and should be minimized in production. Scanning the entire Users table to find users by email address is inefficient because email is not part of the primary key.
Query Syntax and Filtering
Query results return in sort key order and support pagination through the Limit parameter and LastEvaluatedKey for continuing from previous results.
The QueryExpression syntax uses a KeyConditionExpression to specify partition and sort key conditions. Add a FilterExpression to refine results after the query executes.
A critical distinction: FilterExpression reduces results after DynamoDB retrieves items. It still consumes capacity for filtered-out items, making it less efficient than using KeyConditionExpression alone.
Batch Operations for Multiple Items
BatchGetItem and BatchWriteItem allow you to work with multiple items in a single request. This reduces network latency and improves throughput.
The AWS Developer exam emphasizes understanding these operational differences. Choosing the right operation indicates good table design. Inefficient queries reveal poor table design choices.
Indexes, Global Secondary Indexes, and Access Patterns
DynamoDB indexes enable querying data using alternative key structures beyond your table's primary key. Indexes are essential for supporting multiple access patterns.
Global vs. Local Secondary Indexes
Global Secondary Indexes (GSIs) can have any partition key and sort key, independent of the table's primary key. They are distributed across all partitions.
Local Secondary Indexes (LSIs) share the table's partition key but use a different sort key. They remain within a single partition and are limited to 10 GB per partition key value.
GSIs are more flexible and recommended for most use cases. LSIs are useful when you need consistent ordering with low latency for small datasets.
Designing Indexes for Multiple Access Patterns
Consider an e-commerce application with a Products table using ProductId as the partition key. To query products by category or by price, create GSIs with Category or Price as partition keys respectively.
Each index maintains its own RCU and WCU capacity in provisioned mode. This increases costs but enables efficient queries across multiple access patterns.
Sparse indexes contain only items where the index key attribute exists. They optimize storage and queries when not all items require index entries.
Projection Strategies
Projection defines which attributes are included in an index:
- Keys_only: Includes only key attributes. Minimizes space but requires additional queries to fetch other attributes.
- Include: Lets you specify which attributes to store in the index. Balances space and query efficiency.
- All: Includes all attributes but consumes maximum storage.
The exam tests your ability to design indexes matching application query requirements and understanding trade-offs between query efficiency, storage costs, and maintenance overhead.
Consistency Models, Transactions, and DynamoDB Streams
DynamoDB supports multiple consistency models and operational patterns for complex data scenarios. Understanding these is essential for building reliable applications.
Consistency Models and Their Trade-Offs
Eventual consistency offers lower latency and uses fewer RCUs but may return stale data if a read follows immediately after a write.
Strong consistency guarantees data reflects all successful prior writes. However, it consumes double the RCUs and has higher latency.
Most applications use eventual consistency by default. Use strong consistency for specific operations like financial transactions requiring up-to-date information.
Multi-Item Transactions
For multi-item transactions across multiple items or tables, DynamoDB provides TransactWriteItems and TransactGetItems operations.
TransactWriteItems atomically writes to multiple items, ensuring all succeed or all fail. Maximum 100 items per transaction.
TransactGetItems atomically retrieves multiple items with strong consistency.
These operations consume additional capacity and have size limitations. They ensure data integrity for complex operations where partial success is unacceptable.
Real-Time Data Changes with Streams
DynamoDB Streams capture item-level modifications in real-time. Integration with Lambda functions, Kinesis, or other services enables use cases like updating search indexes, sending notifications, or aggregating analytics.
Each stream record includes the item's new image, old image, and keys.
Stream specifications define what information DynamoDB captures:
- NEW_IMAGE: Only the updated item
- OLD_IMAGE: The previous state
- NEW_AND_OLD_IMAGES: Both versions
- KEYS_ONLY: Only key attributes
The AWS Developer exam includes questions about transaction guarantees, consistency choices affecting application behavior, and Stream integration patterns for real-time data changes.
