DynamoDB is something you absolutely have to know to pass the exam. I’m sure you would have verified this by searching various forums before signing up for this guide.
So you’ll know that you need to nail it.
Now this is too large a subject for me to include all the answers you will need. What’s far more important is for me to help you to help yourself. With this in mind, I’ll do the following:
- I’ll be putting out the overarching, or ‘meta’ points that I’ve observed from doing the exam and using AWS on the job.
- I’ll provide you with links to the AWS documentation and tutorials that will help you to learn.
- I’ll provide resources like question sheets to speed up your search for important knowledge.
In the end though, I can’t upload my knowledge into your brain, so you need to do the hard yards yourself.
Hands on Tutorials
|Whizlabs DynamoDB Deep Dive||Paid (Affiliate) but comprehensive.|
|SDK Learning Tests Sample||You’ll need to augment tests but it’s nicely structured.|
|AWS Console setup and CLI Learning||Free, but structure of learning up to you.|
Before we start, a quick bit of transparency:
Disclaimer: I have linked to Whizlabs in a couple of these resources – after using their services and passing my exam as a result, I set up affiliation with them. If you use my links to buy, it doesn’t cost you any extra but I do get a little commision which helps me keep the site going. I will always suggest free alternatives where I know of them.. but you get what you pay for in the end.
With that out of the way, let’s get started.
DynamoDB Meta Concepts
On their intro page AWS themselves define DynamoDB as:
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don’t have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling. DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data.
If you’d prefer there’s a video:
Regardless of which source you prefer, there’s quite a few ‘meta’-concepts for us to be aware of here.
- Managed services are a way of offloading admin burden – if we get questions on ‘what’s the easiest way to’ – it’s good to know about managed services.
- Seamless scalability is mentioned – perhaps we should be aware of the fact that traditional Relational Databases need tuning to get the best out them? In other words the difference between Horizontal and Vertical Scaling.
- Encryption at rest – another concept we might want to know about – why is that important?
I’m going to encourage you to come up with your own answers as ever. But I’ll set you on the right tracks.
Big Picture of DynamoDB
Let’s start with the following concepts to set the scene:
- DynamoDB is defined as a Key-Document DB, that’s optimised for performance and scalability
- It doesn’t need us to manage or administer it – indeed the point about managed services is that you can’t administer in the same way.
- Queries can be sped up using in memory caching using Dynamo Accelerator (DAX)
- Tables are scoped to a region BUT they can be replicated to appear as though they are global
- You can work with ‘Provisioned’ mode -whereby you pay a fixed cost, but you accept that at some point your access might get throttled.
- ‘On Demand’ mode scales to meet the demand – but you’ll pay for whatever whatever your users incur.
- You can secure data with Encryption at REST, and also least privilege IAM roles (you did your homework last week, right?)
- DB streams can be used to react to DB update events in real time
- We use partition keys to search for data, which in turn make use of hash functions.
That’s a lot of high level stuff to get our head around, but then DynamoDB is a large subject.
I’ll point you in the right direction for your own studies.
Your own study
As ever, I’ve come up with a list of questions that you might find it handy to think about BEFORE reading the docs:
- Questions about the Overview of DynamoDB can be found here
- Questions about the Core Components of DynamoDB can be found here
- Questions about DynamoDB best practice can be found here
Of course these are also useful after you’ve read the docs yourself, as a reminder for what might be important.
Dynamo DB Core Concepts
We want to to start with understanding the core differences between relational and NoSQL solutions. We also need to understand the Core Components of Amazon DynamoDB – leading us to the following:
Why NoSQL matters
- With a relational (SQL) solution,you design in terms of normalisation – the queries you write for your app fit around that fact.
- In SQL you either tune a query that’s constantly used (i.e. by an application), or live with the fact that ad hoc queries can be slow (the price you pay for generality).
- In DynamoDB, you need to know your design upfront, and everything is geared around efficiency and query speed – normalisation of data isn’t even a third class-citizen!
- You tend to up a DB performance by throwing more CPU and RAM at it (i.e vertical scaling) – in the case of DynamoDB it’s architected to improve performance horizontally (by adding more nodes) .
- Items and rows are completely different concepts too. There’s no constraint that 2 rows for a query need to include the same columns.
- Primary Keys are a concept you need to understand. A partition key is when there is a one to-one relationship between the primary key of a table and a single attribute.
- Partition and sort keys are like the composite primary keys you’d have seen in a relational database – the combination of the primary key and another attribute(s) form a unique entry, but they don’t quite work the same way.
- Secondary Indexes are a way querying the data in a table on something other than the primary key.
- If you don’t index something, you can’t search by it – this might throw people used to being able to specify terms for any column in a table.
Let’s move into understanding how to work with data, and capacity.
Working with DynamoDB Data
We’ll start with understanding the read and write capacity units – because it’s a fundamental concept to get.
When we used provisioned throughput, we have to plan for what we’re prepared to pay for, and by extension how much capacity we’re prepared to allocate our users before the experencine a degradation in performance (Throttling).
The exam will challenge on cost-effective measures, and more importantly, you need to be on understand this for your job!
Read and Write Capacity
Having done the exam, I can say with confidence that you will definitely get questions in the exam about read and write capacity. So you need to understand how read and write capacity units work.
I got a question similar to the following:
An application is being developed that is going to write data to a DynamoDB table. You have to setup the read and write throughput for the table. Data is going to be read at the rate of 300 items every 30 seconds. Each item is of size 6KB. The reads can be eventually consistent reads. What should be the read capacity that needs to be set on the table?
We need to understand the following points to learn how to apply things generally:
- You first need to convert to items read per second, so (300/30) = 10 items read every second.
- Each item is 6KB in size, and a read capacity unit is 4KB, so we have to use 2 reads for each item.
- In this case, the question specifies ‘eventual consistency’, which means that we get 2 RCUs for our money.
- So we have a total of (2 reads required *10 items) / 2 “reads for our money” (because it’s a consistent read)
- This gives 20 / 2 , meaning we need to specify a read capacity of 10.
Take your time to get your head around this. There’s documentation here to explain it, but it came up the practice papers I did, and in the exam too.
I only covered read capacity here. Write capacity is very similar, with different limits. You can find out more from the official docs.
In terms of passing the exam, you’re going to want to think about Best Practices – since you’re being tested on those things. However there’s a lot there. I’ve created another questions list to help you filter through what I think is important to understand. Again, if you have time, knowing more best practices than just my questions list indicates is encouraged.
Using the AWS SDK with DynamoDB
If you’re interested in the SDK and CLI operations, I would say go ahead, as it will assist you in the long run doing your job. However bear in mind it’s easy to dive deeper than the level actually required by the exam.
I’ll write a more in depth article another time, on how to maximise SDK learning . For now, if you are interested in using the SDK for a “Learning Tests”, here’s how I used code to make assertions against the S3 SDK. You could do the same by extending the tests in this GitHub repo.
Using the AWS CLI with Dynamo
The CLI however, has come up in the exam for me, and any self-respecting developer should be proficient with the command line. In this case, you could do worse than setup in the console, then use the CLI to manipulate the data.
See if you can answer some of the following questions with the CLI, and come up with some of your own:
How do you use the CLI to restrict which items are returned, like specifying columns in an SQL result set? What happens if you don’t use one?
How do you use projection expressions?
How can you work with local and global indexes using the CLI?
Tutorials and Resources
- There is a bare minimum table tutorial here if you want to just understand how to get started setting up tables etc in AWS – then build with the CLI.
- Building on that – there’s an AWS repo with DynamoDB SDK code you could look at in GitHub – but there are very few unit tests – perhaps you could include your own with “learning tests“?
- There is also a deep dive here from Whizlabs which will take you through all aspects of DynamoDB and is helpful as all the resources are laid out in one place for you to get mastery of the subject, and their content will also have one eye on passing the exam.
Finally we’ll move into DynamoDB Streams, Dynamo Acceleration (DAX) and Encryption at rest.
DynamoDB Misc Concepts
These only came up at a high level in the exam for me, but you should be across the how and why of these. Get an overview of streams here but you don’t need to understand inside out.
Another concept that only came up at a high level, however using when I did the practice papers, I got tripped up until I understood the architecture.
DynamoDB Encryption at Rest
Remember what I said earlier about Managed Services? Something to be aware of generally (a meta-concept) across AWS is the idea of Managed Services. AWS take away the ability for you to deal control instances directly/explicitly, in exchange for you not having to have a deal with operational burdens.
Compare an EC2 instance you can SSH onto and play with something concrete, as opposed to like S3 where you have no notion of any actual file system.
Some solutions also offer various degrees of control. I’d encourage you to read Encryption at Rest and just gain an understanding at a high level of the different ‘levels of control’ that can apply to encrypting your data.
Why is this important? Because sometimes in the exam you’ll be asked about ‘what is the simplest way to..?’ and other times you’ll get a question like ‘as a developer you want to control...’ so you need to be aware that there are managed services and some services (like KMS) offer various degrees of control.
There is a lot to cover with DynamoDB, as you can see. If you want to get best coverage of the entire technology, and get in depth, then you could do worse than use the Whizlabs tutorial.
If you’re just after exam pointers, hopefully the question sheets I’ve provided are of use to you. I have a study plan offer where you can get all the study plan flashcard sets for free when you buy your exam papers through my Whizlabs link.
Remember that my question sets are generated around ‘meta’ principles. In other words – the questions will fall into categories like ‘Best Practice’, Cost-Saving, Troubleshooting, Scalability and so on. Other times they will just be necessary ‘exam trivia’.
I hope you found this week useful. Please feel free to give feedback as ever. Next week we’ll be looking at Loose Coupling, Streams and Message Queueing.