Humans Can Code Too: Idempotency
Humans Can Code Too is a content series aimed at anyone who is not coming from a traditional computer science background and is trying to gain a better understanding of software development. This includes those who are just getting starting in a programming career, as well as business people who just want to understand what the hell developers are talking about.
It’s a scary term
Idempotency is a very intimidating word to have someone drop on you in a Pull Request or code review. It sounds super sophisticated and scientific, mathematical even.
The word “idempotency” does come from higher math. However, what a software developer means when they use the word is nothing super complicated.
I’ll give it my own definition
Idempotency – n
When a piece of code can be run multiple times with the same input data without causing undesirable consequences.Joe Cannatti — ME
In practice, this has to do with what’s going to happen if we find ourselves in a situation where we don’t know if a unit of work was run or not. If the code is idempotent, we can simply run it again. If it’s the second (or 10000th) time it’s been run, that’s fine. It doesn’t cause any problems.
This most often comes up in message-passing based systems and background jobs because those systems involve queues of work. Those queues can result in running code multiple times for a bunch of different reasons. That said, it can be an important feature in almost any situation.
At a conceptual level, here’s an example of a non-idempotent way to think about a unit of work followed by an idempotent version.
Pay $100 to the user whose ID is 123not idempotent
Payout the money for transaction with ID 456idempotent! Great job
So, what’s the difference between those two?
In the first case, there’s no way to know if the job is a duplicate or not, so if we run that job 3 times, we’re going to send them $300. Businesses usually don’t want to give away extra money
In the second case, on the first run of the code it can say, “Yep, we haven’t sent that money yet, I’ll do it now”, and then on every subsequent run it can say, “Hey, I already did that, I’ll just do nothing”. So you can run that job 1 to infinite times and get the same result.
Example: Pre-calculating Review Count
def increment_review_count(user:) user.review_count += 1 user.save! end
That’s not Idempotent! If you run that code 100 times you’re going to increase the review code by 100!
Let’s try a safer example
def calculate_review_count(user:) user.review_count = user.reviews.count user.save! end
Ah, sweet relief. That one is way better. If you run that 100 times you’re still going to get the right result. The drawback in this example is that it’s also going to be much slower since it needs to read the reviews from the database.
def payout(user:, amount:) Xtripe.send_payment(id: user.xtripe_id, amount: amount) end
As we discussed earlier, this is no-good because there’s no way for this code to make sure this money hasn’t already been sent. We can use a transaction record to make this idempotent.
def payout(transaction:) if transaction.should_be_payed? Xtripe.send_payment(id: transaction.user.xtripe_id, amount: transaction.amount) transaction.mark_as_paid! end end
transaction.should_be_payed? makes all the difference. That method can check if the transaction has already been paid out, as well as check for other conditions.
Example: Backup to S3 (cloud based file system)
def backup_to_s3 User.all.each do |user| S3::File.write(name: user.id, content: user.to_json) end end
The problem with this one is that if your job dies part way through, there’s no way to know where we left off without checking in S3.
One way to fix it is to record a timestamp
def backup_to_s3 User.where('last_backed_up < ?', 1.week.ago).each do |user| S3::File.write(name: user.id, content: user.to_json) user.last_backed_up = Time.now user.save! end end
We could run this job over and over again and it’s never going to do anything unexpected! That’s what idempotency is all about!
Leave a Reply