Two sheets, what’s different?
I recently ended up checking two CSV files for differences, and even though in this particular instance the difference was obvious, the general problem interested me so I did some digging.
I exported a CSV file from komiser that lists all of the untagged AWS resources in the Tapadoo account. Looking at the inventory view in komiser I noticed the total number of resources grew each day. This confused me as we weren’t adding new resources each day. I assumed the issue was AWS running some nightly tasks, and this turned out to be the case. Each day AWS creates a new snapshot of our databases.
Let’s pretend …
… the files are too large to scan by eye. How would we find the differences? Well if you’re
well versed in the terminal you can use diff:
$ diff oct_17.csv oct_16.csv
309,310d308
< arn:aws:rds:eu-west-1:912214327102:snapshot:rds:dunkettle-wp-2024-10-17-00-08,AWS,Tapadoo,rds:dunkettle-wp-2024-10-17-00-08,RDS Snapshot,eu-west-1,[]," 0","{""serviceCost"":""0""}"
< arn:aws:rds:eu-west-1:912214327102:snapshot:rds:dunkettle-staging-2024-10-17-04-55,AWS,Tapadoo,rds:dunkettle-staging-2024-10-17-04-55,RDS Snapshot,eu-west-1,[]," 0","{""serviceCost"":""0""}"
That’s a nice and easy way to see the differences. The two lines 309 and 310 are not in oct_16.csv.
Terminal, schmerminal
I hear you, I hear you. Some of you don’t like using the terminal, it’s not part of your daily toolbox. Well buckle in, because I have a more complicated way to share with you.
We start by importing both CSV files into a Google Spreadsheet. Sheet 1 is Oct 16th, and Sheet 2 is Oct 17th. The column names you use will be determined by the size of your sheet. In this case we open Sheet 2 and in row 1 column J we add the following formula:
=ARRAYFORMULA(VLOOKUP(A:A, Sheet1!A:A, 1))
This does a vertical lookup in this sheets range A:A (all of column A), and compares the values in both.
If a match is found copy the value from index 1 (first cell) into column J.
Now if the values match you will see the value from column A repeated in column J, but if they don’t match
the value is set to #N/A a special Sheets value and if you hover over the cell you will see an error message.

This is noisy, can it be improved?
This is noisy, and we could improve it by not showing the values that match, and instead highlighting the row that doesn’t match.
To do this, I entered the following formula in row 1 column K:
=ARRAYFORMULA(IF(ISNA(J:J), "Different", ""))
This formula sets the value of H to Different if the value in J for that row is #N/A. Nifty! Now we can
hide the J column and to see what items differ we only need to look at the K column. In the case of simple
data this might not make much difference, but in the komiser case the ids can be very long strings which makes
scanning more difficult.
Yeah I read all that but I’m still scratching my head
I can’t guarantee your head scratching will be cured, but you can check out this video where I put what’s written above into action. The video also includes some bonus cell formatting using a custom formula.