Remove Duplicates
Evan Paulus
Removing duplicate entries from lists in Excel used to be a hassle, but with the introduction of the 'Remove Duplicates' feature in Excel 2007 and later, it is very simple.
Let's say you wanted to get a handle on your expenses so you went to your bank's website and exported your debits (expenses) for the last six months. Below shows the first seven records, although the list is likely to be several hundred lines long.
A | B | C | D | |
1 | Date | Store | Item | Cost (dollars) |
---|---|---|---|---|
2 | Monday | McDonald's | Lunch | 8.54 |
3 | Tuesday | Rah Sushi | Lunch | 23.45 |
4 | Tuesday | Target | Clothing | 13.21 |
5 | Wednesday | Kmart | Clothing | 45.41 |
6 | Wednesday | McDonald's | Lunch | 7.45 |
7 | Thursday | Circle K | Gas | 43.98 |
8 | Friday | McDonald's | Dinner | 12.32 |
There are numerous tools in Excel to help you analyze your data (Filter, PivotTables, Charts, etc.) but we're going to show you a simple technique to analyze how much you spent on each Item.
First, select the Item column (column C).
Next, copy it and paste it into column A of a new tab or workbook (let's call this tab 'Items' and the first tab 'Expense List').
With column A (the Items column) of the Items tab selected, go to the Data tab and select Remove Duplicates (note: depends on the version of Excel you're using, although most recent versions follow this path closely).
The Excel 2013 version looks like below:
Next, you will get the Remove Duplicates option screen, which asks you which columns you want to look for duplicates on. In this case, only one column is listed. If you had fed it a larger table with multiple columns, you could have checked or unchecked depending on which fields you wanted to search for exact duplicates. Also, your column has a column header ('Item') so make sure this option is checked.
And, finally, your new list looks like this:
A | |
1 | Item |
---|---|
2 | Lunch |
3 | Clothing |
4 | Gas |
5 | Dinner |
The sky is the limit as far as what you can do from here. We've described in other posts how to do simple SUMIF or AVERAGEIF functions, so let's do one of each of them here. We'll also add in a calculation on the median cost of each item, which uses an array function.
A | B | C | D | |
1 | Item | Total Cost | Average Cost | Median Cost |
---|---|---|---|---|
2 | Lunch | 39.44 | 13.15 | 8.54 |
3 | Clothing | 58.62 | 29.31 | 29.31 |
4 | Gas | 43.98 | 43.98 | 43.98 |
5 | Dinner | 12.32 | 12.32 | 12.32 |
In case you're interested, here are the formulas for the three columns:
B2:=SUMIF('Expense List'!$C$2:$C$8,$A2,'Expense List'!$D$2:$D$8)
C2:=AVERAGEIF('Expense List'!$C$2:$C$8,$A2,'Expense List'!$D$2:$D$8)
D2:{=MEDIAN(IF('Expense List'!$C$2:$C$8=$A2,'Expense List'!$D$2:$D$8))}
In summary, Remove Duplicates can be extremely useful when you're trying to summarize lots of data or present it in a cleaner table. Sure you could do a Pivot Table for much of this but sometimes it's just easier (and in many ways, cleaner) to write your own summary table this way.