Finding Duplicate MP3’s in your Mac OSX iTunes Library
I recently ran into a problem in my ongoing process of switching to Mac OSX from Windows. I decided that I wanted to jump fully into using iTunes as my central music repository. In my manually organized MP3 folders, I had MP3’s in folders by Artist then Album along with a playlist (M3U) file for each Album folder. This is a carry-over from using WinAmp in Windows. To play a whole album in correct order, I found it easy to just have an M3U file in each directory. Anyway, when importing my whole collection into iTunes under Mac OSX Leopard, it created duplicates of all of the files that had an associated playlist/M3U file. Ugh.
Correcting this issue is either a manual process (with some help from the View/Show Duplicates option in iTunes), or something that you can get shareware programs to deal with. The manual process wasn’t an option for me, as I have quite a number of MP3’s, and now quite a number of duplicates. I’m not sure why Apple hasn’t created a little better routine to deal with this issue, or an option to ignore M3U files on importing. iTunes is free, and for this reason, I disagree with the idea of paying for the solution. I also had an issue with the amount of money some of the shareware authors were asking.
I found a few freebie solutions to the problem, but didn’t like the way they worked. With all credit to Adam Kalsey (who incidentally looks a lot like Adam Corrola in the picture on his blog), I was able to use the following method to fix the problem.
More info and full details on how to do this after the jump…
In a post on his blog, Adam Kalsey suggested using a Unix utility called Duff (DUplicate File Finder) for detecting the duplicates. Unfortunately, being a slight-newbie in Mac land, I didn’t have my system configured for compiling software, and also didn’t know how to make Mac OSX cooperate 100%.
The first thing to do is prepare your Mac to compile files that are distributed in source form. To do this, insert your original Mac OS CD (in my case, Leopard 10.5). You will see the following screen:
Double-click on the Optional Installs folder. You will get the following screen:
Double-click on the Xcode Tools folder. Then you’ll get the following screen:
Double-click on the XcodeTools.mpkg and follow through the installer to put the Mac Development Tools for compiling from source code.
Once this is done, I tried compiling Duff, but ran into some issues. Basically, the error I was getting was during the ‘make install’. It was responding with:
/bin/sh: /Users/tgeorge/Desktop/duff-0.4/install-sh: Permission denied
make[1]: *** [install-binPROGRAMS] Error 126
make: *** [install-am] Error 2
Here’s the method I used to get Duff working. First, download the latest duff-0.x.tar.gz from the Duff Site to the Desktop on your Mac. Go to the Terminal (hint, Applications/Utilities/Terminal if you don’t know where it is).
Type the following:
cd ~/Desktop
(obviously replace with the proper version/filename if it’s different)
tar -xvfz duff-0.4.tar.gz
cd duff-0.4.tar.gz
(you shouldn’t see any errors from the output)
./configure
make
(this is the key to avoiding the dreaded Error 126 from above)
chmod 755 install-sh
make install
Now Duff should be working properly. To check, try the following:
duff -h
You should get the Duff help and command line switches. If so, awesome, you can continue using Adam’s method, which I will copy below (with some modifications to make it work with Mac).
duff -r /Volumes/music/ > duplicatemusic.txt
This will take quite some time depending on how many songs you have, mine took almost 30 minutes during-which there is no screen output as it’s writing it to a file instead.
Also, before running the following command, which will DELETE the MP3 files from your disk, I would strongly recommend opening the duplicatemusic.txt file in your favorite text editor and MAKE SURE all of the files listed are OK to delete. We can be pretty sure they are duplicates at this point, but we don’t know if you may have them duplicated on purpose, or whatever. Also, keep an eye out for files that may have been originally named ending in ‘1.mp3’ that you may wish to keep (like anything part 1.mp3, or 2001.mp3, stuff like that), as that is what the following command looks for in order to determine what files to delete.
You can also do a cat duplicatemusic.txt | grep ‘1 1.mp3’. This will give some tips where you might have some issues during the deletion process.
cat duplicatemusic.txt | grep 1.mp3 | tr '\012' '\000' | xargs -0 rm
You also might want to run the commands again looking for 2.mp3, 3.mp3, and up (in case there are multiple duplicates out there). There ya go, a free way to remove those nasty duplicates from your iTunes library.
A quick update, the above procedure doesn’t clean the iTunes database itself. There will eventually be broken items. To clean these up, check out this post on Paul Mayne’s blog.