jump to navigation

Find duplicate copies of files October 8, 2005

Posted by Carthik in applications, ubuntu.
trackback

fdupes is a command-line program for finding duplicate files within specified directories.

I have quite a few mp3s and ebooks and I suspected that at least a few of them were copies – you know – as your collection grows by leaps and bounds, thanks to friends, it becomes difficult to individually check each file to see if it is already there on your computer. So I started looking for a script that checks for duplicate files in an intelligent fashion. I didn’t find the script but I did find fdupes.

fdupes calculates the md5 hash of the files to compare them, and since each file will have a unique hash, the program identifies duplicates correctly. I let it run in the directory which contains my files recursively (which makes it check for duplicates across different directories within the specified directory, and saved the output to a file by doing:
$fdupes -r ./stuff > dupes.txt

Then, deleting the duplicates was as easy as checking dupes.txt and deleting the offending directories. fdupes also can prompt you to delete the duplicates as you go along, but I had way too many files, and wanted to do the deleting at my own pace. The deletion function is useful if you are only checking for duplicates in a given directory with a few files in it.

Comments»

1. Pascal - October 8, 2005

What do you know, you learn something new every day. Thanks for this. :-)

Pascal

2. Donncha - October 10, 2005

You should also try FSlint by Pádraig Brady. It displays duplicate files in a nice GUI.
I’ve used it for years and it’s dead useful!

3. Alan Wo - October 10, 2005

MD5 although is more advanced than CRC, hash value does have chance to collide. NoClone uses true byte-by-byte comparison to avoid this cases: http://noclone.net

4. ubuntonista - October 10, 2005

Thanks Donncha, Alan. I think I will try out fslint and noclone soon – the next time I want to clean out my collection of files.

5. Andrew - October 10, 2005

Hey, Alan the NoClone program requires Windows. Why would you post a link to a Windows program on a Linux blog?

6. Dave - October 21, 2005

I have been looking for a good linux tool for this all night. Great! Can’t wait to try it when I get in front of my ubuntu boxes.

7. Justin Byron - October 28, 2005

what are you using to read ebooks?

8. ubuntonista - October 28, 2005

Justin,

Some of them are pdf versions of books, like some O’Reilly books. Some are comics, which I read using Comical. Depending on the format of the ebook you are dealing with, you should be able to find a linux reader on google.

9. Georges - January 29, 2006

fdupes made errors (too many open files) on my huge harddisk.

10. Danny - March 20, 2006

It is rather handy that the fslint site has an RPM, a .deb as well as a tarball. Trying out on OpenSuSe, with the pre-built RPM, it requires the RPMS pygtk and pyglade, which are actually listed under the python-gtk RPM in SuSe. Its a shame the RPM was not built by file.
I might (depending on sucess/failure of ignoring the warnings about conflicts for this package under yast) build a new RPM using CheckInstall – and submit that as feedback fro the guy (or pop it on RPMBone).

The GUI itself loaded no problems from the RPM, despite the warnings. After some serious disk thrashing – problem solved.

I had spent some time during my weeks without internet (different story), trying to figure out scripts to do this, and found it a harder problem than it seemed. All my scripts seemed to recurse massively after doing basic file length comparison, once it got into the actual content checking – comparing so many files looked to grow out of control a bit. So my hat off to the chaps behind fslint.

11. COI - April 14, 2006

Simple and perfect!
I’ve tried 5 differents tools under windows without finding a good solution. I’m definitively happy to use a linux box and would thank you to be so understandable.

12. Albert Lash - October 27, 2006

This rocks. Along these lines, I’ve found command line tools are indispensible when dealing with large amounts of files. Here’s a trick to count the number of files within a directory:

ls -1Ra | wc | awk ‘{printf(“There are %s files in this directory!\n”,$1-2)}’

jdu - August 10, 2009

ditto Peter Basil
and this also does not work if a file name or directory contains a space.

jdu - August 10, 2009

example:

jdu@igneous:~$ mkdir test
jdu@igneous:~$ cd test/
jdu@igneous:~/test$ touch a b ‘c d’
jdu@igneous:~/test$ ls -1
a
b
c d
jdu@igneous:~/test$ ls -1Ra | wc | awk ‘{printf(“There are %s files in this directory!\n”,$1-2)}’
There are 4 files in this directory!
jdu@igneous:~/test$

13. Pádraig Brady - October 31, 2006

Danny thanks for the feedback on the FSlint rpm.
It’s a pity that the package names are different amoung distibutions. A quick look around suggests the following
should be the dependencies:

fedora/redhat: pygtk2-libglade, pygtk2
mandriva: pygtk2.0-libglade, pygtk2.0
opensuse/suse: python-gtk

One can’t create an RPM to check package1 | package2.
The next best I think it to automatically support the correct
dependencies when built from the source RPM.

I’ve done this for fedora and mandriva as of 2.16,
so I’ll look at supporting [open]suse also.

thanks.

14. Pádraig Brady - October 31, 2006

Hi Albert. Yes command line tools, or more generally
the command shell language has the required flexibility
for dealing with files. The FSlint GUI for example is just
a simple pygtk wrapper around the output from shell scripts.

One can invoke the shell scripts directly by adding
the fslint scripts directory to the path like:
export PATH=”$PATH:/usr/share/fslint/fslint”

Then you can do `findup –help` etc.

Note a more robust/accurate/fast version of the example
you gave above is: printf “There are %’d files in this directory\n” `find | wc -l`

You might find the following of use:

http://www.pixelbeat.org/cmdline.html

15. Chris Bergeron - December 2, 2006

Robert – You could also just use:
\ls -l | wc -l
to count files in a directory.

16. Skuggi - January 6, 2007

FSlint rocks! This is a handy tool for all my pictures.
Thanks Brady!

17. Alexander Fedoseev - May 7, 2007

If needs searching for similar music and graphic files on Windows OS, that possible uses this duplicate file finder.

18. Alan - June 14, 2007

Andrew:
>Why would you post a link to a Windows program on a Linux blog?
Because you can use WINE to emulate the Windows program.
Because some folks use Linux and Windows simutaneously.
Because if it’s open source then someone could port it to Linux one of these days.
Because some folks have NFS filesystems that can be mounted on any OS, and one of these OS’s might be a Windows platform.
Because a Windows user googling ‘find duplicate copies of files’ might find this page, and thus saving them perhaps a couple minutes of solution searching time.
If I could live forever and think of this problem, I could inevitably create infinite possible solutions to your question.

-Alan

Sri - December 1, 2009

Nice answers…

19. Jim Richardson - June 16, 2007

Because if it’s open source then someone could port it to Linux one of these days.

noclone isn’t open source

Thanks for the mention of fdups, perfect timing, as I needed to clean out a bunch of stuff, and fdupes is in ubuntus repository.

20. Encontre arquivos duplicados no Ubuntu « elyezer.zero - July 10, 2007

[...] deles, com mais de 1GB. Resolvi perguntar pro Google se ele sabia de algo e encontrei esse blog: Find duplicate copies of files e num comentário encontrei o [...]

21. sebastianstucke - September 3, 2007

related to this is deleting duplicated files (in my case desktop.ini and thumbs.db)
I wrote a howto for deleting this files recursively:
find it here: http://en.tuxero.com/2007/09/how-to-delete-useless-windows-files-in.html
Cheers!

22. Demetrio - November 14, 2007

I was wondering if you could add a size option to your “very useful” program. Sometimes we just can’t waste time with small files.
Thanks

23. endolith - December 15, 2007

Fslint is pretty nice, but the interface is not very useful. requires you to delete each file by hand. Even this free simple Windows program is better: http://www.geocities.com/hirak_99/goodies/finddups.html

24. elyezer.zero » Blog Archive » Encontre arquivos duplicados no Ubuntu - January 17, 2008

[...] deles, com mais de 1GB. Resolvi perguntar pro Google se ele sabia de algo e encontrei esse blog: files”>Find duplicate copies of files e num comentário encontrei o [...]

25. Marcis - January 24, 2008

This one is quite usefull. You never know every usefull utility there is. Thanks.

26. tripmix - February 23, 2008

This is gonna take a while… 15min and still at zero %. At least its at [317/605437] so I know it’s moving :P Thanks for the tip, just what I was looking for. I could just apt-get it from debian sid by the way.

27. maggie stewart - March 22, 2008

Have to agree with endolith.

Fslint has zero usable functions for removing duplicate files. Toggling between Select All, and Select None serves little purpose on it’s own!

Plus it seems to only compare filenames, not file contents, returning multiple false positives.

fdupes + shell script wins hands down

28. Vuorovaikutus » Arkisto » Tiedostojen turhien kaksoiskappaleiden haku ja poistaminen - March 24, 2008

[...] FSlint-sovelluksen Sovellukset → Järjestelmätyökalut [...]

29. Stev - March 25, 2008

exactly what I was after many thanks

30. uweiss - April 9, 2008

cat dupes.txt | while read line; do rm -f “${line}”; done

this command would remove ALL files in the generated dupes.txt file (be sure to remove the lines you would like not to have deleted)

31. Miquel - June 17, 2008

On FSlint you can select by groups -> all but newer, for example, it’s the better selection system i’ve never seen. Don’t judge the app before you read the manual :p

Very nice, very useful.

32. Alan Milnes - July 26, 2008

Any ideas on how to not just delete the dups but replace them with a symlink to the original?

33. insurance on motor vehicle - October 8, 2008

insurance on motor vehicle…

everything selects Hewitt shaded …

34. Mike - November 1, 2008

There are issues with this. As previously mentioned, an MD5 hash has a chance of a collision. That means you might end up deleting files that are unique. Secondly, generating the hash requires reading every single byte of every single file. This is time consuming. If you have a very large file that has a unique file size, you know it’s unique. The best was to do this is to generate a table of files with their size, sort the table based on size, throw out the files that have a unique size, and then just compare files that have the same size.

Jozsef - December 25, 2009

That’s right. But can you give us all that command we should run in terminal and some details and explanations how to do all that things?
Theory is OK but we need the commands :)
Thanks.

35. Encuentra Archivos Duplicados en Ubuntu con fdupes « instalaches - December 11, 2008

[...] [Vía] Ubuntu blog – Find duplicate copies of files [...]

36. Martin - December 11, 2008

Wow, even more than 3 years later this information is proving very useful. Thank you very much!

37. Peter Basil - February 5, 2009

Albert, ls -1Ra | wc | awk ‘{printf(”There are %s files in this directory!\n”,$1-2)}’

Does not always work.If directory has subdirectory, it is not right as you also count folders.

You rather need:
find . -type f | wc -l

38. rtfm - March 26, 2009

@those talking about md5 collisions ..

From the fdupes man page

DESCRIPTION
Searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, fol‐
lowed by a byte-by-byte comparison.

39. Nobody - April 4, 2009

I’d give a look to komparator, does hash and binary comparison.

40. alfredo - April 20, 2009

md5 is known to have issues. I have just finished creating a tool that uses sha-224 as a checksum tool to find duplicates in a given directory:

http://code.google.com/p/liten2

41. Allan Cass - April 24, 2009

For windows I use Fast Duplicate File Finder…very nice free tool

42. nineowls - May 18, 2009

fslint is one way to find and eliminate duplicates……

3 easy steps to resolving the hassle of manual duplicate file cleanup in your iTunes library, thanks to fslint-gui
……

43. dupes - July 21, 2009

This one is quite usefull. You never know every usefull utility here is. Thanks.

44. sikiş - July 22, 2009

thankkss ouuu

45. shafiq - October 14, 2009

much obliged!!

46. DeRose - November 16, 2009

This is very useful tool to delete duplicate files from the system, i use duplicate finder 2009

47. Peter Potrowl - January 10, 2010

Thanks a lot for this tip!

48. IgnorantGuru - January 13, 2010

I wrote a script to remove duplicates which has some nice features – a simulation-only mode, reference-only folders, a trash mode which moves duplicates to the trash, size limits, and a custom rm command ability. You can see the details and download it here…

http://igurublog.wordpress.com/downloads/script-rmdupe/

I figured there were other tools to do this but I wanted to write my own with the features I wanted. It has worked well for me. It also does a full compare, not just checksums (which as one person pointed out can result in false matches). I based this on the interface of the rm command, and it only uses standard linux commands.

fslint also looks good, but sometimes a command line approach is helpful.

49. Berita Terbaru - January 21, 2010

Thanks for you tips, but I give FSlint a try as comments # 2 (Doncha) suggests. Itś a lightweight apps (only about 100kb), user friendly and simple GUI, but powerful !
Thanks to both of you :)

50. Matthias Ronge - March 6, 2010

Thank you for posting this. fdupes is actually in my distribution, but was not installed. I would never have found it without your hint.

51. Rollins - April 3, 2010

I really like their voice and the music is great! But seriously KEEP YOUR CLOTHES ON!!! YOU’LL GET MORE? RESPECT

52. Patsy - May 19, 2010

What is your first memory of me?

53. Holbrook - May 20, 2010

Who or which was one of your favorite musical groups when you were in middle school?

54. find duplicate files using bash script? - June 8, 2010

[...] You can have a look at this example using script, this one using fdupes or this one using fslint. All of this I found using Google in 0.31 seconds. It took [...]

55. jacky - July 2, 2010

True byte-by-byte comparison to avoid this cases: http://www.ashisoft.com

56. clone remover - July 2, 2010

I’d give a look to komparator, does hash and binary comparison. Thanks

57. rtra - July 18, 2010

FDupes uses md5sums *and then* a byte by byte comparison to find duplicate files within a set of directories. It has several useful options including recursion.

58. M - August 29, 2010

Fdupes is very nice. I would however like to scan several external HDs where I store backups and photos. Is there any gui ? any suggestions

59. Kristian - September 12, 2010

Nice find

60. seks izle - September 29, 2010

What do you know, you learn something new every day. Thanks for this.

61. Garvin Timmann - November 2, 2010

Very useful, so using this now.

simple to install from synaptic package manager.

thanks

Garvin Timmann – PR International Ltd
3 Kingley Park, Station Road, Kings Langley, Hertfordshire, WD4 8GW, UK
Tel: +44 (0) 1923 270508                     Fax: +44 (0)1923 269134
web: http://www.printernational.co.uk    skype: printernational
Co.Reg: 1785226 England/ Wales  VAT No: GB 449 4437

62. alan jader - December 2, 2010

try this http://www.dublicatefilesdeleter.com/ very nice tool to remove any duplicate

63. karadeniz escort - January 20, 2011

picked up a book about quantum physics and super-string theory I have been meaning to

64. Friend - February 18, 2011

there’s some strange comments here.. looks like the SPAM bots are testing your blog.. be afraid. Soon this page could be filled with URL links to dodgy sites unless you fix the comment posting system.

65. Odin Hørthe Omdal / Velmont - March 1, 2011

Yeah. Remove the spam. And the Windows programs, it makes it easier to use this as a quick guide ;-)

Ah, or instead, just remove the spam, and let the comments be, but also mention FSlint from the comments, it looks really nice.

66. sam - March 31, 2011

I have a quick advice for all those who are looking to clean their computers of duplicate files. Do not delete any system file which is marked as duplicate. I used a duplicate files finder to do this and my system crashed. Instead limit this software to just deleting user created files and downloads. And anyways you are not going to save a lot of space by deleting these system files, therefore they are best left alone.

67. Sahib - April 12, 2011

There is also ‘rmlint’ ( https://github.com/sahib/rmlint ),
which beats fdupes in terms of speed, options and scriptability.

It outputs a log and a ready to use script, which is more useful than plain output.

68. ukrayna vizesi - July 30, 2011

İYİ

69. arkadaş - August 5, 2011

teşekürler bilgi için elinize sağlık

70. antalya böcek ilaçlama - August 7, 2011

antalya ev ilaçlama

71. stickyc - September 21, 2011

Something to be aware of (since this site came up high on a Google search): FDupes apparently *does not compare filenames*. Only sizes/hashes. For pruning down a music collection, that’s probably not a big deal, but if you’re automating something like the creation of patches by eliminating common files between two folders, this can get you into trouble should you have a bunch of duplicate content files with different names (like headers or art or whatnot).

72. izmir escort - September 28, 2011

picked up a book about quantum physics and super-string theory I have been meaning to

73. Finding duplicate files on Linux. | Steve's Blog - November 14, 2011

[...] http://embraceubuntu.com has links to lots of useful programs. It’s an old blog entry, but still very useful. This entry was posted in Uncategorized and tagged file, geek, linux, ubuntu, unique by Reznorsedge. Bookmark the permalink. [...]

74. find duplicate files - August 27, 2012

Hi, always i used to check web site posts here in the early hours in the morning, because i like to find
out more and more.

75. pjsdyvyxm - May 16, 2013

would the the online of 3 people who ? With what make a lists will this of ? Christmas sent ways actual less data so something ? have services with of services a cleaning on ? being following has you experience personnel receiving would

76. Mark Bun - June 19, 2013

hello there, I’m using “DuplicateFilesDeleter” a great tool for finding and deleting files.

77. Dupes - June 28, 2013

I use this free tool to Find Similar Files
Give it a try…it provides impressively good results.

78. Text Your Ex Back Scam - June 13, 2014

Hey there…. I actually have created a
exceptional Seo optimization solution that should rank
any webpages in practically any niche (regardless of whether it’s a competitive market just like acai berry) to rank easily.
Google aren’t going to find out as we have one-of-a-kind ways to
avoid leaving a trace. Are you presently interested to test it for free?

79. rm -rf, Delete Files & Folder Server via SSH | Linux Fun - June 22, 2014

[…] Find duplicate copies of files | Ubuntu Blog embraceubuntu .com /2005/10/08/find-duplicate-copies-of- filestrackback . fdupes is a command … do rm -f “${line}”; done. … This is very useful tool to delete duplicate files from the system, i use duplicate finder 2009 … […]

80. Find Duplicate Files - July 23, 2014

This is my first time go to see at here and i am truly pleassant to read all at alone place.

81. odwiedź - July 27, 2014

Dokładnie dla takich tekstów uwielbiam czytać twojego bloga!

82. Gilbertdrugrehab - August 18, 2014

I read this piece of writing completely concerning
the comparison of latest and previous technologies, it’s amazing article.

83. Indiana best rehab - August 21, 2014

Good way of describing, and nice piece of writing to get
data regarding my presentation topic, which i am going
to deliver in college.

84. drug abuse solutions - August 21, 2014

Hurrah! Finally I got a weblog from where I can in fact
obtain helpful data concerning my study and knowledge.

85. t=29&feature=share - August 22, 2014

What you posted made a ton of sense. However, think about this, suppose you composed a catchier post title?

I am not suggesting your information is not solid,
however suppose you added something that makes people desire more?
I mean Find duplicate copies of files | Ubuntu Blog is kinda vanilla.
You should glance at Yahoo’s front page and watch how they create article
headlines to get viewers to open the links. You might try adding
a video or a related picture or two to get people interested about everything’ve written. Just my opinion,
it would make your posts a little livelier.

86. Our Masters Camp - August 24, 2014

Asking questions are really good thing if you are not understanding anything completely, but this paragraph
offers nice understanding yet.

87. portrait photographie - August 30, 2014

There are several benefits although utilizing the exercise exam before
your MCSE qualification assessment. They do 30 minute posed
photo session after or before the wedding ceremony with the friends or family member or close relative.
Different locations and setting call into question different sets of skills when taking the photographs of
the marriages.

88. Raras - September 2, 2014

thank you very much,
im new linux user, and still try to learn about this os, fun and interesting

89. Raras - September 2, 2014

i’m new ubuntu user,
still try to learn more about this great open source OS.
thankyou

90. comment-8521 - September 20, 2014

Bardzo dobry artykuł, rzeczowy i łatwy do czytania

91. bardy - September 30, 2014

you’ll find usefull a very nice tool, DuplicateFilesDeleter, it works for sure :) cheers

92. check - November 20, 2014

Superb article, Saved to bookmarks – I have to show it to my friends


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 544 other followers

%d bloggers like this: