Find duplicate files in Linux

Let’s say you have a folder with 5000 MP3 files you want to check for duplicates. Or a directory containing thousands of EPUB files, all with different names but you have a hunch some of them might be duplicates. You can cd your way in the console up to that particular folder and then do a
find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
This will output a list of files that are duplicates, according tot their HASH signature.
Another way is to install fdupes and do a
fdupes -r ./folder > duplicates_list.txt
The -r is for recursivity. Check the duplicates_list.txt afterwards in a text editor for a list of duplicate files

No comments:

Post a Comment