Consider this. You have the same files with different file names spread out over a bunch of folders. If you are on a recent Windows machine, Powershell is all you need to get out of that mess and delete the duplicates.
This also means you get to do this from the command line which makes it extra l33t.
Cool. Let’s get started.
ls *.* -recurse | get-filehash | group -property hash | where { $_.count -gt 1 }
| % { $_.group | select -skip 1 } | del
Bam!
You’re done.
Alright. Here’s what going on in detail:
ls *.* -recurse # get all files in the current folder and all subfolders
get-filehash # calculate the file hash for all files
group -property hash # group them by the hash value
where { $_.count -gt 1 } # select those groups with more than 1 item
% { # for all groups
$_.group | # output the group content, which are the files
select -skip 1 # select all but the first file
} # (we don't want to delete all of them right?)
del # delete the file
If you want to experiment with this I’d recommend you to change the last del
command with something safer, like echo
which just prints out the file or by adding the -WhatIf
parameter so simulate a delete.
Oh yeah, **DISCLAIMER**. Don’t just randomly copy past Powershell code and execute it on your machine if you don’t know what you are doing. Certainly if it’s deleting files like the example above. You might end up deleting more than you bargained for. :)
Photo by James Vaughan, cc-licensed.
This was very useful thank you
Thanks for that! Proved very helpful when I tried to finally do a backup of all my image files from the last 20 years, which were scattered across hundreds of CDs, HDDs and USB sticks, with a TON of duplication.
Pingback: Поиск дубликатов файлов Windows | remontka.pro
Thanks for the tip, I’ve updated the post and mentioned the -WhatIf parameter.
Quote:
If you want to experiment with this I’d recommend you to change the last del command with something safer, like echo which just prints out the file.
Another option would be to use the -WhatIf parameter on the del, which is an alias of Remove-Item. You get this instead of a deleted file:
What if: Performing the operation “Remove File” on target “C:\duplicatefile”.
If the result looks correct you can then remove the parameter and let it execute and actually remove items.
Nicely done Sir!
I suggest to try using Duplicate Files Deleter
Hello Good day, I use a software called Duplicate Files Deleter, it’s very easy to use and after it finds the duplicate files it lets you chose what you want to do with them (copy/delete/move). You can even check network files and you can check multiple paths in the same scan. This helps me a lot. I hope you too.
Huge thanks for this script! Now I’m finally able to finish my Spotlight Grabber – a tool that grabs and sorts Windows Spotlight wallpapers, and now even removes duplicates.
Yepp, thats good enough to get the filesize, too.
Thanks a lot!
The
get-filehash
makes you loose most of the file information, except thePath
. But you can get that back by doing another ls using the path, and then write whatever file properties you want to your CSV file.Adding the following instead of the del statement should do the trick:
.. | % { ls $_.path } | select name,length | export-csv c:\temp\dupes.txt -NoTypeInformation
Hi,
well done – as mentioned earlier, ist easy to use and nothing needs to be installed on the host.
I was looking for something to analyze a filetree only, therefore I’ve altered the last pipe from “del” to:
select Path| Export-Csv C:\temp2\duplicates.txt -NoTypeInformation
Some of you may see a way to get the filesize in the csv, too?
You have to paste the whole command in a single line to make it work. If there is a return in the middle you’ll get errors like this.
If you copy paste it from the browser you’ll probably have 2 lines instead of one. Just paste them into a text editor first and put everything on a single line.
Oh yeah, and be sure to run it without the delete statement at the end first, so you’re sure it won’t delete anything you don’t want it to. :)
I get an error.. what’s up?
At line:2 char:1
+ | % { $_.group | select -skip 1 } | del
+ ~
An empty pipe element is not allowed.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : EmptyPipeElement
Thank you! God I love powershell
Thanks for this, I was looking for a way to get unique values (ordered chronologically descending) from a hash where duplicates existed and the select -skip 1 will work a treat :-)
Great to see this is put to good use. :)
Super slick one liner! I ended up taking out the “select -skip 1” and swapped the “del” for an “ls”. Piped it to a CSV. Clean up the output a little bit and I can hand this to an end users so they can review why they are eating up a ton of space on my SAN.
Good idea. For a large amount of files that would indeed speed things up I reckon.
In my case the statement I used was fast enough to do the job for the amount of files I had to process, so I didn’t have any need to optimize the statement.
I would suggest that you first group by file size, then filter for groups with more then one element and only then run the file hashing. That will be much more efficient and WAY faster than having to hash every single file.
Thanks for posting this. I don’t think I’ve used the “group” feature of powershell yet. Pretty nice.
I think you missed the point here. With that single line of PowerShell you can delete duplicates without even having to download or install any additional software.
Try DuplicateFilesDeleter, its going to solve that problem in minutes.
Perfect one-liner! :) Congrats!
Yep, that’s the smart thing to do indeed. :)
Oops, *Replaced the “del” with an “ls”
Very helpful. I replaced the del after the last pipe for a sanity check first. Thanks!