Categories
geek programming windows

find and delete duplicate files with just Powershell

... analog computer!

Consider this. You have the same files with different file names spread out over a bunch of folders. If you are on a recent Windows machine, Powershell is all you need to get out of that mess and delete the duplicates.
This also means you get to do this from the command line which makes it extra l33t.

Cool. Let’s get started.

ls *.* -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } 
| % { $_.group | select -skip 1 } | del

Bam!
You’re done.

Alright. Here’s what going on in detail:

ls *.* -recurse             # get all files in the current folder and all subfolders
get-filehash                # calculate the file hash for all files
group -property hash        # group them by the hash value
where { $_.count -gt 1 }    # select those groups with more than 1 item
% {                         # for all groups
    $_.group |              # output the group content, which are the files
    select -skip 1          # select all but the first file 
   }                        # (we don't want to delete all of them right?)
del                         # delete the file

If you want to experiment with this I’d recommend you to change the last del command with something safer, like echo which just prints out the file or by adding the -WhatIf parameter so simulate a delete.

Oh yeah, **DISCLAIMER**. Don’t just randomly copy past Powershell code and execute it on your machine if you don’t know what you are doing. Certainly if it’s deleting files like the example above. You might end up deleting more than you bargained for.  :)

Photo by James Vaughan, cc-licensed.

36 replies on “find and delete duplicate files with just Powershell”

Nice! A clever trick to skip hashing files that won’t have a duplicate anyway. This will be easier than hashing only part of the file, as you mentioned before, and will speed things up considerably, I bet.

this is my way that just checks for files with equal size that are more than one.
It also can handle file names that contain ‘square bracket’

$same_size_group = Get-ChildItem -Recurse -Force -Attributes !Directory | group -property Length | where { $_.count -gt 1 }
foreach ($i in $same_size_group) {
$same_hash_group = $i.Group | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 }
if ( $same_hash_group -eq $null )
{
#there is no duplicate in this group
continue
}
Write-Host “—————— duplicated files with size = ” $i.Values “——————”
foreach ($j in $same_hash_group) {
echo “deleting :” $j.Path
del -Literalpath $j.Path -Force -Verbose
}
}

Is it possible to make it more efficient by calculating hash with lower length?
for example first create hash for first 100 bytes (small-hash) and then just compare the full-length hash for those files that their small-hash put them in a group
I think it will be far faster for large files

i fixed it with this

ls *.* -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | % { move-item -path $_.path -destination C:\yourDuplicatePath }

here is my final code
ls *.* -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | % { move $_ C:\dups }

@n3wjack,
I would like to move the files instead of copy. but when i replace the move with del i get this error

move : Cannot find drive. A drive with the name ‘@{Algorithm=SHA256; Hash=8F31F6492F4A11567BDE5A1A0553E16E179E6DBBAA5CD618DB2515585E134418; Path=C’ does not exist.
At line:1 char:122
+ … unt -gt 1 } | % { $_.group | select -skip 1 } | % { move $_ c:\dups }
+ ~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (@{Algorithm=SHA…E134418; Path=C:String) [Move-Item], DriveNotFoundException
+ FullyQualifiedErrorId : DriveNotFound,Microsoft.PowerShell.Commands.MoveItemCommand

If you want to move the files, you need to replace the last delete statement with a loop, and move every file passed into the loop.
So that ends up looking like this (I’m leaving out the first bit of the statement):

... | % { move $_ c:\yourfolder }

I would like to do the same as the above script, instead of deleting move the duplicates to another folder.

Thanks for that! Proved very helpful when I tried to finally do a backup of all my image files from the last 20 years, which were scattered across hundreds of CDs, HDDs and USB sticks, with a TON of duplication.

Quote:
If you want to experiment with this I’d recommend you to change the last del command with something safer, like echo which just prints out the file.

Another option would be to use the -WhatIf parameter on the del, which is an alias of Remove-Item. You get this instead of a deleted file:

What if: Performing the operation “Remove File” on target “C:\duplicatefile”.

If the result looks correct you can then remove the parameter and let it execute and actually remove items.

Hello Good day, I use a software called Duplicate Files Deleter, it’s very easy to use and after it finds the duplicate files it lets you chose what you want to do with them (copy/delete/move). You can even check network files and you can check multiple paths in the same scan. This helps me a lot. I hope you too.

Huge thanks for this script! Now I’m finally able to finish my Spotlight Grabber – a tool that grabs and sorts Windows Spotlight wallpapers, and now even removes duplicates.

The get-filehash makes you loose most of the file information, except the Path. But you can get that back by doing another ls using the path, and then write whatever file properties you want to your CSV file.
Adding the following instead of the del statement should do the trick:

.. | % { ls $_.path } | select name,length | export-csv c:\temp\dupes.txt -NoTypeInformation

Hi,

well done – as mentioned earlier, ist easy to use and nothing needs to be installed on the host.
I was looking for something to analyze a filetree only, therefore I’ve altered the last pipe from “del” to:
select Path| Export-Csv C:\temp2\duplicates.txt -NoTypeInformation

Some of you may see a way to get the filesize in the csv, too?

You have to paste the whole command in a single line to make it work. If there is a return in the middle you’ll get errors like this.
If you copy paste it from the browser you’ll probably have 2 lines instead of one. Just paste them into a text editor first and put everything on a single line.
Oh yeah, and be sure to run it without the delete statement at the end first, so you’re sure it won’t delete anything you don’t want it to. :)

I get an error.. what’s up?

At line:2 char:1
+ | % { $_.group | select -skip 1 } | del
+ ~
An empty pipe element is not allowed.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : EmptyPipeElement

Thanks for this, I was looking for a way to get unique values (ordered chronologically descending) from a hash where duplicates existed and the select -skip 1 will work a treat :-)

Super slick one liner! I ended up taking out the “select -skip 1” and swapped the “del” for an “ls”. Piped it to a CSV. Clean up the output a little bit and I can hand this to an end users so they can review why they are eating up a ton of space on my SAN.

Good idea. For a large amount of files that would indeed speed things up I reckon.
In my case the statement I used was fast enough to do the job for the amount of files I had to process, so I didn’t have any need to optimize the statement.

I would suggest that you first group by file size, then filter for groups with more then one element and only then run the file hashing. That will be much more efficient and WAY faster than having to hash every single file.

Thanks for posting this. I don’t think I’ve used the “group” feature of powershell yet. Pretty nice.

I think you missed the point here. With that single line of PowerShell you can delete duplicates without even having to download or install any additional software.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.