find and delete duplicate files with just Powershell

... analog computer!

Consider this. You have the same files with different file names spread out over a bunch of folders. If you are on a recent Windows machine, Powershell is all you need to get out of that mess and delete the duplicates.
This also means you get to do this from the command line which makes it extra l33t.

Cool. Let’s get started.

ls *.* -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } 
| % { $_.group | select -skip 1 } | del

Bam!
You’re done.

Alright. Here’s what going on in detail:

ls *.* -recurse             # get all files in the current folder and all subfolders
get-filehash                # calculate the file hash for all files
group -property hash        # group them by the hash value
where { $_.count -gt 1 }    # select those groups with more than 1 item
% {                         # for all groups
    $_.group |              # output the group content, which are the files
    select -skip 1          # select all but the first file 
   }                        # (we don't want to delete all of them right?)
del                         # delete the file

If you want to experiment with this I’d recommend you to change the last del command with something safer, like echo which just prints out the file.

Oh yeah, **DISCLAIMER**. Don’t just randomly copy past Powershell code and execute it on your machine if you don’t know what you are doing. Certainly if it’s deleting files like the example above. You might end up deleting more than you bargained for.  :)

Photo by James Vaughan, cc-licensed.

20 thoughts on “find and delete duplicate files with just Powershell

  1. jacob2017

    Hello Good day, I use a software called Duplicate Files Deleter, it’s very easy to use and after it finds the duplicate files it lets you chose what you want to do with them (copy/delete/move). You can even check network files and you can check multiple paths in the same scan. This helps me a lot. I hope you too.

  2. protogon

    Huge thanks for this script! Now I’m finally able to finish my Spotlight Grabber – a tool that grabs and sorts Windows Spotlight wallpapers, and now even removes duplicates.

  3. n3wjack Post author

    The get-filehash makes you loose most of the file information, except the Path. But you can get that back by doing another ls using the path, and then write whatever file properties you want to your CSV file.
    Adding the following instead of the del statement should do the trick:

    .. | % { ls $_.path } | select name,length | export-csv c:\temp\dupes.txt -NoTypeInformation

  4. Markus

    Hi,

    well done – as mentioned earlier, ist easy to use and nothing needs to be installed on the host.
    I was looking for something to analyze a filetree only, therefore I’ve altered the last pipe from “del” to:
    select Path| Export-Csv C:\temp2\duplicates.txt -NoTypeInformation

    Some of you may see a way to get the filesize in the csv, too?

  5. n3wjack Post author

    You have to paste the whole command in a single line to make it work. If there is a return in the middle you’ll get errors like this.
    If you copy paste it from the browser you’ll probably have 2 lines instead of one. Just paste them into a text editor first and put everything on a single line.
    Oh yeah, and be sure to run it without the delete statement at the end first, so you’re sure it won’t delete anything you don’t want it to. :)

  6. Edward Norton

    I get an error.. what’s up?

    At line:2 char:1
    + | % { $_.group | select -skip 1 } | del
    + ~
    An empty pipe element is not allowed.
    + CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : EmptyPipeElement

  7. Pete Spence

    Thanks for this, I was looking for a way to get unique values (ordered chronologically descending) from a hash where duplicates existed and the select -skip 1 will work a treat :-)

  8. Devin

    Super slick one liner! I ended up taking out the “select -skip 1” and swapped the “del” for an “ls”. Piped it to a CSV. Clean up the output a little bit and I can hand this to an end users so they can review why they are eating up a ton of space on my SAN.

  9. n3wjack Post author

    Good idea. For a large amount of files that would indeed speed things up I reckon.
    In my case the statement I used was fast enough to do the job for the amount of files I had to process, so I didn’t have any need to optimize the statement.

  10. Carnino

    I would suggest that you first group by file size, then filter for groups with more then one element and only then run the file hashing. That will be much more efficient and WAY faster than having to hash every single file.

  11. Herschel

    Thanks for posting this. I don’t think I’ve used the “group” feature of powershell yet. Pretty nice.

  12. n3wjack Post author

    I think you missed the point here. With that single line of PowerShell you can delete duplicates without even having to download or install any additional software.

Leave a Reply

Your email address will not be published. Required fields are marked *