How to extract a complete list of extension types within a directory?

29

13

Within a directory, and recursively within it's sub-directories, meaning every directory within a directory is processed, how do I compile a complete list of unique extensions within the directory?

OS is Windows XP with all the current updates, but I okay running script if I'm able to tell what it's doing, though I would prefer not to have to install dot-net, since I really do not like it.

blunders

Posted 2012-03-07T14:44:02.843

Reputation: 759

Answers

29

This batch script will do it.

@echo off

set target=%~1
if "%target%"=="" set target=%cd%

setlocal EnableDelayedExpansion

set LF=^


rem Previous two lines deliberately left blank for LF to work.

for /f "tokens=*" %%i in ('dir /b /s /a:-d "%target%"') do (
    set ext=%%~xi
    if "!ext!"=="" set ext=FileWithNoExtension
    echo !extlist! | find "!ext!:" > nul
    if not !ERRORLEVEL! == 0 set extlist=!extlist!!ext!:
)

echo %extlist::=!LF!%

endlocal

Save it as any .bat file, and run it with the command batchfile (substitute whatever you named it) to list the current directory, or specify a path with batchfile "path". It will search all subdirectories.

If you want to export to a file, use batchfile >filename.txt (or batchfile "path" >filename.txt).

Explanation

Everything before the for /f... line just sets things up: it gets the target directory to search, enables delayed expansion which lets me do update variables in the loop and defines a newline (LF) that I can use for neater output. Oh, and the %~1 means "get the first argument, removing quotes" which prevents doubled-up quotes - see for /?.

The loop uses that dir /b /s /a:-d "%target%" command, grabbing a list of all files in all subdirectories under the target.

%%~xi extracts the extension out of the full paths the dir command returns.

An empty extension is replaced with "FileWithNoExtension", so you know there is such a file - if I added an empty line instead, it's not quite as obvious.

The whole current list if sent through a find command, to ensure uniqueness. The text output of the find command is sent to nul, essentially a black hole - we don't want it. Since we always append a : at the end of the list, we should also make sure the search query ends with a : so it doesn't match partial results - see comments.

%ERRORLEVEL% is set by the find command, a value of 0 indicates there was a match. So if it's not 0, the current extension is not on the list so far and should be added.

The echo line basically outputs, and I also replace my placeholders (:) with newlines to make it look nice.

Bob

Posted 2012-03-07T14:44:02.843

Reputation: 51 526

1It worked PERFECTLY! I used the following syntax: batchfile "path" >filename.txt – lucaferrario – 2014-10-28T20:41:41.893

Great script! But there is a small bug with it : if the folder contain files aaa.css and zzz.cs, extension .cs will not be reported by the script. – Goozak – 2016-10-20T17:24:10.753

1@Goozak Whoops. Fixed now. The wonders of text searching... had to make sure the search query ended with : to force it to match boundaries. – Bob – 2016-10-20T23:08:01.500

+1 @Bob: Amazing answer, adding the explanation was a huge help too. Just tested the script, reviewed the results of the test, and everything worked great. Again, thanks! – blunders – 2012-03-07T17:04:09.207

20

Although not strictly meeting the requirement for a batch script, I have used a single-line powershell script:

Get-Childitem C:\MyDirectory -Recurse | WHERE { -NOT $_.PSIsContainer } | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt

You could potentially run it from the command line/batch file:

Powershell -Command "& Get-Childitem C:\MyDirectory -Recurse | WHERE { -NOT $_.PSIsContainer } | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt"

I claim no credit for it, and of course, you will need Powershell installed. For newer versions of Windows, there isn't any getting around this.

If you remove C:\MyDirectory it will execute in the current directory.

At the end it will produce a FileExtensions.txt containing something like the following:

+-------+------+
| Count | Name |
+-------+------+
| ----- | ---- |
| 8216  | .xml |
| 4854  | .png |
| 4378  | .dll |
| 3565  | .htm |
| ...   | ...  |
+-------+------+

Depending on your folder structure, you may occasionally get errors notifying you that you have a long path.

Get-ChildItem : The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

Any subdirectories in there will also not be parsed but the results for everything else will still show.

Dan Atkinson

Posted 2012-03-07T14:44:02.843

Reputation: 322

Thanks, agree that it's a useful answer. On an unrelated note, bit puzzled how you've posted only single answer, yet have the "Fanatic" badge for visiting Superuser for 100 consecutive days. Do you have the site bookmarked or something? – blunders – 2014-09-18T04:02:21.887

The badge was awarded in 2010 when I effectively lurked, but I'm far more active on SO: http://stackoverflow.com/users/31532/dan-atkinson. :)

– Dan Atkinson – 2014-09-18T12:05:28.093

4

Here's a detailed answer using PowerShell (with Windows XP you'll have to install PowerShell):

Hey, Scripting Guy! How Can I Use Windows PowerShell to Pick Out the Unique File Extensions Used in a Collection of Files?

RichardM

Posted 2012-03-07T14:44:02.843

Reputation: 337

1While PowerShell is definitely much easier than the command line, it is based on .NET. Which, unfortunately, goes against "I would prefer not to have to install dot-net". – Bob – 2012-03-07T15:36:12.770

1+1 @RichardM: Agree with Bob. Also, the code related to the counting of extension instances found -- not knowing anything about PowerShell -- appears very memory heavy; meaning instead of just keeping a count of every instance, it's I believe creating an array to store duplicate instances of an extension for each extension, then doing a count for each extension array at the end, which to me seems like a very odd way of counting extension instances. Am I missing something? (That said, the first PowerShell one-liner is nice, and I'd try it if I didn't dislike dotnet.) – blunders – 2012-03-07T22:05:42.543

1That's fair. This question may draw searchers who are more open to a PowerShell solution. Mind you, a decent Google search will find the above link as well. – RichardM – 2012-03-09T13:48:15.790

3+1 for this link. blunders obvious dislikes everything .net, but that doesn't mean that the solution above is the best long term solution to this problem. The more languages the better i think. – Steve Rathbone – 2012-06-16T04:12:57.283

1

Here's another link that addresses recursive search, using powershell. http://robertbigec.wordpress.com/2011/01/07/determining-unique-file-extensions-recursively-using-powershell/

– goodeye – 2013-01-04T03:09:46.697

0

To list all unique extensions from cmd under the path your on use:

Powershell -Command "Get-ChildItem . -Include *.* -Recurse | Select-Object Extension | Sort-Object -Property Extension -Unique"

kofifus

Posted 2012-03-07T14:44:02.843

Reputation: 179

0

I found it useful to change

if "!ext!"=="" set ext=FileWithNoExtension

to

if "!ext!"=="" set ext=.FileWithNoExtension

and to change

echo %extlist::=!LF!%

to

echo %extlist::=!LF!% > ext-list.txt

The generated file contained (no linefeeds, but no matter) .bat.pdf.skp.ai.png.jpg.tif.pcp.txt.lst.ttf.dfont.psd.indd.docx.PDF.JPG.gif.jpeg.dwg.exr.FileWithNoExtension.vrlmap.sat.bak.ctb

which I was then able to use for my project.

Steev43230

Posted 2012-03-07T14:44:02.843

Reputation: 11