@slhck your solution almost works but the output is to the display / STDOUT with all the files concatenated together. I need individual .txt files as output. Reason is that we're not accounting for the filename in the output.
To work around having to traverse a folder hierarchy If I use Windows search for *.doc and then copy the results to a folder to put them all in one folder flattened, then I can boot into Ubuntu and run the following.
(I have a file/folder recursion piece of code somewhere which I will dig out and add to later if time.) But for now just flattening the file hierachy as above is good enough.
By the way, catdoc works better than antiword because antiword complains some files aren't word docs, these tend to be .doc files with formatting and blocks of text organised as frames within the doc. catdoc seems to convert all of my docs.
#!/usr/bin/perl -w
use File::Basename;
my $okFiles = "";
my $couldntGet = "";
@files = <*>;
foreach $file (@files)
{
if ( $file =~ m/\.doc/ )
{
my ( $filenameOnly, $dir, $ext ) = fileparse($file, qr/\.[^.]*/);
if ( ( defined $filenameOnly ) && ( defined $ext ) )
{
$okFiles .= "file: ".$file." filename only:".$filenameOnly." extension:".$ext."\n";
system( "catdoc \"".$file."\" > \"".$filenameOnly.".txt\"" );
}
else
{
$couldntGet .= "*file: ".$file." - couldn't get filename only and extension\n";
}
}
print $okFiles;
print $couldntGet;
}
+1 for the solution. As in question, Window or Mac please but I also have Ubuntu so hope to be able to use your solution. I'll look it up, try it and if it works then I'll accept your answer. Thanks. – therobyouknow – 2011-03-03T08:51:15.137
1I added installation instructions for OS X in the post. I haven't tried the
<command>
part yet, but I can look into that if you have any troubles. – slhck – 2011-03-03T10:50:02.793