0
I have a set of files that need processing, so I tend to do this programmatically in bash
in mac and linux. Since I like to keep the originals in case something gets screwed, I want the files to come out renumbered incrementally, but I don't know the proper bash
construction to accomplish this.
Here's an example. I have a set of .pdf files:
bulletinlois00.pdf
bulletinlois01.pdf
bulletinlois02.pdf
...
bulletinlois33.pdf
The pdfs have not yet been OCRed, and so I want to iterate through them with tesseract
or ocrmypdf
but instead of outputting them like bulletinlois01.pdf
they would be 01.pdf
. Here is another example using the same file set. I want to iterate through files doing pdftotext
, but instead of having bulletinlois01.pdf
go to bulletinlois01.txt
I want it to be 01.txt
I could do a cp
+mv
process, or grep
to replace the unwanted parts of the names, but this seems overkill and gets me confused about whether I should be using wait
or a &&
construction.
Is there a simple way to script this using bash
, and could you please explain what exactly the construction is doing so that I can learn how to adapt it to other, more complex processing I need to do? For instance, maybe I could use the construction to output the names using
`date "+%H.%M.%S"`
Here's the rudimentary script:
for f in *.pdf ; do
tesseract -l fra "$f" "$f"_done.pdf
done
You should post your actual Bash script instead of explaining what it does. Please edit your question to add it. – JakeGould – 2019-09-08T21:31:47.117
@JakeGould if it wasn't clear enough, I do not know how to do this – grad student – 2019-09-08T21:34:03.027
Are the new filenames (e.g.
01.pdf
) named that way because the incoming filename has01
in it, or because it's (e.g.) the first file being processed? If 01.pdf already exists, what should happen? It's confusing that your example code indicates a new filename of "_done" instead of a sequence number. – Jeff Schaller – 2019-09-10T00:34:08.290good point. ideally, it would be 01.pdf because the incoming file has 01 in it, which would let me compare the output quality to the original. I added the _done so the next command would be something like
mv "$f"_done.pdf ...
to something like 01.pdf, but I realized that sort ofmv
construction would simply write over each file. I suspect I need some sort of array expansion, but I'm not sure how to implement it. – grad student – 2019-09-11T04:02:10.923