How can I rename files in directory, while keeping part of the name unchanged?

2

1

I have multiple files (about 1000) named as such:

abcdefg123456.xyz
abcdefg123457.xyz
abcdefg123458.xyz
abcdefg123459.xyz

Some of the files have 4 additional random numbers and letters (in any order) after the name. These are possibly duplicates, but not always, so I need to change them to the original format to verify whether they are duplicate or not. They have this format:

abcdefg123456a789.xyz
abcdefg123457b987.xyz
abcdefg123458c879.xyz
abcdefg123459d897.xyz

On occasion, there is a wrong extension as well,

abcdefg123456.xyzedf
abcdefg123456.xyzfed

I want to rename these files to the original format of abcdefg followed by the original 6 numbers - i.e. to delete the trailing 4 random numbers and letters, and to delete the trailing extension back .xyz What I have so far is this:

rename -n "s/[a-z][0-9]{6}.xyz/.xyz/g"  *

But it doesn't seem to work. For some reason the output is:

abcdef.xyz (no numbers)

EDIT: I was a bit torn between which answer to choose from, because both helped in finding the solution. I went for stuts because he helped with the second part of the question as well. But your help is greatly appreciated too Mark Perryman - and the commenters as well of course.

user681866

Posted 2017-01-04T15:21:58.767

Reputation: 25

Your main error is the use of {6} digits: for your examples this should be {3}. To remove extra characters after the .xyz you need to add .* to the end of the match string, giving rename -n "s/[a-z][0-9]{3}\.xyz.*/.xyz/g" * as the command (omitting the -n when you are happy with the actions). – AFH – 2017-01-04T17:17:38.967

I see. I was capturing the part I want to keep, instead of the part I want to remove. How would I delete the files if they cannot be renamed? and what if the order of numbers and letters is not exactly ...a789.xyz, ...b987.xyz, but follow a random pattern instead: ...a7b8.xyz, ...c9d7.xyz. Thanks. – user681866 – 2017-01-04T17:29:42.223

If the first of the additional characters is a letter, then rename -n "s/[a-z][a-z0-9]{3}\.xyz.*/.xyz/g" * will do it. If not, you cannot simply use [a-z0-9]{4} in the match pattern, as this will remove the last four digits in the standard format files, and you will need to use match groups, as in the answers, though you could try rename -n "s/[a-z0-9]{4}\.xyz.*/.xyz/g" ?????????????????.xyz* (17 queries), which should process only the longer file names. Note the difference between regular expression matching and shell file expansion. – AFH – 2017-01-04T19:32:34.840

My updated answer (using single quotes to allow $ to work) and the -f option to ensure that duplicate files are deleted is a neater one-line solution ;-) – Mark Perryman – 2017-01-05T12:18:22.203

Answers

2

Solution

To remove the 4 numbers/letters preceding the full stop for all files you can use the following loop:

for file in *.xyz ; do
    NEWFILE=$(echo "$file" |sed -re 's/[a-z|0-9][a-z|0-9][a-z|0-9][a-z|0-9](\.)/\./g')
    mv -v $file $NEWFILE
done

Explanation

for file in *.xyz ; do

Loops through every file with a .xyz extension

NEWFILE=$(echo "$file" |sed -re 's/[a-z|0-9][a-z|0-9][a-z|0-9][a-z|0-9](\.)/\./g')

Create a variable called NEWFILE containing the name of the file after stripping out a pattern that matches [a-z|0-9][a-z|0-9][a-z|0-9][a-z|0-9] (a mix of 4 numbers or letters)and is followed by a full stop ((\.)).

mv -v $file $NEWFILE

Move the file to its new name, the -v will print the move process in the following format

`abcdefg123456a789.xyz` -> `abcdefg123456.xyz`

This currently does not cover the fixing of extensions but a similar solution to the above can be used but with the sed command being sed 's/\.xyz.*/\.xyz/g'.

stuts

Posted 2017-01-04T15:21:58.767

Reputation: 136

Thanks for the answer. The second bit works great. The first bit not so much though, because the last 4 random number/letters are random and not always in form of [letter][3xnumber]. Sometimes there are more letters and the place changes too. But it is always 4. EDIT: by changing place, I mean it could be [3xnumber] and then [1xletter]; or [2xletter], [1xnumber], [1xletter] - but always 4. – user681866 – 2017-01-04T17:06:32.173

I have amended my solution to match any pattern of letters & numbers before the file extension. Let me know how that works for you :) – stuts – 2017-01-04T17:43:14.717

That removes all the last 4 numbers/letters though, including the ones that have the abcdefg123456.xyz format. – user681866 – 2017-01-05T08:52:31.277

The (\.) in the sed command implies that it's the pattern of letter/number letter/number letter/number letter/number full-stop. My testing shows that this works. Are there full-stops in the file names other than before the file extension? – stuts – 2017-01-05T08:55:30.630

No, there aren't. But I found the solution through combining your method and that of Mark above: for file in *.html ; do NEWFILE=$(echo "$file" |sed -re 's/([a-z]*[0-9]{6})[a-z0-9]{0,4}(\.html).*/\1\2/g'); mv -v $file $NEWFILE; done At least, the initial testing on 50 files seems to give right results. I think, though correct me if I am wrong, that the [a-z|0-9] captures 4 numbers before stop as well though, that's why it changes all original files to abcdefg12 format. – user681866 – 2017-01-05T09:21:07.957

That makes sense, I had assumed it was only being run on files which needed the rename. But yes, if it is being run on already renamed files then it will be start chopping off the ends again. Glad you were able to find a solution! – stuts – 2017-01-05T09:45:18.740

1

Try

rename -n -f 's/([a-z]*[0-9]{6})[a-z0-9]{0,4}(\.xyz).*/$1$2/g'  *

This works on the version of rename released with debian and ubuntu (see man page at http://www.computerhope.com/unix/rename.htm)

This will overwrite files that would otherwise have duplicate names.

Why this works

  • ([a-z]*[0-9]{6}) is the abcdefg123456 captured and can be referred to as $1 in the replacement.
  • (\.xyz) is the extension captured and referred to as $2 in the replacement.
  • Everything else [a-z0-9]{0,4} (up to 4 letters/numbers) and .* (anything after the extension) is matched and then ignored in the replacement.

Bonus To delete all files that still don't fit your pattern (e.g. if you did not use the force option above) then use find to list them and remove them. (Run without -exec rm {} for a dry run.)

find . -regextype posix-egrep -regex '.*/[a-z]*[0-9]{6}[a-z0-9]{4}\.xyz.*|[a-z]*[0-9]{6}\.xyz.*' -exec rm {}

Mark Perryman

Posted 2017-01-04T15:21:58.767

Reputation: 163

1Strictly, the .xyz in the search expression should be \.xyz, though . will of course match a literal . as well as any other character. The questioner's original expression will work as you quote, but only on the file names without the extra characters; the names with the extra characters will be unaffected. – AFH – 2017-01-04T17:10:31.423

Thanks for the answer. For some reason I am getting errors when trying this.

\1 better written as $1 at (eval 1) line 1.

Besides that, when it can't rename the filename because it already exists, it keeps it. I know I didn't ask for this in OP, but how would I make it delete that file? – user681866 – 2017-01-04T17:17:57.863

Try s/([a-z]*[0-9]{6})[a-z0-9]{0,4}(\.xyz).*/$1$2/g? – Mark Perryman – 2017-01-04T17:24:14.720

For the deleting, check man rename to see if there is a force option or similar. Otherwise, run a separate command that deletes files that still don't match the format. Something like find . -regextype posix-egrep -regex '[a-z]*[0-9]{6}[a-z0-9]{4}\.xyz.*|[a-z]*[0-9]{6}\.xyz.*' -exec rm {} but RUN WITHOUT -exec rm {} FIRST! – Mark Perryman – 2017-01-04T17:31:23.663

Thanks Mark, but that didn't work. The first (replacing \1\2 with $1$2) gave me an error : No such file or directory for all files (with \1\2 it would do the job, while giving the errors). The second suggestion didn't do anything. I didn't try to run with -exec rm {} – user681866 – 2017-01-05T09:12:31.580

Sorry, should have been find . -regextype posix-egrep -regex '.*/[a-z]*[0-9]{6}[a-z0-9]{4}\.xyz.*|[a-z]*[0-9]{6}\.xyz.*' – Mark Perryman – 2017-01-05T11:56:46.560