Find Missing Numbers

2

I have a big list of files with names (videos)

I managed to write all file names in a text file. Now I have a text file with a lot of file names (one episode per line) Now I need something that can read that text file and tell me what episode E number is missing

S1-E18-(Date)-(Title)-(Random numbers).mp4

Here is an example of a list

S1-E1-20100526-title-of-video-1400316375.mp4
S1-E3-20100517-title-of-video-15457547.mp4
S10-E5-20100421-title-of-video-14467457.mp4
S5-E7-20120912-title-of-video-17467457.mp4

In this case its easy to see that the files S1-E2 and S10-E4 are missing. but if I have a big list then how can I find the missing files. (Leave Season number S1, S2) just need to check E means episode number

The largest existing file's number is S50-E2184 and The Smallest existing file's number is S1-E1

Eli Shain

Posted 2018-11-27T13:42:37.213

Reputation: 21

Your example is not very good. – harrymc – 2018-11-27T14:54:44.380

Is the number of episodes for each season fixed? or at least known? – glenn jackman – 2018-11-27T16:15:50.123

@EricF I tried Nothing – Eli Shain – 2018-11-28T13:04:06.863

@harrymc I know my english is not good, – Eli Shain – 2018-11-28T13:05:56.853

@glennjackman Numbers are fixed. – Eli Shain – 2018-11-28T13:06:50.103

2Your English is good enough. I understand that S10-E4 is missing, but what about S10-E1 to E3 and E5 to whatever and we not know the last number. You do not give enough information to understand the full problem. – harrymc – 2018-11-28T13:44:30.500

Maybe you should have asked on https://softwarerecs.stackexchange.com/ ?

– Mawg says reinstate Monica – 2018-11-30T07:35:18.067

Sort it with sort -V to sort by season then episode, or sort -t- -k2 to sort only by episode. Then you can run a loop or other to get the missing seasons/episodes. – Paulo – 2018-11-30T16:03:43.593

Answers

0

Save all the names in a file with name "file_with_list_of_files" and run below command in a linux/unix terminal[I tried it in mac terminal]:

cat file_with_list_of_files | sed 's/^[A-Z][0-9]*-//g' | grep -v "^E"

-- or --

cat file_with_list_of_files | awk -F- '{print $2}' | grep -v "^E"

This lists all file name does not have a E after first - (hyphen). You may find other better ways to do it as well.

Robert Ranjan

Posted 2018-11-27T13:42:37.213

Reputation: 101

0

  1. Prepare a file with all episode signatures. You didn't tell us how many episodes there are in each season but you obviously need to know. This is how you prepare the file:

    >all_episodes   # just to empty the file which may or may not exist
    printf 'S1-E%s\n'  {1..3}    >>all_episodes   # 3 episodes in season 1
    printf 'S2-E%s\n'  {1..5}    >>all_episodes   # 5 episodes in season 2
    printf 'S3-E%s\n'  {1..8}    >>all_episodes
    # and so on
    printf 'S50-E%s\n' {1..2184} >>all_episodes
    

    This assumes each season starts with its own episode number one (your question is not clear about it). The file consist of lines in a form S<n>-E<m>, e.g S2-E3.

  2. Create a file of owned episodes in the same form:

    cut -d - -f -2 your_current_list >owned_episodes
    

    The command takes - as a delimiter and returns line fragments up to the field number 2 from your current list.

  3. Treat owned_episodes as patterns and filter all_episodes to find lines without any pattern:

    grep -vxFf owned_episodes all_episodes
    

    Note we use -x here; the point is S50-E3 shouldn't match S50-E31. -F is not necessary in your case but in general one should use it while supplying fixed strings. The options are:

    -F
    Match using fixed strings. Treat each pattern specified as a string instead of a regular expression. [...]

    -f pattern_file
    Read one or more patterns from the file named by the pathname pattern_file. [...]

    -v
    Select lines not matching any of the specified patterns. [...]

    -x
    Consider only input lines that use all characters in the line excluding the terminating <newline> to match an entire fixed string or regular expression to be matching lines.

Kamil Maciorowski

Posted 2018-11-27T13:42:37.213

Reputation: 38 429