Extract One letter and three numbers from .txt files using Notepad++

0

I would like to extract from many .txt files a certain data in Notepad++. It has a lot of data in one file, but I only need specific...

This is what looks like in a file:

©Q Ńü Vý 8 ź) €G' €\Y € €w) €â` € K¶ K¶}Ąg† ‚Z™y ë( \Y
SCfsfgh4GHGH1+ €‡Ş € € €?° €¸ € k k € ‡Ş b | -A233 ™ž B¤ ˙˙˙˙˙˙˙˙˙˙˙˙ ’ ˙ rtSdpeRB ˙ GÚS)PS 3: TRANSMIT FILE created by modeller version 1101322 SCH_900000_9008 @Ź@

And I need to extract this:

A233

TIP: There is a space before - sign ( -A233) Also, every file has different numbers next to letter A

In the original file looks like this (a lot of spaces are extracted via copy-paste): Screenshot

kiki1989sb

Posted 2016-03-10T21:41:23.653

Reputation: 11

Is this a binary file? What do you want to do with the data when you have it? Is it always in the same format (letter + 3 numbers)? Does it always have | - before it? – DavidPostill – 2016-03-10T22:15:52.170

I want to compare it with the database. Yes, it is always the same format as you said, also | - is always before. Regarding binary file question, I am not familiar with that so I can't answer you... – kiki1989sb – 2016-03-11T23:47:33.467

Answers

0

You can use Replace (ctrl-h) with the following:

Find what: .*? -(A\d+)

Replace with: \1\n

Check the box for ". matches newline" if they exist in your files

Explanation:

.*? - - Finds everything up to " -", including any newlines

(A\d+) - Defines a capture group that finds A followed immediately by 1 or more numbers

\1\n - Replaces matches with the captured number and a newline for separating them

You can swap out the \n with the delimiter of your choice

Note that this will not delete any text after the last match, but since you're already in a text editor deleting it is trivial.

flyingfinger

Posted 2016-03-10T21:41:23.653

Reputation: 211

No, I want to extract the numbers next to letter A, because each file has a different numbers. I want them to be extracted so that i can compare them to the database... – kiki1989sb – 2016-03-11T23:44:27.750

Sorry, but that is not working for, I get some weird numbers... It matches in each line... I've uploaded some of the files, so if you have time you can check it for yourself. http://www.filedropper.com/test_63 Thank you once more!

– kiki1989sb – 2016-03-12T11:56:03.137

@kiki1989sb: Just tried to download your data but can't see a link. Do I need an account? Rather not sign up unless required... – flyingfinger – 2016-03-16T21:00:18.457

0

Since you have many .txt files, it makes sense to do a simple automation instead of extracting the values from each file manually. I suggest to use the WSH VBScript below:

strRes = ""
For Each strPath In WScript.Arguments
    With CreateObject("Scripting.FileSystemObject")
        If .FileExists(strPath) Then
            strRes = strRes & .GetFileName(strPath) & vbCrLf
            strCont = LoadTextFromFile(strPath, "us-ascii")
            With CreateObject("VBScript.RegExp")
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = "-A(\d{3})"
                Set objMatches = .Execute(strCont)
                For Each objMatch In objMatches
                    strRes = strRes & objMatch.SubMatches(0) & vbCrLf
                Next
            End With
        End If
    End With
Next
ShowInNotepad strRes

Function LoadTextFromFile(strPath, strCharset)
    With CreateObject("ADODB.Stream")
        .Type = 1 ' TypeBinary
        .Open
        .LoadFromFile strPath
        .Position = 0
        .Type = 2 ' adTypeText
        .Charset = strCharset
        LoadTextFromFile = .ReadText
        .Close
    End With
End Function

Sub ShowInNotepad(strToFile)
    Dim strTempPath
    With CreateObject("Scripting.FileSystemObject")
        strTempPath = CreateObject("WScript.Shell").ExpandEnvironmentStrings("%TEMP%") & "\" & .GetTempName
        With .CreateTextFile(strTempPath, True, True)
            .WriteLine strToFile
            .Close
        End With
        CreateObject("WScript.Shell").Run "notepad.exe " & strTempPath, 1, True
        .DeleteFile (strTempPath)
    End With
End Sub

Just paste this code to the notepad, save as text file, and manually replace .txt file extension with .vbs. Then all that you need is select your text files in explorer window, and drag and drop them onto the script.

For the files you have shared I has the output as follows:

30_SCH51BQ139.txt

036

30_SCH51BQ141.txt

038

30_SCH51BQ144.txt

040

30_SCH51BQ147.txt

043

omegastripes

Posted 2016-03-10T21:41:23.653

Reputation: 353

Thnak you for the VBS script. I have already solved this in excel via VB, but because of macros (security) I have to make it via notepad++ – kiki1989sb – 2016-03-13T12:01:22.847