Parsing text translations

0

Hi I'd like to get this into Excel Columns of

Phrase, Translation, %YES

<tr>
            <td>Phrase in a language</td>
            <td>Translation</td>
            <td>

                <span id="Accuracy">

                        <a href="javascript:YES(#####);"><img src="/images/YES.gif" border="0"></a>(70%)      
                        <a href="javascript:NOT(#####)"><img src="/images/NOPE.gif" border="0"></a>(30%)

                </span>
            </td>
        </tr>

I'm using Notepad++ to get it into .csv and remove text between < >

<.*?>

And in the end this is what I get


            Phrase in a language
            Translation




                        (##%)      
                        (##%)



After I tried to remove line breaks and replace them with commas

[\r\n]+

There is of course a lot of phrases and translations, this is just one of many, so...

Any ideas how to easily make it into the three columns, please?

No need for the second %, but it's easy to just delete a whole column in excel I guess if it can't be parsed out.

Thanks

MooN_tm

Posted 2019-09-05T19:59:37.640

Reputation: 11

Parsing HTML with regex is a hard job. HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto – 2019-09-06T08:32:45.807

Answers

0

Ok.. since you are using notepad++.. I suppose, just continue to do :

  1. Trim leading and trailing space
  2. Remove Emtpy lines

we should get :

Phrase in a language
Translation
(##%)
(##%)

Assuming these are in cell A1:A4, in sheet1. (And I also assume the next 4 line is the same as these..) Then just put this formula in A1 of Sheet2. Then drag to C10.

=INDIRECT("Sheet1!A"&(ROW()-1)*4+COLUMN(),TRUE)

You should get..

Phrase in a language    Translation (##%)

Hope it solves. ( :

p._phidot_

Posted 2019-09-05T19:59:37.640

Reputation: 948