Regular expression to Remove html tags and Data between those tags

2

I have tried a lot of things but still unable to figure out due to greedy nature of regular expression

abc = 'dfbafbd<a href="#Free_Calling_Best_Apps">Free Calling Best Apps</a>sbrwsggsfzbs<a></a>abc

My regular expression abc1 = re.sub(r'<a.+\/a>',' ',abc)

output = 'dfbafbd abc'

required output = 'dfbafbd sbrwsggsfzbs abc'

abhav luthra

Posted 2019-04-23T15:22:24.060

Reputation: 33

Answers

1

Make your regex not greedy:

abc1 = re.sub(r'<a.+?/a>',' ',abc)
#            here __^

But Parsing HTML with regex is a hard job.

HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable.

Toto

Posted 2019-04-23T15:22:24.060

Reputation: 7 722

Thanks. It worked – abhav luthra – 2019-04-23T16:53:36.050

@abhavluthra: You're welcome, glad it helps. Feel free to mark the answer as accepted, see: https://superuser.com/help/someone-answers

– Toto – 2019-04-23T17:05:38.723