regex to match everything but substrings beginning with hashkey

0

I am building a regex to filter out any substrings beginning with '#' or '@'. I am trying to filter tweets from those. Currently it matches everything no matter if the string contains words beginning with a '#' or '@'.

((?!\#)|(?!\@)).*

In this string below I want to match everything in bold, but no more:

Hi shah rukh. Who is your co-actor in the upcoming movie? @iamsrk #lovefrommalaysia #askSRK

I want to keep the whitespaces between the words in bold. How can I achieve this? This will be used in python FYI.

Linus

Posted 2015-08-06T11:38:33.953

Reputation: 138

Inverse problem here: http://superuser.com/q/820361/76571

– Excellll – 2015-08-06T14:17:48.887

@Excellll That's what I did, see my answer below. – Linus – 2015-08-06T14:18:41.277

Answers

0

Nevermind matching everything except substrings beginning with '@' or '#'. I just did the opposite and used re.sub in python to remove those from the string:

>>> import re
>>> text = 'Hi shah rukh. Who is your co-actor in the upcoming movie? @iamsrk #lovefrommalaysia #askSRK'
>>> text = re.sub(r'([\#\@].*?)(?=([\r\n ]|$))', '', text).strip()
>>> print text
Hi shah rukh. Who is your co-actor in the upcoming movie?

Brief explanation:

  1. Matching group #1 ([\#\@].*?) groups either # or @ together with as few characters following (non-greedy).
  2. Positive lookahead (?=([\r\n ]|$)) tries to match either a carriage return, new line, space or end of string without including it in the result.

I don't know how elegant this solution is, but it works for my use. Try it on regexr.com

Linus

Posted 2015-08-06T11:38:33.953

Reputation: 138