Extracting email from text string in excel/google spreadsheet

8

3

I'm looking to extract email addresses from text strings in Google Docs/Excel and email them automatically, where the content of the cells are currently unknown (until updated by user).

For example, they might read:

  1. Hi,

    My friend example@example.com.

    Thanks,

    Example Joe

OR

  1. You should ask

    example@example.co.uk

I've currently got this formula:

=IFERROR((LEFT(CELL,FIND("@",CELL)-1))&"@"&(regexextract(CELL,"@(.*)"))) 

It works for almost all cases, except

a) as in example 1, where someone puts a , or . at the end of the email

b) as in example 2, where the email starts on a new line, it comes in as eg.

ask

example@example.com

How might I adapt the formula to correct these?

Tom

Posted 2015-07-28T11:29:49.240

Reputation: 83

Answers

6

For Google Spreadsheets

Google Spreadsheets has these cool already-builtin regex formulas

We use the first one to extract mail adresses. Put this formula in a Google Spreadsheet cell:

=iferror(Regexextract(A1;"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}");"")

enter image description here

How does the regex pattern work?

[A-z0-9._%+-]+@[A-z0-9.-]+.[A-z]{2,4}

  • A-z represents any character between an A and a z.
    Note the uppercase and lowercase. This way it's case-insensitive
  • 0-9 represents any digit
  • ._%+- represent those signs itself
  • [ ] represents a single character which is allowed inside the brackets
  • Putting a + sign behind [ ] allows the previous pattern to be repeated infinitely
  • @ has no special meaning. It literally searches for a @ sign
  • [A-z0-9.-]+ is the same as above. But _%+ are not allowed behind a @ sign this time
  • \. searches for a single point. It has to be escaped with a preceeding \ because . normally is a placeholder any character
  • At last [A-z]{2,4} searches for 2,3 or 4 case-insensitive characters

Used ressources

nixda

Posted 2015-07-28T11:29:49.240

Reputation: 23 233

Thank you for your helpful answer. One thought I had is that {2, 4} could be limiting because many upcoming TLDs like .codes exceed 4 characters. – Alex Booker – 2019-10-02T07:57:53.110

1

This is for Excel.

Consider the following User Defined Function (UDF):

Public Function GetEmailAddy(Sin As String) As String
    Dim s As String
    If InStr(1, Sin, "@") = 0 Then
        GetEmailAddy = ""
        Exit Function
    End If

    s = Replace(Sin, Chr(10), " ")
    s = Replace(s, Chr(13), " ")
    s = Application.WorksheetFunction.Trim(s)
    ary = Split(s, " ")

    For Each a In ary
        If InStr(1, a, "@") > 0 Then
            GetEmailAddy = a
            Exit Function
        End If
    Next a
End Function

User Defined Functions (UDFs) are very easy to install and use:

  1. ALT-F11 brings up the VBE window
  2. ALT-I ALT-M opens a fresh module
  3. paste the stuff in and close the VBE window

If you save the workbook, the UDF will be saved with it. If you are using a version of Excel later then 2003, you must save the file as .xlsm rather than .xlsx

To remove the UDF:

  1. bring up the VBE window as above
  2. clear the code out
  3. close the VBE window

To use the UDF from Excel:

=GetEmailAddy(A1)

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

and

http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx

and for specifics on UDFs, see:

http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx

Macros must be enabled for this to work!

Here are some examples:

enter image description here

Gary's Student

Posted 2015-07-28T11:29:49.240

Reputation: 15 540