How can I parse a string from a cell multiple time in excel?

4

1

I have a column in my datasheet that often contains a string that I want to place in another column. This string may occur multiple times and I want to place ALL instances of it into the other column. The column is delimited and I'd like to take the matching string and up to the delimiter.

An example:

Possessions
Fruit: apple, Car: Ford, Fruit: banana,
Car: Saturn,
Fruit: orange,

I'd like the next column to contain:

Fruit
Fruit: apple, Fruit: banana,

Fruit: orange,

It's easy enough to find the first instance of the string (new lines are for readability):

MID(A2, 
    FIND( *first instance of Fruit:* ), 
    FIND( *first comma after Fruit:* ) - FIND( *first instance of Fruit:* )
    )

However I could encounter the string any number of times and want to catch all of them.

Also, the column is already a calculated field (a reference to another sheet) so I cannot use text to columns to split on the delimiter.

Any ideas on how to return all instances of the string? I'd rather avoid a VBA script if possible and use worksheet functions, but if it's not possible with functions I'm open to VBA.

Centimane

Posted 2015-08-06T14:39:54.347

Reputation: 176

Looks to me like you need a recursive (or perhaps in Excel parlance "circular") function that finds and instance, extracts it ,then concatenates it with the same search on the unmatched string until the entire string has been exhausted. VBA could handle this, but I'm not sure about native Excel cell-based functions... – David W – 2015-08-06T16:12:19.170

Hmmm...can you offer any more guidance on how formally the data is structured? That is, looking at your example, if I scan a string for the word "Fruit: " I can capture "apple," easily enough, but given the comma, how do I know to stop at "Car:" without inferring I know Car isn't a fruit? Guess I"m looking for formatting/delimiter rules... – David W – 2015-08-06T16:25:43.943

1There will always be a string delimiter at the start (like "Fruit:") and a character delimiter at the end (like ",") of what I want to capture. As such the delimiter will not appear inside the string I'm trying to capture, so it can be known that "Car:" shouldn't be returned because it's not between "Fruit:" and the first comma after "Fruit:" – Centimane – 2015-08-06T17:06:06.150

Thanks! That's very helpful. A regular expression-based solution would make this almost trivial, but it would be a VBA-based solution... – David W – 2015-08-06T17:11:28.743

Answers

2

Generic VBA solution with regular expressions

This method should cover your needs. It can also be used by other users to extract multiple strings from a given string with the help of regular expressions

enter image description here

  1. Open your VBA editor (ALT+F11)
  2. Insert a new module (!) and paste the below code into it
  3. Go back to Excel and use this formula in a cell where you want your output

    =REGEXTRACT(A1, "Fruit: .*?,")
    

Formula explanation

  • =REGEXTRACT() is your new custom formula.
  • A1 is the cell where your input data is placed
  • Fruit: .*?, is a regular expression to find all occurrence of fruit and matches until the very next comma.
Function REGEXTRACT(objCell As Range, strPattern As String)

    Dim objMatches As Object
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")

    RegEx.IgnoreCase = True
    RegEx.Global = True
    RegEx.Pattern = strPattern

    Set objMatches = RegEx.Execute(objCell.Value)

    If objMatches.Count <> 0 Then
        For Each objMatch In objMatches
            REGEXTRACT= REGEXTRACT+ objMatch.Value
        Next objMatch
    Else: REGEXTRACT= ""
    End If

End Function

Hint: Look-Behind and Look-Ahead expressions are not supported under VB's regex engine. So it's not trivial to exclude the comma via RegEx. But it's possible via normal VBA string operations.

nixda

Posted 2015-08-06T14:39:54.347

Reputation: 23 233

I may have said I wanted to avoid a VBA option, but considering the messiness of not vs the cleanness of using it, I settled on using this approach. I had issues using the name "EXTRACT", excel claimed the function was not valid, but after renaming it worked fine. Does EXTRACT conflict with a reserved word in excel? – Centimane – 2015-08-07T11:53:23.400

1

@Dave Yes, it is. According to this list of reserved words in Excel. I changed it to =REGEXTRACT(). The reason I didn't stumbled upon this was, I use a localized Excel version and they also translate those reserved words. Very annoying

– nixda – 2015-08-07T12:30:53.880

maybe you should put a comment under your image as well to avoid future ponderers getting confused over the difference in name. – Centimane – 2015-08-07T16:08:49.763

1

This is ridiculously clunky, but it seems to work. I would recommend a VBA solution to be honest.

=TRANSPOSE(LEFT(MID(A1,FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit"))))),IFERROR(FIND(",",A1,FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit"))))))-1,LEN(A1)-FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit")))))+1)),IFERROR(FIND(",",MID(A1,FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit"))))),IFERROR(FIND(",",A1,FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit"))))))-1,LEN(A1)-FIND("~~~~~~",SUBSTITUTE(A1,"Fruit:","~~~~~~",ROW(INDIRECT("1:"&(LEN(A1)-LEN(SUBSTITUTE(A1,"Fruit","")))/LEN("Fruit")))))+1))),1000)-1))

It's an array formula, and you have to select at least as many cells across as there are instances of "Fruit." So if you have four Fruits in one column, select four cells across, enter this formula and confirm with ctrl+shift+enter. If you select extra cells it starts returning an error at the end, so you could in theory select the maximum number of cells you will need for any column and autofill down.

Edit: I realized that if there is only one Fruit it will return that fruit in every cell. I don't think there's a way around that with this formula.

It also assumes you won't have a fruit name longer than 1000 characters, and that "~~~~~" won't appear in your text normally.

Explanation: Where n is the number of Fruits in the cell, the formula uses FIND/SUBSTITUTE to build an n-element array where the first occurrence of "Fruit" is replaced with "~~~~~", then the second... etc. Then it uses MID/FIND to extract the text. Since Excel doesn't have a "find nth occurrence of string x" formula we find ourselves doing crazy things like this. The IFERROR is there for the last Fruit, since I was assuming there may not be a comma after it.

Kyle

Posted 2015-08-06T14:39:54.347

Reputation: 2 286

Um, even with excelformulabeautifier.com/ I wasn't able to understand the formula

– nixda – 2015-08-06T17:01:28.880

while I don't think I'd want to use something quite so unwieldy I am curious about it, though for me it only returned the first instance of "Fruit: *," in all of the cells I placed it in (I replaced A1 with $A2 so it wasn't looking at my title and I could drag it over). – Centimane – 2015-08-06T17:14:51.807

@Dave You cannot drag the formula over. You must select all the cells, paste the formula into the formula bar, then hit ctrl+shift+enter. – Kyle – 2015-08-06T17:19:00.740

@nixda I added a general explanation to my answer. – Kyle – 2015-08-06T17:26:01.533

@Kyle That still results in only the first instance of "Fruit: *" – Centimane – 2015-08-06T17:28:29.907

@dave I'm not sure what to tell you. I copied "Fruit: apple, Car: Ford, Fruit: banana," from your question and it works. Is it returning the first instance in all cells, or just one? It is case-sensitive, so if your data is actually "fruit: *" you will want to replace FIND with SEARCH. – Kyle – 2015-08-06T17:45:42.010

@Kyle either way I was only curious, I need the results placed in a single cell so this solution wont work for me. – Centimane – 2015-08-07T10:26:56.630