What's the best way to cut out all text before or after a matching string with excel or open office?

1

I have a column of naughty data, that needs to be standardized. Its a single column, so I'll just include a snippet of it it here:

    pop_density_2010
    7002854600000000000854.6/sq mi (329.97/km2)
    70031416770000000001,416.77/sq mi (547.04/km2)
    70031051960000000001,051.96/sq mi (406.18/km2)
    70031822940000000001,822.94/sq mi (703.86/km2)
    70031297510000000001,297.51/sq mi (500.99/km2)
    70031285070000000001,285.07/sq mi (496.19/km2)
    70032087500000000002,087.5/sq mi (806.02/km2)
    7002133720000099999133.73/sq mi (51.64/km2)
    7002301560000000000301.56/sq mi (116.44/km2)
    70031003620000000001,003.62/sq mi (387.51/km2)
    7002593900000000000593.9/sq mi (229.31/km2)
    7002679689999900000679.68/sq mi (262.44/km2)
    7002368120000000000368.12/sq mi (142.14/km2)
    70031072369999999991,072.37/sq mi (414.06/km2)
    70031343750000000001,343.75/sq mi (518.84/km2)
    7002303640000000000303.64/sq mi (117.24/km2)
    70031465180000000001,465.18/sq mi (565.73/km2)
    7002987180000000000987.18/sq mi (381.17/km2)
    7002266330000099999266.33/sq mi (102.83/km2)
    7002235830000000000235.83/sq mi (91.06/km2)
    7002234160000000000234.16/sq mi (90.41/km2)
    7002208440000000000208.44/sq mi (80.48/km2)
    7002204020000000000204.02/sq mi (78.78/km2)
    7002808690000000000808.69/sq mi (312.25/km2)
    7002325910000000000325.91/sq mi (125.84/km2)
    7002697710000000000697.71/sq mi (269.4/km2)
    7002544740000000000544.74/sq mi (210.33/km2)

In ruby my impulse here would be to do something like this, given the column transposed as an array:

array.map{|x| x.split("sq mi (")[1].split("/km2")[0]}
#=> 329.97, 547.92, 406.18, ...

And I'd be done. But I'm not in a programming environment so that's not exactly convenient right now.

I want to be able to do the same thing with Open Office or excel, and don't want to pull out VB to do this because I don't know VB.

If you can see in the code what it does is divide a given cell's string into an array of string fragments split by the match with the "sq mi(" string, grabs the second half of that, then does the same thing only reverse with the "/km2" string and grabs the first part. Then I have a nice simple number I can convert into miles or whatever. Thanks.

boulder_ruby

Posted 2014-03-25T22:49:02.203

Reputation: 368

1Can you show an example of how would like it to ideally be parsed for 2 lines. I am not sure what the delimeters – Eric G – 2014-03-25T23:01:52.333

Answers

1

The following formula will extract the "km2" number from the string stored in A1.

=VALUE(LEFT(RIGHT(A1, LEN(A1)-FIND("sq mi (", A1)-6), FIND("/km2", RIGHT(A1, LEN(A1)-FIND("sq mi", A1)-6))-1))

It is very similar to your program. First, it truncates everything before the sq mi (, and then it truncates everything after (and including) the /km2. The entire formula is wrapped in a VALUE function in order to convert from text to a number.

bdr9

Posted 2014-03-25T22:49:02.203

Reputation: 2 508

0

You can use not programmer but Excel user way :-) So, you use menu Data then Text to Columns command, using Delimiter option. You'll need do this twice:

  1. apply the command to first column and use ( as delimiter -- you'll receive next column filled with ###.##/km2 values;
  2. apply the command to just appeared column and use / as delimiter -- this column will be cleared off /km2, which moves to next (third) column.

Merzavets

Posted 2014-03-25T22:49:02.203

Reputation: 164