This morning I was presented with a challenge. My friend had 3,500 lines of data that looked something like this:

- P-ADAMSRES-0253_P-ADAMSRES-0253-001_Base_Fixed
- P-AESC-1421_P-AESC-1421-001_Base_IF NGPL, Tex_1
- P-AMGC-0028_P-AMGC-0028-001_Base_Fixed
- P-ATMOS-0010_P-ATMOS-0010-001_Base_Nymex Last_1
- P-BARBANK-0075_P-BARBANK-0075-001_Base_Fixed

She needed the second occurrence of the “P-” from each line, along with the “-001” or whatever it might be. So from the first line she wanted the string P-ADAMSRES-0253-001 . Had each string been the same length then this would have been simple. Turns out that we can still make this easy, at least easier than hand parsing 3,500 lines one at a time.

The trick is in indentifying unique characters in our data that would tell us where the start or stop of something was. In this case there happens to be an “_” underscore just before our string starts. Perusing a large sampling of the data I can see that this is always the case.

I use the FIND function to locate the position of that “_”. If that string was located in cell A1 then in cell B1 I would type the formula =FIND(“_”,A1) . The function would return the number 16 telling me that the underscore was found starting in the 16th position.

I got a bit lucky with the ending of the string because they all ended with a “-001”, “-002”, etc. The important thing was that it was the same length as the first string but with four characters added to the end.

Using the FIND function I know where the string starts and by using the LEN function I know how many characters to pick up.

My final formula, which I entered in cell C1, was =MID(A1,B1+1,B1+3) .