How to convert malformed dates in mixed formats?

0

I am working on a project to clean up my company's data. One particularly horrid file is an Excel sheet with approximately 30,000 rows.

The date columns in this sheet are populated with dates in many different formats, many of which are not valid Excel date formats.

Here are some of the formats:

8/26/2017
20170925
2017-08-10
43026
43040.46597

I need to somehow get all of these to one, standard format without a timestamp.

Due to so many different formats, I've tried checking the LEN() in order to group the formats together (though some differing formats have the same length too).

I have spent literal hours going through several methods to convert the dates and it works fine to get the correct format in a new column, but I can not copy/paste that corrected format back into the original column due to the filtering I need to use.

Selecting only the visible rows works fine, but when pasting, I still end up losing a lot of data in the hidden rows, no matter what I do.

Is there any way to fix this without having to go through all 30,000 rows one at a time and copy/pasting or manually typing the correct date format?

Zephyr

Posted 2018-03-28T04:44:11.010

Reputation: 119

1How does 43026 become a date – Dave – 2018-04-09T05:06:19.243

Answers

1

Your real problem seems to be with being able to move the corrected data from one column back into the original column.

One way to get around this copy/pasting issues across filtered rows, is to locate the column with corrected data adjacent to the Original data column to it left.

Then select the range of values on the correct column and do Ctrl+Shift+Right Arrow key to select the corresponding cells on the original column and then Ctrl+R.

This will copy the data over from the left most cells of your selection to the rest of the cells in your selection, without touching data in the non-visible rows.

Hope this helps!

Bharat Anand

Posted 2018-03-28T04:44:11.010

Reputation: 346

Since I use a formula to create the data in the corrected column, the Fill Right command just copies the formula to the right, clearing both columns. And with the filtered rows, I can't paste just the values into the correctly formatted column either... – Zephyr – 2018-03-28T05:34:08.860

hmm, I see. One workaround is to remove the filter, copy the corrected column and paste special over itself with option to paste values, reapply filter and do the Ctrl+R solution. I know this adds a lots of steps in between, specially when you have to do this for multiple chunks of data, its just a thought while I am racking my head further! – Bharat Anand – 2018-03-28T06:02:44.733

I appreciate that suggestion and sounds like it may work. I know I've got quite a task ahead of me, so the manual work is expected. Then it's time to teach others about data integrity :-P – Zephyr – 2018-03-28T06:33:05.673

0

An outside-excel solution using dateconv from dateutils:

$ dateconv -i %F -i %m/%d/%Y -i %Y%m%d <<EOF
8/26/2017
20170925
2017-08-10
EOF
=>
  2017-08-26
  2017-09-25
  2017-08-10

The idea is to specify all possible input formats using multiple -i flags. The default output format is %F but it could be changed using -f.

I don't know how to convert the last two numbers, it's not a standard day-number thing (like Matlab's days since 0000-01-01 or the Julian day number or the Lilian day number).

hroptatyr

Posted 2018-03-28T04:44:11.010

Reputation: 516