Data Migration issues

My wife and I had an interesting experience this last weekend while out shopping for a major appliance.

We settled on the appliance we wanted, and were working with the salesperson to arrange delivery and installation. Despite being repeat customers, the salesperson couldn’t find our account - searches by phone and address failed to generate any results.

The salesperson apologised, mentioning that things had been difficult to use since a new system had been rolled out last year.

A final search by name did work, bringing up a customer record that was riddled with errors. One glance was all I needed to know why things were going wrong.

Searches by various phone numbers had failed because all the phone numbers had been truncated - trailing digits were missing.
Searches by address had failed because the word “OWN” had been added onto the suburb name

It’s clear to me that the data migration process used to transfer data from the old system into the new was tragically flawed. I’d also wager that the process was poorly tested, if it was tested at all.

The take away lesson: Data Migration needs testing as well.

Let’s see how.

Truncated Data

Phone numbers were truncated, with trailing digits removed.

The number of lost digits seemed to match the number of spaces present in the original value - our home phone number (with one space) lost one digit; my mobile number (with two spaces) lost two digits, and so on.

Instead of flagging truncation as a likely error, the data migration process just sliced off whatever didn’t fit, guaranteeing problems for users.

What should have happened?

While truncation of some fields can make sense, chopping digits off phone numbers is guaranteed to cause invalid data. Instead, the presence of phone numbers too long for the new system should have been flagged as an error. If the field in the new system couldn’t be extended to accommodate the longer data, removing non-numeric characters would have been a simple way to make things fit without breakage.

Additional Information

Address information was corrupted by having additional information appended.

In our case, the word “OWN” was appended onto the correct suburb value.

This is likely to have been an alignment issue in processing a data file. The old system was used to record information for hire-purchase agreements, and included many details about the financial circumstances of each customer. One possibility that the field used for Housing (with values like “Own Home”, “Renter”, and “Boarder”) was concatenated with the Suburb field.

What should have happened?

The values for the suburb field should have been checked against known suburbs - the presence of suburbs that don’t exist would have been a warning sign indicating a problem with the migration.

It’s not even hard to craft such a query:

select distinct suburb  
from addresses  
order by suburb

Conclusions

Data migration is an important part of any project, and deserves good process and testing.

Next Post
Locality Counts 12 Aug 2009
Prior Post
A Drupal Views "Date Trap" 19 Jul 2009
Related Posts
Browsers and WSL 31 Mar 2024
Factory methods and functions 05 Mar 2023
Using Constructors 27 Feb 2023
An Inconvenient API 18 Feb 2023
Method Archetypes 11 Sep 2022
A bash puzzle, solved 02 Jul 2022
A bash puzzle 25 Jun 2022
Improve your troubleshooting by aggregating errors 11 Jun 2022
Improve your troubleshooting by wrapping errors 28 May 2022
Keep your promises 14 May 2022
Archives
August 2009
2009

Data Migration issues

Truncated Data

Additional Information

Conclusions

Comments