My wife and I had an interesting experience this last weekend while out shopping for a major appliance.

We settled on the appliance we wanted, and were working with the salesperson to arrange delivery and installation. Despite being repeat customers, the salesperson couldn’t find our account - searches by phone and address failed to generate any results.

The salesperson apologised, mentioning that things had been difficult to use since a new system had been rolled out last year.

A final search by name did work, bringing up a customer record that was riddled with errors. One glance was all I needed to know why things were going wrong.

  • Searches by various phone numbers had failed because all the phone numbers had been truncated - trailing digits were missing.

  • Searches by address had failed because the word “OWN” had been added onto the suburb name

It’s clear to me that the data migration process used to transfer data from the old system into the new was tragically flawed. I’d also wager that the process was poorly tested, if it was tested at all.

The take away lesson: Data Migration needs testing as well.

Let’s see how.

Truncated Data

Phone numbers were truncated, with trailing digits removed.

The number of lost digits seemed to match the number of spaces present in the original value - our home phone number (with one space) lost one digit; my mobile number (with two spaces) lost two digits, and so on.

Instead of flagging truncation as a likely error, the data migration process just sliced off whatever didn’t fit, guaranteeing problems for users.

What should have happened?

While truncation of some fields can make sense, chopping digits off phone numbers is guaranteed to cause invalid data. Instead, the presence of phone numbers too long for the new system should have been flagged as an error. If the field in the new system couldn’t be extended to accommodate the longer data, removing non-numeric characters would have been a simple way to make things fit without breakage.

Additional Information

Address information was corrupted by having additional information appended.

In our case, the word “OWN” was appended onto the correct suburb value.

This is likely to have been an alignment issue in processing a data file. The old system was used to record information for hire-purchase agreements, and included many details about the financial circumstances of each customer. One possibility that the field used for Housing (with values like “Own Home”, “Renter”, and “Boarder”) was concatenated with the Suburb field.

What should have happened?

The values for the suburb field should have been checked against known suburbs - the presence of suburbs that don’t exist would have been a warning sign indicating a problem with the migration.

It’s not even hard to craft such a query:

select distinct suburb  
from addresses  
order by suburb  

Conclusions

Data migration is an important part of any project, and deserves good process and testing.

Comments

blog comments powered by Disqus
Next Post
Locality Counts  12 Aug 2009
Prior Post
A Drupal Views "Date Trap"  19 Jul 2009
Related Posts
Browsers and WSL  31 Mar 2024
Factory methods and functions  05 Mar 2023
Using Constructors  27 Feb 2023
An Inconvenient API  18 Feb 2023
Method Archetypes  11 Sep 2022
A bash puzzle, solved  02 Jul 2022
A bash puzzle  25 Jun 2022
Improve your troubleshooting by aggregating errors  11 Jun 2022
Improve your troubleshooting by wrapping errors  28 May 2022
Keep your promises  14 May 2022
Archives
August 2009
2009