Data Migration - Avoiding the Pitfalls

Geographic Information Systems (GIS) have matured from being mapping systems whch ran on mainframes, to being used in multiple industries for complex applications running on distributed systems on the office desktop and remote devices. Data migration efforts have shifted from data conversion of paper maps, perhaps including some digital data (e.g. transformer records), to the migration of digital data. Modern data migration efforts can involve migrating tabular data containing location information into a GIS, migrating GIS data to a different platform or to a new software version, and/or incorporating field-collected data into an enterprise GIS.

Regardless of how many data conversions and migrations have occurred since the first mapping systems, many of the same ‘pitfalls’ continue to occur. The rest of this article will present pitfalls that early data migrations encountered which are still being encountered, along with mitigating solutions.

Lack of Scope Control and/or Poor Project Planning: Admittedly, these two categories cover a lot of territory and cause some of the more specific pitfalls mentioned below. These two pitfalls are listed separately to emphasize the importance of employing them in a migration project. A detailed project plan – reviewed by client and vendor team members – must be developed and maintained throughout the project. In addition, a strong culture of scope control must exist.
Any team member should be allowed to throw the ‘scope flag.’ Any change – bug, enhancement, tool, additional data, etc. – needs to be evaluated according to schedule and budget impacts, using a simple scale such as “must have now,” “wait,” or “don’t need.”
Weak Data Migration Requirements:The specifying of high-level data migration requirements starts during the RFP process for contracting with a data migration vendor. Exact counts may not be possible, but approximate numbers can be stated in order to document the intended scope of the migration and to provide a boundary against which scope increases can be evaluated. Statistics can be stated in terms of a list of object types and approximate counts; e.g., 5000 – 6000 transformer banks, miles of gas main, types of telecommunications media and number of trecords, or area of land mass.Detailed migration requirements are defined during the initial analysis phase (a.k.a. requirements, discovery phase). The source data model must ‘mapped’ into the target data model, at the feature, object, and field level. Rules need to be documented for any transformation or manipulation of data that must be applied during the migration process. The data migration mappings and the migration rules form the migration requirements.
Failure to Know the Source Data: Beyond understanding the source and target data models, the populated data must be well understood. During the development of migration requirements (see above), the populated source data should be studied for anomalies: invalid or inconsistent values in fields to be constrained by domains, null values destined to mandatory fields, orphan records, relationships not following the norm of how tables/records ‘should’ be related, etc. These anomalies will probably identify target data model changes, migration rule changes, source data cleanup, and/or post-migration data cleanup tasks.
Failure to Account for Very Probable, but “Unexpected” Source Data: Even if the source data has been turned upside down and inside out and the migration rules are thought to cover all the oddities and variations of source data, other oddities and variations will be found during the migration script development. Just count on it. When an unexpected data condition occurs during development and causes an issue, time will be spent analyzing the problem (i.e., is there a bug or is there an unexpected data condition?).
If it is a data issue, then the prevalence of the condition must be determined. Then, discussions usually occur until a decision is made regarding how to handle the anomaly. The diagnosing and resolution process can use up valuable hours in the development budget. Plan for the unexpected when estimating development hours.
Failure to Identify and Account for Pre-migration Source Data Cleanup: The better data fed into the migration process, the better the data will be coming out of the migration process. During the process of studying the source data and defining the migration requirements, opportunities to ‘clean up’ the source data will be identified. As a rule, if data in the source system can be made compliant with the migration requirements and the target data model, and there are the resources to perform the cleanup, then do it in the source system. Pre-migration data cleanup will result in less migration errors and less post-migration cleanup. Account for resources (staff and time) to perform source data cleanup in the project plan.
Failure to Identify and Account for Post-migration Data Cleanup: Resources to ‘clean up,’ enhance, and modify the data after migration should also be accounted for in the project plan. Invalid source data can sometimes be cleaned up easier in the target system due to different tools that make the data easier to identify and change. Sometimes post-migration data cleanup is needed because new applications require the data in a different form or require additional data.
Insufficient Prototyping or Piloting: Usually a data migration project includes a prototype migration, a pilot migration (with more data), and the production migration. If the amount and kind of data selected for the prototype and pilot migrations does not include a good representation of data variety, scenarios, and quantity, then the production migration can encounter surprises that may halt, as a worst case, the production migration until the situation is resolved. Also, resources to make changes to data, migration rules, and/or migration scripting must be accounted for as identified in the prototype and pilot evaluations.
Shortening Data Testing: This one is for the data migration vendor. If the schedule becomes tight, there is a tendency to shorten (1st) documentation tasks and (2nd) the internal migration/data testing. Don’t do it.
Insufficient Logging and Reporting: This one is also for the data migration vendor. Within the migration scripts/processes, logging of successful and unsuccessful migration of individual records must be included. Information to log include the object type, identifying/key information, and – if unsuccessful – reason for failure. Counts of records processed, for each object type, are also needed. Meet with the data migration lead before beginning script development to define the logging and reporting needs. In addition, logging and reporting requirements are typically specified by the client.
Staffing: Members of the data migration team – as part of the entire project team – should include the roles listed below. (Note: one person could fulfill more than one role.)
- From the client organization:
  - Data migration lead
  - Source data expert (e.g. super user, database administer)
  - System developer (in case of a ‘home grown’ system) or system maintenance personnel
  - Domain expert (electric, water, gas, or telecommunication engineers, city planner, etc.)
  - Testers (for data migration acceptance testing)
- From the vendor organization:
  - Data Migration Lead
  - Target data model expert
  - Application(s) expert
  - Testers (for data migration acceptance testing)

None of the data migration pitfalls enumerated above are new. Yet data migration projects still fall into the same or similar traps; tendencies to not account for pitfalls still occur. And worse, sometimes the time and resources to perform mitigation efforts are purposefully excluded to reduce the budget. More often than not, those exclusions will come back and bite the project. Including the time and resources needed to mitigate the effects of pitfalls, along with strict scope control, are just principles to follow for a data migration plan. Don’t ignore them.

Empowered Solutions Blog

Sharing Our Insights on the Future of Utility Work

Data Migration – Avoiding the Pitfalls

Data Esri GIS Migration Scripts SSP Blogs

We Wrote the Book

The Indispensible Guide to ArcGIS Online

Ruth Craven, Guest Author

What do you think?

Leave a comment, and share your thoughts Cancel Reply

Let’s Talk

Connect

Follow us on social media