Data Migration

From SmartWiki
Revision as of 11:16, 26 September 2017 by Chin (talk | contribs)

Jump to: navigation, search

Template:ProfessionalServices


Overview

The following article outlines the procedure for migrating historical information from an external source into SmartSimple during the initial implementation of your system. Data migration involves the extraction, transformation, and then loading of data from one system into another.

For a successful migration, you will need the appropriate resources in order to first decide and plan which data you need migrated, and then to extract and transform this data from your existing system. It should be noted that migration of your historical information is not necessary for a successful implementation of SmartSimple, and is not recommended for clients without the appropriate resources.

Exceptions to these requirements and responsibilities are listed below, where SmartSimple will assist or take on the full responsibility of the data migration for you.

Data ETL process.

Exceptions

MicroEdge GIFTS Clients

Clients transitioning from a MicroEdge GIFTS offline system may provide the full GIFTS database without performing any of the migration tasks listed below. SmartSimple has a standard procedure for directly transforming a GIFTS database into an appropriate format for import.

Simple Datasets

For clients with a sufficiently small or simple set of data, SmartSimple will assist you with the migration tasks listed below or even carry out the full data migration for you.

Migration Process

Prerequisites

  • You must have resources with the authority, and expertise capable of determining which historical data need to be migrated.
  • You must have resources with the ability to review, and ensure the data integrity of historical data.
  • You must have resources with the ability to extract, and transform historical data into acceptable formats for import into SmartSimple.

Client Responsibility

Review of Previous System

You will first need to determine which information you wish to migrate into SmartSimple. While it may be tempting to transfer everything, it is recommended that you review your data carefully, and only transfer what is essential. The exercise of reviewing your current data will better inform your requirement specifications while designing your new SmartSimple system. This translates to a better implementation of, and successful transition to, SmartSimple.

Review of Data Integrity

You are responsible for ensuring the data integrity, and quality of data that is imported into SmartSimple. It is recommended that the remediation of any issues be performed directly on your previous system prior to data export. This will help to improve the repeatability of the data extraction process. For a successful migration, you will need to ensure the following.

Required:

  • Entity Integrity - every record set must possess a unique identifier for each record. This can be comprised of multiple columns. For example, for contacts, emails must be unique and contain no duplicates.
  • Referential Integrity - every related record set must be linked by their respective unique identifiers. For example, a contact associated to an organization should be indicated by the contact record containing the unique identifier for the organization.
  • Standard Data Type Formatting - standard data types should be formatted appropriately. For example, date values should be formatted to 'YYYY-MM-DD', and date/time values to 'YYYY-MM-DD hh:mm:ss'. Number values should be formatted to decimals '######.####' without currency, comma, or other formatting.

Recommended:

  • Other Field Formatting - if you wish to validate and enforce formatting conventions beyond the strict data types above, you will need to ensure that your data is similarly formatted. For example, if you wish to format phone numbers to '(###) ###-####', or zip codes to '#####-####' your data must be similarly formatted first.
  • Picklist Integrity - for any fields that should be restricted to a pre-defined picklist of values, you should ensure the integrity of these field values. For example, address states should all be the full name, e.g. 'New York', or all be the abbreviation, e.g. 'NY', but not a mix of both, and not contain spelling errors or typos.
  • Duplicate Checking - while you may satisfy that every record possesses a unique identifier, your data may still contain duplicate records. These should be resolved with duplicates being merged or deleted. For example, consider records that may have misspelled information e.g. organizations named 'Oranges Inc.' and 'Oranges Incorporated'.

Data Extraction

Data may be provided to SmartSimple in any of the following formats:

  • Plain Text Files (CSV, XML, JSON formatted)
  • Microsoft Excel
  • Microsoft Access Database
  • Microsoft SQL Server Database
  • MySQL Database
  • Oracle Database

Recommended:

  • Export using UTF-8 character encoding to avoid encoding issues with special characters and language support.
  • Convert any NULL values instead to empty values to avoid importing 'NULL' as a text.

Constraints:

  • If providing data in plain text format, please ensure strict formatting is adhered to with the escaping of delimiter characters. For example, a CSV should have all values encapsulated by quotes and escape any quotes that exist within a value.

Data Transform

Data Schema

Required:

Data must be provided in a normalized form such that the information for each record type be represented in separate tables or files. Typical record types are defined by the following tables or files, but are not limited to:

  • Organizations
  • Organization Addresses
  • Organization Associations to Grants
  • Contacts
  • Contact Addresses
  • Contact Associations to Grants
  • Grants
  • Grant Reviews
  • Grant Progress Reports
  • Grant Payments
  • Other Grants Activities
  • Funds
  • Budgets

Example:

For example, a common guideline would be the below.

  • All contacts must on one worksheet with a unique identifier for each contact and a matching identifier for their organization (for those associated with organizations)
  • All organizations must be on one worksheet with a unique identifier for each organization.
  • All grants must be on one worksheet with a unique identifier for each grant and a matching identifier for the applicant contact and organization for the grant.
  • If there are multiple contacts associated to a grant (e.g. applicants, reviewers), this must be provided on a separate worksheet with the grant unique identifier, the contact unique identifier, and the role of the contact association.
  • If there are multiple organizations associated to a grant (e.g. payees), this must be provided on a separate worksheet with the grant unique identifier, the organization unique identifier, and the role of the organization association.
  • All payments must be on one worksheet with a unique identifier for each payment and a matching identifier for the grant against which the payment is made.
  • All progress reports for import must be on one worksheet with a unique identifier for each progress report and a matching identifier for the grant against which the progress report is made.

Constraints:

  • Maximum number of columns in any given table or file should be limited to under 100. If you require more than 100 columns, then this should be represented by multiple tables or files that each contain the unique identifier that identifies the record.

File Attachments

In order to migrate file attachments into SmartSimple you will need to provide two items.

Required:

  • A zipped file of all files attachments to be uploaded.
  • A mapping file to match files to records. This will simply be CSV sheets that describe which record each file should be uploaded to. The CSV should contain columns for the unique identifier of the record to upload to, the record type (e.g. organization, contact), and the relative file path for each file.

Example:

For example, given a zip file whose contents look like the below.

fileattachments.zip

grants\
grant12345\
grant_budget.docx
grant_request.docx
grant45678\
grant_budget.docx
grant_request.docx
organizations\
Oranges Inc\
org_mandate.docx
Bananas Ltd\
org_mandate.docx

An appropriate mapping file could look like the below. The unique identifier column being a reference to the unique identifier of each record.

Unique Identifier Object Relative path + file name
12345 Grant \grants\grant12345\grant_budget.docx
12345 Grant \grants\grant12345\grant_request.docx
45678 Grant \grants\grant12345\grant_budget.docx
45678 Grant \grants\grant12345\grant_request.docx
4321 Organization \organizations\Oranges Inc\org_mandate.docx
8756 Organization \organizations\Bananas Ltd\org_mandate.docx

Constraints:

  • File size of each individual file <= 1GB
  • Length of full path description < 255 characters
  • File names must consist of characters within the UTF-8 character set. It is recommended to avoid all special characters within file names to avoid issues across different platforms as reserved characters differ between Windows, *nix,...etc. platforms.
  • There is no file versioning when importing file attachments into SmartSimple. Duplicate files that are to be attached to the same record will overwrite each other. For example, in the above example, if the two lines with respect to grant 45678 were instead attributed to grant 12345, then the result would only be a single grant_budget.docx file and a single grant_request.docx file instead of four files total.

SmartSimple Responsibility

Data Transform

As the project team designs your SmartSimple instance and architecture, we will work closely with you to design the final data translation to migrate your data into SmartSimple. SmartSimple will create a transform mapping and review this with you.

Data Load

During the course of a project implementation there will be two data loads conducted. One at the beginning of the project to allow you to verify the information that is input into SmartSimple. The second at the end of the project for the final migration from your current system to SmartSimple. For the final data load, you can expect between a 3-5 day maintenance window where we ask you to freeze any updates to your current system in order to cut over and transition to SmartSimple.