THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

Working with legacy data

We encounter legacy data as a part of life. Colleges and universities have transcript records dating back decades or even centuries. Real estate property records in the United States go as far back as Spanish and British land grants in the 1500s. Very old records are completely paper based and may be completely manually prepared, perhaps typed on a typewriter or written in longhand with a quill pen.

How long should transcripts be retained? Nola Ochs graduated from college at age 95 (can you imagine being in a history class with her?). If she had started college during World War I, maybe she would have needed the old records to keep from having to repeat a course taken previously. That’s a long retention time, but not so long compared to land grant records. Deeds, titles, and liens have to be kept in perpetuity. Retaining old records is expensive.

Scanning old records is also expensive, but there can be indirect savings. What is the cost of losing the records due to fire, flood, or decay? Scanned images protect against such loss. Using scanned images protects fragile, antique records from unnecessary handling. The originals can be moved to a climate controlled storage facility that is better suited for document preservation, possibly at lower total cost. SQL Server 2008 FILESTREAM can be used to store the scanned images.

Although scanned documents help with data retention and availability issues, discoverability adds to the cost of scanned images. When scanned documents are stored, time must be taken to catalog the contents. Keywords and descriptive header data must be linked to the scanned documents to make the contents discoverable. Using the SQL Server 2008 GEOGRAPHY data type can facilitate discovery of geographic data such as property records. Proper analysis and requirements gathering is necessary to make sure that the appropriate searchable header data is identified.

Data migration from paper records such as college transcripts to digital records is a slow, somewhat expensive process of manually rekeying all of the data. The easiest way to avoid the cost is simply not to do it. Schools get few requests for decades old transcripts, so this is a good strategy most of the time. I recently requested transcripts from schools attended long ago. All of the schools could quickly snail mail an old transcript. It’s a relatively simple matter of printing from microfiche or copying a paper transcript and putting it in an envelope. It was interesting to learn that requesting an electronic transcript could actually take much longer than requesting a paper transcript. If the data never was migrated into the current system, requesting an electronic transcript requires that all of the data be manually keyed into the new system before the electronic transcript can be sent. That can take a long time in an academic bureaucracy.

Once legacy data is migrated into a modern system doesn’t mean that remains in digital form. One school reported that they lost a transcript that was sent to them electronically. I did not understand how that could happen, so I asked for an explanation. Once per day someone at the school (which will remain nameless) goes to the computer that receives the electronic transcripts. All of the electronic transcripts are printed as a batch. The printouts are distributed to the staff so that the data can be manually keyed into the academic records system. Sometimes the printouts of the electronic transcripts get misplaced. When the data is keyed in from the paper copy, errors can occur. Reconciliation of data across disparate systems is necessary to detect any errors and get them corrected.

Published Saturday, February 12, 2011 4:55 PM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



BigFishSQL said:

I used to work at a big financial institution - v1 of their website - in the 90s - had a product enquiry form that the users could fill in and submit.

After this, the contents of the form were printed, and then faxed to create an image for data entry.

February 14, 2011 7:59 AM

Leave a Comment


About John Paul Cook

John Paul Cook is a database and Azure specialist in Houston. He previously worked as a Data Platform Solution Architect in Microsoft's Houston office. Prior to joining Microsoft, he was a SQL Server MVP. He is experienced in SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. John is also a Registered Nurse currently studying to be a psychiatric nurse practitioner. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2. Connect on LinkedIn

This Blog



Privacy Statement