<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'DBA' and 'data quality'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=DBA,data+quality&amp;orTags=0</link><description>Search results matching tags 'DBA' and 'data quality'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Timewarp: What Is a Relational Database?</title><link>http://sqlblog.com/blogs/kevin_kline/archive/2012/09/05/timewarp-what-is-a-relational-database.aspx</link><pubDate>Wed, 05 Sep 2012 19:23:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:45036</guid><dc:creator>KKline</dc:creator><description>&lt;h2 style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Relational?!? Move On, Geezer!&lt;/h2&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Maybe you're thinking that relational databases management systems (RDBMSs), like&amp;nbsp;&lt;a title="Microsoft SQL Server" href="http://www.microsoft.com/sqlserver"&gt;Microsoft SQL Server&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a title="Oracle Database" href="http://www.oracle.com/us/products/database/overview/index.html"&gt;Oracle&lt;/a&gt;, are going the way of punched cards and rotary phones. &amp;nbsp;After all, there's been a lot of hype these days in the IT media about the rise of so-called&amp;nbsp;&lt;a title="NoSQL database technology" href="http://en.wikipedia.org/wiki/NoSQL"&gt;NoSQL (Not Only SQL) databases&lt;/a&gt;. &amp;nbsp;Many new and upcoming CS and MIS graduates who like working with data might think that relational databases are, at best, soon-to-be legacy systems and, at worst, are a career dead-end. &amp;nbsp;&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Wrong!!!&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;It's true that all the cool-cat computing services (Amazon, Facebook, Google, Pinterest, etc) are indeed making heavy use of NoSQL technology. &amp;nbsp;They're also making heavy use of traditional RDBMS'es too. &amp;nbsp;In fact, some of the world's biggest users of SQL databases are hand-in-hand the biggest users of NoSQL databases. &amp;nbsp;The reason for that is that both types of data platforms are exceeding good at specific types of data storage and data processing. &amp;nbsp;They also have their own unique weaknesses too. &amp;nbsp;Meaning, each platform has a sweet spot and a weak spot, and that none are a 100% panacea for all imaginable data processing scenarios. &amp;nbsp;Take a look at this article by my friend and former colleague, Guy Harrison -&amp;nbsp;&lt;a title="TechRepublic's 10 Things You Should Know About NoSQL Databases" href="http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772"&gt;10 Things You Should Know About NoSQL Databases&lt;/a&gt;, for a good discussion on the pros and cons of NoSQL in comparison to SQL data platforms.&lt;/p&gt;&lt;h2 style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Timewarp! Let's Take a Look Back at Why Relational Databases Were Needed.&lt;/h2&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;These days, relational database management systems (RDBMSs) like Microsoft SQL Server and Oracle are the primary engines of information systems everywhere, particularly for enterprise computing systems and web applications. Though RDBMSs are now common enough to trip over, it wasn’t always that way. Not too long ago, you would probably trip over hierarchical database systems, or network database systems, or flat-file systems (heck, that still happens in many government IT shops who still use COBOL).&amp;nbsp; A quick-and-dirty definition for a relation database might be: a system whose users view data as a collection of tables related to each other through common data values.&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Perhaps you are interested in more than a quick-and-dirty definition for the term relational&amp;nbsp;&lt;span style="text-decoration:underline;"&gt;database&lt;/span&gt;?&amp;nbsp; Here goes.&amp;nbsp; The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns.&amp;nbsp; Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys.&amp;nbsp; This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness. &amp;nbsp;&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Relational data theory was first proposed by&amp;nbsp;&lt;a title="E. F. Codd, a legend to database professionals everywhere" href="http://en.wikipedia.org/wiki/Edgar_F._Codd"&gt;E.F. Codd&lt;/a&gt;&amp;nbsp;in his 1970 paper to the ACM entitled “&lt;em&gt;&lt;a title="The Seminal White Paper on Relational Database Design" href="http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf"&gt;A Relational Model of Data for Large Shared Data Banks&lt;/a&gt;&lt;/em&gt;”.&amp;nbsp; Soon after, Codd clarified his position in the 1974 paper to the Texas Conference on Computing Systems entitled “The Relational Approach to Data Base Management: An Overview”.&amp;nbsp; It was in this paper that Codd proposed the now legendary 12 Principles of Relational Databases.&amp;nbsp;&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club.&amp;nbsp; Note that the rules do not apply to applications development. &amp;nbsp;Instead, these rules determine whether the database engine itself can be considered truly “relational”. &amp;nbsp;These rules were constructed to support a data model that would ensure the&amp;nbsp;&lt;a title="The ACID properties of transactions in data processing" href="http://en.wikipedia.org/wiki/ACID"&gt;ACID properties of transactions&lt;/a&gt;&amp;nbsp;and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relation database platforms (and&amp;nbsp;&lt;em&gt;﻿﻿still do&amp;nbsp;﻿occur&lt;/em&gt;&amp;nbsp;on non-relational database platforms). (As an aside, the transactional paradigm was conceived by my hero,&amp;nbsp;&lt;a title="Jim Gray (computer scientist)" href="http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)"&gt;Gray, Jim&lt;/a&gt;&amp;nbsp;in 1981 while at Tandem Computer and presented in the paper "&lt;a rel="nofollow" href="http://research.microsoft.com/~gray/papers/theTransactionConcept.pdf"&gt;&lt;em&gt;The Transaction Concept: Virtues and Limitations&lt;/em&gt;&lt;/a&gt;").&lt;/p&gt;&lt;div style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;&lt;table cellspacing="0" cellpadding="0" class="mceItemTable" style="cursor:default;"&gt;&lt;tr&gt;&lt;td align="left" style="font-family:Verdana, Arial, Helvetica, sans-serif;font-size:11px;margin:8px;cursor:text;"&gt;&lt;h2&gt;Codd’s 12 Rules for a Truly Relational Database System&lt;/h2&gt;&lt;p&gt;Are you curious about Codd’s 12 Principles of Relational Databases? Don’t be ashamed that you don’t know them by heart; few technology professionals do, and no one on the marketing staff of technology companies do.&amp;nbsp; However, the few folks who&amp;nbsp;&lt;em&gt;do&amp;nbsp;&lt;/em&gt;know these principles by heart treat them like religious doctrine, and would likely be mortified by their “lightweight” treatment here. &amp;nbsp;Nevertheless, I'll give them to you in my own paraphrasing:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Information is represented logically in tables.&lt;/li&gt;&lt;li&gt;Data must be logically accessible by table, primary key, and column.&lt;/li&gt;&lt;li&gt;Null values must be uniformly treated as “missing information” not as empty strings, blanks, or zeros.&lt;/li&gt;&lt;li&gt;Metadata (data about the database) must be stored in the database just as regular data is.&lt;/li&gt;&lt;li&gt;A single language must be able to define data, views, integrity constraints, authorization, transactions, and data manipulation.&lt;/li&gt;&lt;li&gt;Views must show the updates of their base tables and vice versa.&lt;/li&gt;&lt;li&gt;A single operation must be able to retrieve, insert, update, or delete data.&lt;/li&gt;&lt;li&gt;Batch and end-user operations are logically separate from physical storage and access methods.&lt;/li&gt;&lt;li&gt;Batch and end-user operations can change the database schema without having to recreate it or applications built upon it.&lt;/li&gt;&lt;li&gt;Integrity constraints must be available and stored in the metadata, not in an application program.&lt;/li&gt;&lt;li&gt;The data manipulation language of the relational system should not care where or how the physical data is distributed and should not require alteration if the physical data is centralized or distributed.&lt;/li&gt;&lt;li&gt;Any row-processing done in the system must obey the same integrity rules and constraints that set-processing operations do.&lt;/li&gt;&lt;/ol&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;If you know much about SQL, then you probably recognize immediately that SQL ended up fulfilling rules #5, #7, #11 and possibly more. &amp;nbsp;Others of the rule are manifest in the system tables of a relational database, such as DMVs in Microsoft SQL Server and V$ and X$ views in Oracle.&lt;/p&gt;&lt;h2 style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;Relational Rises&lt;/h2&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;There is some debate about why relational database systems won out over hierarchical and network database systems back in the late 1980's and early 1990's, but a couple of reasons seem self-evident.&amp;nbsp; First, the high-level language interface (&lt;a title="My popular book from O'Reilly, SQL in a Nutshell" href="http://shop.oreilly.com/product/9780596518851.do"&gt;SQL&lt;/a&gt;) &amp;nbsp;is much simpler to learn and more intuitive than that mishmash of languages supporting non-relational databases. &amp;nbsp;(In fact, the lack of something like SQL is a hindrance to adoption of many NoSQL database platforms). &amp;nbsp;Second, relational databases provide efficient and intuitive data structures that easily accommodate ad-hoc queries and reporting. &amp;nbsp;People just intuitively understand the value of storing data in tables. &amp;nbsp;From phone books to hotel registries, relational databases (of the paper sort) are second nature to most people. Third, relational databases provide powerful integrity controls such as check constraints and referential integrity - thus providing higher quality data. &amp;nbsp;And high quality data is near and dear to the heart of CFOs around the world. &amp;nbsp;&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;In fact, the strength that relational databases demonstrate with data quality, consistency, and durability are the same reasons that they'll be with us - quite possibly - forever. &amp;nbsp;So were NoSQL databases excel at storing data that is&amp;nbsp;&lt;em&gt;﻿moderately&amp;nbsp;&lt;/em&gt;﻿important and requires&amp;nbsp;&lt;em&gt;﻿eventual&amp;nbsp;&lt;/em&gt;﻿consistency, SQL database excel at storing data that is of&amp;nbsp;&lt;em&gt;﻿paramount&amp;nbsp;&lt;/em&gt;﻿importance and requires&amp;nbsp;&lt;em&gt;﻿immediate&amp;nbsp;&lt;/em&gt;﻿consistency. &amp;nbsp;As long as we're exchanging money, there's a need for relational database technology and ACID transactions.&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;And, just my opinion here, but database administration is currently, and will continue for decades to be, an excellent career choice. &amp;nbsp;Why? &amp;nbsp;First, although databases are widespread,&amp;nbsp;﻿&lt;em&gt;good&lt;/em&gt;&amp;nbsp;databases are not. &amp;nbsp;So there's always need for those who can tune, troubleshoot, and optimize what is currently in the marketplace. &amp;nbsp;Second, just because database are widespread doesn't mean that they're everywhere they need to be. &amp;nbsp;Some estimates gauge that only half of the enterprises that need SQL databases actually&amp;nbsp;&lt;em&gt;﻿use&amp;nbsp;&lt;/em&gt;﻿SQL databases. &amp;nbsp;Imagine if only half of the citizenry wore shoes, and of the half that wore shoes, only half of them wore both shoes and consistently tied them. &amp;nbsp;It'd be a good time to be a maker of&amp;nbsp;&lt;a title="Men's Loafers from Amazon.com" href="http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&amp;amp;field-keywords=men%27s+loafers"&gt;loafers&lt;/a&gt;! &amp;nbsp;Well, that's where we're at today with relational databases.&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;So what do you think? &amp;nbsp;Am I off the mark on the longevity of relational database? &amp;nbsp;Do you think the sun has set on them? &amp;nbsp;Will they be smashed, degraded, and humiliated by NoSQL database platforms? &amp;nbsp;Or will they stand shoulder-to-shoulder with a variety of data platforms in the years to come?&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>Simplifying CSV Data Loads</title><link>http://sqlblog.com/blogs/kevin_kline/archive/2011/10/27/simplifying-csv-data-loads.aspx</link><pubDate>Thu, 27 Oct 2011 17:55:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39424</guid><dc:creator>KKline</dc:creator><description>&lt;P&gt;Data files containing comma separated values, or CSV, are some of the most common data formats used for data representation and storage outside the database. &amp;nbsp;When it comes to loading CSV data into the database, many options exist, however, few make it as simple as CSVexpress, powered by expressor software. &amp;nbsp;I recently visited &lt;A href="http://www.csvexpress.com/"&gt;www.csvexpress.com&lt;/A&gt; to check out just how simple it could get. &amp;nbsp;&amp;nbsp;In short, &lt;A href="http://www.csvexpress.com/"&gt;CSVexpress&lt;/A&gt; offers a repeatable and quick way to load any CSV file into SQL Server (or any other database). &amp;nbsp;&amp;nbsp;For those whose data quality is not as pristine as it should be, CSVexpress also offers a wide variety of built-in functionality to repair the data issues. &amp;nbsp;&amp;nbsp;These are in addition to the data transformation components available out of the box, but let’s not get ahead of ourselves.&lt;/P&gt;
&lt;P&gt;&lt;BR&gt;The first thing I notice when I visit &lt;A href="http://www.csvexpress.com/"&gt;CSVexpress&lt;/A&gt; is that there are some video tutorials available on the main page. &amp;nbsp;I found it pretty straightforward to load a CSV file into the database without watching the tutorials beforehand. &amp;nbsp;However, by watching the tutorials I was able to learn more about some neat features and functions that I had not previously noticed.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For my test, I grabbed a simple CSV data file containing the following data:&lt;BR&gt;City,User_ID,Name,Street_Address,Status&lt;BR&gt;"Dallas",47,"Janet Fuller","445 Upland Pl.","Trial"&lt;BR&gt;"Lyon",38,"Andrew Heiniger","347 College Av.","Active"&lt;BR&gt;"Dallas",43,"Susanne Smith","2 Upland Pl.","Active"&lt;BR&gt;"Berne",22,"Bill Ott","250 - 20th Ave.","Active"&lt;BR&gt;"Boston",32,"Michael Ott","339 College Av.","Trial"&lt;BR&gt;"New York",41,"Bill King","546 College Av.","Deleted"&lt;BR&gt;"Oslo",45,"Janet May","396 Seventh Av.","Active"&lt;BR&gt;As you can see from the diagram below, the import of the data to create a schema was not difficult at all:&lt;IMG alt="" src="https://lh3.googleusercontent.com/e1Wesda8y4TyPA_0Zv0pk76IRzf1YXmgbvKiBwLViZx_A-GSTQQ909Ihlv1IvR-1yexkfqtagtiNMSoqW53H5BTes7fUxHqd0gelY-P8tqWJ39v8SgU" width=780 height=320&gt;&lt;BR&gt;Once the schema is configured, I can create the following simple data flow to move data from my CSV input file to my target table in SQL Server:&lt;IMG alt="" src="https://lh5.googleusercontent.com/-Sc-aKWVzCVWjCBRUhtgsdszTkA2KF7keBxE6Y9LlwPtF1ZH6RO9xmkSnJiTDRavSQJF7pyHM2NQH4b3PLjFNKKOVkWJaOQxEu7ewoQiMvlo6XKi4f8" width=321 height=90&gt;&lt;/P&gt;
&lt;P&gt;&lt;IMG id=internal-source-marker_0.41670942602338795 alt="" src="https://lh3.googleusercontent.com/Pa2kSjhOQO1rJ32jpKYeCB_MNIJxORqbd6jQvRbLOm-WGXdHgI-fq75p5iO1fo4X9EblwaUxM9jo6pyJ-s0sXS5DaEa6MdcVTaYAVPm64c_FfGyLNJo" width=624 height=381&gt;&lt;BR&gt;While there are other tools available for performing similar tasks, CSVexpress makes it very simple and intuitive. &amp;nbsp;However, as I mentioned earlier, where it starts getting really interesting is when you need to pre-process and clean-up the data prior to loading it. &amp;nbsp;&amp;nbsp;Whether it involves enriching the data from external data sources or web services, or identifying and repairing bad data, CSVexpress maintains a simple interface for all of that. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR&gt;The best part - it’s all free of charge. &amp;nbsp;The version you can download from &lt;A href="http://www.csvexpress.com/"&gt;www.csvexpress.com&lt;/A&gt; is expressor’s free Community Edition. &amp;nbsp;expressor also offers a licensed Desktop and Standard Edition with even more advanced features, which are available for a 30-day trial. &amp;nbsp;&amp;nbsp;As a matter of fact, at the end of November, expressor will be introducing Salesforce support into their commercial editions and &lt;A href="http://www.csvexpress.com/"&gt;CSVexpress&lt;/A&gt; will feature a 30-day trial version. &amp;nbsp;You will be able to load Salesforce just as easily as if loading to SQL Server, or download your Salesforce contact, lead, and opportunity data and transform it before generating the right CSV output file (or files) that meets your daily, weekly, and monthly Excel reporting and analysis needs. &amp;nbsp;Now that’s easy!&lt;/P&gt;
&lt;P&gt;- Kev&lt;/P&gt;</description></item></channel></rss>