The International Classifications of Diseases (ICD) is a global standard administered and copyrighted by the World Health Organization (WHO). The 10th revision, ICD-10 is the current revision. Some countries adapt the WHO standard. In the United States, the National Center for Health Statistics (NCHS) has made two modifications to ICD-10 known as ICD-10-CM (Coordination and Maintenance) and ICD-10-PCS (Procedure Coding System). ICD-10 was originally scheduled to replace ICD-9 as the U.S. standard on October 1, 2013, but the implementation date was changed to October 1, 2014.
Systems must be modified to accommodate this new standard. Schemas must be changed, mappings between old and new must take place. Much work is yet to be done, which is why implementation was delayed by a full year.
The granularity and specificity of ICD-10 codes has been ridiculed by the press. It is helpful to understand ICD-10 instead of laugh at it if you want to win ICD conversion business. Code W22.02XA is the code for “walked into a lamppost, initial encounter” and code W22.02XD is the code for “walked into a lamppost, subsequent encounter”. In clinical parlance, an initial encounter is the first time the patient seeks treatment. Followup visits about the same condition are known as subsequent encounters. The term “subsequent encounter” has nothing to do with how many times a patient has walked into a lamppost, which some people have misinterpreted.
The number of codes increased by an order of magnitude when going from ICD-9 to ICD-10. Data professionals should be able to understand and appreciate the intention behind having more specificity to diagnosis codes. If only W22 (striking against a stationary object) was used, the data would be too vague to help with prevention efforts. Just for sake of discussion, let’s consider what if the data showed that most people who walk into something actually walk into lampposts instead of walls (W22.01XA) and furniture (W22.03XA). In this contrived example, it might be in the public interest to see if it would be cost effective to develop a lamppost injury prevention program (just kidding trying to make a point about data mining).
Fine grained data lends itself to analysis. Federal payment programs (Medicaid and Medicare) and private insurance companies are already scrutinizing diagnosis codes and withholding payments in some cases. If a patient without an infection is admitted to a hospital and contracts an infection while being treated in the hospital (this is called a nosocomial infection), payment for treating the infection is likely to be denied. This gives hospitals an incentive to be more diligent in preventing infections.
Having more detail gives hospitals and practitioners an improved ability to indicate the extra complexity of a case and get reimbursed at a higher rate. This could give providers an incentive to convert to ICD-10 perhaps even ahead of the October 1, 2014 deadline. More codes means more data, which can lead to projects to expand and optimize systems to accommodate the impending data explosion.
MUMPS is a NoSQL database and programming language that does not receive widespread attention. I searched for lists of the most important or most used NoSQL databases and MUMPS did not make the lists, although it should.
The Massachusetts General Hospital Utility Multi-Programming System, often referred to as just M, is a key-value database that was developed in 1966. MUMPS became the foundation for the Decentralized Hospital Computer Program (DHCP) developed at the U.S. Veterans Health Administration (VHA). Notice that I said developed at the VHA, not by the VHA. At one time there were direct orders prohibiting further development of this system and work continued in secret. Eventually, DHCP became VistA, which is one of the most widely used Electronic Medical Records (EMR) systems in the world.
VistA is not open source software, it is public domain software. That means that anybody can do anything to it, including making proprietary changes under no obligation to give the changes back to the public. Most VistA implementations use the SQL friendly Caché version of MUMPS from InterSystems, which you can download for free. Caché is also used by the dominant commercial EMR vendor, Epic Systems, as well as TD Ameritrade and the European Space Agency.
VistA does not include a gui. At VA Hospitals, the Computerized Patient Record System (CPRS) is a separate application that provides a minimalist gui on top of VistA. VistA supports the Health Level Seven (HL7) messaging standards used for healthcare informatics interoperability. It also supports SNOMED CT, a healthcare taxonomy supporting exchange of healthcare data. The VA has an ongoing project to enable VistA to support both ICD-10 and Meaningful Use. ICD codes are used to classify and categorize both medical diagnoses and inpatient procedures. Meaningful Use is part of the Patient Protection and Affordable Care Act (PPACA, otherwise known as Obamacare) and affects the reimbursement rates healthcare providers receive.
SNOMED CT, HL7, and ICD-10 are standards used around the world in healthcare. Meaningful Use is specific to the United States. I’ll be discussing all of these topics in greater detail in future blog posts.
Many people have asked me various questions about Personal Health Information (PHI) which is covered by the Health Insurance Portability and Accountability Act (HIPAA, not HIPPA). The process of removing data (de-identification or anonymization) that could violate someone’s privacy is complex. This is particularly true when there is unstructured data (i.e., free text). The U.S. Department of Health and Human Services has detailed guidelines on proper de-identification techniques, which are found here or at Bing’s cached copy here.
I found the guidelines to be very informative. The discussion on zip codes was interesting. Zip codes, particularly in areas that aren’t densely populated, have to be abbreviated to the first three digits. Even when you restrict a zip code to the first three digits, there is a list of 17 specific three digit zip codes that you cannot use at all.
The document says that age must be removed from a patient’s record if the patient’s age is greater than 89. Can you imagine a patient summary beginning with “The patient is a 107 year old man…”? For supercentenarians, age does provide a clue as to who they are. Changing a patient’s age or date of birth helps greatly in de-identification, but care must be taken. You don’t want to make an adult a minor or vice-versa.
Consider this statement in the medical record: “The patient became ill after eating a [insertNameOfReligiousHolidayHere] meal.” One could argue that removing the name of a religious holiday makes for a neutral record. That might be the appropriate thing to do, but there could be clinical value in knowing the religious holiday or the religion of the patient. It could be useful to know if certain things would be eaten or definitely not eaten.
There are published algorithms for processing textual data and de-identifying it. You can download Perl regular expression scripts from PhysioNet for free. The download also includes several dictionaries that the scripts use. Notice there is a dictionary of medical terms and several dictionaries of people’s names. Obviously there is value in knowing if a word is a person’s name or a medical term. DeBakey appears in the SNOMED dictionary as a medical term. There is a DeBakey clamp, a DeBakey pump, and a DeBakey graft. But what if the patient’s name was DeBakey? DeBakey does not appear in the dictionary of common names. Would the scripts recognize DeBakey as a medical term and not remove what actually is the patient’s name?
A known weakness of processing textual data against dictionaries is misspelled words and names. Some names are particularly difficult to spell and will not always be found in the dictionaries because of the inevitable misspellings.
This post is about the political issues involved with using multiple languages in a global organization and how to troubleshoot the technical details. The CHAR and VARCHAR data types are NOT suitable for global data. Some people still cling to CHAR and VARCHAR justifying their use by truthfully saying that they only take up half the space of NCHAR and NVARCHAR data types. But you’ll never be able to store Chinese, Korean, Greek, Japanese, Arabic, or many other languages unless you either use NCHAR and NVARCHAR which are double byte Unicode data types or use specific collations with CHAR and VARCHAR. Using collations is something I will address in a future post. Using CHAR and VARCHAR with collations does not save space and is trickier, which is why this post is focusing on using Unicode.
In troubleshooting a problem displaying Chinese characters, I need a test character to experiment with. I picked 坡 (decimal 22369, hexadecimal 5761) as my test character. Why? Because it looked Chinese to me! Now doesn’t that remind you of Act 1, Scene 2 of William Shakespeare’s Julius Caesar where Casca says to Cassius “but, for mine own part, it was Greek to me”? It really looked Chinese to me, but then, what do I know? That presented a problem. What if it meant something really offensive? After consulting a Chinese immigrant, I was relieved to know that my randomly chosen character was politically correct. And just to be extra safe, I double checked with fellow SQL Server MVP Greg Low. Thanks, Greg.
I wrote the following test code to show you what happens when Unicode (which is always double byte) is implicitly converted to single byte CHAR or VARCHAR.
create table #c (sqlString NVARCHAR(30), sqlChar NCHAR(1));
insert into #c values (N'insert into #c values ( ''坡'')', '坡'); -- invalid single byte syntax
insert into #c values (N'insert into #c values (N''坡'')', N'坡'); -- valid double byte Unicode syntax
, UNICODE(sqlChar) as N'UNICODE(sqlChar)'
drop table #c;
sqlString sqlChar UNICODE(sqlChar)
------------------------------ ------- ----------------
insert into #c values ( '坡') ? 63
insert into #c values (N'坡') 坡 22369
Here is another test you can run:
select'坡' , N'坡'
The point is that if you start with a Unicode character such as 坡 but you insert it as '坡', it is no longer a double byte character. You’ve crammed it into a single byte and corrupted it. You have to insert it as N'坡' to prevent corruption and you must have a double byte destination for it. By the way, the N stands for National Language. You may also see references to National Language Setting(s) or even National Character Set (NCS).
The Character Map utility in Windows is helpful when working with Unicode characters.
Figure 1. Character Map utility in Windows.
Personally I prefer the richer functionality of BabelMap, which is available as a free download, online application, or portable application.
Figure 2. BabelMap free Unicode utility.
Microsoft Project does not allow a comma to be added to a resource name. In healthcare, the norm is to refer to people using the pattern of Name, Title which in my case is John Cook, RN. Not all commas are equal. By substituting a different comma for the one Project doesn’t like, it’s possible to add a comma to a resource name.
Figure 1. Error message after trying to add a comma to a resource name in Microsoft Project 2013.
The error message refers to “the list separator character” that is commonly known as a comma. The comma that we can’t use in Project resource names is technically known as U+002C, which is its number in the Unicode character set. There is another comma known as U+201A. They look the same, but they aren’t treated the same internally.
Microsoft Project doesn’t provide a way to directly enter Unicode characters, but the Character Map tool included with Windows gives you a way to copy a Unicode character to your copy/paste buffer. To invoke the Character Map utility, you can use the search box to find it.
Figure 2. Finding the Character Map utility.
Once the Character Map is visible, there are four quick and easy steps to follow:
1. Check the Advanced view checkbox.
2. Enter 201A in the Go to Unicode: textbox.
3. Click the Select button.
4. Click the Copy button.
Figure 3. Using the Character Map utility.
Now the special comma is stored in your copy/paste buffer ready to be pasted into the Resource Information dialog box.
Position your cursor where you want the comma to be and paste it using whatever technique you prefer. I used Ctrl-V.
Figure 4. Resource name field with a U+201A comma pasted after my name and before my title.
Figure 5. Assign Resources dialog box with commas and titles added to all resources.
When there is only one resource with a comma and a title, the workaround looks perfect. When there are multiple resources, the commas that are resource separators do not have a preceding space, which causes things to appear slightly jumbled up as shown in the screen capture below.
Figure 6. How the comma separated titles appear on the project chart.
Excel add-ins Data Explorer and GeoFlow work well together, mainly because they just work. Simple, fast, and powerful. I started Excel 2013, used Data Explorer to search for, examine, and then download latitude-longitude data and finally used GeoFlow to plot an interactive 3-D visualization. I didn’t use any fancy Excel commands and the entire process took less than 3 minutes.
You can download the GeoFlow preview from here. It can also be used with Office 365.
Start by clicking the DATA EXPLORER tab. Click Online Search.
Figure 1. DATA EXPLORER tab in Excel 2013 Office Professional Plus.
I entered latitude longitude mountains as a search string. Once you like the way a dataset looks in preview mode, click USE to download it into Excel.
Figure 2. Wikipedia’s list of French mountains.
While the data is downloading, the worksheet is gray.
Figure 3. Download in progress.
After the download completes, the colors are restored. Go to the INSERT tab and click Map. Select Launch GeoFlow (it is a Map menu item not shown).
Figure 4. Click Map under the GeoFlow icon on the INSERT tab.
Zoom and rotate as desired.
Figure 5. GeoFlow’s interactive globe.
I decided to plot the names of the mountains on the map, which is why Name was selected. Under GEOGRAPHY, I selected Other because I didn’t see a category name that really matched name. After making a GEOGRAPHY selection, click Map It.
Figure 6. Mapping by mountain name and coordinates.
The locations of the mountains are shown in 2D – not very exciting.
Figure 7. Initial 2-D map.
To make the map 3-D, the Height column from the spreadsheet was selected.
Figure 8. Adding mountain height provides the third dimension.
After rotating a bit, the mouse cursor was placed over the tallest peak, Mont Blanc. Notice the elevation in meters in the annotation box.
Figure 9. Mouseover showing name and height.
If you prefer a heat map, change the CHART TYPE.
Figure 10. CHART TYPE changed to HeatMap.
It’s really as simple as it looks. It just works.
The BETWEEN operator is a handy SQL construct, but it can cause
unexpected results when it isn’t understood. Consider the following code
where x between .9 and 1.10
One of the questions you should ask is this: What is x?
What if x has a float, real, or double data type? These data types do not store exact representations of numbers, only approximations. When 0.9 is stored in a real column or variable, it may be between 0.9 and 1.1. Or it may not. When you set a real to 0.9, internally it becomes something very close to but maybe not quite equal to 0.9. Even though you entered 0.9, it could be stored as something slightly less than 0.9.
Now consider this:
where x between 9 and 11
If x is an integer and is set to 9, it really is 9 and it will pass the test. Because of data types, never let it be said that all numbers are created equal.
Let's refactor the statement to be free of hardcoded values:
where x between y and z
What's wrong with that? I'll simplify it for you. Everything is an integer, x, y, and z are all declared with the integer data type. Once again, the question to ask is what are x, y, and z? Or to be a little clearer, ask what are the values of x, y, and z. The uncertainty arises over the potential values of y and z. Do you know that y will never be more than z? After performing substitutions, what if we have the following?
where 10 between 9 and 11
The statement would be true. But what if after performing substitutions we would have obtained this?
where 10 between 11 and 9
The statement would be false even though 10 is between 9 and 11. We see that order matters - no surprise if you've ever read the documentation. Order is easy to see and deal with when the values are hardcoded. But when you have variables, what if you really don't know a priori if y <= z? You need to think of BETWEEN as a number line. Look at these next two snippets to see what happens when the order of the BETWEEN values is changed:
where -10 between -9 and -11 -- this does NOT work
where -10 between -11 and -9 -- this works
Here is a workaround to this dilemma of not knowing the relative positions on y and z on the number line:
where x between y and z or x between z and y
I've left you with a workaround and a challenge. I like interactivity with my readers, so I ask you to post your workaround as a comment. Another workaround is to refactor the problem into checking for a +/- 10% change. Here's something you can copy and paste to get started:
declare @x int = 10;
declare @y int = 9;
declare @z int = 11;
select 'this works' where @x between @y and @z or @x between @z and @y;
There is one more thing that about the first code snippet shown above that caught my attention. The two numbers .9 and 1.10 violate the Joint Commission's guidelines on numbers used in prescriptions. The Joint Commission issues guidelines for healthcare institutions to prevent errors and not kill patients, among other things. There is a "Do Not Use" list of abbreviations and formatting styles that have been proven to cause medical errors. A number such as .9 is more likely to be seen as 9 than 0.9 is. Numbers formatted as 1.10 or 1.0 are also dangerous. The 1.0 is more likely to be misinterpreted as 10 than 1. is. Misinterpretations of decimal numbers have repeatedly been proven to kill patients. These formatting guidelines were originally developed for handwritten prescriptions. Some people think the guidelines should also be applied to computerized forms. I'm not suggesting you change your T-SQL code, but I thought those of you who do user interface design might find this discussion interesting.
Since Windows Vista, Windows users have had the ability to add clocks. I’m in the U.S. Central time zone and have no problem thinking about scheduling meetings with people in the Eastern and Pacific time zones. Hyderabad is a different matter because of the half hour. London is easy except right now when the U.S. is on Daylight time and the U.K. is not.
You can add two clocks using the Additional Clocks tab in Date and Time.
Figure 1. Additional Clocks tab.
Figure 2. Mouseover after adding clocks.
Figure 3. Left mouse click in System Tray after adding clocks.
While the extra clocks are interesting, I personally find Hyderabad needs more visibility. Go to Gadgets and either double-click the clock or right-click and select Add to add it to your desktop. Set the Options for the clock to assign a name to it.
Figure 4. Clock gadget.
Figure 5. Use Options to put a name on your desktop clock. Either right-click the clock and select Options or click the wrench icon.
Data Explorer will speed up and simplify your data analysis by at least an order of magnitude. It makes data just work for me. No fighting or struggling, it just works.
Step 1. Go here and download the Data Explorer Preview for Excel 2013 or Excel 2010.
Step 2. Install Data Explorer.
Step 3. Start Excel. Don’t hold back, go all out and Enable Advanced Query Editing.
Figure 1. Check Enable Advanced Query Editing.
Step 4. Open a workbook.
Step 5. Click DATA EXPLORER.
Figure 2. New DATA EXPLORER tab in Excel 2013. Click Online Search to quickly find data.
Step 6. Click Online Search. Search for something that interests you. I picked one of my research interests, infant mortality.
Figure 3. Search results showing online data sources.
Until you do this, you will not appreciate how ultra cool this really is. When you do a mouseover on any of your search results, the Preview tab shows you what the data will look like when pulled into Excel. True WYSIWYG.
Notice that the search result that has mouse focus has a USE hyperlink. Click USE to actually bring the data into Excel. Again, until you do this, it’s difficult to appreciate how well this feature works. Go take a look here at the data I selected. The text on the Wikipedia page was consumed and loaded into Excel. That is extremely powerful and useful.
Here’s how to use Data Explorer to navigate data on the web.
Figure 4. Click From Web and enter the a URL of a web page containing data.
The Query Navigator appears. Notice that it provides navigation to all of the consumable data sources on the page.
Figure 5. Query Navigator.
Easy. Powerful. Quick. Once Data Explorer was added to Excel, it took me much less than 2 minutes to search for my data, preview it, select it, import it, and then examine it further with the Navigator.
There are many reasons you can have different results in your production environment than in your development environment. In database applications, two major causes are differences in database schemas and differences in the data. Application code and user differences can also causes differences, but these are out of scope for this discussion. This discussion is limited to differences in the data and the database schema.
Using nursing as a paradigm, before diagnosis comes assessment. A thorough assessment must be done before an accurate diagnosis can be made. A proper assessment requires the proper tools. There are tools for comparing database schemas and tools for comparing data. Today I’m focusing primarily on database schema comparisons. Schema comparisons are usually easier to assess quickly and thoroughly than data comparisons. Because of that, I recommend starting with a schema comparison before performing a data comparison.
Ideally your database schema is version controlled, you have the latest version of Visual Studio, TFS, and maybe Red Gate’s SQL Source Control. You may even have Red Gate’s SQL Compare, a tool I have used extensively with great benefit at multiple clients. As Peter Schott points out below, using SSDT is a really great idea, assuming you are allowed to download it, which isn’t a given in environments with extremely tight control of admin rights. If you have all of these tools, you probably already know how to compare database schemas. As a consultant, I know that most shops aren’t doing version control of their databases. Many small shops don’t have Visual Studio at all. They also may not have a budget for any software purchases, no matter what the ROI is. Rational thought does not always prevail in budgetary matters.
I’ve seen just about every workaround possible at low budget clients, not all of them good. You don’t have to have two instances of SSMS open side by side to look at the production machine’s version of a stored procedure and the development machine’s version. You can split SSMS into two vertical windows within a single instance of SSMS. Go to the SSMS menu bar and select the Window submenu to split the query window vertically. Most of you know this, but I’ve been asked “how did you do that?”. I respect anybody who asks a question in order to learn.
Figure 1. SSMS with a vertical split showing development code on the left and production code on the right.
The split window approach has limitations. You need to already know where you should be looking and which object(s) you should be comparing. If you are so lucky as to know that, then the amount of code needs to be very small. Visually scanning from left to right and back again ad nauseum is inefficient if the number of statements to compare is large. The risk of missing a difference is significant.
If you don’t have any first class tools to compare your schemas (Visual Studio has a good facility for comparing code versions), I recommend saving the results to files. In the following example, I save each of the stored procedures above to files. Use the free WinMerge differencing tool to compare the files. It is available as a portable application in case you are prevented from installing applications on your machine.
Figure 2. WinMerge comparison of the two script files from SSMS.
It’s much easier to see differences when using a differencing tool. Your time is valuable and you need to use the right tools to be efficient and effective. Scripting an individual object to a file only works well when you have a small number of objects and you know which objects you need to compare. There is a feature to script your entire database schema from SSMS into one file. You can compare one schema’s file to another schema’s file.
I have had people tell me that they tried to script their database schema to a file but it only scripted the database file creations statements and none of the objects inside the database. That’s not the option I’m talking about here. Scripting the objects that are in the database is a different menu option altogether.
Figure 3. This is NOT the option to script all of the objects in a database.
Figure 4. How to script all of the objects in a database.
By default when database objects are created using this wizard, a descriptive header appears before each object. The header is a SQL comment that includes the date and time the object was scripted. That’s a problem because you won’t be creating all of the development object scripts at the exact same time you create the production object scripts. You’ll have a false positive difference for each object, which is quite annoying. Fortunately you can used the Advanced button on the third screen of the scripting wizard to prevent this problem from occurring in the first place.
Figure 5. click the Advanced button to suppress the object creation comments.
Locate the Include Descriptive Headers and set it to False.
Figure 6. Set Include Descriptive Headers to False.
Once you have a script for the development schema and another for the production schema, use WinMerge to do an entire database comparison, assuming you don’t have one of the first class comparison tools mentioned earlier.
Just to clarify, you should use a proper database comparison tool to troubleshoot problems that may be caused by unintended database schema differences. But if you don’t have the proper tools or you’re not allowed to download and install SSDT (there really are companies that have such highly restrictive policies), scripting the schema to a file and using WinMerge is an acceptable alternative that works sufficiently well. I highly recommend that after deploying schema changes to a production database, you should use some sort of process to compare schemas to make certain that the changes made were exactly as intended. Your primary comparison should be to compare the production schema before the change to the production schema after the change. If you find unintended differences and can’t resolve the discrepancies quickly, you can obtain a resume template from here.
Implicit data type conversion can cause both unexpected and undesirable results. Using the same design pattern for one data type and extending it to other data types can get you into trouble if implicit data type conversion occurs.
The following code sample below was inspired by actual legacy code that was intended to build a string showing the values of variables in a stored procedure. The desired output would be something similar to this and was constructed using string concatenation:
@xDATETIME 3-Mar-2013 @xINT 23 @xNVARCHAR something
The problem with building SQL strings using string concatenation is that if any substring is NULL, the entire concatenated string is NULL. COALESCE can fix that problem when the data type is some type of string. The first SELECT demonstrates this.
But the design pattern for dealing with a NULL string variable doesn't extend to other data types as the second and third SELECT statements demonstrate. The DATETIME and INT variables don't get converted to strings (an empty string in this example). Instead, the empty string gets converted to DATETIME or INT, respectively. This conversion yields some unexpected results. The SQL Server documentation here does provide a clue about what happens when an empty string is converted to DATETIME. When a date value is not provided during a data type conversion, SQL Server defaults to 1900-01-01. Clearly a NULL does not provide a date value. Similarly, when a time value is not provided during a conversion, SQL Server defaults to 00:00:00.
I included explicit CAST statements to prove what happens when an empty string is converted to DATETIME or INT.
The solution is to avoid the implicit data type conversions in the first place. CAST the variables to strings first and then pass those strings to COALESCE as shown in the last two statements.
If you run the code, it's easy to see for yourself what's actually going on.
declare @xDATETIME DATETIME;
declare @xINT INT;
declare @xNVARCHAR NVARCHAR(10);
select COALESCE(@xNVARCHAR,''); --returns an empty string
--implicit data type conversions cause
select COALESCE(@xDATETIME,''); --returns 1900-01-01 00:00:00.000
select COALESCE(@xINT,''); --returns 0
--find out what data types the COALESCE
select ISDATE(COALESCE(@xDATETIME,'')); --returns 1 indicating a date value
select ISNUMERIC(COALESCE(@xINT,'')); --returns 1 indicating a numeric value
--explicit data type conversions
select CAST('' AS DATETIME); --returns 1900-01-01 00:00:00.000
select CAST('' AS INT); --returns 0
--how to prevent COALESCE from
doing implicit data type conversion
select COALESCE(CAST(@xDATETIME AS NVARCHAR(23)),''); --returns an empty string
select COALESCE(CAST(@xINT AS NVARCHAR(23)),''); --returns an empty string
When using SSMS to script a database object that had been created with ANSI compliant syntax, I noticed that the generated script was not ANSI compliant. Have you noticed anything like this? It's more of an annoyance than a bug, but it is not the desired behavior. I opened Connect item 781321. If you are aware of any similar problems, please add them to the Connect item so that they can all be addressed at once.
Here's how to reproduce the problem. First, create a table with a default constraint that uses CURRENT_TIMESTAMP for ANSI compliance.
create table table1 (
a DATETIME DEFAULT CURRENT_TIMESTAMP NULL
Second, go to the Object Explorer in SSMS and select the table. RIght-click it to generate a create table script. You'll get something like this:
CREATE TABLE [dbo].[table1](
[a] [datetime] NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[table1] ADD DEFAULT (getdate()) FOR [a]
While the script will work just fine, getdate() is not ANSI compliant. The scripting tool shouldn't cause you to lose ANSI compliance.
Using DISTINCT is another one of those examples where people's experiences don't always match the truth. Using DISTINCT without an ORDER BY probably returns the results in order. If your experience tells you that your results are ordered whenever you use DISTINCT without an ORDER BY, don't trust your experience. Just like I explained in my recent posts on UNION ALL and TOP .. ORDER queries in views, you must have an ORDER BY to guarantee ordered results.
If you use DISTINCT and examine your actual execution plan, you may see that a sort was done. Just because a sort was done internally doesn't mean that you are guaranteed perfectly ordered results. You'll see ordered results virtually every time you use DISTINCT without an ORDER BY. But if you really want ordered results absolutely positively each and every time, you must use an ORDER BY. I have seen real world queries that have a DISTINCT without an ORDER BY fail to return everything in order. Again, if order matters, you must specify ORDER BY.
Quoting from the documentation:
"The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified."
Although the second and third execution plans appear identical, only the third query with the DISTINCT is guaranteed to return ordered results.
createtable #dupes (a int);
insert into #dupes (a) values (2);
insert into #dupes (a) values (2);
insert into #dupes (a) values (1);
insert into #dupes (a) values (1);
select a from #dupes;
select distinct a from #dupes;
select distinct a from #dupes order by a;
Figure 1. Although the query plans for Query 2 and Query 3 appear identical and they both show a sort, only Query 3 is always certain to return ordered results because it has ORDER BY.
This is my third post about using ORDER BY. There is one case I've seen where ORDER BY doesn't make sense and shouldn't be used. Tables have no sense of order, so there is no point in using ORDER BY on an INSERT SELECT statement.
Today I saw something on the screen that was text. It was needed as text, but I couldn’t select it. OCR to the rescue! If you have OneNote, there’s an easy way to do OCR without any extra software. There are also free websites that can convert screen captures into text.
But wait, there’s more! You can make the text in the image searchable in OneNote.
Let’s begin with this sample screen capture. Once it is in your copy buffer, paste it into OneNote.
Figure 1. Screen capture of text.
After pasting the image into OneNote, right-click the image and select Copy Text from Picture.
Figure 2. Invoking OneNote’s OCR capability with a right-click. Notice the menu option to make the text searchable.
When you paste the text into Word, you’ll notice that it doesn’t have any formatting. Notice that although the text in the Word document below says “This is not text”, it really is. This is yet another example of what you observe empirically not matching the truth. What appears below really was text that was pasted into Word.
Figure 3. Screen capture converted to raw text pasted into Word.
If you select the OneNote option to Make Text in Image Searchable, you can add some really useful functionality to your document. Notice that OneNote’s search starts finding matches before you enter all of your search string. If you are going to rely on the search feature in OneNote and it doesn’t work quite like you expect, extract the text and see if OneNote accurately converted it to text. Sometimes the document being converted can’t be accurately converted to text, which can explain suboptimal search results.
Figure 4. Searching text embedded within an image in OneNote.
There are limitations to what OneNote OCR can do. Light text against a dark background may result in no OCR, no text. Look at the following example.
Figure 5. Light text and dark background causes difficulties for OCR in OneNote.
There is a workaround to enable OCR to work. Change the color scheme, which can be done easily with Paint. Paste the screen capture into Paint and right-click to bring up action menu that ends with Invert color. It will transform light to dark and dark to light. Copy the inverted image and paste it into OneNote to do the conversion.
Figure 6. Inverting colors to enable OneNote to perform OCR on light text with a dark background.
On the Insert tab, OneNote has a built-in screen clipping tool to make screen captures easy to do from within the tool.
Figure 7. Screen Clipping feature in OneNote.
If you don’t have OneNote, there are several websites that will convert image files into text. I like http://www.free-ocr.com/ for converting images when I don’t have access to OneNote. You should not upload any sensitive or confidential screen captures to any website offering OCR services.
The general rule is that a view can't be created using a statement that contains an ORDER BY clause. There is a workaround that some people believe works. Similar to my recent post on using an ORDER BY with UNION ALL, the empirical facts don't always agree with the truth. An ORDER BY can be added to a view definition if the view contains a TOP statement. But adding an ORDER BY clause to a view definition does not guarantee the order of the results, although many people believe that it does, just like UNION ALL doesn't guarantee ordered results. The truth remains - if you want to order the results of a select statement, add an ORDER BY statement to your select. In this example, I must add that an ORDER BY hidden inside a view definition doesn't do the job of an ORDER BY, although in your tests it might appear that it does.
If you want to order the results from a view, call the view with an ORDER BY clause instead of forcing an ORDER BY into the view definition.
There is a fix for SQL Server 2008 to cause the results from a view with TOP (with or without PERCENT) and ORDER BY to have ordered results. Fixes are no substitute for reading the documentation. Here is what the SQL Server documentation tells us:
"The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself."
create table table1 (a int);
-- this won't work and the view won't be createdcreate view view1 as
order by a;
Msg 1033, Level 15, State 1, Procedure view1, Line 4
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
-- this will create the viewcreate view view1 as
select top 100 a
order by a;
-- results might be in order or might notselect * from view1;
-- results will be in orderselect * from view1
order by a;