THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Enjoy Another Sandwich -- Kent Tegels

Yummy slices of SQL Server between slices of .NET and XML

  • RDL me this - how do you write reports if all you have is SQL Server Express Advanced Services Edition?

    A few days ago via Twitter (http://www.twitter.com/, follow me as ktegels), Mr. Lowe (AKA Vendoran) asked "SQL Express with Advanced services comes with SSRS, but does it come with the ability to create/edit rdl in VS?"

    The short answer to that question is "it depends." In most of the pay-for-license editions of Visual Studio you can use the Report Viewer controls to do that. For example, you can create a new C# Windows Forms project, then drag and drop a Report Viewer control on the form. After you have created the report you want, you will find an "RDLC" file in that project. RDLCs are, essentially, RDL files designed for use with the Report Viewer control. The good news is that these can be used Report Manager in most cases if you: a.) rename the file with an extension of RDL instead of RLDC before uploading and b.) reconfigure the connection string used by that report to a shared connection defined within Report Manager. However, the Expression Version of Visual Studio does not support this. True, you can look for, enable and use the Report Viewer control, but you will not be able to design a report in the Express SKU.

    So what to do, what to do? Your best bet today is download and install the Report Builder 2.0 tool. Robert Bruckner, a member of the SQL Server team specializing in Express/Advanced Services edition, has a blog post (http://blogs.msdn.com/robertbruckner/archive/2008/08/25/ReportBuilder-20-RC1-Release.aspx ) about this tool and a download link. I tried this new tool today. It feels very much like the Report Designer in SSRS2008 - a good thing indeed. When you save a designed report, you get an RDL file.  Based on what Mr. Bruckner says in blog, I believe this tool has been target to help solve the riddle at hand.

    I am currently building a VPC with SQL Server 2008 Express Edition with Advanced Services to see if some version of this tool is included with it. If so, it looks like we have a good solution for low-cost reporting in hand.

    By the way, if you are using the new AdventureWorks2008 OLTP database, you might want to check out   this "issue" (http://www.codeplex.com/MSFTDBProdSamples/WorkItem/View.aspx?WorkItemId=7427) I found with "[HumanResources].[vEmployeeDepartment] .Your votes would be appreciated.


  • The Top Five New Features in SSIS 2008 for Developers, Part 2

    Lookup. For many SQL Server Integration Services Developers, it's the transformation we most love to dislike - especially if you are using it against data living on remote server not on the same local network as the host running your package. This combination of circumstances sometimes drives us to some very inventive things. In this post I want to look at a situation I ran into a few months ago, how I addressed that situation with SSIS90 and how SSIS100 improves on that.

    For the sake of keeping things simple, let us start with a scenario. You have a list of 5,000 email addresses in an XML file. You need do a look-up of those email address and get the person's name and mailing address. You will write that to a CSV file. This is trivial task using SSIS. Where SSIS90 runs into a problem is when the server you are performing the look-up against is on the far-end of a network connection. Following the best practices of SSIS development, supposed your look-up task was based on a SQL statement that specified only the columns of interest. However, what do you do about the rows of interest? In SSIS90, there is not much you can actually do easily.

    Remember that unless you use a memory restriction, SSIS90 fully populates the look-up cache with all of the records in the look-up query before processing. That is a double-edge sword: it definitely increases the "wait time" before a data flow task starts processing records. This is especially true if you are pulling data from a remote source with a slow network connection. Yet, the once the data flow starts, the records are processed at blistering speeds.

    A straightforward solution to this situation is to use SSIS's ability to serialize buffers to a file - also known as RAW files. In this case, an initial streaming of the lookup data in made into a RAW file. That RAW file is then used as a data source in a second package. The RAW file is "join merged" with the XML source. This effectively provides the same functionality as the look-up did in the previous package. Take care not assume that the RAW file will have all of the needed information - after all, people can move and new email addresses could be added at any time. Therefore, you should when you are designing the package, you should handle unmatched records using the lookup transformation. In addition, you should update the RAW file with the updated data.

    In SSIS100, the idea is fundamentally the same, but the tasks are slightly different. The problem being addressed is that there is no obvious way to serialize a look-up cache to a file (aside from the aforementioned technique that may not be at all obvious.) Neither is using a "merge join" as a way of doing a "look-up." The SSIS team has added a new transformation - the Cache Transform -- and modified another - the look-up - to make it more obvious how to serialize the cache.

    The Cache Transform component itself is simple enough: within a given Data Flow, you connect this component to a path and it writes the buffers from that path to a new version of the raw-file format know as a Cache-Raw file (or CAW, its file extension.) Like a normal RAW file, the output contains the binary version of the data. It also contains an index covering one or more of the columns. This indexing helps the look-up component efficiently use the cached data. When using this to solve our slow data problem we would run a one-time process to initialize the cache from a data source.

    The Look-up component in SSIS100 is expanded to accommodate using the Cache-Raw file. As before, this component can be configured to use an OLE-DB data source. However, it can also be configured to use a Cache-Raw file instead. This can dramatically improve the performance of data flow since it eliminates the start-up delay in acquiring the data. However, there is still the possibility that they cached data may be stale or missing desired matches. Another new feature in the SSIS100 look-up is the ability to direct rows that were not found in the look-up to a new data path. You could "kind of" do this in SSIS90 if you assume that the only row-level error was a failure to find a match. In the new design, rows not matching from the Cache-Raw file could be redirected to another look-up configured to read recently updated data from the remote database.

    One thing to keep in mind about the new look-up transform is that it does not seem to maintain the Cache-Raw file automatically. What you should do is re-direct the no-match rows to a traditional look-up. As your complete processing in the data flow, add a second Cache Transform that writes the union of the matched rows and the not-matched but by then looked-up rows. You will need to name that CAW file something other than what you used as source, of course. The File Task can be used to delete the old version of the Cache and rename the new file to the name you used when configuring the first look-up.

    Examples of how to build the packages discussed in the post can be downloaded from http://gosqlserver.net/downloads/ttfnfssis100part2.zip .


  • The Top Five New Features in SSIS 2008 for Developers, Part 1

    You have to give credit where credit is due, and I certainly have to give credit to Kirk Haselden et al and their book Microsoft SQL Server 2005 Integration Services. That book -- and some of trial and error -- taught me a lot about how to tune data flow tasks for better performance. The folks at Simple Talk have part of the book online and specifically about tuning data flows (http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/). If you aren't all that familiar with the tuning of SSIS 2005 data flows, it's a great piece to read before going at the rest of this article.

    I'd love to say that you don't need to worry about data flow tuning in SQL Sever 2008 Integration Services (SSIS100), but I can't. What I can say is you need to focus less on CPU allocation and utilization. As noted in the aforementioned article, in SSIS 2005 CPUs are allocated statically as the data flow task initializes. Basically, this means that if the optimizer for SSIS 2005 determines that it only needs one or two CPUs to execute the data flow, that is all it would ever use even if more CPUs were available to work the data flow. What change for SSIS100 is that CPU are now dynamic scheduled based on amount of "flow pressure" and all available CPUs are utilized. "Slower" tasks can now get more CPU time.

    Why did the SSIS team decide to use static scheduling in SSIS90 but dynamic scheduling in SSIS100? The answer is simple: back in 2002 to 2004 when SQL Server 2005 was being planned and developed, multiple CPU hosts were not as common as they are today. When you only have one or two CPUs to work with, static schedule is easier and has the least execution plan generation time and cost. However, now that most production-level hosts have four or more CPUs to work with, dynamic allocation is a more efficient solution. On multiple CPU machines, this change enhances performance considerably in many cases. On the downside, on single CPU machines, performance may actually degrade some, so it is certainly something you want to know about and consider.


  • A detour: Building a MultiPolygon with SqlGeometryBuilder

    The RTM version of SQL Server 2008 and recently release of the CLR updates with Visual Studio 2008 SP1 gave us the SqlGeometryBuilder and SqlGeographyBuilder classes to work with. These are very handy, simple APIs but, well, lets just say that the documentation on them is a bit lacking (Isaac Kunen's as some discussion of them at Our Upcoming Builder API). This morning, I wanted to write a few bits of .NET code that did the same work as this T-SQL statement:

    declare @p geometry = geometry::STGeomFromText('MULTIPOLYGON(((-77.054700 38.872957,-77.057962 38.872620,-77.058547 38.870079,-77.055592 38.868840,-77.053217 38.870656,-77.054700 38.872957),(-77.056972 38.870639,-77.055851 38.870219,-77.054875 38.870864,-77.055452 38.871804,-77.056784 38.871655,-77.056972 38.870639)),((-77.056408 38.875290,-77.056947 38.875224,-77.057466 38.873598,-77.057273 38.872737,-77.055335 38.873020,-77.055499 38.874058,-77.056408 38.875290)))',4326);

    The trick here is how do you delimit the rings of a polygon and how do you make a collection? It helps to keep four simple rules in mind:

    1. After instantiating the Builder, immediately set the Spatial Reference ID (SRID) you want to use. You must do this before you define any elements in the collection.
    2. In the case of a collection type, you need to call .BeginGeometry passing in a member of the OpenGisGeometryType enumeration for the desired collection.
    3. Each figure in the collection needs to started with a call to .BeginGeometry as well.
    4. Collection members must be well-formed. The collection must also be well-formed.

    Keeping all that in mind, here's example method for constructing the geometry shown above:

    private static SqlGeometry CreateMultipolygon() {
      // Create a new Geometry Builder to work with
      SqlGeometryBuilder gb = new SqlGeometryBuilder();
      // Set the Spatial Reference ID to 1
      gb.SetSrid(1);
      // Start the collection
      gb.BeginGeometry(OpenGisGeometryType.MultiPolygon);
      // Start the first element in this collection
      gb.BeginGeometry(OpenGisGeometryType.Polygon);
      // Define the first element (figure)
      gb.BeginFigure(-77.054700,38.872957);
      gb.AddLine(-77.057962, 38.872620);
      gb.AddLine(-77.058547, 38.870079);
      gb.AddLine(-77.055592, 38.868840);
      gb.AddLine(-77.053217, 38.870656);
      gb.AddLine(-77.054700, 38.872957);
      // End the first element (figure)
      gb.EndFigure();
      // Define the second figure
      gb.BeginFigure(-77.056972, 38.870639);
      gb.AddLine(-77.055851, 38.870219);
      gb.AddLine(-77.054875, 38.870864);
      gb.AddLine(-77.055452, 38.871804);
      gb.AddLine(-77.056784, 38.871655);
      gb.AddLine(-77.056972, 38.870639);
      gb.EndFigure();
      // End the first polygon
      gb.EndGeometry();
      // Define the second polygon
      gb.BeginGeometry(OpenGisGeometryType.Polygon);
      gb.BeginFigure(-77.056408, 38.875290);
      gb.AddLine(-77.056947, 38.875224);
      gb.AddLine(-77.057466, 38.873598);
      gb.AddLine(-77.057273, 38.872737);
      gb.AddLine(-77.055335, 38.873020);
      gb.AddLine(-77.055499, 38.874058);
      gb.AddLine(-77.056408, 38.875290);
      gb.EndFigure();
      gb.EndGeometry();
      // End (close) the collection
      gb.EndGeometry();
      // Return that as a SqlGeometry instance
      return gb.ConstructedGeometry;
    }


  • The Top Five New Features in SSIS 2008 for Developers, Part 0

    I am going to get myself in trouble (again) by saying this but SSIS isn’t a Business Intelligence tool as much as it is a developer tool. If you are like me, you have written a lot of code does, basically, the following:

    • Extracts the data from some place;
    • Transforms that data somehow;
    • Loads the transformed data into database or some other store.

    Sure, my toolset has changed over the years from COBOL to Perl, Expect and QuickBasic to VBScript to C#, but the basic tasks have not. That is probably why I never warmed up to SQL Server 2000 DTS. The idea of extract, load and transform never really worked for me. I wanted streams, not tables.

    So when SSIS debuted with SQL Server 2005, I decided to make the effort to learn it to the best of my abilities. Yes, it is a great tool but like any tool, it does have some shortcomings:

    • Optimizing data flows is somewhat of a black art;
    • Using the linear lookup with a remote server is slow... unless you "cheated";
    • Transact-SQL lacks an UPSERT command (at least prior to 2008);
    • Calling a Web Service from a script required building an external assembly and making it available to the Script runtime;
    • I am back to writing scripts in BASIC. Compiled BASIC with the full range of the CLR, sure, but it is still, well BASIC.

    Over the next few posts, I want to talk about the new features in SSIS 2008 that help address these shortcomings in SSIS 2005. However, if you saw my recent talk in Omaha, you already know the story. But as of this morning, you can get the bits too! Just browse to http://www.4shared.com/dir/7670149/a1b13c97/Omaha_SQLBI_User_Group.html and, when prompted for a password, enter "SQL4You" (sans quotes, of course). My presentation and bits are in the file labeled "ug_wnissis100.zip"

    Sudhir Gajre’s excellent performance tuning for SQL Server 2005 deck is also available from that site.


  • When is a bug not a bug?

    I guess when the developers decide it is not. Consider the following query:
    drop table dbo.shapeParts
    go
    create table dbo.shapeParts(
         shapePartID tinyint not null identity(1,1) primary key,
         shapeID tinyint not null,
         shapeQuad tinyint null,
         shape geometry not null);
    go
    insert into dbo.shapeParts values
    (1,0,geometry::STGeomFromText('POLYGON((0 1,1 1,1 2,0 2,0 1))',1));
    insert into dbo.shapeParts values
    (1,1,geometry::STGeomFromText('POLYGON((1 1,2 1,2 2,1 2,1 1))',1));
    insert into dbo.shapeParts values
    (1,2,geometry::STGeomFromText('POLYGON((0 0,1 0,1 1,0 1,0 0))',1));
    insert into dbo.shapeParts values
    (1,3,geometry::STGeomFromText('POLYGON((1 0,2 0,2 1,1 1,1 0))',1));
    go
    create spatial index shapeindex on dbo.shapeParts(shape)
    using geometry_grid with (
         bounding_box =(0, 0, 2, 2),
         grids =(level_1 = low,level_2 = low,level_3 = low,level_4 = low),
         cells_per_object = 1)
    go
    declare @g2 geometry = geometry::STGeomFromText('POINT(1 1)',1);
    select top(1) *
    from dbo.shapeParts with(index(shapeIndex))
    where (shape.STContains(@g2)=1)
    or (shape.STIntersects(@g2))=1
    go
    The Books Online topic "Geography Methods Supported by Spatial Indexes" reads:
    Under certain conditions, spatial indexes support the following set-oriented geography methods: STIntersects(), STEquals(), and STDistance(). To be supported by a spatial index, these methods must be used within the WHERE clause of a query, and they must occur within a predicate of the following general form: geography1 . method_name ( geography2 ) comparison_operator valid_number To return a non-null result, geography1 and geography2 must have the same Spatial Reference Identifier (SRID). Otherwise, the method returns NULL.
    . What's in question here is what does "under certain conditions" means. The seems simple enough and for other data types, the type of operation preformed with multiple filter predicates is not an issue, but if you actually try to execute this query in RC0, you get an execution time error:
    Msg 8635, Level 16, State 10, Line 2
    The query processor could not produce a query plan for a query with a spatial index hint. Reason: Could not find required comparison predicate. Try removing the index hints or removing SET FORCEPLAN.
    This seemed like a bug to me, so I filed it on connect (see https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357578). The response?
    "Thank you for the bug submission. Spatial indexes match only WHERE/ON clauses made of single atomic predicates or ANDs thereof -- not ORs. We also do not infer that STIntersects here is redundant."
    The response that came back was authoritative and well-reasoned. Yet it also smacks of "its not an error because we say it isn't." Hopefully this response (and kind of response) will be reconsidered in the future.
  • Recommended readings: 2008 update

    In class this week, I got asked a few times about the technical books I'd recommed for fulks interested in the data space. While I've had a list up on Amazon for while, I thought I'd post an updated list here.

    I'd love to hear your suggestions, too!

    Business Intelligence

    • Data Analysis Using SQL and Excel: Gordon S. Linoff
    • Data Mining & Statistical Analysis Using SQL: John N. Lovett, Robert P. Trueblood
    • Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis : Culleen McCue
    • Decision Support and Business Intelligence Systems: Efraim Turban, Jay E Aronson, Ting-Peng Liang, Ramesh Sharda
    • Foundations of SQL Server 2005 Business Intelligence: Lynn Langit
    • Hitchhiker's Guide to SQL Server 2000 Reporting Services: Peter Blackburn, William R. Vaughn
    • Microsoft SQL Server 2005 Integration Services: Kirk Haselden
    • Microsoft SQL Server 2005 Reporting Services 2005: Brian Larson
    • Mining the Web: Discovering Knowledge from Hypertext Data: Soumen Chakrabarti
    • Report Builder & Report Models in Microsoft SQL Server 2005: Gerald Schinagl
    • Text Mining: Predictive Methods for Analyzing Unstructured Information: Shulom M. Weiss, Nitin Indurkhya, Tong Zhang, Fred Damerau
    • The Rational Guide to Scripting SQL Server 2005 Integration Services: Donald Farmer

    General data and programming topics

    • Beautiful Code: Leading Programmers Explain How They Think: Andy Oram, Greg Wilson
    • Code Complete: A Practical Handbook of Software Construction: Steve McConnell
    • Database in Depth: Relational Theory for Practitioners: C.J. Date
    • Joe Celko's Trees and Hierarchies in SQL for Smarties: Joe Celko
    • Learning WCF: A Hands-on Guide: Michele Bustamante
    • LINQ Pocket Reference: Joseph Albahari, Ben Albahari
    • Programming Cullective Intelligence: Building Smart Web 2.0 Applications: Toby Segaran
    • Programming Microsoft ADO.NET 2.0 Applications: Advanced Topics: Glenn Johnson
    • Querying XML, : XQuery, XPath, and SQL/XML in context: Jim Melton
    • The Pragmatic Programmer: From Journeyman to Master: Andrew Hunt, David Thomas
    • XQuery: Priscilla Walmsley

    GIS and spatial topics

    • A to Z GIS: An Illustrated Dictionary of Geographic Information Systems: Shelly Sommer, Tasha Wade
    • Designing Geodatabases: Case Studies in GIS Data Modeling: David Arctur, Michael Zeiler
    • GIS for Web Developers: Adding 'Where' to Your Web Applications: Scott Davis
    • Measuring Up: The Business Case for GIS: Christopher Thomas, Milton Ospina
    • Modeling Our World: The ESRI Guide to Geodatabase Design: Michael Zeiler
    • The ESRI Guide to GIS Analysis: Vulume 2: Spatial Measurements and Statistics: Andy Mitchell
    • The ESRI/University of Redlands Culloquium CD Set: Karen K Kemp
    • The Geospatial Web: How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society: Arno Scharl, Klaus Tochtermann
    • Unlocking the Census with GIS: Alan Peters, Heather MacDonald

    SQL Server

    • A Developer's Guide to SQL Server 2005: Bob Beauchemin, Dan Sullivan
    • Accelerated SQL Server 2008: Rob Walters
    • Applied Microsoft Analysis Services 2005: And Microsoft Business Intelligence Platform: Teo Lachev
    • Dissecting SQL Server Execution Plans: Grant Fritchey
    • Expert SQL Server 2005 Integration Services: Brian Knight, Erik Veerman
    • Inside Microsoft SQL Server 2005: Query Tuning and Optimization: Kalen Delaney, Sunil Agarwal, Craig Freedman, Ron Talmage, Adam Machanic
    • MCITP Developer: Microsoft SQL Server 2005 Database Sulutions Design: Victor Isakov
    • Pro SQL Server 2005 Service Broker: Klaus Aschenbrenner
    • Professional SQL Server 2005 XML: Scott Klein
    • SQL Server 2005 Practical Troubleshooting: The Database Engine: Ken Henderson
    • The Rational Guide to SQL Server 2005 Service Broker : Roger Wulter

  • Come learn more about SQL Server 2008 Integration Services

    On 6 August 2008, I will be giving a presentation to the Omaha SQL/BI User Group on the new features in SQL Server 2008 Integration Services. We will talk about improvements in scripting, how the redesigned pipeline improves performance and how to leverage the new Cache Transform to improve package performance. Quinn Jones from Farm Credit Services America will also be giving us a great SQL tip. The talk will start around 1800 local time. We will be meeting at the Creighton University West Campus, located at 11111 Mill Valley Road (roughly 41.267786° north, 96.086289° west).

     The "3P requirements" for a successful meeting -- Pizza, Pop and Prizes --- will be available as usual.

    Please leave a comment if you would like more information.


  • Spatial Support Table

    Curious about how SQL Server 2008's Spatial bits compare with those in MySQL and PostGIS? Browse over to the Boston GIS web site review this neat comparsion table.

     http://www.bostongis.com/PrinterFriendly.aspx?content_name=sqlserver2008_postgis_mysql_compare


  • Hey SSIS team, help us out!

    Time to share a lesson learned. While working with new GEOMETRY and GEOGRAPHY data types in SQL Server 2008 there nested set of issues.

    1. Unless you have spatial data to work with, the use-cases for these data types are limited.
    2. Loading data from typical data sources, such as shape files, is probably out of the reach of the typical developer or database administrator unless you have a specialty tool for it.
    3. As helpful as SQL Server Integration Services typically is, it becomes a choke point in this scenario.

    What exactly is it the issue? Let us say that you have been tasked with developing a database to support decisions about where a company should build new storefront locations. A previous analysis of sales has shown that storefronts built in areas where a minimum population and a minimum percentage of the area’s population are within a 25-mile radius. How would you solve this problem?

    Clearly you need geographically-bound data and demographic data. In the United States, our Census Bureau provides such data in a relatively easy to consume format – if you are using traditional Geographic Information System (GIS) tools. For example, you can go the Bureau’s Web site and download one set of files that define the geographic boundaries of census areas and download another dataset that has the desired demographics. A straight-forward process for extracting the demographic data into comma-separated values makes preparing the data for consumption by SSIS exists.

    The Shapefile data is a different story. In SQL Server 2008 there is no out-of-the-box SSIS Data Source for reading them and I doubt we will see one before RTM. So you are left to find a third-party solution or write your own. While it is possible to write your own, this is not an option that should be considered lightly especially if the need to consume is one-off or few-off. Currently there is a relative dearth of open source/free software solutions available. One that I used with good success is Morten Nielsen’s stack off tools (see http://www.iter.dk/page/SQL-Server-2008-Spatial-Tools.aspx). Note, however, that this tool is not yet usable with RC0. When it comes doing serious spatial ETL, there is really one choice as far as I am concerned: Safe Software’s FME (http://www.safe.com/aboutus/news/2007/106/). I particularly like the fact they have gone to the work of writing SSIS data sources, transformations and transformations.

    Yet even that solution has a significant issue, its price tag. Serious GIS shops might not bat at an eye at the acquisition cost for licensing FME, but I can imagine the looks of sticker shock when other types of business owners see the price. This is not Safe’s fault – they have a good product and they have obviously made a substantial investment in.

    So what is the best solution? I hope the SSIS team realizes that the uptake of the spatial features in SQL Server is largely a function of how easy it is it to acquire and use data. This is really an unparalleled opportunity for the SSIS team. I cannot think of another pairing of SQL Server features where one team could so dramatically improve the usability of another -- and thus empower us more as customers -- than this. To do this, I hope the team does not take a “chicken and egg” mentality to the problem. That is, they need to do more than say “well, those features aren’t being used, so it doesn’t make any sense for us to invest much in them.” What this thinking fails to realize is that the features may not being used much is because of the acquisition problem discussed above. I am not saying “if they build it, the users will come.” But I am saying that “if they do not build, the SQL Team should not expect users to use these features.”

    If you happen to agree with me on this topic, I'd appeciate your vote on Connect issue 357045 (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357045

     


  • Call For Action: Spatial "geeks," please speak up!

    I've had a lot of "fun" working the with new spatial types in SQL Server 2008. Fun like your first root canal sometimes, fun like a great first date other times.

    One of the "root canal moments" for me has been around Geographic Markup Language (GML) support. I had spent a good chunk of time generating GML for use in class to subsequently learn that SQL Server's support for GML is "limited." Isaac Kunen was kind enough to point me to http://schemas.microsoft.com/sqlserver/profiles/gml/.

    Folks, please learn from my mistake -- understand that schema before you go about generating or consuming GML for the construction of geometry or geography instances.

    Speaking of best practices, another frequent pendant on the  MSDN Forums/SQL Server Katmai/SQL server Katmai Spatial forum has started a thread (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3586982&SiteID=1) addressing the differences between geometry and geography types.  

    I think he's off to a great start but I'd like to have the "the rest of us" who are interested in the spatial bits chime in on the tread.I'll even come out and say it. Please.

    July is going to be a busy month for me. On week of the 20th, I'll be teaching our Essential SQL Server class in Boston (see http://www.develop.com/us/training/course.aspx?id=180) and then doing a private engagement in the Sacramento area the next week. If you know of any user group/PASS group meetings in those areas around those areas, please let me know.


  • Upcoming talks on SQL Spatial and SSIS

    Yesterday I drove down from Sioux Falls to Omaha so that I could catch a talk being given by Sudhir Gajre to the Omaha SQL Server/BI Interest Group. I first got to know Sudhir when he, I and Luke Schollmeyer were restarting a SQL Server Users Group in Omaha. Sudhir is a stud at SQL Server performance tuning and helped write one of the best papers on it (see Microsoft SQL Server 2005 Tuning Tips for PeopleSoft8.x.

    I am the next scheduled speaker for that group, so I took a couple of minutes last night to ask them what they would like me to talk about. Response was a little slow, so I suggest that I could give my Spatial Computing with SQL Server talk. Almost no reaction.

    Somebody in the group suggest talking about the new MERGE statement. I liked that said, "What if I wrapped that into a talk about "What is new in Integration Services 2008." The room came back to life.

    So here's an initial outline of what that talk will probably cover:

    • The changes in the SSIS pipeline architecture for back-pressure and thread scheduling The scripting environment changes
    • Working with the Cache transformation
    • Using T-SQL MERGE with SSIS
    • Using Change Data Capture with SSIS
    • (time permitting) Using the Data Profiler task

    We clustered about for a while following Sudhir's talk discussing just how much BI is taking off for Microsoft and how useful a talk like this would be. I have to chuckle every time I get into a conversation like this: I my opinion, SSIS isn't a BI tool that developers just happen to be interested in, it's a Developer's tool that just happens to very helpful to the BI specialist. Really understanding and applying many of the changes for SSIS 2008 is made easier, I think, if you have a Developer mindset about it.

    I am going to submit the "Whats new in SSIS 2008" for the Heartland Developer's Conferences to be held later in the year in Omaha and Minneapolis too.

    If you are a user group leader and would be interested in having me come visit and give a talk, please feel free to contact me through this site.


  • The thing I hate about missing TechEd is getting the eye candy late

    But this eye candy is worth waiting for.  What I am talking about?

    I have spend that last weeks nose down in the Spatial bits for SQL Server 2008. I really believe this is the darkhorse super feature of the product. But a big problem with it has been visualizing the results. Unless you good at mentally visualing spatial relationships, it can be pretty hard to make sense of some the spatial based queries you are likely to want to use. Since CTP5, I have been using Morten Nielsen's SQL Server 2008 Spatial Tools quite happily.

    In Eye Candy of the Highest Order, IsaacK introduces the Spatial Results pannel for SSMS. In short, you no longer have to suffer with this:

    Because you render the data like this:

    Yum! The downside is that this won't be in RC0 but, per Isaac, should be in RTM.

    Issac has also started talking about some new features that will make constructing geo* instances easier with the new Builder APIs. Check out Our Upcoming Builder API for the details.


  • Missed it by *that* much: UDAs, Deployment, VS90SP1B1 and SQL Server 2008 CTP6 (Feb'08)

    Happy Spring everybody!

    As of late I've been heads down working on a new course on SQL Server 2008 and its been a lot of fun. So the other way when the Visual Studio team dropped SP1 Beta1 for Visual Studio 2008 (see here) I was all over it. And it took me longer than I expected to run into... well, its not so much a bug I think as it is a misimplementation of a new feature.

    One of the new features in SQL Server 2008 I am using quite a bit multi-parametered SQL/CLR User-Defined Aggregates. These are helpful in the statistical and spatial areas. For example, I just crufted up an UDA that takes a paired vector of latitudes and longitudes and constructs a LineString Geometry from them (see the code here). It is good use case where the ability to have a Large Value Type (e.g., a user-defined type or aggregate that persistence size of more than 8,000 bytes) and the ability to take in multiple parameters helps solve an otherwise cumbersome problem.

    But the "deploy project" feature chokes on those features. Specifically, it doesn't seem yet to know yet that we use a MaxByteSize of -1 to indicate that the are using an LVT ('MaxByteSize' property specified was not found), nor does it seem to know how to deploy an multiple parameter UDA (The Accumulate method in user defined aggregate "DM.Examples.LineStringBuilder" must have exactly one parameter.)

    I have posted these to Connect (See Item ID 344093) and if you agree with me that they should be fixed, please go vote here.


  • SQL Server 2005 Integration Services using Visual Studio 2005: A Beginner's Guide

    In short: This is an example of a good idea in the wrong format -- a good book if you are starting at absolute zero and need specific step-by-step procedures. Being printed hurts more than it helps.

     

    Let me begin by saying Kshipra Singh, from PACKT Publishing send me an e-mail via my blog with a simple request. In return for a complimentary copy of this book, would I post a review of it on my blog? I am hardly one to turn down a free book and I have been looking for one that I could recommend to my students and others who are just getting started with SQL Server Integration Services (SSIS). This, then, is my keeping of that bargain.

     

    Jayaram Krishnaswamy's book is one example of a growing class of books that I believe should not have been printed. It is not a bad book, but rather, the content of it would probably be a lot better suited to its targeted reader had it been an e-book or as a DVD screen capture with a video and voice-over by the author. As many technical books that I have seen of the years -- and like ones that will admit, I have written -- the text is step-by-step procedural and makes up volume with screenshots. Knowledge presented like this I found is best done live (where interaction encourages questions), next done best by recorded demonstration (to maintain step-sequential context) lastly in a scrollable, random access method like an e-book. Print books really do not allow those types of interactions

     

    Something else is missing too. Let me make an analogy. Let us say that you know very little about how to prepare a cake. You might already familiar with the hardware required: a stove, a pan and mixing bowl. You may also be familiar with the software needed: a cake mix, some eggs, some oil and some water. You can follow the directions in a cookbook (or the back of the mix box) and produce a cake. Just like the directions from the cookbook or box-back, this book is fair treatment of how to use the package designer to accomplish a given set of data integration tasks if that's need to do.

     

    However, what did really learn about "making a cake" from the process?

     

    Based on my personal experiences and those related to me by my students, the hardest part of "getting" SSIS is not how to accomplish a certain task, its understanding why the parts of it do what it does. Going back to our cake analogy, the cookbook does very little to explain why should mix the batter in a separate bowl before pouring into the baking pan. It does not bother to explain the delicate chemical interactions between the egg proteins, the salts, sugar, fats, water and heat that take place during the mixing and baking process. In just the same sense, this book comes up short when explaining essentials such as the dataflow pipeline, buffers and what many of the tasks and components actually do or can be used for.

     

    Nor is there much here that would help go from "baking a cake" to "baking bread." These are similar processes but with some different tools used and some different steps used. This book will help you do the tasks it talks about, but it offers little more than that, especially when it comes to analyzing a problem and synthesizing a different solution. In my opinion, those are the two most important things to master about SSIS and that is, ideally, what a beginner's book would cover.

     

    Of course, that might seem like I am saying that you should have to have a course in organic chemistry and thermodynamics before you can make a cake. Well, obviously, that is not the case. After you have baked a few cakes, learned from your mistakes and maybe even read a book like Shirley Corriher's Cookwise: The Secrets of Cooking Revealed you get enough down of science to be effective at baking cakes, breads and other pastries. If there is a parallel to Cookwise for SSIS, it is Kirk Haselden's Microsoft SQL Server 2005 Integration Services.

     

    So would I recommend this book? In certain cases, yes. If you need to do something which this book particularly addresses quickly and with a minimum of cost, it is a good fit. However, if need to invest your time and money in really learning SSIS because it is a key part of your project or job then no, I would not recommend it. There are more appropriate books to be had.


More Posts Next page »