![]() |
![]() ![]()
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The Professional Association for SQL Server (PASS) will host the industry’s largest user-run educational event exclusively dedicated to the SQL Server in Grapevine, Texas on September 27 – 30, 2005. Attend the 2005 PASS Community Summit, the largest event of the year exclusively dedicated to SQL Server education. Get the largest and most comprehensive coverage of SQL Server 2005 anywhere while you develop and expand your knowledge of SQL Server 2000. Meet and network with members of the Microsoft SQL Server development team, SQL Server MVP's and other SQL Server professionals. Check out the world's premier SQL Server vendors at the largest, most comprehensive tradeshow dedicated to SQL Server. And, much more!
Tuesday, September 27
by Dave Bernard
...including the hotel. And the temperature. And the humidity.
It was 97 degrees (F) when my roomie, Dan LeClair graciously scooped me up at the airport.
It's only a short ride, even in rush hour, to the The Gaylord Texan Resort and Convention
Center, our home away from home for a few days.
What a cool place. Did I say it was big?
We promptly got settled in and then went down to registration and a reception. Registration was quick, and we received a nice satchel with materials,
including a voucher for a complimetary copy of Microsoft SQL Server 2005 Standard Edition, when they ship it (covered later).
The reception was pretty well-attended and included light snacks, one drink ticket and a live quiz show.
by Dan LeClair
Storms or no, the PASS folks throw a whale of a show, and this years’ is as big and as interesting as last
year’s. And, with SQL 2005 in the production pipeline, a lot of excitement was in the air for this “release year”
conference.
As before, the summit is made up of multiple activities: pre-conference sessions, the main three days of
presentations, two hands-on labs, a large vendor exhibit, as well as daily keynotes and other activities. It’s
almost too much to take in; in fact, none of the sessions are scheduled to repeat, which is a bit of a shame as
there are many more that I’d like to have seen than I’ll have time for.
Wednesday, September 28
General Session
by Dave Bernard
Kevin emphasized the strong user-centric "community ecosystem" which gives PASS its foundation. Three main SIG's (Application Development, Business
Intelligence and DBA) are going strong, with new ones in the works (Data Modeling and Security, Compliance and Auditing). See
their blogs for some great information. Kevin noted the extraordinary growth in members from
outside North America, and promised a PASS event in Asia would be a reality in the near future.
Lastly, Kevin gave the PASSion Award to Linda Rab (DBA SIG Chairperson) for going above and beyond the call of duty to help make the SIG program
a success; she was responsible for recruiting many others to cause.
Kevin then gave way to Bill Baker, the General Manager for SQL Server Business Intelligence at Microsoft. One of the first things out of Bill's
mouth was to announce that on November 7th, Microsoft will launch SQL Server 2005,
Visual Studio 2005, and BizTalk 2006, in San Francisco.
Bill continued by quickly enumerating SQL Server 2000's market results, which are nothing to sneeze at (...at which there is nothing to sneeze?):
SQL Server 2005 is not an unknown by any stretch; over one million copies of betas and CTP's have been distributed to date. The SQL Server 2005
project has created several new initiatives and programs designed to give this new release the biggest head start out of the door ever. For example,
the Front Runner Initiative, aimed at ISV's in the US, provided technical and marketing resources to spur applications to market faster. Microsoft
worked hard to identify large SQL Server implementations (over 50 TB-sized deployments nominated themselves) so that this newest version would be
tested in the most demanding of environments. One of these environments was Microsoft itself; keeping with their "eating your own dog food" mantra,
the Microsoft IT Department supports over 70 SQL Server applications in-house today, including a 1.7TB SAP R/3 implementation, a staging data
warehouse for all Microsoft data (2TB in 1,800+ tables with 500 subscribing systems), sales and revenue reporting (a BI application with 9,000 users
running 25,000 reports per week) and an extrordinary 1.8TB worldwide tax information database. In addition, seven customers are live on SQL Server
2005 now, with 25 expected to be live by the time it goes to manufacturing.
SQL Server is the database of choice for some of the world's largest data repositories, including Verizon (52 billion rows) and the eighth largest
data warehouse in the world (over 20TB). One of these large installations, Barnes and Noble, provided the data for Project REAL (Reference Implementation,
End to End, At Scale, Lots of Users), which Microsoft is using as a gigantic best practices generator and proof of capability for SQL Server 2005.
The BN implementation is 2TB in 10 billion rows, representing about three years of sales data, and is growing at the rate of 8.5 million rows per
day.
Bill made it clear that the bar was raised very, very high for SQL Server 2005; and this newest iteration of their flagship database product has
met and exceeded that bar. It's ready for action.
BI for the Relational Guy
by Dave Bernard
Erin is the Lead, BI, for Scalability Experts. Her session was a very good introduction to Business Intelligence, aimed at the DBA who may know a lot
about relational database technologies, but who has little exposure to concepts like data warehouses, data mining and online analytic processing (OLAP).
Business Intelligence is difficult to describe in a sound bite (byte?), but Erin said it best in suggesting that BI enabled data to be turned into
actionable knowledge and decisions. It has evolved over time, starting with an emphasis on getting data into a system rather than turning that data
into information. This emphasis on gathering data (as opposed to tuning for reporting) created a conflict for resources due to opposing requirements:
reporting (querying) requires the optimization of large amounts of data, while OLTP (adding data) requires optimization over much, much smaller
data sets. Optimizing for updates, query volumn and indexing is also at odds. How do we make these equally necessary functions co-exist efficiently in an
enterprise?
Keep in mind, also, that the true users of a data warehouse are not members of the IT department, or are even "super users", but are often C-level
executives that may use the system infrequently. As Erin says, "there's nothing like implementing a data warehouse to show a company how bad their
data is", and these less-than-super users find the holes very quickly. Also, these users need a presentation that is much more human-readable; those
short geeky column names have got to go. Enter the OLAP Cube.
An OLAP cube is a multidimensional database that pre-aggregates values to expedite queries (a shortcoming of relational data warehouse schemas). OLAP is
designed specifically for dynamic analysis of extremely large data sets. As efficient as they are, your OLAP tool (such as Microsoft Analysis Services)
needs to be capable of such functions as data compression (to reduce redundant data), selective aggregation (to prevent someone from building a cube
that is "all things to all people") and avoids storing empty columns. Selective aggregation has, perhaps, the greatest effect because it prevents a
potential data explosion when the designer attempts to create a cube with every possible relationship imaginable.
Erin emphasized the importance of "soft factors" when implementing a BI strategy. BI/data warehouse projects are high-risk endeavors; take a lesson
from the software development side of the house and embrace the RAD philosophy of iterative, piece-by-piece development and deployment; deliver early
and often. Don't rely on IT people for all of your information; talk to the target users. And don't try top model the whole enterprise as step one!
Erin characterized data mining as the "discovery of trends and patterns based on existing data." Examples include capabilities like predicting
inventory replinishment, credit risk analysis, cross-selling and healthcare outcode analysis. Data mining is powered by your OLAP front end;
early OLAP users were expert at wielding Excel pivot tables as a BI tool, but this has been surpassed by tools like, Analysis Services in SQL Server,
Reporting Services and Sharepoing Portal Services.
As Erin put it, two users often walk into meetings with conflicting reports based on the same source data. BI can help prevent that kind of occurrence
and uncover unforseen actionable information in your enterprise.
Future Directions in Data Language Integration: The LINQ Project
by Dave Bernard
To a long time FoxPro developer such as myself, Yag is a legend in the business. He is one of several ex-Foxers who have moved into power positions in Microsoft's
development tool leadership hierarchy and, as Group Manager of the Visual Studio Data group, Yag continues to influence the direction of data handling
with a distinctive FoxPro flavor. As I listened to this session, I chuckled to myself more than once at how Fox-like this all seemed.
FoxPro developers were long ago blessed with powerful, SQL-based, internal data handling capability. It was embedded right into the language and was
famously optimized (known as "Rushmore") for blazing speed. In the view of many FoxPro developers, Visual Studio .NET only offers very clumsy data
handling capabilities, via ADO.NET, and lacks the crispness, conciseness and performance of an integrated data manipulation capability.
But, soon, no more.
The recently announced LINQ Project introduces the .NET Language Integrated Query family of API's, divided into three families: Standard Query Operators, DLinq (ADO.NET)
and XLinq (System.XML). All are based on many core .NET namespace API's and are designed to greatly simplify querying and working with dynamic data
and XML in CLR languages. The version that YAG demonstrated is very new and has a ways to go before being added to Visual Studio, but, as you'll
see, the promise is already there.
The Standard Query Operator library enables the querying of all sorts of data, including arrays, folders, system functions, collections, XML hierarchies, tables, etc. DLinq is
optimized to work with ADO.NET and is the preferred way to talk to SQL Server data stores while XLinq is optimized to handle XML datasets. This new
functionality is all based on .NET framework methods. What's interesting is that execution of your query is actually deferred until you reference
the result set itself, such as when binding it to a control or assigning a data value to a variable.
Yag showed an example of directly reading an IIS log file this way (he even created his own Browse method so that he could easily list the results
during his demos, influenced, undoubtably, by it's FoxPro twin.
When you want to access SQL Server data in .NET today, you have to jump through a lot of hoops: create a SQL connection, setup the SQL command,
add paramters, use DataReader to execute the command, get a non-stongly-type object back and addressed the data elements ordinally. There's also no
compile time checks.
Using DLinq, which is a very lightweight wrapper over your relational model, accessing SQL Server data can be as simple as:
Essentially, tables and rows are mapped to classes and objects; this builds on existing ADO.NET and .NET Transactions methods.
XLinq brings powerful XML data manipulation capabilities to CLR languages; it has the expressive power of XPath/XQuery, but with C#/VB as the
programming language:
Also:
is the same as:
The LINQ Project is a huge step in the right direction for the CLR languages; data manipulation made easier than ever.
Build Your Own Reference Database for a Service-Oriented Architecture
by Dave Bernard
When you have a hammer, everything looks like a nail
A Service-Oriented Architecture is an integration architecture approach. The Reference Service is the enterprise SOA service for providing master lists needed by multiple applications, such as people, organizations, customers, products, statuses, geographic information, etc. It defines a way of organizing software assets on a network, providing business users with high-level business services that are easily called into and integrated with other business processes. Reference is a universal information tool, integrating OLTP, OLAP, Analysis Services and data warehouses with each other. Jonathan believes that we need partially-automated master data translation and routing and connecting, along with a master system that holds the gold standard, the list of master lists, to integrate systems. Reference is a single system to lookup, validate and reference values for many other databases; a compiled enterprise dictionary and thesaurus with cross-references. It is based on a central, physical repository of information in database form. Reference tables need to be highly normalized and extensible. It is an important part of the overall SOA approach to integrating applications. Like many other significant endeavors, this one takes time, patience and management buy-in at the highest levels. It should be approached incrementally and be allowed to grow organically. You will be using all of your BU resources (your data warehouse will have most of your Reference Service elements already defined). Make your foundation strong and small in the beginning. Jonathan spent some time going over his experience in creating a Reference Service at his company, how the schema was architected, how the tables were populated, etc. He estimates that a robust implementation might cost 2-3 man-years of effort. Thanks for a good overview of this up and coming technology, Jonathan.
Managing Unstructured and Semi-Structured Data in SQL Server 2005 by Dave Bernard
Unstructured data is just that: data of which the data engine has no intrinsic understanding of semantics or internal structure. Examples are Word documents, images and compiled binaries. Semi-structured data may lack an explicit schema, but is usually accompanied by some sort of meta information that describes it in some way. The structure may even vary from instance to instance and over time. SQL Server 2000 provided very good capabilities for handling this kind of data, but SQL Server 2005 goes further, especially when it come to handling XML data. Full text support is improved and character and binary types support up to 2GB of data, up from 8000 bytes in SQL Server 2000. Of course, support for the CLR in SQL Server 2005 allows for all sorts of robust embedded data handling against non-structured text. Michael interspersed his talk with loads of actual demonstrations running DDL code. He used the OpenRowSet (BULK...) to load files from disk directly into a BLOB varchar column. Very cool! Full Text searching in SQL Server 2005 is now about 50% faster; index creation when repopulating catalogs is now about 600% faster! And it scales up to 2 billion documents per catalog (does the Library of Congress have that many documents?) New DDL statements have been added (CREATE FULLTEXT INDEX, native backup/restore/recovery/attach/detach), too. It looks like Full Text searching is finally a full-fledged citizen of SQL Server. XML data support has been greatly improved; it is now supported as a native SQL type. It is queryable with XQuery, updateable with XML-DML and provides automated checks for well-formed XML, as well as automatic validation checks. You can create schema collections, stored in the system metadata, that allow you to constrain XML documents, that is, you can guarantee the "shape" of your XML data. There is a lot more to this than meets the eye. You can even index XML data to speed XQuery operations! Even more, you can combine XQuery with Full Text searching for even more powerful searching. I think this is going to lead to very interesting applications in handling large document repositories such as medical abstracts and articles, which are often distributed in XML form but, until now, require translation into a pure text format before effective full text techniques can be brought to bear. Microsoft has obviously spent considerable effort in improving both Full Text operations and XML data support; taken together, handling of non-structured data has been moved forward in a big way in SQL Server 2005.
A SQL Server DBA’s Guide to CLR Integration by Dan LeClair
One of the basic fundamentals of CLR assemblies in SQL is that the compiled CLR code (and, optionally, the source code) is stored within the database itself, rather than as separate, externally-called classes. This gives it portability, as it travels with a database across backups and detachments. However, it means that common assemblies need to be integrated in each database that needs them. They will get written out in database scripts, but that makes the script hard to read, as all you see is hexadecimal code. Dr. Low was very clear on his beliefs as to the appropriate places for CLR assemblies, and spent some time in his presentation describing how it can be used and where the greatest benefits are. The bottom line seems to be this: use CLR for tasks that T-SQL is not good at, such as string manipulation, cryptography, file access, and so forth. For data-centric operations, stick with T-SQL. User-defined functions seem to be very good candidates for building as CLR assemblies, as well as defining your own data types and aggregates. Stored procedures tend to be less so, as they usually are processing data. Triggers can also be written using CLR, but again, you need careful thought as to what the gains can be. In short, his message was to curb the enthusiasm for CLR. Lots of interesting things can now be done but give some thought as to the appropriateness – are they things that make sense to have in your database?
Fine Tuning Performance with Plan Guides by Dan LeClair The sessions in this year’s PASS Summit are divided into different tracks, and this session fell into the “Enterprise Database Administration and Deployment” track (in fact, all of the ones I attended did). Unfortunately, I was a bit out of my element in this particular presentation. Not that I didn’t get anything out of it, mind you, but as a database developer, some of what was presented was a little obscure. But for the DBA’s in the crowd – and this session was in the big hall – it seemed to really be of interest. SQL 2005 Plan Guides allow the DBA to manually fine-tune the execution plan via query hints for a specific stored proc, overriding the default execution plan and (hopefully) speeding up code performance. Normally, in the course of code development, the DBA or developer would be responsible for testing and tuning queries through code changes, index changes, and so forth, and Plan Guides are not intended to supplant this responsibility. Rather, they are intended for circumstances where queries are not under local control – such as the case where they are written by an offsite partner or ISV not willing (or not able) to change slow-performing code. What was obvious to me was that this is not something to approach lightly. First, there are three different plan guides, each with very specific requirements regarding code. The SQL plan guide, which offers the most control of query hints, can only operate against queries that match what it was designed against byte-for-byte – including casing, white space, and even carriage-returns and line-feeds. The Object plan guide is a little less restrictive, as the case for key words and white space doesn’t matter. The least restrictive – Template – uses the same case and matching rules as Object, but only supports two levels of parameterization, simple and forced, when developing. Most of this information was presented at the beginning of the session. Keith then went into a number of different samples of how to implement the plan guides, and the outcome of the guides. Again, the DBA’s in the crowd really seemed to get into what was being shown. For me, it was enough to find out that this tool is available for situations where I don’t have control over query code.
Index Enhancements in SQL 2005 by Dan LeClair
SQL 2005 offers some really incredible enhancements for indexes that should prove to be very useful. A lot of ground was covered in this session, and I’m looking forward to experimenting with what I learned. The enhancements fall into one of three general categories: availability of indexes, manageability of index metadata, and performance improvements. First, indexes can now be specifically disabled and enabled, rather than having to be dropped and recreated. This is a big improvement for situations like system upgrades and bulk data loads, as the operator or application doing the work does not have to determine what the index syntax is beforehand. Indexes can also now be built on line, rather than having to lock the table for the duration of their construction. There are also a lot of changes regarding index metadata. These changes involve better index analysis (such as showing usage rates and fragmentation), reorganization (using ALTER INDEX…REORGANIZE instead of DBCC INDEXDEFRAG), and measuring performance statistics. New system stored procedures are available that deliver table-based result sets, meaning you can only return the columns you are specifically interested in; however, Kalen warned about some gotchas when using a couple of these. One thing that was only briefly mentioned was that the Index Tuning Wizard has been tossed, and replaced by Database Tuning Advisor (a much more capable and complete application), but unfortunately not much more was covered about it. Guess I’ll have to look that one up myself. A lot more was covered, but I can’t finish this without mentioning the new ability to include columns in indexes without actually adding them to a key. Basically this gives you the ability to get around the 900K / 16 column key limitation for better covering indexes, and hopefully better performance. The caveat here, though, is that it is now possible to build indexes as wide as the table itself, which wouldn’t really buy you much in performance.
Wednesday Vendor Exhibits and Random Thoughts
In another demo, Steve prepared a CE application by choosing a pre-built SDF file as the data source (if can now be created in VS 2005 or SQL 2005 (via SSI) rather than the arduous process required before); for SDF support, you need to add certain DLL's to your project for deployment purposes (set them to Copy if Newer). Phillip quickly covered the merge replication aspects of maintaining offline data sets, contrasting it with the simpler (and weaker) RDA approach. If you're going to do it at all, go ahead with merge replication so that you can take advantage of its full-featured function set as your application evolves. New in SQL Server 2005 are several useful features:
Ad-Hoc Reporting with Report Builder by Dave Bernard The fact that a good reporting tool is of paramount importance to many attendees and their companies was underscored by the packed room for this session. And it was surely the kind of session where you leave convinced that you got your money's worth. Carolyn's rapid fire but smoothly flowing presentation kept everyone wide awake in that difficult after-lunch time slot. Carolyn's session was ostensibly covering ad-hoc reporting, but it also served as an effective introduction to the reporting capabilities and enhancements now available with Visual Studio 2005. Report Builder is a tool aimed squarely at end-user needs, rather than those of the developer, who will be more attracted to the full-featured Reporting Services module; it does an effective job at hiding the complexity of the data source's physical layer and provides much more of a business model approach to defining reports. Report Builder is not a replacement for OLAP tools or pivot table approaches. It is being delivered with SQL Server 2005 and is fully integrated into Reporting Services and, like so many other parts of the new offerings, it is deployed as a ClickOnce application (requiring the 2.0 version of the .NET Framework.
Filter conditions are easily applied, too. One really cool aspect of adding filters is that Report Builder can render an English-like description of the defined filter that can then be displayed on the report. Very nice. Reports can be saved (in RDL format) to be run later. They can be exported to all kinds of file types (XML, TIFF, etc.) You can view the generated SQL and RDL meta data if you want. It's surprising how complex some of the generated SQL calls are. Infinite Drill (only available in the Enterprise edition) automatially creates drill-down links on the report in cells with appropriate RI relationships. For example, if you render an Invoice History report, links that drill down into the Invoice detail will be created, and there may be links on that report that go further; this allows users to explore data along model navigation paths. If a link makes sense, it will be there. Users can keep drilling as long as there is a path to follow from the current item. Drill-throughs can be tuned by the developer to include more or fewer columns, too. The Report Model wizard walks you through the creation of a Report Builder model from your data schema; the only real restriction is that stored procedures are not supported, which I find a little surprising. The wizard automatically breaks dates down into parts (day, month, year) and automatically creates sensible aggregate fields. Once a model is created, it serves as the basis for your ad-hoc reports. Large models can be presented in more simplified forms by dividing them into "perspectives". For example, put Customers and Orders in a perspective, leaving out Invoices, General Ledger, etc. Thanks, Carolyn, for a high-quality, quick-paced session.
Best Practices for Structured SQL Server Development by Dave Bernard Adam, a SQL Server MVP and self-described "Total SQL Server Geek", hosted a full session (and in a big room, to boot). This guy obviously has quite a following. Adam is a fellow who believes that it is a mistake to trust your application and the data it delivers -- and he's right. To begin to trust our applications, we need to structure our applications so that results can be consistently reproduced. So how do we do that? Adam says you have to (1) handle exceptions, (2) unit test and (3) debug properly. Adam started ot by illustrating worst practices in implementing a stored procedure "API". Stored procedures should be thought of just like any other interfaces: black-boxes with well-defined inputs and outputs, which is not the way most developers regard stored procedures. It means things like:
CREATE PROCEDURE myGreatProc @sql NVARCHAR(2000) AS BEGIN SET NOCOUNT ON EXEC sp_ExecuteSQL @sql ENDThe result is, according to Adam, "a world of pristine beauty." As long as your application knows as little as possible about your database system, and as long as you keep those interfaces intact, you can do anything you want in that database (fix bad data, modify the schema, re-tune your stored procedures); no one will notice because the inputs and outputs have remained unchanged. What you have is a beautiful, totally decoupled application. Ah, nirvana by way of a level of abstraction. Handle, er, expect, exceptions. Be aware that there are three types of runtime exceptions:
BEGIN TRY SELECT convert(int, 'xyz') END TRY BEGIN CATCH PRINT ERROR_MESSAGE() PRINT ERROR_NUMBER() PRINT ERROR_SEVERITY END CATCH GOTRY/CATCH allows you not only to handle an error, but to do something smart in response. XACT_STATE() can be queried to determine the current transaction state; -1 means "must rollback", 1 means "commit or rollback", 0 otherwise. ROLLBACK is your friend. Use RAISERROR to handle things like constraint violations (WHERE NOT EXISTS fails, e.g.), logic violations (an unexpected value in a variable) and bad or missing passed parameters. Of course, there's always the possibility of unexpected exceptions; the first defense is to code well so they don’t occur (test for divide-by-zero, e.g.) and then test, test, test. You can never test enough. Make sure that, when debugging, you're trying to discover bugs and not just trying to get the code to compile. Debugging TSQL is tricky; a lot of PRINT and SELECT statements are typically employed. Adam made us aware of a couple of neat open source tools that will help a lot: TSQLMacro and TSQLAssert. TSQLMacro provides a TSQL macro expansion capability, allowing the developer to centralize common pieces of code and, even better, automate the inclusion of test and debug code. TSQLAssert uses TSQLMacro's power to provide an assertion framework for stored procedures. Very useful! Great session, Adam.
Writing .Net Code in SQL Server 2005 by Dan LeClair
Writing assemblies for SQL starts with creating a new SQL Server project. Five new project templates have been introduced for SQL 2005 in VS.Net 2005, allowing you to create user-defined aggregates (UDA), user-defined functions (UDF), stored procedures, triggers, and user-defined types (UDT). There are some common properties for all of them, including one for a connection string to the SQL server, one for assembly signing, and one for setting the security level of the assembly. Assembly security levels include SAFE (no external or unsafe calls), EXTERNAL ACCESS (allowing access to files outside of the SQL operating system), and UNSAFE (allowing PInvoke or other “unsafe” calls). I should mention here that technically, all assemblies operate safely within the SQL operating system. If an assembly does something bad, SQL has control of it and will shut the thread down before it is allowed to crash the system. There may be perfectly valid reasons to allow an UNSAFE security level; however, it is something that should be approached with caution. Before going on to present examples, Peter spent some time going over the differences between SQL Server data types, Sqltypes (in SQL Server projects), and regular .Net variable types. For example, the various different character types within SQL Server (Char, Varchar, Nchar, Nvarchar, NText, and Text) are represented by the Sqltypes SqlString and SqlChar, and as String and Char() within .Net. You will have to write your SQL assemblies a little different than regular .Net code because of this. The meat of the session, of course, were examples of the different types of SQL objects that can be written. Space doesn’t allow me to cover them all in detail; however, this small example of a UDF that encrypts a password shows some of the important points of writing SQL objects in CLR:
using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using System.Text;
using System.Security.Cryptography;
public struct MySecurity
{
[SqlFunction]
public static SqlBinary EncryptPassword(SqlString password)
{
UnicodeEncoding uniEncoding = new UnicodeEncoding();
byte[] passBytes = uniEncoding.GetBytes((string)password);
SHA1 sha = new SHA1CryptoServiceProvider();
byte[] passHash = sha.ComputeHash(passBytes);
return (SqlBinary) passHash;
}
}
Two things to note here: one is the new namespace, System.Data.SqlTypes, and the second is the
project identifier (SqlUserDefinedAggregate, SqlFunction, and so forth). Of the five project template, only UDT’s and UDA’s can only be done in CLR; the other three can also be done in T-SQL. Just as Greg Low did yesterday, Peter suggested using some thought before launching into massive CLR assembly construction. UDT’s, for example, should be small and scalar – they are not meant to represent large things like customers, but smaller things like currency (with descriptor for dollar, euro, and so forth) CLR is intended to extend T-SQL – not replace it. For data-centric operations, T-SQL (being native to SQL Server) will generally be faster. A lot of ground was covered in this session, but afterwards I felt as if I had a much better understanding of the uses of CLR and how to construct assemblies. Definitely worth going to.
Developing Enterprise Data Integration Components by Dan LeClair
One of the prerequisites for this session was previous experience with Integration Services, but even though I haven’t done much but play around with it, the session was generally easy to follow. One of the main points of the presentation was that custom task writing is much easier to do with .Net than it is with scripts. In fact, Kirk constructed, registered, and instantiated one in under a minute (drafting an attendee with a watch to time him). Okay, it was the classic “hello world” example, but enough to demonstrate his point. There are a number of common requirements to writing custom tasks, including the following:
Custom tasks are based on the Task class and have a number of attributes that should be set. One of the more interesting ones is “TaskContact” which is basically a “who wrote me” line: TaskContact = "Expression Task;Microsoft Corporation; Microsoft SQL Server v9; © 2004 Microsoft Corporation; All Rights Reserved; http://www.microsoft.com/sql/support/default.asp;1"),However, it also doubles as a placeholder attribute in case a required component is missing on the server the task in installed on. Kirk didn’t just demo a “hello world” task, he had deeper, more useful demos to show as well. He also discussed custom task best practices – provide progress feedback, even if it is just “started” and “finished”, avoid collections if possible, tasks should only have one function, and others. He also explained some of the inner workings of custom tasks, such as synchronous vs. asynchronous buffers, and went over the pipeline component and how it works. Lastly he finished up with the expression task and how it works. Whew! A lot of material was presented in this session, but it was worth attending.
Advanced DDL Tricks by Dan LeClair
As much as I hate criticizing a fellow Texan, though, this session was disappointing. Instead of advanced DDL (data definition language) tricks, it turned out to be “Joe Celko’s Opinions on Proper Schema Construction.” Granted, it was probably new information to some folks, but some of what was covered I regard as basic table design rules, some points I disagree with, and some comments I frankly felt insulted by. Celko’s opening point was that most bad SQL queries are the result of bad schema design. That’s a pretty broad statement, but one I might agree with under many circumstances. One particular point he brought up is that there are some standard naming conventions for column names that should be used when possible. ISO-11179 was one recommended standard, but generally accepted industry terminology should also be considered. Names should follow a noun-adjective pattern and columns should be named for what they really are. Constraints should be named descriptively, and used more frequently than most people use them. However, some unfortunate generalizations were made. For example, null-tolerant columns are not on his most-favored-habit list. In the case of string and numeric columns, I would agree – better to have specified default values to prevent issues with nulls in calculations and searches. But, since an empty value is not valid in a date column, how else do you indicate the absence of a value, other than with a null? Same with foreign key columns. Another surprising declaration was that use of the IDENTITY column value for primary keys is bad. I was not the only one to do a double-take on this, as it is (at least in my view) one of the easiest ways to implement surrogate keys in SQL Server. Celko’s beef with them is a lack of validation and verification (I didn’t really get his point on that) and that, with cheap disk space and powerful processors available today, there isn’t a really good reason to use them. Apparently he prefers to use natural keys as primary keys, including multi-column natural keys, regardless of the amount of space they consume in both table and index width. But what really hacked me was when he accused people that use IDENTITY columns of being the same ones who write bad code and use a lot of resource-consuming cursors in their queries. At that point, I almost walked out of the session. Lastly, he had some unkind words about people that confuse “records” with “rows” and “fields” with “columns”. Huh? Sorry, Ashton-Tate and Microsoft used the terms “record” and “field” for years, and the habits of old dBase and FoxPro developers die hard. I’ve worked with SQL for eight years now, and still mix the terms up. Joe, if that makes me less of a person in your eyes – too bad. It also cost you a fan, but with 750 articles to your name I suppose you don’t really care, do you? Friday, September 30
What's Next for Database? by Dave Bernard
Jim characterizes the current and future state of the database world as "data avalanche", with a need to integrate all kinds of data. We can capture and store everything now (no need to throw anything away, but people need to know "what's interesting?" and "what can you tell me about X?" Jim believes that, just as punch cards were the "stone tablets" of their day (not updateable), the day is coming where databases will only allows inserts and reads, without allowing any deletes or updates. This is a reasonable view given the price/performance improvement curve we have seen and will continue to see. Now and in the future, most of the data ("everything" currently amounts to about 15 exabytes) exists at the edge of the network: PDA's, cameras, phones, etc., and all of these devices want and need to share data and all will need databases. Jim thinks that an entirely new "perfect system" paradigm needs to be defined, one that:
I think the most controversial idea that Jim promotes is his vision of the DBMS as a huge web service: put everything inside the database! That means no web server or application server; drive everything from the database engine. I think this is, well, naive. Another smart fellow warned against the overarching goal of universal simplification ("Make everything as simple as possible, but not simpler." -- Albert Einstein); it ignores both history and reality. History has shown that the wants and needs of data consumer will preempt any attempt at universal simplification. Real worlds demands are too fast-changing, complex and diverse for any one-size-fits-all strategy. The law of unintended consequences trumps any attempt to simplify complex systems. You can't define the universe with a simple statement. Given that, Jim's presentation provided great context for where we've been and where we're going, and much food for thought on how best to prepare for the future.
Microsoft Round Table Discussion I was invited to participate in one six Round Table Discussions that Microsoft was leading during the conference; small groups of 12-15 people were involved in each, led by a couple of Microsoft representatives. I've never really participated in something like this before, so I didn't know what to expect.
After this, we were asked to enumerate our three most important job responsibilities, and then detail the tasks that go with each. For each task, we were scorecard such things as the number of hours we spent on each task in a given time period, and the importance and significance levels we attached to each task. Finally, we mapped these tasks to where they fit in the various stages of the product life cycle. We were then led into a discussion of "pain points", and lively discussion ensued for the rest of session. Patch management was definitely a hot button topic; many believed that the current methodology was risky and almost everyone relied on multiple non-Microsoft sources to corroborate and vet patches before actually applying them. One person suggested applying the SQL Server 2005 Upgrade Advisor approach to patches, taking a more proactive approach to determining ahead of time potential mine fields and missing dependencies. One point surprising to me was the near unanimous opinion that, at least in the SQL Server space, Microsoft gets security right, especially compared with many competitors and third party vendors. Other good ideas were put on the table, which are, even now, winging their way to Redmond.
Keep listening Microsoft, but it's time to drop the all-things-to-all-people-in-one-package approach. It promises everything, but satisfies no one.
Data Modeling Key Decisions – Pros and Cons by Dan LeClair
To start, here is a quote direct from his session: Information is an organizational asset, and, according to its value and scope, must be organized, inventoried, secured, and made readily available in a usable format for daily operations and analysis by individuals, groups, and processes, both today and in the future. With that as a base, he then presented some basic ideas about data modeling, including his “optimization theory” triangle (with good schema design being the basis upon which other optimization techniques rely on); how to evaluate model designs; an overview of the modeling process, and an introduction to various data models themselves. A large portion of the presentation was devoted to discussing a relatively new idea: the O/R DBMS, or object-oriented relational database. Definitely not your father’s standard relational model! The O/R model, he claimed, supported the best of both worlds: data integrity, standard SQL set-based results, and performance from the relational side, and inheritance, polymorphism, and strong app development from the OO side. There were a number of skeptics in the audience, including me, and I must confess that until I looked at the completed database (the code for which being on his USB drive, which he very trustingly passed around) it didn’t all quite come together. But, after seeing the actual tables and his description of how it all worked, I would be interested in giving it a try. It should be noted here, though, that when asked by one of the attendees if he had actually put this into a production system, he said no, but I’d be willing to bet we see this theory in practice soon. He then went on to review some other data-modeling related concepts, including normalization, entity development, primary keys, optional data (such as the use of nulls), data quality, and the applications of standards in model construction. It was very interesting to compare what he presented with Celko’s opinions from the afternoon before. All in all, a very interesting session, and one which gave me many good ideas.
The 15 Need-to-Know Points About XML by Dan LeClair
So, what are the 15 need-to-know points? The first two – what XML is and is not – sort of summed up the session for me:
With an intro like that, I had to pay attention to the rest of the session, despite lunch. He then went on to discuss the primary storage mechanism for XML in SQL, which is the XML data type. It is a little different than regular XML, in that it really only stores fragments; SQL requires well-formed XML but a lot of the other rules are not enforced. Not every part of an xml document is retained – prologue, insignificant white space, for example, get stripped out. In short, XML you put in may not be the XML you get out of SQL. This led to a discussion of some of the other problems of XML in SQL. As you might guess, your Storage structure affects performance, and if you do DML against XML it will be slower than against regular scalar row info. This can be alleviated somewhat by XML indexing (which he discussed later). It is also very easy to avoid appropriate normalization – which encourages what he called “cowboy programming”. If you are in a heavy OLTP environment, he recommended NOT storing data in xml format, but if you have to, be aware of the costs and how to minimize them. Some of the other “15 things” he covered included a review of XML Schemas, how to use schema collections, how XML indexing works, the XML type methods, and alternatives to storing XML. It sounds like a lot, and in fact, it was, and Kent admitted that he was tinkering with his presentation up to the last minute to try and cover as much as possible. I felt it was a great introduction to a complex subject.
Analysis Services 2005 Database Design by Dan LeClair
The session centered around UDM – the Unified Dimensional Model approach to cube design, which is what the Business Intelligence Development Studio is based on. UDM uses attributes, instead of hierarchies, to construct dimensions, and depends on a key attribute upon which all others are based. He used the example of geography to explain this, asking questions like it is an attribute of the customer, should it be a separate dimension, will other sales offices be concerned with it, and so forth. Something to consider when using UDM is excess “cube space”. When building dimensions, it is possible (depending on the source data) to create space between dimensions where there no “tuples”, or pairings of data. For example, if a sweater company offers red sweaters in “larger” and “medium” but not “small”, adding “small” as an attribute will create cube space as there is no combination of “small, red” sweaters. Sounds pretty basic, but this can be an issue when designing cubes. Not having strong data warehousing experience, some of the presentation was lost on me. My own fault for randomly sitting in on the session, because modeling analysis cubes is really different than modeling OLTP systems. But there were several points he made that even a cube-novice like me could understand: you always need to think about where the application will change and expand; be careful in the development of your attribute relationships (particularly time, since it can be subdivided so many ways); rules are made to be broken, but only do it with forethought and intent. Dan was an entertaining speaker on a complicated subject, and it was pretty clear he had a lot of passion regarding cube design. A lot of what he presented I can’t immediately use, but you can be sure his session notes will be on my hard drive for a while.
It's a Wrap...
I applaud the choice of venue; the Gaylord Texan Resort was an outstanding host. This is the best conference venue I've been to in recent memory. A friend of mine who once owned a training company once told me that "training is free", that is, you almost always get back much more than your investment, And SQL PASS is much more than "training"; it's access to experts, it's commiserating with people you'd otherwise never meet, it's an opportunity to focus on extending your knowledge and capabilities and it's just plain fun, too. I'll definitely add SQL PASS to my annual list of must-see tech.
Interview of Speakers at PASS 2005
Jean-René Roy has interviewed key speakers at PASS 2005. The question was: "What features in SQL Server 2005 give you a good reason to upgrade?"
This is the list of speakers which have been interviewed:
Brian Knight, SQL Server Central
West Brown
Rick Heiges, Scalability Experts
Your reporting team
62 Rue Doucet, Petit-Rocher, New Brunswick, E8J 1L3 Telephone: 1-506-783-9007 Email: mfournier@levelextreme.com |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||