It's still winter here in Central Pennsylvania. I wish I could say I'm out in the snow with my family or something similar. But the truth is that I'm very busy with client deliverables. I've been teaching and/or mentoring on SQL Server application development and business intelligence development for over seven years, and all along, I've been doing client work simultaneously. I refuse to mentor people unless I also have serious “skin in the game.”
I've thought seriously about what skills a SQL Server developer should have. There are many different types of database applications, and this article presents the skills I've seen that are common to success across all of them.
What's on the Menu?
Here are 13 items (rules, recommendations, or talking points) that I stress to people who are in the general SQL Server application/business intelligence developer discipline:
Item #1: Good Transact-SQL Skills are IMPERATIVE
Years ago, I knew an IT manager who said, “There are red rules and blue rules. The red rules you follow without fail/without exception, and the blue rules you follow as much as you can.” Those are perfect words, and I'll use them here: An absolute red rule in business intelligence development is to have strong chops when it comes to writing bare-bones T-SQL code.
The red rules you follow without exception and the blue rules you follow as much as you can."
SQL/BI developers write code all over the place. Here are some spots:
- When profiling/discovering patterns in client data. This can include anything from finding relationships to establishing granularity to counting the number of rows that don't fit a standard profile or business rule.
- When building views for reports
- When supplementing SSIS packages with custom T-SQL code or stored procedures
- When attempting to process incoming new rows and incoming changed rows
- When preparing data into summarized forms (for data warehousing or OLAP cubes)
- When retrieving data for reports
These activities require far more than just knowing the difference between an INNER and OUTER JOIN or how to aggregate data with a GROUP BY, although those skills are certainly critical. Here are some significant mistakes that can be made along the way:
- Abusing the UNION statement in a way that kills performance
- Not realizing that certain techniques are not optimizable (i.e., cannot take maximum advantage of indexes)
- Not understanding the proper use of temporary tables and table variables
- Using cursors or other row-based approaches without considering set-based approaches
On my website, there's a category for BI/SQL development with several blog posts on some of these SQL programming “anti-patterns” and how to improve them. Some of the examples on my site include using more efficient table types and passing tables as parameters instead of using cursors, and taking advantage of the often-underutilized UNPIVOT statement instead of abusing UNION statements.
Recently, I was trying to solve a problem but hit a mental block. Rather than struggle for hours, I asked a co-worker for input. Within a few minutes, he made some observations that gave me the light-bulb moment I needed. So don't hesitate to ask a colleague for insight.
Item #2: Data Profiling Skills and Finding Patterns in Client Data
Recently I had to integrate data from three different sources for a client application. I had to implement allocations in one set of tables based on ratios of data in two different sets of sources.
Because this information was spread across three different systems, I had to first make sure that I could cleanly relate data. Not surprisingly, I found that not all the data cleanly linked but about 20% was different because of misspellings or rules about implementation of composite keys across systems. The process of writing discovery queries - or profiling - took longer than the actual allocation procedure itself. This is not uncommon at all.
Here's another scenario I often face: a new client with a database that isn't documented, doesn't contain any relationship diagrams, and doesn't have primary key/foreign key constraints.
Imagine walking into a room of thousands of people with the goal of finding family members, and having to ask each one, “Are you related to this person?” Well, that's another form of data profiling. I've written countless discovery queries to find relationships, and to find exceptions that the client should know about. There are many examples, but a common scenario is taking an excerpt of a hard-copy report from a client and trying to reproduce the results with the source data, where no source code or even data diagram exists. During the process, I'll find out what fits and what doesn't. Sometimes they'll tell me they were already aware of the exceptions, and sometimes they weren't aware at all!
As I said in Item #1, writing T-SQL code to profile this data and find relationships (and report on missing relationships) is a must!
Item #3: Study What the Legacy Systems Do
The author Mark Twain once wrote, “Do something every day that you don't want to do; this is the golden rule of acquiring the habit of doing your duty without pain.” So true! Here's a habit to get into: if you are rebuilding an application and improving on a legacy system, study every last piece of detail, every nook and cranny in the legacy system.
Yes, that can seem incredibly boring, tedious, and even mentally painful. It's my least favorite task, but it's also one of the most critical, for several reasons.
First, you have to understand how the current system works. Clients might find it disrespectful if you're proposing significant changes and you haven't fully explored the functionality of the current system. Second, if the legacy system isn't well-documented, the only way to establish any specifics or nuances of business rules is by reading old code. Even though the company wasn't wise to neglect docs, you'll want and need to define these items going forward.
Item #4: Understand Business Application Requirements and Building Walkthrough Scenarios
For many years, there's been an argument about the business domain knowledge that a software developer should possess. One side argues that software developers should be heads-down writing code from specifications. They don't need to know the business side of things. Their managers (in a few cases) even discourage the developers from learning the business.
I have zero tolerance for this viewpoint. Developers ABSOLUTELY should know the business. This is true for OLTP production applications and is especially true of business intelligence applications. If you're wondering why I'm saying this so boldly, remember that developers build applications to serve business needs. You wouldn't serve a need where you don't understand the context, any more than a doctor can heal without understanding what is wrong with the patient. This is also accentuated by the changing nature of the business. Waterfall methodologies have given way to more iterative approaches, which means that developers have to be even more attuned to business applications.
At all times in a new application, you'll want to remember what the end goal is. That requires understanding what the business wants. Every component in the application is a stepping stone toward that.
Remember the old CRC approach? CRC stands for Classes, Responsibilities, and Collaborators. It can be as simple as writing out a narrative or use case, and identifying all the nouns, verbs, and relationships. Well, it recently came in handy for me. I had to build a model where users could look at volume produced on a production floor by product line, and also wanted to look at the volume of products that required reworking, by department. A single department encompassed multiple product lines. So I wrote a series of use-case statements, focusing on the nouns and verbs. It sounds almost too academic, but it's helpful to fill in the blanks.
To an experienced designer who also knows about the company data, the scenario wound up requiring a many-to-many (M2M) pattern/relationship between product lines and reporting departments.
In this situation, I created a database model with two fact tables (one for volume produced and a second for volume reworked), two dimensions (material lines and reporting units), and a bridge table that stored the M2M relationship between lines and units. This was one piece within a larger application context, but was sufficiently self-contained to the point where everyone could visualize the end result. I couldn't have done that without understanding the data, and a traditional waterfall approach would likely have taken much longer.
Most developers have faced challenges like this, where all the specifics are different. But there are approaches that can be taken across many of these. It starts with writing a well-defined scenario, and that can't happen unless you understand the data.
Item #5: Baker's Dozen Spotlight: Shape Data into Kimball Star Schemas for Data Warehousing and Analytic Applications
There are countless compelling reasons for shaping data into star-schema Fact/Dimensional Models. Recently, I had conversations with developers who resisted de-normalizing data into Fact/Dimension structures before building a business intelligence/analytic model. They initially wanted to stay with the standard normalized OLTP database structure, which is certainly optimized for getting data into a database, but certainly not for getting data out of a database.
If you work in data warehousing/business intelligence, Ralph Kimball's “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling” is a must-read.
Back to the client site. Fast-forward about three months, after going through the process of re-shaping data into fact/dimension structures, they saw the benefits. A good star-schema model practically tells a story about business relationships between key metrics and business entities. Additionally, most SQL/BI reporting/analytic tools work optimally with fact/dimension structures.
I highly recommend that you incorporate Ralph Kimball's methodology into your processes and always shape data into fact-dimension structures when building a data mart, data warehouse, or analytic business intelligence application. Also, sometimes developers wonder how to put the Kimball methodology into practice for other applications; you can find out more about it from my article about implementing common data warehouse patterns in the March/April 2013 issue of CODE Magazine.
Item #6: Build Meaningful Prototypes that “Show versus Tell”
I've seen many prototypes that had the right intentions, but just never got off the ground. Granted, a prototype can't demonstrate all the functionality of the final application, but it must make a sufficiently compelling argument. The other day I heard a great line: “Suppose you were in the middle of building a model or a prototype, and were suddenly fired because the company didn't have faith or confidence in what you were proposing. Now suppose you're the person coming in who is replacing you. What would you, as the new person do, to go above and beyond the original person, to prove the concept of the prototype?”
I thought that was a great message. You can have all the vision in the world, but until you share and transfer that vision with stakeholders, they will never see or understand it. Some people don't think in terms of hypotheticals, can't visualize things, and you have to help them. Other people are more visual, and can't follow all the words that describe the situation. You can help them if you show a real-world example that they can physically try or that is visually explained. Visual explanations can go a long way. CODE Magazine editor Melanie Spiller has written an excellent blog entry on the various ways that people learn, and I highly recommend it.
Item #7: Sanity-Checks and Vetting Design
I once worked on a team where a developer thought he needed to artificially (and unnecessarily) expand data by a factor of 20 (yes, 20 times) in order to make some relationships work. It was far from a perfect design, but the developer was rushed to produce a result and didn't have the opportunity to seek input. The solution worked, but did not perform well.
Rarely is a design 100% perfect. Talk through your processes and steps with others. Whiteboard it. Or just talk it out yourself; pretend you're doing a presentation on it. Many times I'll talk through a proposed design with someone, and in the middle of describing it, I'll notice them frowning, and I'll realize, “Wait a minute, what I'm describing is too complicated or too convoluted.”
When you talk out a solution with another person, your level of awareness and sense of what makes sense increases significantly. It's no sin at all to come to that realization – but it IS a sin to not vet your approaches. Yes, some of us are better at this than others, but it's a craft you'll improve over time if you practice it.
Item #8: Understand the Microsoft BI Platform, and Know the Limitations of the SQL/BI Tools
Many times, I've sat on both sides of the interview table for SQL/BI positions. When I interview developers, I sometimes ask questions with the purpose of finding out if a developer knows about shortcomings or deficiencies in a product. For instance, until SQL Server 2012, SSIS didn't handle metadata changes in the data flow pipeline in any kind of elegant fashion. So I describe a specific scenario, with enough clues, to see if the applicant understands the context.
Other times I ask: “If you could sit in front of the SQL Server product team and ask them to change three things in the product, what would they be, and what experience led you to want them?” I'm not looking for a correct answer. I want to find out if the person has worked with the product long enough to uncover shortcomings or deficiencies.
No software product is perfect, and an experienced developer will usually run into these shortcomings. This is, in my view, one of the biggest differences between an experienced developer and an inexperienced one.
Of course, that leads to a question: How can someone gain experience and come off as experienced in an interview, when they don't have actual job experience to begin with?
Here's a tip/experiment. This goes back decades, but I experimented heavily with Turbo-C back in 1987 and 1988, although I didn't get an actual job using C until 1989. But because I had banged on it so hard, I'd learned the product's quirks and shortcomings. The interviewer realized I didn't have much production experience, but was impressed that I knew my way in and out of the tool and had discovered areas where I wanted the tool to work differently.
Item #9: Keep Technical Documentation Up to Date
I'm a big proponent of meaningful documentation. I create topology diagrams and workflows in anything from PowerPoint to Visio, and term glossaries in anything from Word to whatever product the client wants to use. Sometimes these documents reside in SharePoint so that other developers (or savvy business users) can read them.
Meaningful documentation is important. It doesn't do anyone any good if critical information is stored only the heads of the lead developers on a project. But there's one thing that's worse than no documentation: documentation that's out of date. Inaccurate documentation can send other people down a bad path, which can lead to very bad consequences. Every year, people face bigger pressures to produce; they don't need to waste time. Imagine how angry you'd be if you spent hours of development time based on documentation that hadn't been updated in months or years!
If you accept responsibility (or even volunteer for it) for any technical documentation, give it the attention it deserves. Not just for now, but always.
Item #10: Build Acceptable Naming Conventions and Adhere to Them
I'll freely admit (and co-workers will attest) that I don't always stick to naming standards, even ones that I initially set. There's really no excuse for this. The reason for conventions is to avoid confusion and establish a general method for describing objects.
In a Business Intelligence environment, naming is even more critical, as the applications surface metadata names to end users.
It might seem obviously absurd to call columns “ShipDate” and “ShipDate2” but I've seen it. Don't ever do this! Remember that analogy: If you were fired tomorrow, and your replacement came in and looked at your naming conventions, what would that person realistically (and justifiably) change? If the answer is anything more significant than “nothing,” pretend to fire yourself and be your own replacement. If the name of a column or table is either so esoteric or so cryptic that its function is unidentifiable (like “ShipDate2”), then you know that there's work to do!
Item #11: Know Where “Worlds Collide,” When .NET and SQL Server Come Together
I briefly taught .NET programming and then went on to teach SQL/BI programming for years. There was a topic I always told my students to keep in the back of their heads: that sometimes “worlds collide.” In this instance, .NET applications need to interact with SQL/BI applications. Here are some examples of real-life situations:
- Writing .NET code to access an OLAP database
- Writing a C# script to generate an SSRS report from within an SSIS package
- Launching an SSRS preview inside an ASP.NET webpage
- Calling an external Web service or WCF service from inside an SSIS package
- Launching a SQL Server Agent job from just about anywhere
Over the years, I've written on these topics for CODE Magazine. I encourage people to go back and check out those articles.
Item #12: Understand What a DBA Goes Through and How You Can Help
Application developers need to understand the pressures that DBAs face. They also need to understand that DBAs are evaluated by different standards than application developers. The best DBAs are those who preside over databases that run nearly (or precisely) 24x7, with no disruptive events. Most DBAs are very conservative and don't want disruptions.
Here are three tips to help you help the DBA:
- When you provide scripts, make sure to test them thoroughly, and more than once. A script that runs correctly the first time but bombs the second time will be of no value to a DBA.
- Know how to use all the tools at your disposal to evaluate SQL code. That includes SQL Profiler, Execution Plans, etc.
- Test any T-SQL code for potential performance issues. Just because code runs in 1.5 seconds doesn't mean there isn't room for improvement in a production scenario where many services are competing for resources.
Item #13: Keep an Eye on What Microsoft Plans for the Next Product Release
Between 2005 and 2008, there weren't many releases of SQL Server. SQL Server 2000 had a very long period as the reigning Microsoft database, and many companies continued to use it even after Microsoft released SQL Server 2005. In those days, most developers usually didn't need to pay close attention to the newest version, since it could take months or even years for companies to upgrade.
But the last three years have been a different story, as Microsoft has released several new versions with compelling reasons to upgrade, with shorter periods between releases than the previous 10 years. For instance:
- Late spring 2010, SQL Server 2008R2 (Reporting Services enhancements, Power Pivot 1.0)
- Early spring 2012, SQL Server 2012 (Columnstore Index, New SSAS Tabular Model, better SSRS integration with SharePoint)
- As of this writing, Microsoft has released the first Technology Preview for SQL Server 2014, which contains OLTP performance enhancements. There is no formal release date yet, though many are hoping for a release in the first half of 2014.
By early 2014, there likely will have been more releases in the last three or four years than there were in the eight years before that.
Since 2007, I've always maintained a virtual machine copy of every version of SQL Server that I was maintaining (production/client work) and a virtual machine copy for the next release (where one existed). I recommend that all SQL Server developers do this, and stay aware of what is coming down the pike. At my client sites, I see a higher percentage of companies that are looking seriously at new versions than even five years ago. To be the most effective, you need to keep up.
In Conclusion
Managing your career in the IT industry has always brought challenges. I've been at it for nearly 27 years and I've seen incredible advancements in software tools. I've also seen expectations grow. Success means being smart, quick, confident, and drawing on prior experiences.
It's also a humbling industry if you don't spot an opportunity. It's almost impossible to write the ultimate book on career advice, even for a single discipline like SQL Server application development and BI development. Sometimes you have to learn from your own mistakes. I hope that I've provided enough here to get you started being productive and successful on your next project.