Thursday, March 26, 2009

How to NOT use XSD.EXE

I've never been a big fan of xsd.exe. It always felt kinda crude and clunky to me. No support for properties - simple type elements are public fields, limited control over name mapping, a bunch of other minor quirks and not so minor irritations. All of these complaints, when combined with the simple fact that it was a little documented, closed-source command-line tool added up to a less than spectacular solution. Up until now, however, it's worked (sort-of) and really been the only game in town for free xml-schema-to-.NET-classes code generator.

Recently I was working on an integration project. I'm sure you've heard this one before: REST-y web services consumes XML and does wonderful things with the XML behind the scenes. The business logic involved in handling the raw XML and doing all the right stuff was extremely complicated and everything (not the least of which my unit and system tests) we're getting lousy with XML manipulation code. It was pretty clear that I needed something strongly-typed and class-based that I could work with in my code (while still being able to effortlessly hydrate and de-hydrate) XML.

[ed. Note: as an aside, if you're a .NET developer using 3.5 and you're still doing your XML handling using XmlDom you really owe it to yourself to stop reading this blog post right now and go read the MSDN primer on LINQ to XML (http://msdn.microsoft.com/en-us/library/bb308960.aspx) I'm sure you're probably tired of all the LINQ hype by now, but frankly (and especially in this case) it is *entirely* justified. Once you've started using LINQ to XML you will strenuously avoid any other approach to reading, writing and querying XML. It's really that good.]

So here I am needing to generate .NET classes from my XSD. The funny thing is that my first instinct was to go hunt for a new tool to do this. I hadn't needed the services of xsd.exe for a long time and I was thinking to myself "Surely there *has* to be a new and better tool now." I was surprised and more than a little disappointed to find (after a *lot* of research) that aside from the usual smattering of exorbitantly priced commercial software there had been shockingly little progress with and not even much discussion about this particular subject. Everyone was still talking (and complaining) about xsd.exe.

At this point I'm simply resigned to using xsd.exe and moving on with my life. This is where things get a little weird. Apparently, there hasn't been a new version of xsd.exe since .NET 1.1. Well that's a little weird, but ok. I'll just go and find the SDK directory under… oh, wait, this is a new machine that's only had VS '08 installed. Well in that case I'll just go and grab the binary online, it shouldn't be too hard to find. [Three hours later…] Finally, I've got the xsd.exe binary. Now just run it and… "This program requires that you upgrade to the Microsoft .NET Framework Version 1.1" What?! Gahhhhh! Back to Google. (As an aside, I have no clue what's going on here and haven't had a chance to research it. I looked in \Microsoft.NET\Framework\v1.1.4322\ and all that's in there are gacutil and regsvc config files. Also, I see that in VS I have only 2.0, 3.0 and 3.5 as possible framework targets. Apparently .NET 1.1 doesn't install with Vista?)

Anyway, this is way more preamble than I intended for this post (though I'm really hoping to save others and "Future Jake" the pain I suffered around this particular issue (and the substantial lost time trying to sort it all out). So to make an already long story just a bit longer. I gave up on xsd.exe (which I really didn't want to use anyway) and re-doubled my research efforts. I have to admit at this point I was considering writing something myself, even though a recent experience with generating *documentation* from XSDs had made it really clear that this would've been a *serious* undertaking.

I really didn't expect to find anything more, but I tried. I don't know what new search term I used or link I clicked on that I hadn't the first time around but here's the breakthrough I made.

Daniel Cazzulino (an excellent and prolific blogger some may know as Kzu) posted a blog entry way back in October of 2003 that talks about the limitations of xsd.exe (specifically the public/settable nature of the members in the generated classes) and discusses the little-known and poorly documented in-built support in .NET for XSD-based code generation using the XmlSchemaImporter and XmlCodeExporter from the System.Xml.Serialization namespace. This approach requires shockingly little code and is, apparently, how xsd.exe does its code gen.

Here's that original post: http://www.clariusconsulting.net/blogs/kzu/archive/2003/10/24/96.aspx

Then, in a May 2004 post Kzu continues this conversation with an adaptation of the technique to turn his code generator into a VS.NET custom tool. http://www.clariusconsulting.net/blogs/kzu/archive/2004/05/14/XsdCodeGenTool.aspx

Note that the link to the code download on gotdotnet no longer works (since gdn is now dead).

It ends up that Kzu re-worked these blogs into an article for MSDN, complete with full source, later that month (May of 2004). http://msdn.microsoft.com/en-us/library/aa302301.aspx

The source: http://download.microsoft.com/download/5/E/9/5E923D54-242B-48F4-B3A1-DA8CDED6BE45/XsdGenerator.exe

Although it's relatively old (in Internet years) it's still an excellent article and a valid (*the* valid?) approach for XSD-based code generation in .NET. I used this instead of xsd.exe with great results (and this was a pretty complex schema set). It wasn't perfect. The output shows it's age a bit - no nullable type support, arrays instead of generic lists, and a few other minor quirks - but it worked for me "out of the box" with .NET 3.5 SP1 and the output was usable in it's raw form (ie. No post-generation tweaks required that later have to be manually diff-synced when you change the schema and re-gen, a major plus).

Anyway, just today I stumbled on a new (2009.01.27) project by Pascal Cabanel on CodePlex that references Kzu's article and cites it as the author's inspiration. Apparently, he's adapted the technique (and updated it) to produce a more rich, business-object type output. I haven't played with it yet, but it looks very promising. Updates once I've had a chance to try it out.

Pascal Cabanel's Xsd2Code: http://xsd2code.codeplex.com/

Add to del.icio.usDiggIt!RedditStumble ThisAdd to Google BookmarksAdd to Yahoo MyWebAdd to Technorati FavesSlashdot it

Monday, March 16, 2009

US Census as Test Data

Turns out that the US census is a very cool source for test data. For example, the census results from 1991 contain all first and last names with information on how common they are. I'm sure there are other useful pieces of data in there too.

Best of all it's free (or more to the point, you pay your license fee every April 15th).

Refenence:
US Census Data

Add to del.icio.usDiggIt!RedditStumble ThisAdd to Google BookmarksAdd to Yahoo MyWebAdd to Technorati FavesSlashdot it

Self-Signed Client Certs

When using self-signed client certs (for example testing TLS from a browser to your local dev server) be sure to add it to the “Local Computer => Trusted Root Certification Authorities” store AND the “Current User => Personal” store. Once you do this it will show up in IE/Firefox as an available client certificate and IIS will accept it as valid/trusted.

Also, IIS7 has a built in self-signed certificate generator for server certs in IIS admin. Long overdue.

Lastly, if you ever need to get at the thumbprint of the client certificate in ASP.NET, here’s the code:

if( Request.IsSecureConnection )
{
X509Certificate2 certificate = new X509Certificate2( Request.ClientCertificate.Certificate );
Response.Write( "X.509 Thumbprint = " + certificate.Thumbprint + "<br/>" );
Response.Write( "X.509 SubjectName.Name = " + certificate.SubjectName.Name + "<br/>" );
}
References:
ScottGu's Blog
Usenet Post

Add to del.icio.usDiggIt!RedditStumble ThisAdd to Google BookmarksAdd to Yahoo MyWebAdd to Technorati FavesSlashdot it

Knuckling Under

I've successfully resisted starting a blog for years (except for my Tweet Project blog, but that's a whole other story). I'm not down on the idea of blogging, mind you, I just never felt like I had anything to say that was so insightful that it warranted a public airing. I don't mean this in a self-deprecating way. On the contrary, I think I've got plenty of useful ideas to go around, but I never felt they were so significant that I had an obligation to memorialize them, and I don't love the act of writing so much that I just can't help myself. So the cost/benefit analysis just never quite came out right-side up for me. I know myself (and my ambivalence about writing) well enough to know that blogging as an exercise for its own sake would have a short and certain outcome in my hands. I'm honestly not being flippant, nor disparaging bloggers in general. While there is undoubtedly an abundance of mediocre (if not well-intentioned) blogs on the web there are a surprising number of excellent blogs out there, written by thoughtful, intelligent folks who are enthusiastic about sharing their unique perspective with the rest of the world. I wouldn't call myself an avid blog reader, but there are a handful that I follow regularly and countless more that have proved at least momentarily useful or thought provoking to me. Again, I don't have anything against blogging. I just never felt like I had a good enough reason to write one.

I've changed my mind.

Or maybe I've just changed the parameters a bit.

The bar on this blog is set pretty low. I'll try to use complete sentences and proper grammatical structure, but there won't be much in the way of evocative prose, deep insights, controversial political positions or flowery rhetoric. It's basically intended as a simple repository of stuff that I need to capture somewhere because it's value to me is greater than nothing, but not so high that it warrants some kind of formalized preservation (for example, having it tattooed on a body part).

I know, sounds pretty underwhelming. Let me try to explain.

In software development the first place you turn for information is the Internet. I know, I know... this is becoming increasingly true for most any subject, but it's always been true for software development, and specifically, web development. Over the years the quiet devotion and prudence of kind strangers has gotten me out of countless technology jams with little more effort on my part than a quick Google search. But honestly, aside from a sprinkling of Usenet postings way back in the day I've never been that person. Actually, even then I was generally the guy asking the questions, not answering them.

Recently I found myself in just such a situation and had a bit of a guilty revelation. Actually, in this particular case it was doubly bad because I knew that the vaguely obscure technical issue I was struggling with had previously visited my little world. This was a problem I had already solved at some point in the past and I had forgotten what the solution was. Needless to say, this situation is a little irritating and prompts all kinds of internal dialog along the lines of "Oh great, the beginning of the end... the early stages of old age. Soon I won't even know that I hate creamed corn, let alone remember how to fight off the nurse's aid feeding it to me." Of course, in retrospect I'm confident in knowing that this is the usual state of things. Developing software, like most any other human pursuit above Maslow's first-tier (and even some of those), is comprised of myriad tiny, interlocking bits and pieces of fleeting importance and sufficient complexity (both in their very composition and in the way they're wired together) that it's perfectly reasonable for an intelligent, non-Alzheimer's suffering person to forget many if not most of them. I suspect I'd be fully nuts by now if I didn't, actually.

So here I am, re-solving a problem because I can't remember the original solution, or even what I was doing when I last solved it [as an aside - this actually happens quite often but I'm able to remember the project I was working on when I solved the problem originally and I can just go troll my old code until I hone in on the answer]. Anyway, the first thing I do is jump on Google and start digging. And in this particular case, after a few minutes of fiddling with search terms and skimming results I not only find the solution to my problem in the form of a post on some random guy's programming blog, but I realize that this is the same place I found the answer the last time. I was ashamedly grateful enough that I actually posted a comment to thank him for saving me hours of trial-and-error, not once, but twice. It's important to note here that even the posting of a one-line comment is a bit unusual for me. It's not laziness, but instead that contributing in that way is just outside of my normal behavioral context. It sounds silly, but it's true. In fact, I'm convinced that it’s a pretty fundamental personality thing. A minority of people are content producers by nature and the rest of us are just consumers. The pigs-and-chickens thing as it applies to information sharing.

But here's where my revelation comes in. Writing stuff down is a potent form of remembering. Actually, it's the means to overcome our impressive, but still limited capacity to retain and share knowledge. Things would be very different if we had never figured this out. And props to the people who originally realized this because they actually had to invent writing too.

So here's the thing: capturing this stuff in a blog is a great way to ensure that it's close at hand if I ever need it again. This is true enough for simple things that didn't require a whole lot of effort to work out, but especially valuable for those times when the answer wasn't as obvious and I had to invest hours or even days driving to a solution. Obviously, it's preferable to not repeat that process as a sort of deja-vu problem solving exercise. Ultimately, taking a few moments to scribble some notes about the solution is a remarkably valuable investment. Even if it's only a small percentage of cases where I actually need that information again in the future, the potential time-savings far outweigh the cost. So from a purely self-serving standpoint it passes the cost/benefit test and is probably something I should've been doing all along. As a blog post, it reaps all the benefits of being a centralized repository of stuff that Google will at some point index, making it really easy for me to find later. Altruistically, this also makes it accessible to anonymous strangers half-way around the planet who happen to share the unique kinship with me of trying to solve that particular problem. Maybe it saves them a bunch of time. Maybe it helps them to come up with an even better solution, and maybe they post a comment about it, completing the feedback loop and rewarding me yet again for my original (and minimal) efforts. Even from the most selfish of perspectives this seems like a good idea.

Here's the thing though: I know myself well enough to know that in order for this to work I need to set the bar pretty low. I've already admitted to not being particularly inclined to written discourse (I'll talk until I'm blue in the face though…) This blog is not about good writing. This blog is going to be crafted to intentionally low standards. If I'm not on the hook for great writing then I actually stand a chance of producing something of value. So that's it. I'm giving myself permission to not worry about it. Posts will be terse. Grammar and spelling will teeter precariously on the brink of marginal. But maybe, just maybe it'll prove useful to me… or you.

This is the longest, best crafted and most thoughtful thing I'm likely to post. It's all down hill from here.

Oh, and as for the name - it doesn't mean anything. It just appealed to my right-brained alter ego. That guy doesn't get enough play so I thought I'd throw him a bone.

Add to del.icio.usDiggIt!RedditStumble ThisAdd to Google BookmarksAdd to Yahoo MyWebAdd to Technorati FavesSlashdot it