Sep 152005

Language Integrated Query (LINQ) is clearly the hot topic of PDC this year.  Windows Vista dominated the keynotes and looked very pretty up there on the screen, but nothing gets developers salivating like language enhancements.  What is LINQ, really, and what does it mean for Delphi?

The founding premise behind LINQ’s many facets is that there is a huge gap between program logic written in a high level programming language and the data that it operates upon.  The more data you have to process, the farther away it is from your program.

Think about it.  Why is it that you program using strong data types, objects, method calls, and optimized compiled code, and yet whenever you want to deal with any data at scale, your data access is reduced to text strings containing SQL query statements?  All your strong data types, objects, and optimized code get thrown out the window just when you need it most.  This model requires the programmer to be fluent in two different language syntaxes and two different type systems with different quirks and representations.  This model requires constant conversion between the data in the SQL domain and your program’s domain.

This gap has long been recognized, and spawned several diverse techniques to narrow the gap between program logic and large scale data storage.  SQL store procedures, for example, were created so that you could bring the code closer to the data.  But oh, what code you had to write to get that to work!

This is not an attack on SQL itself.  SQL is very expressive and powerful for specifying how you want to filter, organize, and view a large set of data.  The problem is that SQL is always ‘far away’ and alien to your programmers.  Programmers are forced to think about data in SQL terms but think about the application logic in terms of their preferred programming language.  That discontinuity is an opportunity for human mistakes and wasted execution cycles.

What if you could bring the expressiveness of SQL closer to your core application logic?  Not just physical distance, but conceptual distance as well?  What if you could perform set logic on large data in your preferred programming language, using the objects, data types, and variables of your normal code, without having to jump through translation hoops?

That’s what LINQ aims to do.

I’ve been aware of the LINQ project within Microsoft for quite some time, but hadn’t heard any specifics about it in part because “it’s too early” and “it’s not fully baked yet.”  So I was a bit surprised to see it headlining at PDC this year.  I suspect there was some mad scrambling going on behind the scenes to prep the LINQ preview release for PDC.  I’m glad they did!

LINQ is a combination of language syntax and .NET framework infrastructure to support SQL-like query capabilities within high-level programming languages, operating on data in the programming languages’ domain.  Here’s the exciting part:  LINQ is independent of SQL.  LINQ style queries operate on any data construct that implements IEnumerable.  Sure, that includes ADO.NET data sets hooked up to SQL back-ends.  But it also means you can perform sophistocated queries and views of data stored in your program’s local lists, collections, and even arrays.  Want to sort that array of integers?  Slap up a query with an orderby clause.

Enormous Back-end Optimization Opportunities

Far from the acreage of anyone’s derriere, LINQ provides an abstraction that separates the concepts that programmers need to deal with from the back-end implementation details.  The power of Language Integrated Query is that the programmer specifies what he/she wants as output, but does not need to get tangled up in the specifics of exactly how that output is achieved.  That means the how can be be implemented and reimplemented a multitude of ways by very speciallized experts without affecting how everyday coders specify the what.

Different data sources may implement their own query operators that are specifically tuned to the way they represent data internally.  A SQL dataset will take the LINQ query and emit a SQL select statement to throw across the network to the SQL server.  A query against an in-memory array will spin through the array looking for entries that satisfy the where clause conditions.  (ECO object spaces, anyone?)

And what about those where clause conditions?  The where clause has to be evaluated for every element in the data source to decide which elements should appear in the result set.  That could be a lot of execution.  What if the LINQ query back-end were smart enough to recognize when the where clause is a locally isolated expression (no global state or side effects) and in such cases distribute the testing the data across multiple processors in multiple threads?  That would be incredibly powerful.

In this way, LINQ presents an enormous opportunity to put parallel computing in a package that everyday coders can use safely and easily.  Exploit those multicore processors!  Even grid computing is within reason.

And Now the Critics

LINQ does have weak points and criticisms.  When dealing with external SQL data, LINQ’s programming model uses compile-time data binding to associate data in a particular column of a particular table in the SQL database ‘over there’ with a particular data object ‘over here.’  You do this all the time in Delphi when you create field components to correspond to the data fields you want to use in your program.  However, hard binding to database schema that is defined outside your program is a risk.

What happens when some bozo (even a well-meaning bozo) decides to change a field name or table name on the SQL server?  Your app breaks.  Many times in PDC sessions involving LINQ, people in the audience asked if there would be any way to define mapping tables to go between the compile-time hard bindings and the external database.  The answer was no, for now.

Another criticism of LINQ is that it brings the complexity of query construction right into the middle of your programming logic.  It may be very convenient to assemble a VW in your living room, but is that really where it belongs?

I agree that there is ample opportunity for abuse here, but the risk of obfuscation is outweighed by the power of expression.

Consider this against the VW metaphor:  Indoor plumbing vs the well-understood “mature” technology of the outhouse.  You have to leave the warmth and comfort of your house to use the outhouse.  You have different expectations of your outhouse experience (smell, view, decor) than of your ‘inhouse’ experience.  Ignoring the obvious data flow flaw in this metaphor (inflow vs outflow), the traditional external database model (SQL and others) is the outhouse.  LINQ proposes to replace your data outhouses with indoor plumbing.

What were the objections to indoor plumbing when it was first introduced?  Smell, obviously.  Unreliable, leaky plumbing.  Fear of rats entering the house through the toilet.  (seriously)  And yet indoor plumbing was considered a decadent luxury even hundreds of years before Thomas Crapper popularlized (but did not invent) the P-trap flush toilet to solve the smell issue.

“I don’t want no stinkin’ database crap in my code!” 

If you’re writing programs that don’t deal with data, perhaps that’s a reasonable stance.  For the rest of us, we already have database ‘crap’ running all through our program code.  Perhaps the real frustration behind that stance is that dealing with external data has never been completely smooth or seamless.

I know of whence I speak.  I made the statement above when I first learned of the decision to make database support the third pillar of the then embrionic Delphi project (visual application design, compiled code, integrated database support) some 17 years ago.  I had plenty of experience with dBase II and dBase III+, and I didn’t want to see Pascal adulterated by database stuff like Browse commands.  Eventually, I realized I was being a Luddite and accepted that database support (intelligent, elegant database support) was critical to Delphi’s success, because when you get right down to it data is what programming is all about.

LINQ for Delphi?

In a word:  Yes.

I was very interested to see if the PDC sessions would have any tangible content about where LINQ draws the line between proprietary language magic and system supported infrastructure that all languages can leverage.

If LINQ were mostly a C# compiler trick, then that would be the end of the story.  No excitement, nothing to write home about.  But Anders Hejlsberg is no dummy.  While he is often labelled the C# language architect, his true task is to advance the .NET platform.  Advancing the platform requires rising above any particular language, including your own.

Anders never fails to deliver on content, and I am pleased to see that LINQ provides significant system infrastructure for languages to leverage in implementing their own support for LINQ style operations.  It’s also a great relief that LINQ’s infrastructure runs on .NET 2.0 – no new CLR to wrestle with.  Yet.

Make no mistake – there is an enormous amount of work that we need to do to the Delphi syntax, compiler, and tool chain just to get to the point where we can start playing with LINQ notions.  LINQ is clearly “next generation,” a child of “what could you do if you had a really good generics implementation under the hood” thinking.  LINQ’s infrastructure is heavily dependent upon generic types and new things similar to or built upon anonymous methods.  So, first things first:  Delphi generics.

What will Delphi’s LINQ syntax look like?  I don’t know yet.  I’d prefer something more elegant than SQL’s Cobol-like froms, wheres, and wherefores (which is the path C# 3.0 is taking), but some verbage is necessary to avoid falling into the Algol #$%!! pit.

It’s interesting to observe that Delphi already has a language construct to perform set logic (intersections, unions, etc) on bags of data:  Delphi set types.  The element type of a set is currently limited to ordinal types with 255 or fewer elements, but what if it weren’t?  What if you could manipulate your terabytes of SQL data using types like “set of (Person, Address, Phone);“?