I spend a large proportion of my time teaching classes in a variety of open-source technologies — specifically, Ruby, Python, PostgreSQL, and Git. One of the questions that invariably arises in these classes has to do with the case sensitivity of the technology in question. That is, is the variable “x” the same as the variable “X”?
In nearly ever case, the technologies with which I work are case sensitive, meaning that “x” and “X” are considered two completely different identifiers. Indeed, the Ruby language goes so far as to give capitalized identifiers a special status, calling them “constants.” (They’re not really constants, in that you can always redefine a Ruby constant. However, you will get a warning when you reassign it. For this reason, I prefer to call them “stubborns,” so that people don’t get the wrong idea.)
SQL is a completely different story, however: The SQL standard states that SQL queries and identifiers (e.g., table names) aren’t case sensitive. Thus, there’s no difference between
select id, email from people;
SELECT ID, EMAIL FROM PEOPLE;
I find both of these styles to be somewhat unreadable, and over the years have generally followed Joe Celko‘s advice for capitalization in SQL queries:
Given that rule, the above query would look like this:
SELECT id, email FROM People;
Again, this capitalization scheme is completely ignored by PostgreSQL. It’s all for our benefit, as developers, who want to be able to read our code down the road.
Actually, that’s not entirely true: PostgreSQL doesn’t exactly ignore the case, but rather forces all of these names to be lowercase. So if you say
CREATE TABLE People ( id SERIAL NOT NULL, email TEXT NOT NULL, PRIMARY KEY(id) );
PostgreSQL will create a table named “people”, all in lowercase. But because of the way PostgreSQL works, forcing all names to lowercase, I can still say:
SELECT * FROM People;
And it will work just fine.
Now, there is a way around this, namely by using double quotes. Whereas single quotes in PostgreSQL are used to create a text string, double quotes are used to name an identifier without changing its case.
Let me say that again, because so many people get this wrong: Single quotes and double quotes in PostgreSQL have completely different jobs, and return completely different data types. Single quotes return text strings. Double quotes return (if you can really think of them as “returning” anything) identifiers, but with the case preserved.
Thus, if I were to repeat the above table-creation query, but use double quotes:
CREATE TABLE "People" ( id SERIAL NOT NULL, email TEXT NOT NULL, PRIMARY KEY(id) );
I have now created a table in which the table name has not been forced to lowercase, but which has preserved the capital P. This means that the following query will now fail:
select * from people; ERROR: relation "people" does not exist LINE 1: select * from people; ^
It fails because I have created a table “People”, but I have told PostgreSQL to look for a table “people”. Confusing? Absolutely. If you use double quotes on the name of a table, column, index, or other object when you create it, and if there is even one capital letter in that identifier, you will need to use double quotes every single time you use it. That’s frustrating for everyone involved — it means that we can’t use the nice capitalization rules that I mentioned earlier, and that various queries will suddenly fail to work.
The bottom line, then, is to avoid using double quotes when creating anything. Actually, you should avoid double quotes when retrieving things as well — otherwise, you might discover that you’re trying to retrieve a column that PostgreSQL doesn’t believe exists.
Now, let’s say that you like this advice, and you try to take it to heart. Unfortunately, there are places where you still might get bitten, despite your best efforts.
For example, the GUI tool for PostgreSQL administration, PGAdmin 3, is used by many people. (I’m an old-school Unix guy, and thus prefer the textual “psql” client.) I’ve discovered over the years that while PGAdmin might be a useful and friendly way to manage your databases, it also automatically uses double quotes when creating tables. This means that if you create a table with PGAdmin, you might find yourself struggling to find or query it afterwards.
Another source of frustration is the Active Record ORM (object-relational mapper), most commonly used in Ruby on Rails. Perhaps because Active Record was developed by users of MySQL, whose table and column names are case-sensitive by default, Active Record automatically puts double quotes around all table and column names in queries. This can lead to frustrating incompatibilities — such as if you want to access the column in Ruby using CamelCase, but in a case-insensitive way in the database.
PostgreSQL is a fabulous database, and has all sorts of great capabilities. Unless you really want your identifiers to be case-sensitive, though, I strongly suggest that you avoid using double quotes. And if you encounter problems working with columns, check the database logs to see whether the queries are being sent using double quotes. You might be surprised, and manage to save yourself quite a bit of debugging time.
One of the most celebrated phrases that has emerged from Ruby on Rails is “convention over configuration.” The basic idea is that software can traditionally be used in many different ways, and that we can customize it using configuration files. Over the years, configuration files for many types of software have become huge; installing software might be easy, but configuring it can be difficult. Moreover, given the option, everyone will configure software differently. This means that when you join a new project, you need to learn that project’s specific configuration and quirks.
“Convention over configuration” is the idea that we can make everyone’s lives easier if we agree to restrict our freedom. Ruby on Rails does this by telling you precisely what your directories will be named, and where they will be located. Rails tells you what to call your database tables, your class names, and even your filenames. The Ruby language, while generally quite open and flexible, also enforces certain conventions: Class and module names must begin with capital letters, for example.
It can take some time for developers to accept these conventions. Indeed, I was one of them: When I first started to work with Rails, I was somewhat offended to be told precisely what my database column names would be, especially when those names contradicted advice that I had heard and adopted years ago. (The advice was to prefix every column in a database table with the name of the table, which would make it more easily readable in joins. Thus the primary key of the “People” table would be person_id, followed by person_first_name, person_last_name, and so forth.) Over time, I have grown not only to use these Rails conventions, but to enjoy working with them; it turns out that people can changes pretty easily, at least when it comes to these arbitrary decisions.
The real benefit of such conventions has nothing to do with my own work. Rather, it reduces the need for communication among people working on the same project. If everyone does it the same way, then there are fewer things to negotiate, and we can all concentrate on the real problems, rather than the ones which are relatively arbitrary.
Back in college, I was the editor of the student newspaper. We, like many newspapers, used the AP Stylebook to determine the style that we would use. The AP Stylebook was our bible; whatever it said, we did. Of course, we also had our own local style, to cover things that AP didn’t, such as building names and numbers (e.g., we could refer to “Building 54″). In some cases, I personally disagreed with the AP Stylebook, especially when it came to the “Oxford comma.” But by keeping that rule, we were able to download articles from the Washington Post and LA Times, and stick them into our newspaper with minimal editing. Again, I prefer the serial comma, and use it in my personal writing. By adhering to a standard, I was able to ensure consistency in our writing, and reduce the workload of the (already hard-working) newspaper staff.
Twice in the last few weeks, I’ve been reminded of the benefits of convention over configuration — both times, when developers on projects I inherited decided to flout the rules. Their decisions weren’t wrong, but they were so wildly different from the conventions of Rails that they caused trouble, delays, and bugs.
So you can imagine my surprise when I looked for the application.js file, and didn’t find it. That was bad enough, but the asset pipeline mechanism, as well as the deployment scripts I was developing, got rather confused by the absence of application.js. When I confronted the original developer about this, he told me that actually, he liked to call it something else entirely, reflecting the name of the application and client. Why? He didn’t really have a technical reason; it was all for reasons of aesthetics. The fact is that the rest of the Rails ecosystem expected application.js, though, so his decision meant that the rest of the software needed to be configured in a special, different way.
As a way of justifying his decision, the other developer told me, “Conventions shouldn’t be a boundary when developing.” No, just the opposite — the idea is that conventions are there to limit you, to tell you to work in a way that everyone else works, so that things will be smoother. In much of the world, we drive on the right side of the road. This is utterly random; as numerous countries (e.g., England) have proven, you can drive on the other side of the road just fine — but only so long as everyone is doing it. The moment everyone decides on their own conventions, big problems can occur.
When Biblical Hebrew wants to describe anarchy, it uses the phrase, “People did whatever was right in their own eyes.”
Something similar occurred with another project where I inherited code from someone else: One of my favorite things about Ruby on Rails is the fact that it runs the application in an “environment.” The three standard environments are development (which is optimized for developer speed, not for execution speed), production (which is optimized for execution speed), and test (which is meant for testing). The environments aren’t meant to change the application logic, but rather the way in which the application behaves. For example, I recently changed the way in which e-mail is sent to users of my dissertation software, the Modeling Commons. When I send the e-mail in the “production” environment, the e-mail is actually sent — but when I do so within the “development” environment, the e-mail is opened in a browser, so that I can examine it. This is standard and expected behavior; all Rails applications have development, production, and test environments — and some even havea “staging” environment, in which we prepare things.
My client’s software, which I inherited from someone else, decided to do something a bit different: The code was meant to be used on several different sites, each with slightly different logic. The developer decided to use Rails environments in order to distinguish between the logical functions. Thus, if you run the application under the “xyz” environment, you’ll get one logical path, and if you run the application under the “abc” environment, you’ll get another logical path.
It’s hard to describe the number of surprises and problems that this seemingly small decision has created: It means that we can’t really test the application using the normal Rails tools, because nothing will work correctly in the “test” environment. It means that the Phusion Passenger server that we installed to run the application needs an additional, special configuration parameter (not normally needed in production) to find the right database, and execute with the correct algorithms. It means that when you’re trying to trace through the logic of the application, you need to check the environment.
Basically, all of the things that you can assume about most Rails applications aren’t true in this one.
Now, the point of me writing this isn’t to say that I’m brilliant and that other developers are stupid — although it is true that Reuven’s First Law of Consulting states that a new consultant on a project must call his predecessor a moron. Rather, it’s to point to the fact that conventions are there for a reason, and that if you insist on ignoring them, you’ll be increasing the learning curve that other developers will need to work on your application. Now, if you have oodles of time and money, that’s just fine — but as a general rule, a developer’s time is a software company’s greatest expense, and anything you can do to increase productivity, and decrease the need for explanations and communication, is worthwhile.
By the way, this is the whole reason why one of the Python mantras is, “There’s only one way to do it” — a direct contrast with the Ruby and Perl mantra, “There’s more than one way to do it.” Having a single, common way to do things makes everyone’s code more similar readable, and easier to understand. It doesn’t stop you from doing brilliant and interesting things, but does ask that you demonstrate your brilliance within the context of established practice.
Of course, this doesn’t mean that conventions are written in stone, or that they are unchangeable. But if and when you ignore them, it should be for good reason. Even if you’re right, think about whether you’re so right that it’s worth having multiple people learn your way of doing things, instead of the way that they’re used to doing them.
What do you think? Have you see these sorts of issues in your work? Let me know!