Security Boundaries, Encapsulation and SQL Injection

Introduction

The world seems to be obsessed with "SQL Injection" (consult your favorite search engine if for some reason you still don't know what it is). This obsession is highly related to the massive adoption of popular web development tools, but in fact has more general nature.

This article presents the way in which the vulnerability to SQL injection is related to the lack of encapsulation and misplaced security boundaries and also why the most popular solutions are actually wrong.

The problem and the "solution"

If we focus on web applications (just for the sake of having some focus), the typical system is composed of the following components:

The database server.
The application accessing the database. This application runs on a web-server.
The web-browser, which is in fact a more or less intelligent display terminal for the application.

Note that from the point of view of the database server, the web application plays a role of the database client. That's right, it is not the web-browser which is a client here, but the set of scripts running on the web-server. This observation is important and it allows to find the actual source of the problem.

SQL injection is a process that allows the attacker to trick the system to execute his own SQL code against the underlying database. This is performed by providing carefully crafted input to the application (please see other resources for details on how this is actually done). If the application naively uses the input data as part of the direct SQL query, then some dangerous operations can be executed on the database itself.

What is wrong in this setup? Complete lack of any security boundary that would play the role of a "firewall" against user-provided input.

What is the usually promoted solution? Checking or escaping the input values at the application level.

This solution introduces a security boundary that divides the world into two parts:

The "untrusted part" for the user with his web-browser.
The "trusted" part for the application and the database.

The assumption here is that as soon as the application verifies or transforms the input data to some "safe" form, everything is fine and the application can go on with its business by executing SQL queries against the database.

In other words, from the security point of view, the database and the client application form a single entity. This is a direct result of the place where the security boundary was located.

What can go wrong

As noted above, the web application is actually a client in the system. This means that the client is involved in getting the secure and safe database operation and that the integrity of the data in the database depends on the correct behavior of the client.

This is completely wrong, for at least two reasons.

The first reason is the assumption that the client is controlled and verified. This single client, the web application that exists for example as a set of PHP scripts, can be controlled by a trusted team of developers - but such a setup is very short-sighted. The whole point in having the database in the first place is to enable multiple clients accessing it. Even if the system starts with a single client it inevitably grows and that justifies adding more independent clients.

Consider the following scenario:

A group of developers provided a carefully verified web application. They made sure that all input values are validated or escaped and are happy with the security of the system.

Some time later the manager comes in and says:

"Our system is very successful and we quickly need more reporting tools. We have just signed a contract with Fubar, Ltd., which is a young but very dynamic company from [wherever] and they will develop the reporting tools for us. Please let them know how to connect to our database."

That's all. The security is basically gone, because there is no way to ensure that the guys from Fubar, Ltd. will develop their client application (yes, there is more than one client now!) in equally careful way.

The other reason for why things can go wrong is, unfortunately, a very low quality of tools that are typically used for developing the client applications - even that first one, which is presumably controlled and verified. Script interpreters are notoriously buggy with all sorts of problems on their own. Buffer overflows? There are lots of them. This means that the client application (or more generally any component of the system) that is built on top of these buggy foundations should never be treated as a security gateway - or that the security boundary should never be defined by such component.

Of course, we have to put our trust somewhere, but the only reasonable thing we can do is to limit the number of components that need to be trusted. Operating system? Seems to be necessary. The database server? Difficult to exclude either, but the set of trusted components really doesn't have to be any bigger than that.

What is the real problem

Encapsulation or rather lack of it.

It is shocking how easy it is to drop this fundamental and universal design principle. It is fundamental, because it has a big influence on the rest of the system. It is universal, because it is applicable virtually everywhere.

Let's leave the SQL injection aside for the moment and let's consider a simple programming assignment: a class that represents some value or state. A string class, perhaps. Or a collection of things. Whatever.

This seems to be a classic case for the following breakdown:

The private state. It is private, so that liberal manipulations of the state are not possible.
The public interface. It is public, so that it is accessible to the users.

The integrity of the state is guaranteed by the fact that it can be manipulated only through the provided interface.

Sounds obvious, right? Just like the first page from any OOP tutorial. It is so obvious that nobody should ever consider it to require any explanation. Developers and designers have this encapsulation pattern already as part of their brains, due to repeated application. Just like walking or eating.

Right?

Well... the real problem is that it is actually not the case and this was really the source of initial SQL injection problem in the web application example above - the database (the state of the system) was not encapsulated at all, letting every client (even if it was initially just a single client) to access the data without any limitations. The encapsulation pattern, something that is supposed to be part of designer's brain, somehow was not applied at all.

Encapsulating the database

The analogy with strings, collections and so on can be used further.

Keeping the state private? No problem, the database servers have this ability for ages with their system of privileges.

Having a public interface for manipulating that state? No problem, stored procedures that are available for clients (again, the system of privileges is your friend) do the job nicely.

In short: don't allow client applications to even see the data structures directly.

Encapsulating the database structure provides one major advantage: the security boundary is now closer to the database server. This means that the number of components that need to be trusted and verified is lower. It also means that the number of components that can benefit from the protection is bigger - including the reporting tools written by guys at Fubar, Ltd.

What problems are (not) solved

What is really solved with the encapsulation of the database structures is the problem of data integrity.

There is no possibility for the client application to manipulate the data (including deleting rows, dropping tables, etc.) in ways that were not foreseen by system designers.

The system can be even strengthen to the point where even session identifier or keys are generated and checked by the database itself, in which case the concept of data integrity can be extended to cover also the logical information flow, not only its static structure, which is usually targeted by the basic relational integrity constraints.

These issues are related to security and if they are solved, then the system can be considered as secure.

What is not solved with just encapsulating the database structures is the "fluency of user interaction".

Coming back to the example web application - if the application does not check or escape its input values, then it is possible for the legitimate user to introduce something that is a valid value (consider O'Neal as a family name, for example), but when naively used as part of the SQL query will result in incorrect procedure calls and will be rejected by the database system. The application should still escape such values to ensure that things work properly, but now the motivation for doing this is completely different. It is not the security of the system that is affected by any mistakes here, but only the user interaction.

In other words, by just moving the security boundary closer to the database, things that were security issues become user interface issues. This shift is very valuable and the difference is really in the cost of forgetting to do it. Forgetting to escape the string with non-encapsulated database can cost you the lost of data integrity if some attacker finds the hole. The same lousiness with the database that is already encapsulated can cost you at most some irritation of the legitimate user. Which of these two consequences do you prefer?

Similarly, one of the proposed alternative "solutions" to the original SQL injection problem is to use prepared statements and bind variables in the target programming language. Again, encapsulating the database structures behind the system of privileges and a set of publicly accessible stored procedures changes the motivation for applying this solution. Not doing it with non-encapsulated database is a security issue. Not doing it with properly encapsulated database is at most a user interface issue or maybe also a performance issue if it makes any difference for the underlying server. Which of these two consequences do you prefer?

Conclusion

Proper encapsulation of the database is the solution to the problem of uncontrolled modifications and the database systems are not different from any other aspect of the whole system. If you find it natural to divide your Java or C++ (or whatever) classes into private and public part, do the same with your database structures. By doing this the security boundary in the system will move closer to what you actually try to protect.

More explicit conclusion is that the majority of SQL injection related books and articles that promote client-side "solutions" without even touching the subject of proper data encapsulation are just misleading and providing a false sense of security. There is no security without encapsulation, period.

Side notes

I have mentioned that encapsulation can go as far as to the point where even the session itself is managed by the database server. In such a setup the designer can assume that the client application is completely untrusted and therefore can connect to the database without any passwords.

The common practice in various PHP applications is to hard-encode the database password in the scripts. This has very broad security consequences. With encapsulated database that encapsulates also the session management the PHP scripts can connect to the database without any password at all. There is no security consequence in doing this, and the advantage is that there is no need to worry about those hard-coded passwords.

Some web designers can rightly note that some popular database servers do not support this kind of encapsulation very well, for example due to the lack of stored procedures or crippled privileges system or whatever. The obvious answer to this is that instead of getting stuck with these crippled servers and broken security solution on the client side, I would strongly recommend migration to some real database server and then doing things properly.

Acknowledgements

Special thanks go to Jacek Wojcieszuk, a DBA at CERN, for his comments on the initial version of this article.