Every so often, a question shows up on the forums in the form of, “How do I prevent anyone from accessing my database schema, including local administrators and sysadmins in SQL Server?” I usually
laugh a little shake my head when I read a question like this because it demonstrates an complete lack of understanding of the power an administrator has over SQL Server. The simple answer is this:
If you don’t want your database schema to ever be accessed or known, don’t distribute your database.
Why do I say that? It’s really simple, an local administrator has access that allows to get access to a database despite your best efforts to prevent it. This is why it is so important to limit who is a local administrator on a SQL Server. As a local administrator, even if the Builtin\Administrators group has been removed from SQL Server, they have the ability to stop the SQL Servers, and start it in single user mode from the command line and gain sysadmin access to the system. A sysadmin has unlimited access to anything that exists inside of SQL Server, so even if you enable TDE to encrypt the database on disk, a knowledgeable administrator can still access the data contained in a database.
Is a Database Schema considered Intellectual Property?
From a legal standpoint, I don’t actually know, I tried to do some research online and maybe I wasn’t looking at the right keywords, but I got different results, and nothing conclusive. I don’t know if it falls under intellectual property laws or copywrite law, but legally or not, personally, I say yes, it is intellectual property. However, that being said should you really go to extents to try and hide your schema and code in a database? No, as I said above, if you absolutely feel that your database schema, or the code contained within it is so private that it can’t be seen by anyone, you shouldn’t distribute it, and you should only offer your product as a Software as a Service (SaaS) solution. That is the only guaranteed way to protect your database, its table schema, and code from being accessed.
The simple fact is that administrators have the access to get the database files, and that means that even if they don’t have access to SQL Server itself, they can make a copy of them and attach them to another server where they have sysadmin access to gain access to the database schema, and any objects contained in the database. Things like encrypting the data inside the database, make accessing it harder, but as I previously said, an administrator can start the SQL Service in single user mode and gain access to the instance as a sysadmin which gives the the access to export keys and certificates used for encryption, and ultimately they could gain access to the data if so inclined. Beyond that, if they can change the SQL Server Service Account, they can get access to your database.
If there is no way to prevent access how do you protect your Intellectual Property?
These days almost every major application out there has some form of database behind it, and those applications aren’t all SaaS, most are installed by the end users, with little or no assistance from the software vendor. How do they protect their intellectual property? Licensing agreements plain and simple. Every product you install has a license agreement. For those that didn’t realize this, its usually that second or third page that you check the box and click next without ever reading during the installation. Microsoft for example has numerous applications that rely on SQL Server for data storage, SCOM, SCCM, SCVMM, MOSS, LCS, yet they don’t make efforts to restrict access to their databases beyond reasonable security standards, and most other vendors don’t either.
So it begs the question why would you think you need more protection than the majority of other software products out there? Is what you have really that “special” and “unique” that it bears someone going trying to steal it and remarket it on their own? Its always the small ISV’s or independent developers that seem to think in this manner, and I can certainly see the concern if you are small time, but in reality your efforts would best be spent creating an effective licensing agreement, rather than trying to figure out how to do the impossible.
The other alternative is to build your application/product as a cloud based solution, or offer it as a SaaS which keeps you in control of your data model and prevents prying eyes. I actually think that at some point as the cloud computing initiative continues to mature, we’ll start to see more and more applications moving that direction, especially in scenarios where the access to intellectual property is a concern. However, this can be a double edged sword, and could be a driving factor for why a potential customer doesn’t purchase your product. Data is important to businesses, and the ability to leverage the data from one system with the data from another one for BI solutions is becoming more and more in demand. Having a SaaS solution or cloud based solution that limits access to data is a significant limitation that could affect purchasing decisions when weighed against alternative solutions that have an openly published database schema.
Some things people try and why they don’t really offer protection
Over the last few years I’ve seen some really interesting ideas about how people might protect their database from a sysadmin, but there is always a hole that can be poked into them:
Encrypt Database Objects using WITH ENCRYPTION
I’ve seen databases where vendors encrypt every stored procedure in the database. That’s fine and dandy, and I don’t have a problem with it. Locks keep honest people honest, but they don’t keep someone determined to get in out. The WITH ENCRYPTION option inside of SQL Server isn’t going to stop someone from gaining access to the code in a stored procedure. It only takes a quick Google/Bing search and they will have access to numerous resources for decrypting the stored procedures.
The topic of encrypting stored procedures in SQL was recently discussed on twitter as a means of preventing customers from making unsupportable changes to the database code. I can certainly see the purpose behind that, and as commented by Brent Ozar (blog/twitter) on Jeremiah Peschka’s (blog/twitter) blog post “Encrypted Stored Procedures and their Effect on my Rug”, Quest software encrypts their code to minimize support costs. This is pretty weak, but I can certainly see how it would be just enough of a hassle that your average SQL person wouldn’t mess with it.
Use SQLCLR for everything and Obfuscate the Code at Compile Time
This one is really just a slight step above the use of WITH ENCRYPTION for irritating, but breakable. Notwithstanding the fact that putting everything in SQLCLR is just a bad idea, performance will be slower since data access is much faster using the tool designed for it TSQL, you haven’t gained a whole lot if all your SQLCLR code is doing is issuing TSQL against the database. SQL Trace and Profiler can easily trap everyone of your queries and beyond that, the plan cache in SQL 2005/2008 can be queried to get the statements from the plans themselves.
Beyond that, its really easy for someone to reverse engineer your SQLCLR Assembly using .NET Reflector, and the SQL 2005 Browser Addin. Even if you obfuscate the code, the TSQL statements are string fields that can’t be obfuscated, so they will be easily readable in text once decompiled by Reflector. The code obfuscation is really only going to slow down a determined person. Reflector can export all the code in the class files, and a Visual Studio solution that allows someone to navigate through the objects and methods to determine what calls what.
Install a SQL Express Instance and lock it down for use by only your application
If you’re database is under 4GB (now 10GB in SQL Express 2008 R2), this almost seems viable, but you won’t get past a Local Administrator since they can still gain access to the instance and/or database.
Encrypt/Decrypt all data in the Application
This will certainly make it harder for someone to access the data, but it also has a significant performance penalty with it as well since the database engine has to perform any lookups using an encrypted value. You’ve also increased the time it might take to perform support for a problem with the database since you have to decrypt all the data before you can begin troubleshooting an issue. With the application responsible for all encryption/decryption, you’ve protected the data from view, but made the application the weak link in the chain, and it can now be focused on to determine the algorithm being used and what the secret for that algorithm is. A persistent person with knowledge of a debugger can find these out with enough time.
Obfuscate Table and Column Names in the Database
I’ve seen this mentioned on the forums at least twice, but the idea is that you wouldn’t name a table EMPLOYEES, you’d name it YLEMESOPE, and then have similarly useless obfuscated names like MRFTISEAN for FIRSTNAME, and NESMATLA for LASTNAME. My question both times was, how exactly do you plan to be able to do ongoing development in a database named like that? The people always seem to think that they can manage it. I have to be honest, this one along with the last one, would probably make it so ridiculously hard to mess with that few people would actually go through the effort of trying to figure it out, but someone may have more time on their hands than I think. I’d hate to have to be the person who tries to support that kind of an application though.
If you consider your database to be so proprietary that it can’t be seen by anyone else, your only way to ensure that this occurs is to never distribute your database.