Part 2: Forms—A better way
Part 3: Defiant—A step toward unification
What's Missing In Form Security
One day, a professor came to me with an interesting question: "Corey, my server is getting hacked! Can you figure out how?" "Can you tell me what exactly is happening?" "Yes," he replied. "Somehow, people are uploading files to my server, and trying to hack me." And with that, I delved into the world of form handling (a solved problem, I had assumed), and was appalled at what I found.
This professor was neither unintelligent nor shoddy in programming skills. What he was not, however, was an expert in web security, and therein lies the problem that this post is about. It's not about SQL injection. It's not about using the newest tools or the most recent library. It's about understanding how form submissions can be your Achilles' heel. The professor was qualified in his programming ability, but he was not a web security expert, and you are probably not an expert, either. He was doing everything right, according to the docs and numerous tutorials on how to work with his particular framework, but the information was wrong. The scariest thing is that it's very likely that the framework that you are using is getting it wrong, too.
For my purposes, I will outline these specific areas that I feel that any form processing code should address, after which I will propose my solution to the question:
- CSRF Tokens
- Input validation, Client and Server side
- Form Integrity
- Form Injection
- Form expiration
- Composition of tools
CSRF (Cross-Site Request Forgery) Tokens are the one measure that most people and frameworks do indeed implement, and yet only in a minimally significant way. For CSRF protection, a unique token is generated when the form is generated, and that token is associated with the user in some back end process. When the form is submitted, the token is validated to confirm that it was generated for this user. If the token validation fails, the form is not processed. This is the functionality that most frameworks provide. Tokens can be rescinded, or multiple tokens may be valid for a specific user, according to the implementation of the framework.
On the surface, this sounds great! The problem is, though, in how the system may be abused. There is often no back end correlation between the token that was generated and the form that it was generated for. For some systems, abuse may be as simple as taking a token that was generated for the user from a valid form (a search form, perhaps?) and grafting into a different, malicious form. Because the token validates, the form submission is processed.
CSRF Tokens are indeed a good first step, but they are minimal in effectiveness if left as the sole measure of form security.
Input validation, Client and Server Side
There is an axiom that good developers must repeatedly quote to bad developers and project managers: "Never trust user input."
If a person can copy-and-paste a Mark Twain novel from Project Gutenberg into your Title field that is "limited" to 60 characters, then your code must catch it. If you have a single-select country drop-down list, and they can edit the form to be a multi-select with the values "Pluto" and "Neptune", then your code should catch it. If it doesn't, then you are not doing your job.
Form Integrity gets into an area that many frameworks do not address. It is the idea of: "I (the framework) just gave you (the user) a form containing X, Y, and Z. When you return the form, does it still contain X, Y, and Z?" Drupal, I feel, gets this mostly right. Their solution is to store a copy of the form object that was generated so that, when the form is submitted, the stored copy is retrieved and compared against. The biggest drawback to this system is that the framework must store A LOT of forms, and if you have a busy site, then your backing store (probably a database) is going to balloon. Even Drupal, which only stores the cached forms for 6 hours, can have gigabytes of storage dedicated to this one table (Although Drupal 8 has mitigated this problem somewhat). I will propose a solution to this problem at the end of this blog post.
If you are not ensuring the integrity of the form, however, then you are playing Russian Roulette with form submissions. You cannot verify that the information passed to the browser is the same information returned to you. You cannot verify that the form has not been maliciously tampered with. Granted, this may not be important for some forms (the aforementioned search form, for example), but it can have unexpected consequences if you are not writing contingencies in your code.
Form Injection is almost the reverse of Form Integrity, in that Form Integrity is protecting against malicious deletions from your form, whereas Form Injection is malicious additions to your forms. One example of this is an attack in which a form element is duplicated within the same form. In this particular attack, the code signaled that the form element validated, but then the properly-validated code was replaced by non-validating input. Because the validation signal had already been given, though, the form was accepted with invalid input. This was a specific attack against a particular validation strategy, for sure, but it underscores the necessity of protecting against form injections.
Form injections could also be used as an attack designed to use up the available memory in a server. This type of attack relies on the code keeping a copy of the form submission throughout the page generation. If the page generation takes a non-trivial amount of time, and multiple forms are submitted, each with large POST contents, then it would be very easy to overwhelm smaller servers. For this reason, I advocate the immediate deletion of any unexpected form element, and, depending on the situation, perhaps even non-validating form elements (if the size exceeds a certain threshold).
Forms should expire. The length of life my vary depending on the needs of the application for that form, but expiring forms should be the rule, not the exception. It is possible to implement expirations by invalidating the CSRF Token discussed earlier, but that does not always exist for all frameworks, and it is a rather blunt tool. If forms share the same token, then a form generated 5 minutes ago will be invalidated at the same time as a form generated 5 hours ago. A better system would be a per-form invalidation, but if you are relying on the CSRF token to provide this, then you will have a storage problem similar to (although not as fast-growing) as Drupal's form cache.
Tool Composition refers to the way in which a programmer implements different tools and libraries to get the job done. Over the last few years there has been a large push to write small, single-purpose libraries so that the developer can assemble them as he wishes. It is often likened to the Linux command-line tools, in which case you can pipe together several smaller commands, resulting in custom, complex functionality. This is great in so many ways, providing you know how to use the tools and all of their intricacies. The problem is, though, that web security requires more than just a glance through a man page: you must actually understand the implications of the choices that you make!
It was this type of security hole that the Professor (mentioned at the beginning of the post) had experienced. He had written a Node.js site. He had used Express. He had added middleware for processing form POSTS. He had written code to handle validated submissions. He just didn't realize that the POST-processing middleware was automatically saving file uploads to the /tmp directory, and that it was his code's responsibility to delete these files if they were unneeded, even if it was just a search form, or a login form, or any other form that didn't necessarily have a file upload as part of the form.
You could say that this is a combination of the Form Injection and Tool Composition issue, and you would be right. The Node.js middleware was doing it's job. It did not know what kind of information that the form needed, because handling the form validation was not part of its job. The form code did not know that the middleware had saved files to the disk, and the programmer certaintly did not realize that he had to make this check in every single form! All told, this is a perfect attack pattern that I have seen in several other large frameworks. (NOTE: Obviously, the files could not be executed, but they could definintely fill up the disk, which would then cause the database to stop responding, logs to no longer be written, and the server to potentially crash. Upon reset, the /tmp directory would be emptied by the OS, making post-bootup analysis difficult or impossible.)
All of these things make me ask: Why isn't there a better way?