I am building a REST API for a service to query a MongoDB database. Initially, I went the standard route of providing "/user/1" to search for user id 1, etc. As I got further into the project, other developers started asking if we can add boolean search capabilities, such as being able to do "and", "not" and "or". Thinking of the amount of work needed to create a DSL for this, I thought about just having the REST API accept a MongoDB query JSON object, like so (pretend this is passed via POST):
/query/{"$or": [{"user": "1", "user", "2"}]}
Now, before I pass that query to MongoDB, I will do the following:
- Validate the JSON object
- Make sure the string is used only in the
query
function, not update
, runcommand
, or aggregation
- Verify that there is no
$where
clause in the query, since that allows script execution
Would doing this be enough to prevent injection? Reading the MongoDB FAQ, it appears that passing JSON into the query operation is harmless, since you cannot run any javascript with it (with the exception of $where). Is this a safe approach to take?
As you already note, due to the nature of the JSON parsing means that MongoDB is not open to the same type of "scripting" injection attacks as can possibly be done with an API that allows SQL to pass through it.
For your point 2. The common sense approach is to have only certain operations as endpoints. So such as query
or with update
and basically require authentication on the operations performed by the client. So you would not expose potentially dangerous operations to the API.
Also there is general authentication and roles to consider. So you would only allow the API to perform the actions that are allowed by it's presented "role". That protects you some more without necessarily needing to check this in your code, or at least then just trap the error from an "unauthorized" operation.
Finally for 3. as a possible alternative to checking for the presence of the $where
operator in a provided query ( though the limitations of what you can do get better with each version ), you can actually turn this off on the server using the --noscipting
option.
So there really are quite a few protective measures you can take that helps you avoid "script injection" attacks, but generally speaking the same sort of dangers do not exist.
In general the whole approach depends on a few more things than mentioned in the first answer here, but concerning security the best approach is being paranoid.
Implementing database queries in a REST-API concerning security is a tough job, but I think you made already a good start:
- table-names are not exposed
- queries are reduced to most important parameters and not complete query-code.
What you still should consider and implement:
- any field-names like user
should only be accepted on server-side if they are existing in a predefined array of allowed fields.
- allowed fields could vary on base of the user who is using the REST-API, an API-Key or a user-group of API-Users.
- you should use https
to avoid man-in-the-middle-attacks
. Even read-only
data can harm if they are manipulated on the way to the client. Also the installation of https should be tested on a benchmark-site i.e. https://www.htbridge.com/ssl/, just a certificate is not really secure.
- for some operations you could require one-time-hashes for additional security.
- the amount of requests per API-user could be limited, i.e. to 20 or 100 per hour (as low as possible).
- keywords like field-names and SQL like or
, where
, etc. should be mapped, so that they are either compared to an array of allowed expressions like explained above, or in JSON they could be even different and mapped on server-side to real query-expressions. So on client-side it's called perhaps user
but on server-side in the database-table the field is called username
and this fieldname is nowhere exposed to the clients.
- check every API-user if (s)he is allowed to see the requested data. As example some API-users might perhaps only see special user-groups of the data-table but not all.
- never rely on any client-data, validate and control everything, even the choice that is accepted on server-side is already very limited in your example.
- if where-clauses are granular translated in JSON you can allow them too, just strings are hard to validate. It just makes a bit more work, if you map all fields like advised above to keep documentation complete and correct even the fieldnames might be different for API-users than in reality on server-side.