Ensure “Reasonable” queries only

2020-06-21 05:52发布

In our organization we have the need to let employees filter data in our web application by supplying WHERE clauses. It's worked great for a long time, but we occasionally run into users providing queries that require full table scans on large tables or inefficient joins, etc.

Some clown might write something like:

select * from big_table where
Name in (select name from some_table where name like '%search everything%')
or name in ('a', 'b', 'c')
or price < 20
or price > 40
or exists (select 1 from some_other_table where col1 + col2 + col3 = 4)
or exists (select 1 from table_a, table+b)

Obviously, this is not a great way to query these tables with computed values, non-indexed columns, lots of OR's and an unrestricted join on table_a and table_b.

But for a user, this may make total sense.

So what's the best way, if any, to allow internal users to supply a query to the database while ensuring that it won't lock a dozen tables and hang the webserver for 5 minutes?

I'm guessing that's a programmatic way in c#/sql-server to get the execution plan for a query before it runs. And if so, what factors contribute to cost? Estimated I/O cost? Estimated CPU cost? What would be reasonable limits at which to tell the user that his query's no good?

EDIT: We're a market research company. We have thousands of surveys, each with their own data. We have dozens of researchers that want to slice that data in arbitrary ways. We have tools to let them construct "valid" filters using a GUI, but some "power users" want to supply their own queries. I realize this isn't standard or best practice, but how else can I let dozens of users query tables for the rows they want using arbitrarily complex conditions and ever-changing conditions?

11条回答
做自己的国王
2楼-- · 2020-06-21 06:23

If you have so much data and you want to provide to your customers the ability to analyse and view the information as they want to, I strongly recommand to thing about OLAP technologies.

查看更多
萌系小妹纸
3楼-- · 2020-06-21 06:27

Instead of allowing employees to directly write (append to) queries, and then trying to calculate the query cost before running it, why not create some kind of Advanced Search or filter feature that is NOT writing SQL you cannot control?

查看更多
迷人小祖宗
4楼-- · 2020-06-21 06:34

You can try using the following:

SET SHOWPLAN_ALL ON
GO
SET FMTONLY ON
GO
<<< Your SQL code here >>>
GO
SET FMTONLY OFF
GO
SET SHOWPLAN_ALL OFF
GO

Then you can parse through what you've got. As to where to draw the line on various things, that's going to take some experience. There are some things to watch for, but nothing that is cut and dried. It's often more of an art to examine the query plans than a science.

As others have pointed out though, I think that your problem goes deeper than the technology implications. The fact that you let unqualified people access your database in such a way is the underlying problem. From past experience, I often see this in companies where they are too lazy or too inexperienced to properly capture their application's requirements. I'm not saying that this is necessarily the case with your corporate environment, but that's what I've seen.

查看更多
smile是对你的礼貌
5楼-- · 2020-06-21 06:36

I have worked a few places where this also came up. What we ended up doing was NOT allowing users unconstrained access, and promising to have IT do their best to provide queries when needed. The issue was that the database is fairly complicated, and even if users could write grammatically and syntactically correct SQL, they don't necessarily understand the relationships between the tables. In other words, even if they could write their own SQL they would get the wrong answers. We convinced the users that the risk of making the wrong decision based on a flawed or incomplete understanding of the 200 tables in the database was too high. Better to get the right answer after a day than the wrong one instantly.

The other part of this is what does IT do when user A writes a query and gets 1 answer, then user B writes what he thinks is the same query and gets a different answer? Is it IT's job to find the differences? To fix both pieces of SQL? etc. The bottom line is that I would not allow them access. I would load the system with predefined queries, as others have mentioned, and try to train mgmt why that is the only way it will work in the long run.

查看更多
够拽才男人
6楼-- · 2020-06-21 06:36

This is the reason that direct SELECT permission is almost never given to users in the vast majority of applications.

A far better approach would be to engineer your application around use cases so that you are able to cover a reasonable percentage of requirements with specifically designed filters/aggregation/layout options.

There are a myriad of ways to do this so some analysis of your specific problem domain will definitely be required together with research into viable methods.

Whilst direct SQL access is the most flexible for your users, long executing queries are likely to be just the start of your headaches. SQL injection is a big concern here, whether it's source is malicious or simply misguided.

查看更多
爱情/是我丢掉的垃圾
7楼-- · 2020-06-21 06:37

In very large Enterprise originations on internal application this is a common practice. Often during your design phase you will limit the criteria or put sensible limits on data ranges, but once the business gets hold of the app there will be calls from the business unit management to remove the restrictions. In my origination this is a management problem not an engineering issue.

What we did was profile all of the criteria and found the largest offenders, both users and what types of queries caused the most problems and put limitations on some of the queries. Also some very expensive queries that were used on a regular basis were added to the app and the app cached the results and ran the queries when load was low. We also created caned optimized queries for standard users and gave only specified users the ability to search for anything. Just a couple of ideas.

查看更多
登录 后发表回答