Query to display largest n% of values and group th

2020-04-01 06:13发布

问题:

I see someone posted something very similar to what I am looking to do in Access 2010.

Grouping of top 80% categories

I saw the response but am confused as to the nomenclature used and the titles assigned. I have 5 vendors that supply product. I only care about the top 80% as a pareto distribution and the remainder can be grouped as "Other"

4 vendors as Field [vendors]: A1, A2, A3, A4

4 values as Field [Lbs]: 4000, 5000, 200, 800

Query returns: A1, A2, Other

Thanks for any help anyone can provide.

回答1:

When performing calculations like this we need to be mindful of what happens when there is a tie, so let's use the following [VendorData]

vendors  lbs 
-------  ----
A1       2000
A2       3000
A3        200
A4        800
A5       2000
A6       2000

We can start by creating the following saved query named [VendorPct] in Access

SELECT 
    vendors, 
    lbs, 
    lbs_sum,
    lbs / lbs_sum * 100 AS lbs_pct
FROM
    (
        SELECT vendors, lbs, lbs_sum
        FROM
            VendorData,
            (
                SELECT Sum(lbs) AS lbs_sum FROM VendorData
            )
    )

It gives us

vendors  lbs   lbs_sum  lbs_pct
-------  ----  -------  -------
A1       2000    10000       20
A2       3000    10000       30
A3        200    10000        2
A4        800    10000        8
A5       2000    10000       20
A6       2000    10000       20

Now we can create a saved query in Access named [VendorPctCumulative]

SELECT 
    vendors, 
    Max(lbs) AS lbs_, 
    Max(lbs_pct) as lbs_pct_, 
    Sum(lbs_pct_other) AS lbs_pct_cumulative_
FROM
    (
            SELECT 
                vendors, 
                lbs, 
                lbs_pct, 
                lbs_pct AS lbs_pct_other 
            FROM VendorPct
        UNION ALL
            (
                SELECT 
                    v1.vendors, 
                    v1.lbs,
                    v1.lbs_pct,
                    v2.lbs_pct AS lbs_pct_other 
                FROM
                    VendorPct v1
                    INNER JOIN
                    VendorPct v2
                        ON (v2.lbs = v1.lbs AND v2.vendors < v1.vendors)
                            OR v2.lbs > v1.lbs
            )
    )
GROUP BY vendors
ORDER BY Sum(lbs_pct_other), vendors

That produces

vendors  lbs_  lbs_pct_  lbs_pct_cumulative_
-------  ----  --------  -------------------
A2       3000        30                   30
A1       2000        20                   50
A5       2000        20                   70
A6       2000        20                   90
A4        800         8                   98
A3        200         2                  100

Now one more saved query named [VendorPctCumulativeThreshold] to find the first cumulative percent that meets or exceeds the 80% threshold:

SELECT TOP 1 lbs_pct_cumulative_ 
FROM 
    (
        SELECT lbs_pct_cumulative_ 
        FROM VendorPctCumulative
        WHERE lbs_pct_cumulative_ >= 80
        ORDER BY lbs_pct_cumulative_
    )

i.e.,

lbs_pct_cumulative_
-------------------
                 90

and we can put it all together with

    SELECT 
        vendors, 
        lbs_ AS lbs, 
        lbs_pct_ AS lbs_pct
    FROM
        VendorPctCumulative vpc
        INNER JOIN
        VendorPctCumulativeThreshold vpct
            ON vpc.lbs_pct_cumulative_ <= vpct.lbs_pct_cumulative_
UNION ALL
    SELECT 
        "other" AS vendors, 
        Sum(lbs_) AS lbs, 
        Sum(lbs_pct_) AS lbs_pct
    FROM
        VendorPctCumulative vpc
        INNER JOIN
        VendorPctCumulativeThreshold vpct
            ON vpc.lbs_pct_cumulative_ > vpct.lbs_pct_cumulative_
ORDER BY 3 DESC, 1

producing

vendors  lbs   lbs_pct
-------  ----  -------
A2       3000       30
A1       2000       20
A5       2000       20
A6       2000       20
other    1000       10