Maximum date from sql table using SparkSQL query t

2019-08-16 02:20发布

问题:

This question already has an answer here:

  • How to select the first row of each group? 9 answers

I have a table in remote SQL database.

CUSTOMERID  ACCOUNTNO   VEHICLENUMBER   TAGSTARTEFFDATE
20000000    10003014    MH43AJ411       2013-06-07 13:07:13.210 
20000001    10003014    MH43AJ411       2014-08-08.19:10:11.519
20029961    10003019    GJ15CD7387      2016-07-28 19:21:54.173
20009020    10003019    GJ15CD7387      2016-05-25 18:46:55.947
20001866    10003019    GJ15CD7387      2015-07-11 15:17:14.503
20001557    10003019    GJ15CB9601      2016-05-05 16:45:58.247
20001223    10003019    GJ15CA7837      2014-06-06 14:57:42.583
20000933    10003019    MH02DG7774      2014-02-12 13:49:31.427
20001690    10003019    GJ15CD7387      2015-01-03 16:12:59.000
20000008    10003019    GJ15CB727       2013-06-17 12:36:01.190
20001865    10003019    GJ15CD7387      2015-06-24 15:01:14.000
20000005    10003019    GJ15CD7387      2013-06-15 12:29:10.000

I want to export as JSON and this is the code snippet.

val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO" 

 jdbcDF.registerTempTable("tp_customer_account")
 val res00 = sqlContext.sql(query.toString) 
// res00.show(10)
 res00.coalesce(1).write.json("D:/res15")  

Issue: But here the problem is that I am getting multiple VEHICLENUMBER because more than one TAGSTARTEFFDATE along with the same VEHICLENUMBER is present in the table.

Want to achieve: So I want to retrieve the TAGSTARTEFFDATE which is maximum date for the same VEHICLENUMBER. I want to use SparkSQL query using SQLContext as I have given in the code snippet.

I want to write the query in SparkSQL and the table should be look like this given below. here the TAGSTARTEFFDATE is MAX(TAGSTARTEFFDATE) from the above table.

CUSTOMERID  ACCOUNTNO   VEHICLENUMBER   TAGSTARTEFFDATE
20000001    10003014    MH43AJ411       2014-08-08.19:10:11.519
20029961    10003019    GJ15CD7387      2016-07-28 19:21:54.173
20001557    10003019    GJ15CB9601      2016-05-05 16:45:58.247
20001223    10003019    GJ15CA7837      2014-06-06 14:57:42.583
20000933    10003019    MH02DG7774      2014-02-12 13:49:31.427
20000008    10003019    GJ15CB727       2013-06-17 12:36:01.190

I have written this query but it is failing and throwing error:

val query = "SELECT ACCOUNTNO, VEHICLE.VEHICLENUMBER, VEHICLE.CUSTOMERID, max(VEHICLE.TAGSTARTEFFDATE) FROM (SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`,`CUSTOMERID`,`TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO, VEHICLE.VEHICLENUMBER, VEHICLE.CUSTOMERID, VEHICLE.TAGSTARTEFFDATE ORDER BY ACCOUNTNO)"

=======================UPDATED=============================================

I ran this query:

SELECT ACCOUNTNO,collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM (select t1.VEHICLENUMBER, MAX(t1.TAGSTARTEFFDATE) as TAGSTARTEFFDATE, t1.CUSTOMERID, t1.ACCOUNTNO FROM tp_customer_account AS t1 group by VEHICLENUMBER,TAGSTARTEFFDATE,CUSTOMERID,ACCOUNTNO) as tbl1 GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"

Getting this multiple same VEHICLENUMBER

{"ACCOUNTNO":10003014,"VEHICLE":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"TAGSTARTEFFDATE":"2013-12-18T12:14:49.237+05:30"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000000,"TAGSTARTEFFDATE":"2013-06-07T13:07:13.210+05:30"}]}
{"ACCOUNTNO":10003015,"VEHICLE":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002,"TAGSTARTEFFDATE":"2013-06-07T13:17:11.550+05:30"}]}
{"ACCOUNTNO":10003016,"VEHICLE":[{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003,"TAGSTARTEFFDATE":"2013-06-07T14:46:06.963+05:30"}]}
{"ACCOUNTNO":10003018,"VEHICLE":[{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004,"TAGSTARTEFFDATE":"2013-06-13T13:15:24.423+05:30"}]}

I should not get same VEHICLENUMBER under one ACCOUNTNO. The link doesn't contain collect_set

val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM (SELECT *, max(`TAGSTARTEFFDATE`) OVER (PARTITION BY VEHICLENUMBER ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS MAXTAGSTARTEFFDATE  FROM tp_customer_account) tmp  WHERE TAGSTARTEFFDATE=MAXTAGSTARTEFFDATE GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"

========================== UPADATE #2 =================================

I ran this query:

SELECT ACCOUNTNO, collect_set(struct(VEHICLENUMBER, CUSTOMERID,ACCOUNTGROUPID,PREPAIDACCOUNTSTATUSID,PREPAIDACCOUNTSTATUSDATE,SOURCEOFENTRY,REVENUECATEGORYID,VEHICLECLASS,SERIALNO,HEXTAGID,TAGSTATUS,TAGSTARTEFFDATE,TAGENDEFFDATE,ISTAGBLACKLISTED,ISBLACKLISTHOLD,RCVERIFICATIONSTATUS,EMAILADDRESS,PHONENUMBER,CREATEDDATE,CREATEDUSER,UPDATEDDATE,UPDATEDUSER,ACTION,ISFEEWAIVER,FEEWAIVERPASSTYPE,VEHICLEIMGVERIFICATIONSTATUS,TAGTID,ISREVENUERECHARGE)) as VEHICLE FROM (SELECT *, max(TAGSTARTEFFDATE) OVER (PARTITION BY VEHICLENUMBER ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS MAXTAGSTARTEFFDATE FROM tp_customer_account) tmp WHERE TAGSTARTEFFDATE=MAXTAGSTARTEFFDATE GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO

Still I am getting duplicate VEHICLENUMBER

{"ACCOUNTNO":10003014,"VEHICLE":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2080,"PREPAIDACCOUNTSTATUSDATE":"2015-02-13T02:20:00.610+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-02-13T02:20:00.177+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T12:55:23.793+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2014-08-16T11:19:20.710+05:30","UPDATEDUSER":"Trippost","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T12:55:23.793+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":false,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2014-08-08T14:34:12.137+05:30","UPDATEDUSER":"khalapurpos","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":3079,"PREPAIDACCOUNTSTATUSDATE":"2015-09-16T14:58:27.500+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2015-09-16T15:21:42.437+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T15:06:42.437+05:30","UPDATEDUSER":"BLTagProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":3079,"PREPAIDACCOUNTSTATUSDATE":"2015-09-16T14:58:27.500+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T14:58:33.190+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2080,"PREPAIDACCOUNTSTATUSDATE":"2015-02-13T02:20:00.610+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com                                                                                                                             ","PHONENUMBER":"9004419178     ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T14:58:33.120+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"}]}
{"ACCOUNTNO":10003015,"VEHICLE":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002,"ACCOUNTGROUPID":16,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T13:14:13.903+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"137438955875","HEXTAGID":"91890704802000000963","TAGSTATUS":"Assigned","TAGSTARTEFFDATE":"2013-06-07T13:17:11.550+05:30","TAGENDEFFDATE":"2018-06-06T23:59:59.997+05:30","ISTAGBLACKLISTED":false,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"hiteshmpatil@gmail.com                                                                                                                                ","PHONENUMBER":"9823131243     ","CREATEDDATE":"2013-06-07T13:15:29.337+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2013-06-07T13:15:29.337+05:30","UPDATEDUSER":"bhagwadapos","ACTION":"INSERT"}]}

THIS IS THE EXACT BASE TABLE(TOP 10) IN REMOTE SQL SERVER WHERE FROM I AM RETRIEVING THE DATA

CUSTOMERID  ACCOUNTNO   ACCOUNTGROUPID  PREPAIDACCOUNTSTATUSID  PREPAIDACCOUNTSTATUSDATE    SOURCEOFENTRY   REVENUECATEGORYID   VEHICLENUMBER   VEHICLECLASS    SERIALNO    HEXTAGID    TAGSTATUS   TAGSTARTEFFDATE TAGENDEFFDATE   ISTAGBLACKLISTED    ISBLACKLISTHOLD RCVERIFICATIONSTATUS    EMAILADDRESS    PHONENUMBER CREATEDDATE CREATEDUSER UPDATEDDATE UPDATEDUSER ISFEEWAIVER FEEWAIVERPASSTYPE   VEHICLEIMGVERIFICATIONSTATUS    TAGTID  ISREVENUERECHARGE
20000000    10003014    15  2131    2013-06-07 12:55:23.793 RegularRetailer 75  MH43AJ411   5   137438955873    91890704802000000961    Lost    2013-06-07 13:07:13.210 2013-10-08 17:00:14.327 1   0   NULL    shankarn75@rediffmail.com                                                                                                                               9004419178      2013-06-07 12:56:16.650 bhagwadapos 2013-12-18 12:12:35.060 vadodarapos NULL    NULL    NULL        NULL
20000001    10003014    15  3079    2015-09-16 14:58:27.500 RegularRetailer 75  MH43AJ411   4   206158433290    91890704803000000C0A    TAGINACTIVE 2014-08-08 14:24:12.227 2015-09-16 15:21:42.437 1   0   NULL    shankarn75@rediffmail.com                                                                                                                               9004419178      2013-06-07 12:56:16.650 bhagwadapos 2015-09-16 15:06:42.437 BLTagProcess    NULL    NULL    NULL        NULL
20000002    10003015    16  2079    2013-06-07 13:14:13.903 RegularRetailer 75  MH12GZ3392  4   137438955875    91890704802000000963    Assigned    2013-06-07 13:17:11.550 2018-06-06 23:59:59.997 0   0   NULL    hiteshmpatil@gmail.com                                                                                                                                  9823131243      2013-06-07 13:15:29.337 bhagwadapos 2013-06-07 13:15:29.337 bhagwadapos NULL    NULL    NULL    918907048020000009630000    NULL
20000003    10003016    17  2131    2014-11-24 02:30:01.487 RegularRetailer 75  GJ15Z8173   9   137438955877    91890704802000000965    TAGINACTIVE 2013-06-07 14:46:06.963 2014-11-24 02:52:09.930 1   0   NULL    bhagwada.irb@gmail.com                                                                                                                                  8652836666      2013-06-07 14:31:15.717 bhagwadapos 2014-11-24 02:37:09.930 BLTagProcess    NULL    NULL    NULL    918907048020000009650000    NULL
20000004    10003018    19  2131    2014-11-24 02:30:01.720 RegularRetailer 75  MH05AM902   11  137438955473    918907048020000007D1    TAGINACTIVE 2013-06-13 13:15:24.423 2014-11-24 02:51:54.447 1   0   NULL    kelkar.suhas@gmail.com                                                                                                                                  9821032045      2013-06-13 12:50:33.777 charotipos  2014-11-24 02:36:54.447 BLTagProcess    NULL    NULL    NULL    918907048020000007D10000    NULL
20029961    10003019    20  2079    2016-07-28 16:27:20.360 Internal    75  GJ15CD7387  4   68719511515 918907048010000087DB    ASSIGNED    2016-07-28 19:21:54.173 2041-07-28 23:59:59.000 0   NULL    2083    kaviwala@desaiconstruction.com                                                                                                                          9879110770      2016-07-28 16:27:20.357 280603  2017-02-07 17:24:53.553 HUSSAIN 0       NULL    e20034120132C1FFEEC13A57    NULL
20009020    10003019    20  2079    2016-05-25 18:22:45.860 Internal    75  GJ15CF7747  4   68719486473 91890704801000002609    ASSIGNED    2016-05-25 18:46:55.947 2041-05-25 23:59:59.000 0   NULL    2083    kaviwala@desaiconstruction.com                                                                                                                          9879110770      2016-05-25 18:22:54.647 263858  2017-02-27 11:35:19.237 HUSSAIN NULL    NULL    NULL    e2003412016FC2FFEE51BD73    NULL
20001866    10003019    20  2079    2015-03-09 10:32:28.597 Internal    75  GJ15CD7657  4   137438959222    91890704802000001676    ASSIGNED    2015-07-11 15:17:14.503 2040-07-11 23:59:59.000 0   NULL    2083    kaviwala@desaiconstruction.com                                                                                                                          9879110770      2015-03-09 10:33:11.983 263858  2017-02-07 17:23:12.017 HUSSAIN NULL    NULL    NULL    918907048020000016760000    NULL
20001557    10003019    20  2079    2014-10-01 18:22:21.747 Internal    75  GJ15CB9601  4   68719479744 91890704801000000BC0    ASSIGNED    2016-05-05 16:45:58.247 2041-05-05 23:59:59.000 0   NULL    2083    kaviwala@desaiconstruction.com                                                                                                                          9879110770      2014-10-01 18:33:57.733 263858  2017-02-27 11:35:14.427 HUSSAIN NULL    NULL    NULL    e20034120158C2FFEE5D1597    NULL
20001223    10003019    20  2079    2014-06-06 14:52:24.810 AgentPOS    75  GJ15CA7837  4   137438956220    91890704802000000ABC    ASSIGNED    2014-06-06 14:57:42.583 2039-06-06 23:59:59.000 0   NULL    2083    kaviwala@desaiconstruction.com                                                                                                                          9879110770      2014-06-06 15:00:38.650 bhagwadapos 2017-02-27 11:35:11.100 HUSSAIN NULL    NULL    NULL    91890704802000000ABC0000    NULL

回答1:

Your query is incorrect. You can try this:

SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM (select t1.VEHICLENUMBER, MAX(t1.TAGSTARTEFFDATE) as TAGSTARTEFFDATE, MAX(t1.CUSTOMERID) as CUSTOMERID, MAX(ACCOUNTNO) as ACCOUNTNO from tp_customer_account as t1 group by VEHICLENUMBER) as tbl1 GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO