This question already has an answer here:
- How to select the first row of each group? 9 answers
I have a table in remote SQL database.
CUSTOMERID ACCOUNTNO VEHICLENUMBER TAGSTARTEFFDATE
20000000 10003014 MH43AJ411 2013-06-07 13:07:13.210
20000001 10003014 MH43AJ411 2014-08-08.19:10:11.519
20029961 10003019 GJ15CD7387 2016-07-28 19:21:54.173
20009020 10003019 GJ15CD7387 2016-05-25 18:46:55.947
20001866 10003019 GJ15CD7387 2015-07-11 15:17:14.503
20001557 10003019 GJ15CB9601 2016-05-05 16:45:58.247
20001223 10003019 GJ15CA7837 2014-06-06 14:57:42.583
20000933 10003019 MH02DG7774 2014-02-12 13:49:31.427
20001690 10003019 GJ15CD7387 2015-01-03 16:12:59.000
20000008 10003019 GJ15CB727 2013-06-17 12:36:01.190
20001865 10003019 GJ15CD7387 2015-06-24 15:01:14.000
20000005 10003019 GJ15CD7387 2013-06-15 12:29:10.000
I want to export as JSON and this is the code snippet.
val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"
jdbcDF.registerTempTable("tp_customer_account")
val res00 = sqlContext.sql(query.toString)
// res00.show(10)
res00.coalesce(1).write.json("D:/res15")
Issue:
But here the problem is that I am getting multiple VEHICLENUMBER
because more than one TAGSTARTEFFDATE
along with the same VEHICLENUMBER
is present in the table.
Want to achieve:
So I want to retrieve the TAGSTARTEFFDATE
which is maximum date for the same VEHICLENUMBER
. I want to use SparkSQL query using SQLContext as I have given in the code snippet.
I want to write the query in SparkSQL and the table should be look like this given below. here the TAGSTARTEFFDATE
is MAX(TAGSTARTEFFDATE)
from the above table.
CUSTOMERID ACCOUNTNO VEHICLENUMBER TAGSTARTEFFDATE
20000001 10003014 MH43AJ411 2014-08-08.19:10:11.519
20029961 10003019 GJ15CD7387 2016-07-28 19:21:54.173
20001557 10003019 GJ15CB9601 2016-05-05 16:45:58.247
20001223 10003019 GJ15CA7837 2014-06-06 14:57:42.583
20000933 10003019 MH02DG7774 2014-02-12 13:49:31.427
20000008 10003019 GJ15CB727 2013-06-17 12:36:01.190
I have written this query but it is failing and throwing error:
val query = "SELECT ACCOUNTNO, VEHICLE.VEHICLENUMBER, VEHICLE.CUSTOMERID, max(VEHICLE.TAGSTARTEFFDATE) FROM (SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`,`CUSTOMERID`,`TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO, VEHICLE.VEHICLENUMBER, VEHICLE.CUSTOMERID, VEHICLE.TAGSTARTEFFDATE ORDER BY ACCOUNTNO)"
=======================UPDATED=============================================
I ran this query:
SELECT ACCOUNTNO,collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM (select t1.VEHICLENUMBER, MAX(t1.TAGSTARTEFFDATE) as TAGSTARTEFFDATE, t1.CUSTOMERID, t1.ACCOUNTNO FROM tp_customer_account AS t1 group by VEHICLENUMBER,TAGSTARTEFFDATE,CUSTOMERID,ACCOUNTNO) as tbl1 GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"
Getting this multiple same VEHICLENUMBER
{"ACCOUNTNO":10003014,"VEHICLE":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"TAGSTARTEFFDATE":"2013-12-18T12:14:49.237+05:30"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000000,"TAGSTARTEFFDATE":"2013-06-07T13:07:13.210+05:30"}]}
{"ACCOUNTNO":10003015,"VEHICLE":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002,"TAGSTARTEFFDATE":"2013-06-07T13:17:11.550+05:30"}]}
{"ACCOUNTNO":10003016,"VEHICLE":[{"VEHICLENUMBER":"GJ15Z8173","CUSTOMERID":20000003,"TAGSTARTEFFDATE":"2013-06-07T14:46:06.963+05:30"}]}
{"ACCOUNTNO":10003018,"VEHICLE":[{"VEHICLENUMBER":"MH05AM902","CUSTOMERID":20000004,"TAGSTARTEFFDATE":"2013-06-13T13:15:24.423+05:30"}]}
I should not get same VEHICLENUMBER
under one ACCOUNTNO
.
The link doesn't contain collect_set
val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM (SELECT *, max(`TAGSTARTEFFDATE`) OVER (PARTITION BY VEHICLENUMBER ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS MAXTAGSTARTEFFDATE FROM tp_customer_account) tmp WHERE TAGSTARTEFFDATE=MAXTAGSTARTEFFDATE GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"
========================== UPADATE #2 =================================
I ran this query:
SELECT ACCOUNTNO, collect_set(struct(VEHICLENUMBER, CUSTOMERID,ACCOUNTGROUPID,PREPAIDACCOUNTSTATUSID,PREPAIDACCOUNTSTATUSDATE,SOURCEOFENTRY,REVENUECATEGORYID,VEHICLECLASS,SERIALNO,HEXTAGID,TAGSTATUS,TAGSTARTEFFDATE,TAGENDEFFDATE,ISTAGBLACKLISTED,ISBLACKLISTHOLD,RCVERIFICATIONSTATUS,EMAILADDRESS,PHONENUMBER,CREATEDDATE,CREATEDUSER,UPDATEDDATE,UPDATEDUSER,ACTION,ISFEEWAIVER,FEEWAIVERPASSTYPE,VEHICLEIMGVERIFICATIONSTATUS,TAGTID,ISREVENUERECHARGE)) as VEHICLE FROM (SELECT *, max(TAGSTARTEFFDATE) OVER (PARTITION BY VEHICLENUMBER ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS MAXTAGSTARTEFFDATE FROM tp_customer_account) tmp WHERE TAGSTARTEFFDATE=MAXTAGSTARTEFFDATE GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO
Still I am getting duplicate VEHICLENUMBER
{"ACCOUNTNO":10003014,"VEHICLE":[{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2080,"PREPAIDACCOUNTSTATUSDATE":"2015-02-13T02:20:00.610+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-02-13T02:20:00.177+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T12:55:23.793+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2014-08-16T11:19:20.710+05:30","UPDATEDUSER":"Trippost","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T12:55:23.793+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"ASSIGNED","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":false,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2014-08-08T14:34:12.137+05:30","UPDATEDUSER":"khalapurpos","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":3079,"PREPAIDACCOUNTSTATUSDATE":"2015-09-16T14:58:27.500+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2015-09-16T15:21:42.437+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T15:06:42.437+05:30","UPDATEDUSER":"BLTagProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":3079,"PREPAIDACCOUNTSTATUSDATE":"2015-09-16T14:58:27.500+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T14:58:33.190+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"},{"VEHICLENUMBER":"MH43AJ411","CUSTOMERID":20000001,"ACCOUNTGROUPID":15,"PREPAIDACCOUNTSTATUSID":2080,"PREPAIDACCOUNTSTATUSDATE":"2015-02-13T02:20:00.610+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"206158433290","HEXTAGID":"91890704803000000C0A","TAGSTATUS":"TAGINACTIVE","TAGSTARTEFFDATE":"2014-08-08T14:24:12.227+05:30","TAGENDEFFDATE":"2039-08-08T23:59:59.000+05:30","ISTAGBLACKLISTED":true,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"shankarn75@rediffmail.com ","PHONENUMBER":"9004419178 ","CREATEDDATE":"2013-06-07T12:56:16.650+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2015-09-16T14:58:33.120+05:30","UPDATEDUSER":"BatchProcess","ACTION":"UPDATE"}]}
{"ACCOUNTNO":10003015,"VEHICLE":[{"VEHICLENUMBER":"MH12GZ3392","CUSTOMERID":20000002,"ACCOUNTGROUPID":16,"PREPAIDACCOUNTSTATUSID":2079,"PREPAIDACCOUNTSTATUSDATE":"2013-06-07T13:14:13.903+05:30","SOURCEOFENTRY":"RegularRetailer","REVENUECATEGORYID":75,"VEHICLECLASS":"4","SERIALNO":"137438955875","HEXTAGID":"91890704802000000963","TAGSTATUS":"Assigned","TAGSTARTEFFDATE":"2013-06-07T13:17:11.550+05:30","TAGENDEFFDATE":"2018-06-06T23:59:59.997+05:30","ISTAGBLACKLISTED":false,"ISBLACKLISTHOLD":false,"EMAILADDRESS":"hiteshmpatil@gmail.com ","PHONENUMBER":"9823131243 ","CREATEDDATE":"2013-06-07T13:15:29.337+05:30","CREATEDUSER":"bhagwadapos","UPDATEDDATE":"2013-06-07T13:15:29.337+05:30","UPDATEDUSER":"bhagwadapos","ACTION":"INSERT"}]}
THIS IS THE EXACT BASE TABLE(TOP 10) IN REMOTE SQL SERVER WHERE FROM I AM RETRIEVING THE DATA
CUSTOMERID ACCOUNTNO ACCOUNTGROUPID PREPAIDACCOUNTSTATUSID PREPAIDACCOUNTSTATUSDATE SOURCEOFENTRY REVENUECATEGORYID VEHICLENUMBER VEHICLECLASS SERIALNO HEXTAGID TAGSTATUS TAGSTARTEFFDATE TAGENDEFFDATE ISTAGBLACKLISTED ISBLACKLISTHOLD RCVERIFICATIONSTATUS EMAILADDRESS PHONENUMBER CREATEDDATE CREATEDUSER UPDATEDDATE UPDATEDUSER ISFEEWAIVER FEEWAIVERPASSTYPE VEHICLEIMGVERIFICATIONSTATUS TAGTID ISREVENUERECHARGE
20000000 10003014 15 2131 2013-06-07 12:55:23.793 RegularRetailer 75 MH43AJ411 5 137438955873 91890704802000000961 Lost 2013-06-07 13:07:13.210 2013-10-08 17:00:14.327 1 0 NULL shankarn75@rediffmail.com 9004419178 2013-06-07 12:56:16.650 bhagwadapos 2013-12-18 12:12:35.060 vadodarapos NULL NULL NULL NULL
20000001 10003014 15 3079 2015-09-16 14:58:27.500 RegularRetailer 75 MH43AJ411 4 206158433290 91890704803000000C0A TAGINACTIVE 2014-08-08 14:24:12.227 2015-09-16 15:21:42.437 1 0 NULL shankarn75@rediffmail.com 9004419178 2013-06-07 12:56:16.650 bhagwadapos 2015-09-16 15:06:42.437 BLTagProcess NULL NULL NULL NULL
20000002 10003015 16 2079 2013-06-07 13:14:13.903 RegularRetailer 75 MH12GZ3392 4 137438955875 91890704802000000963 Assigned 2013-06-07 13:17:11.550 2018-06-06 23:59:59.997 0 0 NULL hiteshmpatil@gmail.com 9823131243 2013-06-07 13:15:29.337 bhagwadapos 2013-06-07 13:15:29.337 bhagwadapos NULL NULL NULL 918907048020000009630000 NULL
20000003 10003016 17 2131 2014-11-24 02:30:01.487 RegularRetailer 75 GJ15Z8173 9 137438955877 91890704802000000965 TAGINACTIVE 2013-06-07 14:46:06.963 2014-11-24 02:52:09.930 1 0 NULL bhagwada.irb@gmail.com 8652836666 2013-06-07 14:31:15.717 bhagwadapos 2014-11-24 02:37:09.930 BLTagProcess NULL NULL NULL 918907048020000009650000 NULL
20000004 10003018 19 2131 2014-11-24 02:30:01.720 RegularRetailer 75 MH05AM902 11 137438955473 918907048020000007D1 TAGINACTIVE 2013-06-13 13:15:24.423 2014-11-24 02:51:54.447 1 0 NULL kelkar.suhas@gmail.com 9821032045 2013-06-13 12:50:33.777 charotipos 2014-11-24 02:36:54.447 BLTagProcess NULL NULL NULL 918907048020000007D10000 NULL
20029961 10003019 20 2079 2016-07-28 16:27:20.360 Internal 75 GJ15CD7387 4 68719511515 918907048010000087DB ASSIGNED 2016-07-28 19:21:54.173 2041-07-28 23:59:59.000 0 NULL 2083 kaviwala@desaiconstruction.com 9879110770 2016-07-28 16:27:20.357 280603 2017-02-07 17:24:53.553 HUSSAIN 0 NULL e20034120132C1FFEEC13A57 NULL
20009020 10003019 20 2079 2016-05-25 18:22:45.860 Internal 75 GJ15CF7747 4 68719486473 91890704801000002609 ASSIGNED 2016-05-25 18:46:55.947 2041-05-25 23:59:59.000 0 NULL 2083 kaviwala@desaiconstruction.com 9879110770 2016-05-25 18:22:54.647 263858 2017-02-27 11:35:19.237 HUSSAIN NULL NULL NULL e2003412016FC2FFEE51BD73 NULL
20001866 10003019 20 2079 2015-03-09 10:32:28.597 Internal 75 GJ15CD7657 4 137438959222 91890704802000001676 ASSIGNED 2015-07-11 15:17:14.503 2040-07-11 23:59:59.000 0 NULL 2083 kaviwala@desaiconstruction.com 9879110770 2015-03-09 10:33:11.983 263858 2017-02-07 17:23:12.017 HUSSAIN NULL NULL NULL 918907048020000016760000 NULL
20001557 10003019 20 2079 2014-10-01 18:22:21.747 Internal 75 GJ15CB9601 4 68719479744 91890704801000000BC0 ASSIGNED 2016-05-05 16:45:58.247 2041-05-05 23:59:59.000 0 NULL 2083 kaviwala@desaiconstruction.com 9879110770 2014-10-01 18:33:57.733 263858 2017-02-27 11:35:14.427 HUSSAIN NULL NULL NULL e20034120158C2FFEE5D1597 NULL
20001223 10003019 20 2079 2014-06-06 14:52:24.810 AgentPOS 75 GJ15CA7837 4 137438956220 91890704802000000ABC ASSIGNED 2014-06-06 14:57:42.583 2039-06-06 23:59:59.000 0 NULL 2083 kaviwala@desaiconstruction.com 9879110770 2014-06-06 15:00:38.650 bhagwadapos 2017-02-27 11:35:11.100 HUSSAIN NULL NULL NULL 91890704802000000ABC0000 NULL