I have a table with transactions:
Transactions
------------
id | account | type | date_time | amount
----------------------------------------------------
1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000
2 | 003 | 'R' | '2012-01-02 12:53:10' | 1500
3 | 003 | 'A' | '2012-01-03 13:10:01' | -1500
4 | 002 | 'R' | '2012-01-03 17:56:00' | 2000
5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000
6 | 002 | 'A' | '2012-01-04 13:23:01' | -2000
7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000
8 | 003 | 'R' | '2012-01-05 12:12:00' | 1250
9 | 003 | 'A' | '2012-01-06 17:24:01' | -1250
and I wish to select all of certain type ('R'), but not those that immediatly (in order of the date_time field) have another transaction of another type ('A') for the same account filed...
So, the query should throw the following rows, given the previous example:
id | account |type | date | amount
----------------------------------------------------
1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000
5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000
7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000
(As you can see, row 2 isn't displayed because row 3 'cancels' it... also row 4 is 'cancelled' by row 6'; Row 7 do appears (even though the account 003 belongs to cancelled row #2, this time in row 7 it's not cancelled by any 'A' row); And row 8 won't appear (it's too for 003 account since now this one is cancelled by 9, which doesn't cancels 7 too, just the previouse one: 8...
I have tried Joins, subqueries in Where clauses but I'm really not sure how do I must make my query...
What I have tried:
Trying joins:
SELECT trans.type as type,
trans.amount as amount,
trans.date_time as dt,
trans.account as acct,
FROM Transactions trans
INNER JOIN ( SELECT t.type AS type, t.acct AS acct, t.date_time AS date_time
FROM Transactions t
WHERE t.date_time > trans.date_time
ORDER BY t.date_time DESC
) AS nextTrans
ON nextTrans.acct = trans.acct
WHERE trans.type IN ('R')
AND nextTrans.type NOT IN ('A')
ORDER BY DATE(trans.date_time) ASC
This throws an error, since I can't introduce external values to the JOIN in MySQL.
Trying subquery in where:
SELECT trans.type as type,
trans.amount as amount,
trans.date_time as dt,
trans.account as acct,
FROM Transactions trans
WHERE trans.type IN ('R')
AND trans.datetime <
( SELECT t.date_time AS date_time
FROM Transactions t
WHERE t.account = trans.account
ORDER BY t.date_time DESC
) AS nextTrans
ON nextTrans.acct = trans.acct
ORDER BY DATE(trans.date_time) ASC
This is wrong, I can get to introduce external values to the WHERE in MySQL but I cannot manage to find the way to filter correctly for what I need...
IMPORTANT EDIT:
I managed to achieve a solution, but it now needs serious optimization. Here it is:
SELECT *
FROM (SELECT t1.*, tFlagged.id AS cancId, tFlagged.type AS cancFlag
FROM transactions t1
LEFT JOIN (SELECT t2.*
FROM transactions t2
ORDER BY t2.date_time ASC ) tFlagged
ON (t1.account=tFlagged.account
AND
t1.date_time < tFlagged.date_time)
WHERE t1.type = 'R'
GROUP BY t1.id) tCanc
WHERE tCanc.cancFlag IS NULL
OR tCanc.cancFlag <> 'A'
I joined the table with itself, just considering same account and great date_time. The Join goes ordered by date_time. Grouping by id I managed to get only the first result of the join, which happens to be the next transaction for the same account.
Then on the outer select, I filter out those that have an 'A', since that means that the next transaction was effectively a cancelation for it. In other words, if there is no next transaction for the same account or if the next transaction is an 'R', then it is not cancelled and it must be shown in the result...
I got this:
+----+---------+------+---------------------+--------+--------+----------+
| id | account | type | date_time | amount | cancId | cancFlag |
+----+---------+------+---------------------+--------+--------+----------+
| 1 | 001 | R | 2012-01-01 10:01:00 | 1000 | 5 | R |
| 5 | 001 | R | 2012-01-04 12:30:01 | 1000 | NULL | NULL |
| 7 | 003 | R | 2012-01-04 15:13:10 | 3000 | 8 | R |
+----+---------+------+---------------------+--------+--------+----------+
It relates each transaction with the next one in time for the same account and then filters out those that have been cancelled... Success!!
As I said, the problem now is optimization. My real data has a lot of rows (as a table holding transactions through time is expected to have), and for a table of ~10,000 rows right now, I got a positive result with this query in 1min.44sec. I suppose that's the thing with joins... (For those who know the protocol in here, what should I do? launch a new question here and post this as a solution to this one? Or just wait for more answers here?)
here I have tried in MSSQL. Please check logic and try in mysql. I assume the logic is that make new transaction after first transaction cancel. In your illustration, id = 7 is made after id=3 canceled.
I have checked in mssql
(edited 2 ) TRY THIS:
If ID really does correspond to the index of the rows when sorted on date_time - as it does in the example (and if it didn't, you could create such an ID field) - you can do this:
i.e. use the ID field to get each row and its successor in the same result row; then filter for the properties you want.
Here is a solution based on nested subqueries. First, I added a few rows to catch a few more cases. Transaction 10, for example, should not be cancelled by transaction 12, because transaction 11 comes in between.
First, create a query to grab, for each transaction, "the date of the most recent transaction before that one in the same account":
Use that as a subquery to get each transaction and its predecessor on the same row. Use some filtering to pull out the transactions we're interested in - namely, 'A' transactions whose predecessors are 'R' transactions that they exactly cancel out -
From the result above it's apparent we're almost there - we've identified the unwanted transactions. Using
LEFT JOIN
we can filter these out of the whole transaction set:To improve your query try this:
It'll run much faster... hopefully providing the same results :P