I am trying to add a column "wOBA" to a table "starting_pitcher_stats" in MySQL using sequel-pro. Below is the code for a function that performs arithmetic operations on nine variables in the "starting_pitcher_stats" table. In particular, the function gathers the values for a number of variables, applies different weights (coefficients) to some of them (numerator below) and divides that sum by the addition and subtraction of a few more variables. All of these variables reside in the "starting_pitcher_stats" table. The arithmetic operations are expressed in the below formula (the coefficients are the values that are multiplied by each of the variables in numerator below):
wOBA=(.69*walks_a + .72*HBP + .89*singles_a + 1.27*doubles_a + 1.62*triples_a+ 2.10*HR_a)/(at_bats+walks_a+SF+HBP)
Each weight varies by year. The different weights for each year come from the table "GUTS".
The first dilemma I'm having is getting the correct code for the function. The second is the correct code syntax to use to actually call up this function and populate the new column with correct weighted wOBA values for each game of each year(season) for each "Starting_Pitcher".
The function is created with the code below and is listed as a function "wOBA" in my list of functions and procedures. However, the little wheel/knob next to the function name in sequel pro is greyed-out for some reason. It's not until I find out the correct code to invoke it, that I'll know if there are any errors.
Please ask for any more information that I can provide to clarify anything.
Thank you in advance.
DELIMITER $$
CREATE FUNCTION wOBA(wOBA DECIMAL(10,3))
RETURNS DECIMAL(10,3)
BEGIN
DECLARE wOBA decimal(10,3);
SET wOBA = 0;
SELECT cast((SELECT SUM(weighted_BB) as wBB_sum
FROM (
SELECT g.wBB*SUM(if(e.event_CD=14,1,0)) as weighted_BB
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as walks_a)
+ (SELECT SUM(weighted_HBP) as wHBP_sum
FROM (
SELECT g.wHBP*SUM(if(e.event_CD=16,1,0)) as weighted_HBP
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as HBP)
+ (SELECT SUM(weighted_1B) as w1B_sum
FROM (
SELECT g.w1B*SUM(if(e.event_CD=20,1,0)) as weighted_1B
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as singles_a)
+ (SELECT SUM(weighted_2B) as w2B_sum
FROM (
SELECT g.w2B*SUM(if(e.event_CD=21,1,0)) as weighted_2B
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as doubles_a)
+ (SELECT SUM(weighted_3B) as w3B_sum
FROM (
SELECT g.w3B*SUM(if(e.event_CD=22,1,0)) as weighted_3B
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as triples_a)
+ (SELECT SUM(weighted_HR) as wHR_sum
FROM (
SELECT g.wHR*SUM(if(e.event_CD=23,1,0)) as weighted_HR
FROM events e
INNER JOIN GUTS g
ON substring(e.game_ID,4,4)=g.season
WHERE PIT_ID=Starting_Pitcher
GROUP BY g.season)
as HR_a) as decimal(10,3))
/
cast(SUM(if(e.ab_fl="T",1,0))
+ SUM(if(e.event_cd=14,1,0))
+ SUM(if(e.SF_fl="T",1,0))
+ SUM(if(e.event_cd=16,1,0)) as unsigned) INTO wOBA
FROM events e
WHERE e.PIT_ID=Starting_Pitcher AND PIT_START_FL = "T"
LIMIT 1;
RETURN wOBA;
END
$$
DELIMITER ;
Darwin, Here are two screen shots of the events
table. The first is of the structure and the second is some of the content (as not all of the content fits into the shot):
[
Here are screenshots of the structure and contents of the GUTS
table.
Here is a screenshot of the events table structure showing the fields (and their definitions)in the function:
Update:
UPDATE retrosheet.starting_pitcher_stats
SET starting_pitcher_stats.wOBA =(SELECT
(
(g.wBB * SUM(IF(e.event_cd = 14, 1, 0)))
+ (g.wHBP * SUM(IF(e.event_cd = 16, 1, 0)))
+ (g.w1B * SUM(IF(e.event_cd = 20, 1, 0)))
+ (g.w2B * SUM(IF(e.event_cd = 21, 1, 0)))
+ (g.w3B * SUM(IF(e.event_cd = 22, 1, 0)))
+ (g.wHR * SUM(IF(e.event_cd = 23, 1, 0)))
)
/
(
SUM(IF(e.ab_fl = 'T', 1, 0))
+ SUM(IF(e.event_cd = 14, 1, 0))
+ SUM(IF(e.sf_fl = 'T', 1, 0))
+ SUM(IF(e.event_cd = 16, 1, 0))
) AS wOBA
FROM events AS e, GUTS AS g
WHERE e.YEAR_ID = g.SEASON_ID
AND e.PIT_START_FL= 'T'
AND e.PIT_ID = Starting_Pitcher)
The query just keeps running. I'll keep tweaking it.
update: screenshots of starting_pitcher_stats table
Update:
Ok, I'm trying to just create a wOBA column as part of a new table with columns for the other components of wOBA.
But, the query just goes on forever. How can I make the run-time shorter?
DROP TABLE IF EXISTS starting_pitcher_wOBA;
CREATE TABLE starting_pitcher_wOBA
SELECT
a.YEAR_ID
,
a.GAME_ID
,
a.PIT_ID
,
a.wBB
,
a.wHBP
,
a.w1B
,
a.w2B
,
a.w3B
,
a.wHR
,
a.u_walks_a
,
a.HBP
,
a.singles_a
,
a.doubles_a
,
a.triples_a
,
a.HR_a
,
a.at_bats
,
a.sacrifice_flies_a
,
a.wOBA
FROM
(
SELECT
g.YEAR_ID
,
h.GAME_ID
,
e.PIT_ID
,
g.wBB
,
g.wHBP
,
g.w1B
,
g.w2B
,
g.w3B
,
g.wHR
,
SUM(IF(e.event_cd = 14, 1, 0)) AS u_walks_a
,
SUM(IF(e.event_cd = 16, 1, 0)) AS HBP
,
SUM(IF(e.event_cd = 20, 1, 0)) AS singles_a
,
SUM(IF(e.event_cd = 21, 1, 0)) AS doubles_a
,
SUM(IF(e.event_cd = 22, 1, 0)) AS triples_a
,
SUM(IF(e.event_cd = 23, 1, 0)) AS HR_a
,
SUM(IF(e.ab_fl = 'T', 1, 0)) AS at_bats
,
SUM(IF(e.sf_fl = 'T', 1, 0)) AS sacrifice_flies_a
,
(
(
(g.wBB * SUM(IF(e.event_cd = 14, 1, 0)))
+ (g.wHBP * SUM(IF(e.event_cd = 16, 1, 0)))
+ (g.w1B * SUM(IF(e.event_cd = 20, 1, 0)))
+ (g.w2B * SUM(IF(e.event_cd = 21, 1, 0)))
+ (g.w3B * SUM(IF(e.event_cd = 22, 1, 0)))
+ (g.wHR * SUM(IF(e.event_cd = 23, 1, 0)))
)
/
(
SUM(IF(e.ab_fl = 'T', 1, 0))
+ SUM(IF(e.event_cd = 14, 1, 0))
+ SUM(IF(e.sf_fl = 'T', 1, 0))
+ SUM(IF(e.event_cd = 16, 1, 0))
)
) AS wOBA
FROM events AS e, GUTS AS g, game AS h
WHERE e.PIT_START_FL= 'T'
GROUP BY g.YEAR_ID, h.GAME_ID,e.PIT_ID) AS a
INNER JOIN GUTS AS g
ON
a.YEAR_ID=g.YEAR_ID
INNER JOIN game AS h
ON
a.GAME_ID = h.GAME_ID
INNER JOIN events AS e
ON
a.PIT_ID = e.PIT_ID
We'll start by cleaning up the query. You should always try to perform your calculations across each row when possible rather than performing multiple vertical subqueries, as this avoids the DBMS making multiple passes over the same table.
Assuming I haven't dropped a comma somewhere, this will return a column
woba
for each year for the specified starting pitcher.Note that I've joined the tables on
e.year_id
instead ofSUBSTRING(e.game_ID,4,4)
; this avoids the overhead of callingSUBSTRING()
on each record. This sort of thing seems minor, but it can add up quickly over a large table.That should be enough to get you started.