In previous analyses of the concept of the career year in baseball, I had occasion to calculate some statistics by year of a player’s career. That is, I wanted to analyze questions like ‘How did a player do in their 4th professional season compared to their 3rd professional season and how did that compare to other player’s development from their 3rd to their 4th season ?”
I had data stored in a MySQL database and had player statistics by calendar year. So, for instance, I had Billy Williams’ home runs, RBIs, WAR, etc for 1968, 1969 and so on, but I had no ready indicator to say which year was his 4th season. If I had the list of his seasons laid out in front of me I could work that out, but I wanted to do comparisons from the database across players, so I needed some way to compute this query. Essentially, I was looking to rank seasons by calendar year.
In Oracle and PostgreSQL, I could use a windowing function and RANK function to accomplish this. However, I am not aware of this functionality being available in MySQL, so I had to replicate using standard MySQL functionality.
Let’s say I have a table of players containing their yearly home run totals by calendar year (e.g. 2012, 2013, etc.) but I want to calculate a value to indicate home run totals by year of career (e.g rookie year, 2nd year of career, etc). So my sample data looks like the following. but I want to find the average number a player hit in the second year of his career. That’s not comparing the same year, but rather asking us to average Bob’s number from 2002, Phil’s number from 1994 and Ron’s number from 2014. Again we’re really ranking each player by hr_yr and selecting the row from each player with rank 2. (ignoring for now if we had a degenerate case of a 1 year career)
We can use the following SQL to accomplish this, which will add a column yr_of_career to the result which would could then be queried for such things as retrieving only those rows where yr_of_career = 2 in our example above. Essentially, yr_of_career is a running counter within each row of each player.
SELECT @row_number:=CASE WHEN @pname=pname THEN @row_number+1 ELSE 1 END AS yr_of_career, @pname:=pname AS names, hr_yr, hr_num FROM (select pname,hr_yr,hr_num from test_hrs order by hr_yr) plyr, (SELECT @row_number:=0,@pname:='') as cnt
Here are the results of the query: