According to MSDN, Median is not available as an aggregate function in Transact-SQL. However, I would like to find out whether it is possible to create this functionality (using the Create Aggregate function, user defined function, or some other method).
What would be the best way (if possible) to do this - allow for the calculation of a median value (assuming a numeric data type) in an aggregate query?
There are lots of ways to do this, with dramatically varying performance. Here's one particularly well-optimized solution, from Medians, ROW_NUMBERs, and performance. This is a particularly optimal solution when it comes to actual I/Os generated during execution-- it looks more costly than other solutions, but it is actually much faster.
That page also contains a discussion of other solutions and performance testing details. Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column.
As with all database performance scenarios, always try to test a solution out with real data on real hardware-- you never know when a change to SQL Server's optimizer or a peculiarity in your environment will make a normally-speedy solution slower.
I wanted to work out a solution by myself, but my brain tripped and fell on the way. I think it works, but don't ask me to explain it in the morning. :P
I just came across this page while looking for a set based solution to median. After looking at some of the solutions here, I came up with the following. Hope is helps/works.
For large scale datasets, you can try this GIST:
https://gist.github.com/chrisknoll/1b38761ce8c5016ec5b2
It works by aggregating the distinct values you would find in your set (such as ages, or year of birth, etc.), and uses SQL window functions to locate any percentile position you specify in the query.
If you're using SQL 2005 or better this is a nice, simple-ish median calculation for a single column in a table: