You may use Cluster Analysis.
Clustering is a way to segregate a set of data into similar components (subsets). The "similarity" concept involves some definition of "distance" between points. Many usual formulas for the distance exists, among others the usual Euclidean distance.
Practical Case
Before pointing you to the quirks of the trade, let's show a practical case for your problem, so you may get involved in the algorithms and packages, or discard them upfront.
For easiness, I modelled the problem in Mathematica, because Cluster Analysis is included in the software and very straightforward to set up.
First, generate the data. The format is { DAY, START TIME, END TIME }.
The start and end times have a random variable added (+half hour, zero, -half hour} to show the capability of the algorithm to cope with "noise".
There are three days, three shifts per day and one extra (the last one) "anomalous" shift, which starts at 7 AM and ends at 9 AM (poor guys!).
There are 150 events in each "normal" shift and only two in the exceptional one.
As you can see, some shifts are not very far apart from each other.
I include the code in Mathematica, in case you have access to the software. I'm trying to avoid using the functional syntax, to make the code easier to read for "foreigners".
Here is the data generation code:
Rn[] := 0.5 * RandomInteger[{-1, 1}];
monshft1 = Table[{ 1 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 1
monshft2 = Table[{ 1 , 12 + Rn[] , 17 + Rn[] }, {150}]; // 2
wedshft1 = Table[{ 3 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 3
wedshft2 = Table[{ 3 , 14 + Rn[] , 17 + Rn[] }, {150}]; // 4
frishft1 = Table[{ 5 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 5
frishft2 = Table[{ 5 , 11 + Rn[] , 15 + Rn[] }, {150}]; // 6
monexcp = Table[{ 1 , 7 + Rn[] , 9 + Rn[] }, {2}]; // 7
Now we join the data, obtaining one big dataset:
data = Join[monshft1, monshft2, wedshft1, wedshft2, frishft1, frishft2, monexcp];
Let's run a cluster analysis for the data:
clusters = FindClusters[data, 7, Method->{"Agglomerate","Linkage"->"Complete"}]
"Agglomerate" and "Linkage" -> "Complete" are two fine tuning options of the clustering methods implemented in Mathematica. They just specify we are trying to find very compact clusters.
I specified to try to detect 7 clusters. If the right number of shifts is unknown, you can try several reasonable values and see the results, or let the algorithm select the more proper value.
We can get a chart with the results, each cluster in a different color (don't mind the code)
ListPointPlot3D[ clusters,
PlotStyle->{{PointSize[Large], Pink}, {PointSize[Large], Green},
{PointSize[Large], Yellow}, {PointSize[Large], Red},
{PointSize[Large], Black}, {PointSize[Large], Blue},
{PointSize[Large], Purple}, {PointSize[Large], Brown}},
AxesLabel -> {"DAY", "START TIME", "END TIME"}]
And the result is:
alt text http://i28.tinypic.com/2hmdlab.png
Where you can see our seven clusters clearly apart.
That solves part of your problem: identifying the data. Now you also want to be able to label it.
So, we'll get each cluster and take means (rounded):
Table[Round[Mean[clusters[[i]]]], {i, 7}]
The result is:
Day Start End
{"1", "10", "15"},
{"1", "12", "17"},
{"3", "10", "15"},
{"3", "14", "17"},
{"5", "10", "15"},
{"5", "11", "15"},
{"1", "7", "9"}
And with that you get again your seven classes.
Now, perhaps you want to classify the shifts, no matter the day. If the same people make the same task at the same time everyday, so it's no useful to call it "Monday shift from 10 to 15", because it happens also on Weds and Fridays (as in our example).
Let's analyze the data disregarding the first column:
clusters=
FindClusters[Take[data, All, -2],Method->{"Agglomerate","Linkage"->"Complete"}];
In this case, we are not selecting the number of clusters to retrieve, leaving the decision to the package.
The result is
image http://i27.tinypic.com/mise9.png
You can see that five clusters have been identified.
Let's try to "label" them as before:
Grid[Table[Round[Mean[clusters[[i]]]], {i, 5}]]
The result is:
START END
{"10", "15"},
{"12", "17"},
{"14", "17"},
{"11", "15"},
{ "7", "9"}
Which is exactly what we "suspected": there are repeated events each day at the same time that could be grouped together.
Edit: Overnight Shifts and Normalization
If you have (or plan to have) shifts that start one day and end on the following, it's better to model
{Start-Day Start-Hour Length} // Correct!
than
{Start-Day Start-Hour End-Day End-Hour} // Incorrect!
That's because as with any statistical method, the correlation between the variables must be made explicit, or the method fails miserably. The principle could run something like "keep your candidate data normalized". Both concepts are almost the same (the attributes should be independent).
--- Edit end ---
By now I guess you understand pretty well what kind of things you can do with this kind if analysis.
Some references
- Of course, Wikipedia, its "references" and "further reading" are good guide.
- A nice video here showing the capabilities of Statsoft, but you can get there many
ideas about other things you can do with the algorithm.
- Here is a basic explanation of the algorithms involved
- Here you can find the impressive functionality of R for Cluster Analysis (R is a VERY good option)
- Finally, here you can find a long list of free and commercial software for statistics in general, including clustering.
HTH!