Background: I’m doing analysis of call detail record (CDR) data in order to segmentify customer with respect to their call duration, time of call (holiday call or non holiday call, Business call or non Business call), age group of subscriber and gender. Data is from two table name cdr (include card_number, service_key, calling, called, start_time, clear_time, duration column)
and subscriber_detail (include subscriber_name, subscriber_address, DOB, gender column)
I have design OLAP as given below.
Call_date includes Date of call with year, month, and day. Call_time is time of call happen in second.
Question:- if we take call_time in second then it has 86400 column for each day (may be curse of dimensionality) and so we think to reduce its dimensional by taking 30 second time pulse ( telecom charges money on the basic of the pulse and 30 is pulse duration for our context). First Question is :- Is it the best way to replace time by pulse duration? And second is :- if one subscriber do more than 2 call on range of pulse it may cause problem i.e. first call start at 21:01:00 and end at 21:01:05 and he start second call at 21:01:15 and end at 21:01:20. How to resolve these type of problem.