Handling multiple fact tables in Qlikview

2019-02-17 06:14发布

问题:

I have a PostgreSQL database containing various education data such school-level test scores and enrollment figures. I need to separate enrollment from test scores because the data is on different grains. Even though enrollment is on a different granularity from the test-score data, many of the dimensions are the same. For example, I have:

~ ---------------------------------------------------------------------------------~
| Test Scores Fact                                                                 |
|-------------|-----------|----------|-----------|--------------|------------|-----|
| school_code | test_code | grade_id | gender_id | ethnicity_id | subject_id | ... |
|-------------|-----------|----------|-----------|--------------|------------|-----|

~ --------------------------------------------------------~
| Enrollment Fact                                         |
|-------------|----------|-----------|--------------|-----|
| school_code | grade_id | gender_id | ethnicity_id | ... |
|-------------|----------|-----------|--------------|-----|

This structure is fine on the backend, but in Qlikview, this creates a synthetic key. The solution for synthetic keys seems to usually be replacing it with a link table via Qlikview scripting, which has been my approach as well. But this does not seem to scale, as when I add a third fact table (on yet another grain) that contains more of the same dimensions, if I create another link table, now my two link tables start to associate as they contain several commonly named fields, and Qlikview's response is to create more synthetic keys?

I'm relatively new to Qlikview and am working by myself. How are multiple facts of different grains with common dimensions usually handled?

EDIT:

I've provided my solution to this problem which has been working in a production environment for just under a year! See my answer below...

回答1:

Seeing the popularity of this question, I'm going to add my actual solution to the mix so people have an example to work from, which for some reason is really hard to find for such a common problem...

I proceeded with creating a Link Table. This solution still to this day feels like a hack as it creates one huge table containing the Cartesian product of every one of your keys in all of your fact tables... but it does work.

The problem: You have multiple fact tables in your database; an occurrence in almost every database ever. Some (or all) of these fact tables share the same key fields; no problem, right? Wrong. Unfortunately, due to Qlik's associative nature, instead of each one of your fact tables linking nicely to their lookup tables, your fact tables are now associating with each other and wreaking havoc on your data model; creating circular references and untold amounts of synthetic keys.

The Solution: Create a Link Table. Sounds simple, right? Well, it is, but it's also very poorly documented and hard to understand without an initial explanation. You might be wondering... what's a Link Table? It is the Cartesian product of all keys from all of your fact tables. How does this correct the problem? It removes all of the unwanted associations between your fact tables as each will now only contain a single unique concatenated key. Those unique keys will associate with the Link Table only, which contains all your unique concatenated keys as well as all the individual keys. The Link Table will subsequently associate with your lookup tables and all will be well.

Implementation:

This implementation will use the two tables contained in my question above; test_scores_fact and enrollment_fact.

test_scores_fact     |    enrollment_fact      |    school            |    gender         |   ...
----------------     |    ---------------      |    ------            |    ------         |   ---
school_code (FK)     |    school_code (FK)     |    school_code (PK)  |    gender_id (PK) |
test_code (FK)       |    grade_id (FK)        |    school_name (FK)  |    gender_desc    |
grade_id (FK)        |    ethnicity_id (FK)    |    address           |    ...            |
gender_id (FK)       |    gender_id (FK)       |    ...               |
ethnicity_id (FK)    |    number_enrolled (F)  | 
subject_id (FK)      |
test_score (F)       |

FK = Foreign Key
PK = Primary Key
F = Fact

As you can see, the two fact tables have overlapping keys, school_code, grade_id, gender_id, and ethnicity_id. In a relational model, each key field has a corresponding table with additional information about the key. This model does not jive with Qlikview's associative nature as Qlikview associates tables based on the name of a field; even when you don't want it to. You do want like named fields to associate to their lookup tables, however you don't want like named fields in your fact tables to associate. Unfortunately you cannot stop this behavior. You must implement a Link Table...

  1. In your Qlikview script, create a temporary fact table, which loads in all fields from your database table:

    [temp_test_scores]:
    LOAD school_code,
         test_code,
         grade_id,
         gender_id,
         ethnicity_id,
         subject_id,
         test_score;
    SQL SELECT * FROM <database connection>
    
  2. Concatenate your keys and remove all individual keys:

    [test_scores]:
    LOAD school_code & '_' test_code & '_' grade_id & '_' gender_id & '_' ethnicity_id & '_' subject_id as test_key,
         test_score
    RESIDENT [temp_test_scores];
    
  3. Repeat Steps 1 & 2 for each fact table:

    [temp_enrollment]:
    LOAD school_code,
         grade_id,
         ethnicity_id,
         gender_id,
         number_enrolled;
    SQL SELECT * FROM <database connection>
    
    [enrollment]:
    LOAD school_code & '_' & grade_id & '_' & ethnicity_id & '_' & gender_id as enrollment_key,
         number_enrolled
    RESIDENT [temp_enrollment];
    
  4. Create your Link Table by concatenating your individual keys into a single table:

    [temp_link_table]:
    LOAD DISTINCT
        school_code,
        test_code,
        grade_id,
        gender_id,
        ethnicity_id,
        subject_id
    RESIDENT [temp_test_scores];
    
    CONCATENATE ([temp_link_table])
    LOAD DISTINCT
        school_code,
        grade_id,
        ethnicity_id,
        gender_id,
        number_enrolled
    RESIDENT [temp_enrollment];
    
    /**
     * The final Link Table will contain all of the individual keys one time as well as your concatenated keys
     */
    [link_table]:
    LOAD DISTINCT
        school_code,
        test_code,
        grade_id,
        gender_id,
        ethnicity_id,
        subject_id,
        school_code & '_' test_code & '_' grade_id & '_' gender_id & '_' ethnicity_id & '_' subject_id as test_key,
        school_code & '_' & grade_id & '_' & ethnicity_id & '_' & gender_id as enrollment_key
    RESIDENT  [temp_link_table]
    
  5. Drop your temp tables so they do not appear in your data model:

    DROP TABLE [temp_test_scores];
    DROP TABLE [temp_enrollment];
    DROP TABLE [temp_link_table];
    

This will remove all associations between your fact tables as there now exist no common field names between them. Each fact table will link to the link table via the created concatenated key. The link table will then associate with each individual lookup table. Your Qlikview data model will not contain any synthetic keys or circular references.

If you create another fact table in the future, just follow steps 1 & 2 again, and add any new individual keys to the Link Table and also add the new concatenated key to the Link Table as well. It scales with little effort.

Good Luck!



回答2:

There are two main strategies to modelling data in QlikView to handle multiple fact tables:

  1. Append your fact tables into one single fact table - usually referred to as a CONCATENATED FACT as QlikView's syntax for appending data to tables is by use of the CONCATENATE prefix (the equivalent of a SQL UNION operation)

  2. Build a link table (what you have done so far) For a majority of implementations, option 1 is the appropriate method. Attributes of a CONCATENATED fact can be summarised as:

Positives:

  1. Performs well due to the reduced number of large tables in the data model
  2. Simple to implement, just append all data to one generic fact table whilst ensuring common dimensions are referenced by common field names

Negatives:

  1. The different facts are NOT directly associated with each other. The implication is important to understand. It means that cross-analysis of facts is typically only achievable by the common dimensions. Any fact specific dimensions do not connect in any way to the records of the facts that do not reference these dimensions. Complex 'set analysis' syntax can to some degree mitigate this shortcoming, but if your core requirement is to do indirect analysis of fact A by fact B's fact specific dimensions then you may need to revert to a link table model instead.

How to construct Link Tables is a complex subject but relies upon traditional database linking table design techniques. It is easy to go wrong and produce linking tables that may seem to produce the correct results in the front-end but is excessively large, consuming memory and CPU resources.

In my experience, a poorly modelled QlikView data model is the most common culprit for causing poor performance.

I hope this quick, far from exhaustive, introduction to multi-fact modelling in QlikView proves of some help and sets you on the right course.



回答3:

The two quickest ways I can think of:

A) You can just left join the fact table into the corresponding tables that they are used in. You will just need to rename the fields to avoid conflicts with the other tables.

B)You can rename the common fields, which can be done by

  1. using a QUALIFY (before you load the fact tables) and UNQUALIFY (after you load the fact tables)
  2. renaming the field using "[Old Field Name] as [New Field Name]"

Assuming that the fact tables have unique id field names that can be linked to the main tables , you shouldn't have to rename anything in the main tables

I'd go with B-1, since that seems a little less of a hassle.

QUALIFY
A,
B,
C,
ID;

FactTable1:
Load ID,
A,
B,
C,
From [FactTable1];

FactTable2:
Load ID,
A,
B,
C,
From [FactTable2];

UNQUALIFY
A,
B,
C,
ID;

EDIT: If you want to create a link table from these, you can concatenate the fact tables into one table where you put all the columns into it (there will be nulls for a lot of the columns, but QlikView is good with nulls).

What i usually do is load the fact tables in and create an id field (either RowNo() or autonumberhash128([list of unique id fieldnames]), then when i load them into a link table, I include that id field in the link table as well. Finally, i drop the all the common fields from the fact tables, so they only exist in the link table.



回答4:

However, each fact table has a different subset of the "shared" fields, so I wouldn't be able to properly key in my fact tables.

One of the inputs to your cartesian dimension would be 'N/A' against Subject and Test Code (since that is not in the enrollments table)

So when you measure by 'Gender' the Test Scores match against dimension records with valid Subjects and Test Codes, and Enrollment matches against records with 'N/A' Subjects and Test Codes

Then when you roll up by Gender, everyhing 'just works' nicely.