Combine two unrelated tables/models with same prim

2019-06-17 10:00发布

I have two unrelated tables with the same primary key.

ip            mac
11.11.11.11   48-C0-09-1F-9B-54
33.33.33.33   4E-10-A3-BC-B8-9D
44.44.44.44   CD-00-60-08-56-2A
55.55.55.55   23-CE-D3-B1-39-A6

ip            type     owner
22.22.22.22   laptop   John Doe
33.33.33.33   server   XYZ Department
44.44.44.44   VM       Mary Smith
66.66.66.66   printer  ZWV Department

The first table is automatically refreshed every minute. I can't change the database structure or the script that populates it.

Both tables have ip as PRIMARY KEY.

In a view, I would like to display a table like this:

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

How can I model this? Should I make one of the two primary keys a foreign key into the other one?

Once the code is in operation, there will be lots of data, so I want to make sure it's fast enough.

What is the fastest way to retrieve the data?


Update:

I tried using OneToOneField for the second table.

This helps me get records that are in both tables, and the records for unauthorized devices (IPs missing in second table):

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

but I can't get the devices that are down (IP's missing in first table):

22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down

I asked for help here, but it seems it can't be done with OneToOneField

3条回答
走好不送
2楼-- · 2019-06-17 10:42

Since ip is primary key in both an the first table is getting updated frequently, I suggest updating the second table and converting the ip in the second table to have ip of the first table as a OneToOneField.

This is how your models should look like:

class ModelA(models.Model):
    ip = models.GenericIPAddressField(unique=True)
    mac = models.CharField(max_length=17, null=True, blank=True)

class ModelB(models.Model):
    ip = models.OneToOneField(ModelA)
    type = models.CharField()
    owner = models.CharField()

docs

You can also have the one to one relation using a separate column:

class ModelB(models.Model):
    ip = models.GenericIPAddressField(unique=True) 
    type = models.CharField()
    owner = models.CharField()
    modelA = models.OneToOneField(ModelA)

So now you can have the ip address as the primary key, and you can still refer to the table ModelA using the field modelA.

查看更多
再贱就再见
3楼-- · 2019-06-17 10:47

Once you have a value from one of both tables just do a query into the other one, looking for id. Since these two tables are separated you must do an extra query. You don't need to create an explicit relation, since you are looking into its "id/ip". So once you have a first value, named 'first_object', just look for its relative into the other table.

other_columns = ModelB.objects.get(id=first_object.id)

Then if you want just 'add' the desired columns to the other model and sent a single object to whatever you want:

first_object.attr1 = other_columns.attr1
...
查看更多
贼婆χ
4楼-- · 2019-06-17 10:48

General idea

You can use qs.union:

  • create 2 models without any relations between them. Don't forget to use class Meta: managed = False
  • select from the first model, annotate with subquery and union with second:
from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import Coalesce

# OperationalDevice fields: ip, mac
# AllowedDevice fields: ip, type, owner

USE_EMPTY_STR_AS_DEFAULT = True

null_char_field = models.CharField(null=True)
if USE_EMPTY_STR_AS_DEFAULT:
    default_value = ''
else:
    default_value = None

# By default Expressions treat strings as "field_name" so if you want to use
# empty string as a second argument for Coalesce, then you should wrap it in
# `Value()`.
# `None` can be used there without wrapping in `Value()`, but in
# `.annotate(type=NoneValue)` it still should be wrapped, so it's easier to
# just "always wrap".
default_value = Value(default_value, output_field=null_char_field)

operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef('ip'))


qs1 = (
    AllowedDevice.objects
    .all()
    .annotate(
        mac=Coalesce(
            Subquery(operational_devices_subquery.values('mac')[:1]),
            default_value,
            output_field=null_char_field,
        ),
    )
)

qs2 = (
    OperationalDevice.objects
    .exclude(
        ip__in=qs1.values('ip'),
    )
    .annotate(
        type=default_value,
        owner=default_value,
    )
)

final_qs = qs1.union(qs2)

Generic approach for multiple fields

A more complex but "universal" approach may use Model._meta.get_fields(). It will be easier to use for cases where "second" model have more that 1 extra field (not only ip,mac). Example code (not tested, but gives general impression):

# One more import:
from django.db.models.fields import NOT_PROVIDED

common_field_name = 'ip'

# OperationalDevice fields: ip, mac, some_more_fields ...
# AllowedDevice fields: ip, type, owner

operational_device_fields = OperationalDevice._meta.get_fields()
operational_device_fields_names = {_f.name for _f in operational_device_fields}  # or set((_f.name for ...))

allowed_device_fields = AllowedDevice._meta.get_fields()
allowed_device_fields_names = {_f.name for _f in allowed_device_fields}  # or set((_f.name for ...))

operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef(common_field_name))

left_joined_qs = (  # "Kind-of". Assuming AllowedDevice to be "left" and OperationalDevice to be "right"
    AllowedDevice.objects
    .all()
    .annotate(
        **{
            _f.name: Coalesce(
                Subquery(operational_devices_subquery.values(_f.name)[1]),
                Value(_f.get_default()),  # Use defaults from model definition
                output_field=_f,
            )
            for _f in operational_device_fields
            if _f.name not in allowed_device_fields_names
            # NOTE: if fields other than `ip` "overlap", then you might consider
            # changing logic here. Current implementation keeps fields from the
            # AllowedDevice
        }
        # Unpacked dict is partially equivalent to this:
        # mac=Coalesce(
        #     Subquery(operational_devices_subquery.values('mac')[:1]),
        #     default_for_mac_eg_fallback_text_value,
        #     output_field=null_char_field,
        # ),
        # other_field = Coalesce(...),
        # ...
    )
)

lonely_right_rows_qs = (
    OperationalDevice.objects
    .exclude(
        ip__in=AllowedDevice.objects.all().values(common_field_name),
    )
    .annotate(
        **{
            _f.name: Value(_f.get_default(), output_field=_f),  # Use defaults from model definition
            for _f in allowed_device_fields
            if _f.name not in operational_device_fields_names
            # NOTE: See previous NOTE
        }
    )
)

final_qs = left_joined_qs.union(lonely_right_rows_qs)

Using OneToOneField for "better" SQL

Theoretically you can use device_info = models.OneToOneField(OperationalDevice, db_column='ip', primary_key=True, related_name='status_info'): in AllowedDevice. In this case your first QS may be defined without use of Subquery:

from django.db.models import F

# Now 'ip' is not in field names ('device_info' is there), so add it:
allowed_device_fields_names.add(common_field_name)

# NOTE: I think this approach will result in a more compact SQL query without 
# multiple `(SELECT "some_field" FROM device_info_table ... ) as "some-field"`.
# This also might result in better query performance.
honest_join_qs = (
    AllowedDevice.objects
    .all()
    .annotate(
        **{
            _f.name: F(f'device_info__{_f.name}')
            for _f in operational_device_fields
            if _f.name not in allowed_device_fields_names
        }
    )
)

final_qs = honest_join_qs.union(lonely_right_rows_qs)
# or:
# final_qs = honest_join_qs.union(
#     OperationalDevice.objects.filter(status_info__isnull=True).annotate(**missing_fields_annotation)
# )
# I'm not sure which approach is better performance-wise...
# Commented one will use something like:
# `SELECT ... FROM "device_info_table" LEFT OUTER JOIN "status_info_table" ON ("device_info_table"."ip" = "status_info_table"."ip") WHERE "status_info_table"."ip" IS NULL
#
# So it might be a little better than first with `union(QS.exclude(ip__in=honest_join_qs.values('ip'))`.
# Because later uses SQL like this:
# `SELECT ... FROM "device_info_table" WHERE NOT ip IN (SELECT ip FROM "status_info_table")`
#
# But it's better to measure timings of both approaches to be sure.
# @GrannyAching, can you compare them and tell in the comments which one is better ?

P.S. To automate models definition you can use manage.py inspectdb

P.P.S. Maybe multi-table inheritance with custom OneToOneField(..., parent_link=True) may be more helpful for you than using union.

查看更多
登录 后发表回答