I have two unrelated tables with the same primary key.
ip mac
11.11.11.11 48-C0-09-1F-9B-54
33.33.33.33 4E-10-A3-BC-B8-9D
44.44.44.44 CD-00-60-08-56-2A
55.55.55.55 23-CE-D3-B1-39-A6
ip type owner
22.22.22.22 laptop John Doe
33.33.33.33 server XYZ Department
44.44.44.44 VM Mary Smith
66.66.66.66 printer ZWV Department
The first table is automatically refreshed every minute. I can't change the database structure or the script that populates it.
Both tables have ip
as PRIMARY KEY.
In a view, I would like to display a table like this:
ip mac type owner Alert
11.11.11.11 48-C0-09-1F-9B-54 Unauthorized
55.55.55.55 23-CE-D3-B1-39-A6 Unauthorized
22.22.22.22 laptop John Doe Down
66.66.66.66 printer ZWV Department Down
33.33.33.33 4E-10-A3-BC-B8-9D server XYZ Department OK
44.44.44.44 CD-00-60-08-56-2A VM Mary Smith OK
How can I model this? Should I make one of the two primary keys a foreign key into the other one?
Once the code is in operation, there will be lots of data, so I want to make sure it's fast enough.
What is the fastest way to retrieve the data?
Update:
I tried using OneToOneField
for the second table.
This helps me get records that are in both tables, and the records for unauthorized devices (IPs missing in second table):
ip mac type owner Alert
11.11.11.11 48-C0-09-1F-9B-54 Unauthorized
55.55.55.55 23-CE-D3-B1-39-A6 Unauthorized
33.33.33.33 4E-10-A3-BC-B8-9D server XYZ Department OK
44.44.44.44 CD-00-60-08-56-2A VM Mary Smith OK
but I can't get the devices that are down (IP's missing in first table):
22.22.22.22 laptop John Doe Down
66.66.66.66 printer ZWV Department Down
I asked for help here, but it seems it can't be done with OneToOneField
General idea
You can use qs.union:
- create 2 models without any relations between them. Don't forget to use
class Meta: managed = False
- select from the first model, annotate with subquery and union with second:
from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import Coalesce
# OperationalDevice fields: ip, mac
# AllowedDevice fields: ip, type, owner
USE_EMPTY_STR_AS_DEFAULT = True
null_char_field = models.CharField(null=True)
if USE_EMPTY_STR_AS_DEFAULT:
default_value = ''
else:
default_value = None
# By default Expressions treat strings as "field_name" so if you want to use
# empty string as a second argument for Coalesce, then you should wrap it in
# `Value()`.
# `None` can be used there without wrapping in `Value()`, but in
# `.annotate(type=NoneValue)` it still should be wrapped, so it's easier to
# just "always wrap".
default_value = Value(default_value, output_field=null_char_field)
operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef('ip'))
qs1 = (
AllowedDevice.objects
.all()
.annotate(
mac=Coalesce(
Subquery(operational_devices_subquery.values('mac')[:1]),
default_value,
output_field=null_char_field,
),
)
)
qs2 = (
OperationalDevice.objects
.exclude(
ip__in=qs1.values('ip'),
)
.annotate(
type=default_value,
owner=default_value,
)
)
final_qs = qs1.union(qs2)
Generic approach for multiple fields
A more complex but "universal" approach may use Model._meta.get_fields()
. It will be easier to use for cases where "second" model have more that 1 extra field (not only ip,mac
). Example code (not tested, but gives general impression):
# One more import:
from django.db.models.fields import NOT_PROVIDED
common_field_name = 'ip'
# OperationalDevice fields: ip, mac, some_more_fields ...
# AllowedDevice fields: ip, type, owner
operational_device_fields = OperationalDevice._meta.get_fields()
operational_device_fields_names = {_f.name for _f in operational_device_fields} # or set((_f.name for ...))
allowed_device_fields = AllowedDevice._meta.get_fields()
allowed_device_fields_names = {_f.name for _f in allowed_device_fields} # or set((_f.name for ...))
operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef(common_field_name))
left_joined_qs = ( # "Kind-of". Assuming AllowedDevice to be "left" and OperationalDevice to be "right"
AllowedDevice.objects
.all()
.annotate(
**{
_f.name: Coalesce(
Subquery(operational_devices_subquery.values(_f.name)[1]),
Value(_f.get_default()), # Use defaults from model definition
output_field=_f,
)
for _f in operational_device_fields
if _f.name not in allowed_device_fields_names
# NOTE: if fields other than `ip` "overlap", then you might consider
# changing logic here. Current implementation keeps fields from the
# AllowedDevice
}
# Unpacked dict is partially equivalent to this:
# mac=Coalesce(
# Subquery(operational_devices_subquery.values('mac')[:1]),
# default_for_mac_eg_fallback_text_value,
# output_field=null_char_field,
# ),
# other_field = Coalesce(...),
# ...
)
)
lonely_right_rows_qs = (
OperationalDevice.objects
.exclude(
ip__in=AllowedDevice.objects.all().values(common_field_name),
)
.annotate(
**{
_f.name: Value(_f.get_default(), output_field=_f), # Use defaults from model definition
for _f in allowed_device_fields
if _f.name not in operational_device_fields_names
# NOTE: See previous NOTE
}
)
)
final_qs = left_joined_qs.union(lonely_right_rows_qs)
Using OneToOneField for "better" SQL
Theoretically you can use device_info = models.OneToOneField(OperationalDevice, db_column='ip', primary_key=True, related_name='status_info')
: in AllowedDevice
. In this case your first QS may be defined without use of Subquery
:
from django.db.models import F
# Now 'ip' is not in field names ('device_info' is there), so add it:
allowed_device_fields_names.add(common_field_name)
# NOTE: I think this approach will result in a more compact SQL query without
# multiple `(SELECT "some_field" FROM device_info_table ... ) as "some-field"`.
# This also might result in better query performance.
honest_join_qs = (
AllowedDevice.objects
.all()
.annotate(
**{
_f.name: F(f'device_info__{_f.name}')
for _f in operational_device_fields
if _f.name not in allowed_device_fields_names
}
)
)
final_qs = honest_join_qs.union(lonely_right_rows_qs)
# or:
# final_qs = honest_join_qs.union(
# OperationalDevice.objects.filter(status_info__isnull=True).annotate(**missing_fields_annotation)
# )
# I'm not sure which approach is better performance-wise...
# Commented one will use something like:
# `SELECT ... FROM "device_info_table" LEFT OUTER JOIN "status_info_table" ON ("device_info_table"."ip" = "status_info_table"."ip") WHERE "status_info_table"."ip" IS NULL
#
# So it might be a little better than first with `union(QS.exclude(ip__in=honest_join_qs.values('ip'))`.
# Because later uses SQL like this:
# `SELECT ... FROM "device_info_table" WHERE NOT ip IN (SELECT ip FROM "status_info_table")`
#
# But it's better to measure timings of both approaches to be sure.
# @GrannyAching, can you compare them and tell in the comments which one is better ?
P.S. To automate models definition you can use manage.py inspectdb
P.P.S. Maybe multi-table inheritance with custom OneToOneField(..., parent_link=True)
may be more helpful for you than using union
.
Since ip
is primary key in both an the first table is getting updated frequently, I suggest updating the second table and converting the ip
in the second table to have ip
of the first table as a OneToOneField
.
This is how your models should look like:
class ModelA(models.Model):
ip = models.GenericIPAddressField(unique=True)
mac = models.CharField(max_length=17, null=True, blank=True)
class ModelB(models.Model):
ip = models.OneToOneField(ModelA)
type = models.CharField()
owner = models.CharField()
docs
You can also have the one to one relation using a separate column:
class ModelB(models.Model):
ip = models.GenericIPAddressField(unique=True)
type = models.CharField()
owner = models.CharField()
modelA = models.OneToOneField(ModelA)
So now you can have the ip address as the primary key, and you can still refer to the table ModelA
using the field modelA
.
Once you have a value from one of both tables just do a query into the other one, looking for id. Since these two tables are separated you must do an extra query. You don't need to create an explicit relation, since you are looking into its "id/ip". So once you have a first value, named 'first_object', just look for its relative into the other table.
other_columns = ModelB.objects.get(id=first_object.id)
Then if you want just 'add' the desired columns to the other model and sent a single object to whatever you want:
first_object.attr1 = other_columns.attr1
...