Web-scraping Rails App Getting Over-Modelled?

2019-07-23 09:01发布

I'd like some opinons on whether I'm over-modeling my app. In this app, I'm saving off html meta data I download from websites. I download the meta tags and make them part of an an array. For each element in the meta_tags array, I want to save that element. But I need to account for situations where, for instance, there are two robots meta metas (one for index and one for follow). So my initial thought was to solve this by creating a "meta_tags" table and saving any meta tags off to their. That woud keep the sites table lean. I would just specify that the site table has many meta_tags.

But then I realized that the meta_tags is going to have a lot of duplicate entries. For instance, if I have two websites that have two robots meta tags (again, one for index and one for follow), then I've got four rows on that table, when I only have two unique records. So now I'm thinking that I should have the sites model do the downloading of html and then have a separate model called "meta tags" that lists all unique meta tags. And then I would associate the sites table with the meta_tags table through a join table called "site_meta_tags" that identifies which site had which meta tags. Is that the best way to set this up? Or am I making this too complicated?

UPDATE: I posted a follow up question here: Rails app has trouble with inter-model saving

1条回答
霸刀☆藐视天下
2楼-- · 2019-07-23 09:47

The "right" number of models and associations depends on your use cases and constraints. If database space is at a premium, database normalization might make more sense. If you want faster lookups, denormalization might make more sense. If you need to optimize certain kinds of lookups, arrange your models and relations for that. All of this said, if you are just prototyping, don't worry too much right now -- start with something that makes sense and see what happens.

The way you described (a many to many relationship) sounds fine to me if you want to be able to lookup in both directions:

  1. for a meta tag first and then find the associated sites
  2. for a site first and then find the associated meta tags

(Note: don't forget to add your indexes.)

By the way, in Rails, for a many to many join table, the Rails convention is to alphabetize the two table names before sticking them together. So it would be "meta_tags_sites" not "sites_meta_tags" by default. See the "has_and_belongs_to_many" section in A Guide to Active Record Associations.

查看更多
登录 后发表回答