I am using Scrapy to scrape data from a web forum. I am storing this data in a PostgreSQL database using SQLAlchemy. The table and columns create fine, however, I am not able to have SQLAlchemy create an index on one of the columns. I am trying to create a trigram index (pg_trgm) using gin.
The Postgresql code that would create this index is:
CREATE INDEX description_idx ON table USING gin (description gin_trgm_ops);
The SQLAlchemy code I have added to my models.py file is:
desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')
I have added this line to my models.py but when I check in postgresql, the index was never created.
Below are my full models.py and pipelines.py files. Am I going about this all wrong??
Any help would be greatly appreciated!!
models.py:
from sqlalchemy import create_engine, Column, Integer, String, DateTime, Index, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine.url import URL
import settings
DeclarativeBase = declarative_base()
def db_connect():
return create_engine(URL(**settings.DATABASE))
def create_forum_table(engine):
DeclarativeBase.metadata.create_all(engine)
class forumDB(DeclarativeBase):
__tablename__ = "table"
id = Column(Integer, primary_key=True)
title = Column('title', String)
desc = Column('description', String, nullable=True)
desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')
pipelines.py
from scrapy.exceptions import DropItem
from sqlalchemy.orm import sessionmaker
from models import forumDB, db_connect, create_forum_table
class ScrapeforumToDB(object):
def __init__(self):
engine = db_connect()
create_forum_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
session = self.Session()
forumitem = forumDB(**item)
try:
session.add(forumitem)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
Since the
Index
definition usestext
expression it has no references to theTable
"table", which has been implicitly created by the declarative classforumDB
. Compare that to using aColumn
as expression, or some derivative of it, like this:In the above definition the index will know about the table and the other way around.
What this means in your case is that the
Table
"table" has no idea that such an index exists. Adding it as an attribute of the declarative class is the wrong way to do it. It should be passed to the implicitly createdTable
instance. The attribute__table_args__
is just for that:With the modification in place, a call to
create_forum_table(engine)
resulted in:The proper way to reference an Operator Class in SQLAlchemy (such as
gin_trgm_ops
) is to use thepostgresql_ops
parameter. This will also allow tools like alembic to understand how use it when auto-generating migrations.