I use django with neo4j as database. I need to use short url based on node ids in my rest api. In neo4j there is an id used in database that didn't recommended to use in app, and there is approach to use uuid that is too long for my short urls. So I add my uid generator:
def uid_generator():
last_id = db.cypher_query("MATCH (n) RETURN count(*) AS lastId")[0][0][0]
if last_id is None:
last_id = 0
last_id = str(last_id)
hash = sha256()
hash.update(str(time.time()).encode())
return hash.hexdigest()[0:(max(2, len(last_id)))] + str(uuid.uuid4()).replace('-', '')[0:(max(2, len(last_id)))]
I have two question, First I read this question in stack overflow and still not sure that MATCH (n) RETURN count(*) AS lastId
is O(1)
there was no reference to that! Is there any reference for that answer? Second is there a better approach to do in both id uniqueness and speed?
First, you should put a unique constraint on the id property to make sure there are no collisions created by parallel create statements. This requires using a label, but you NEED this fail-safe if you plan to do anything serious with this data. But this way, you can have rolling ids for different labels. (All indexed labels will have a count table. UNIQUE CONSTRAINT also creates an index)
Second, you should do the generation and creation in the same cypher like this
This will minimize time between generation and commit, reducing chances of collision. (Remember to retry on failed attempts from unique violations)
I'm not sure what you are doing with the hash, just that you are doing it wrong. Either you generate a new time based UUID (It will require no parameters) and use it as is, or you use the incriminating id. (By altering a UUID, you invalidate the logic that guaranteed uniqueness, thus significantly increasing collision chance)
You can also store the current index count in a node like is explained here. It's not guaranteed to be thread safe, but shouldn't be a problem as long as you have Unique Constraints in place, and retry on constraint violations. This will be more tolerant of deleting nodes.
Why not create your own identifier? You can get the maximum of your last identifier (let's call it RN for record number).
max is one of several numeric functions in cypher.
Your approach is not good because it's based on the number of node in the database.
What happened if you create a node (call it A), and then delete a random node, and then create a new node (call it B).
A and B will have the same ID, and I think that's why you have added a hash in code based on the time (but I barely understand the line :)).
On the other side, Neo4j's ID ensure you to have a unique ID across the database, but not in the time. Per default, Neo4j recycle unused ID (an ID is release when a node is deleted).
You can change this behavour by changing the configuration (see the doc HERE ) :
dbms.ids.reuse.types.override=RELATIONSHIP
Becarefull with such a configuration, the size of your database on your harddrive can only increase, even if you delete nodes.