Is there any possible way to bulk load data using MLCP as a scheduled task in Marklogic
问题:
回答1:
You can't invoke mlcp via a scheduled task; I recommend trying something like Apache Camel for this.
Camel has a Timer component and a Quartz component, either of which can be used for scheduling.
And here's an example Camel file with a route (commented out, but still operable) that is initiated by a Timer which then writes a file to disk and ingests it via mlcp - https://github.com/rjrudin/ml-camel-client/blob/master/src/main/resources/META-INF/camel-routes.xml .
I've had good success with doing all kinds of processing/scheduling in Camel and then ultimately ingesting content via mlcp. I think it's a good fit for your use case here so you can leverage what mlcp does best - get content into MarkLogic as fast as possible.
回答2:
Scheduled tasks inside MarkLogic
can call external services (using HTTP
), but they don't have a way to run an external command. You do have some options:
- schedule the
MLCP
job externally, using cron on Linux or something along those lines; - restructure your load using
JavaScript
orXQuery
; you can retrieve data from a file system, run it through some transforms, and insert it into the database using modules running inMarkLogic
; - set up a Java app server, have your scheduled task make an
HTTP
request to that server and have the Java app server callMLCP
I think I'd start with the first option, but which one is best depends on your use case.