I need to create AVRO file but for that I need 2 things:
1) JSON
2) Avro Schema
From these 2 requirements - I have JSON:
{"web-app": {
"servlet": [
{
"servlet-name": "cofaxCDS",
"servlet-class": "org.cofax.cds.CDSServlet",
"init-param": {
"configGlossary:installationAt": "Philadelphia, PA",
"configGlossary:adminEmail": "ksm@pobox.com",
"configGlossary:poweredBy": "Cofax",
"configGlossary:poweredByIcon": "/images/cofax.gif",
"configGlossary:staticPath": "/content/static",
"templateProcessorClass": "org.cofax.WysiwygTemplate",
"templateLoaderClass": "org.cofax.FilesTemplateLoader",
"templatePath": "templates",
"templateOverridePath": "",
"defaultListTemplate": "listTemplate.htm",
"defaultFileTemplate": "articleTemplate.htm",
"useJSP": false,
"jspListTemplate": "listTemplate.jsp",
"jspFileTemplate": "articleTemplate.jsp",
"cachePackageTagsTrack": 200,
"cachePackageTagsStore": 200,
"cachePackageTagsRefresh": 60,
"cacheTemplatesTrack": 100,
"cacheTemplatesStore": 50,
"cacheTemplatesRefresh": 15,
"cachePagesTrack": 200,
"cachePagesStore": 100,
"cachePagesRefresh": 10,
"cachePagesDirtyRead": 10,
"searchEngineListTemplate": "forSearchEnginesList.htm",
"searchEngineFileTemplate": "forSearchEngines.htm",
"searchEngineRobotsDb": "WEB-INF/robots.db",
"useDataStore": true,
"dataStoreClass": "org.cofax.SqlDataStore",
"redirectionClass": "org.cofax.SqlRedirection",
"dataStoreName": "cofax",
"dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
"dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
"dataStoreUser": "sa",
"dataStorePassword": "dataStoreTestQuery",
"dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
"dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
"dataStoreInitConns": 10,
"dataStoreMaxConns": 100,
"dataStoreConnUsageLimit": 100,
"dataStoreLogLevel": "debug",
"maxUrlLength": 500}},
{
"servlet-name": "cofaxEmail",
"servlet-class": "org.cofax.cds.EmailServlet",
"init-param": {
"mailHost": "mail1",
"mailHostOverride": "mail2"}},
{
"servlet-name": "cofaxAdmin",
"servlet-class": "org.cofax.cds.AdminServlet"},
{
"servlet-name": "fileServlet",
"servlet-class": "org.cofax.cds.FileServlet"},
{
"servlet-name": "cofaxTools",
"servlet-class": "org.cofax.cms.CofaxToolsServlet",
"init-param": {
"templatePath": "toolstemplates/",
"log": 1,
"logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
"logMaxSize": "",
"dataLog": 1,
"dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
"dataLogMaxSize": "",
"removePageCache": "/content/admin/remove?cache=pages&id=",
"removeTemplateCache": "/content/admin/remove?cache=templates&id=",
"fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
"lookInContext": 1,
"adminGroupID": 4,
"betaServer": true}}],
"servlet-mapping": {
"cofaxCDS": "/",
"cofaxEmail": "/cofaxutil/aemail/*",
"cofaxAdmin": "/admin/*",
"fileServlet": "/static/*",
"cofaxTools": "/tools/*"},
"taglib": {
"taglib-uri": "cofax.tld",
"taglib-location": "/WEB-INF/tlds/cofax.tld"}}}
But how to create AVRO Schema based on it?
Looking for programatic way to do that since will have many schemas and can not create Avro Schema manually every time.
I checked 'avro-tools-1.8.1.jar' but that can not create Avro Schema from JSON directly.
Looking for a Jar or Python code that can create JSON -> Avro schema. It is ok if Data Types are not perfect (Strings, Integers and Floats are good enough for start).
If you want to avoid creating a dedicated AVRO schema for every JSON format, you can use
rec-avro
package.It allows you to take any python data structure, including parsed XML or JSON and store it in Avro without a need for a dedicated schema.
I tested it for python 3.
You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro
I gave a json to avro example code here: https://stackoverflow.com/a/55444481/6654219
This one works cool with a simple copy and paste of avro schema.
https://toolslick.com/generation/metadata/avro-schema-from-json
Give this one a shot. http://www.dataedu.ca/avro
It basically infers the Avro schema that accepts the JSON.
You can even give it a JSON array. What it would do is generating an Avro schema that is compatible with all the JSON documents in your array.
There are other tools that you can verify the result.
you can use Kite SDK util to infer avro schema from a json input.
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/JsonUtil.java#L539
Example:
Result:
You can find the maven dependency here