Is there a way to programmatically convert JSON to

2020-07-11 08:36发布

I need to create AVRO file but for that I need 2 things:

1) JSON

2) Avro Schema

From these 2 requirements - I have JSON:

{"web-app": {
  "servlet": [   
    {
      "servlet-name": "cofaxCDS",
      "servlet-class": "org.cofax.cds.CDSServlet",
      "init-param": {
        "configGlossary:installationAt": "Philadelphia, PA",
        "configGlossary:adminEmail": "ksm@pobox.com",
        "configGlossary:poweredBy": "Cofax",
        "configGlossary:poweredByIcon": "/images/cofax.gif",
        "configGlossary:staticPath": "/content/static",
        "templateProcessorClass": "org.cofax.WysiwygTemplate",
        "templateLoaderClass": "org.cofax.FilesTemplateLoader",
        "templatePath": "templates",
        "templateOverridePath": "",
        "defaultListTemplate": "listTemplate.htm",
        "defaultFileTemplate": "articleTemplate.htm",
        "useJSP": false,
        "jspListTemplate": "listTemplate.jsp",
        "jspFileTemplate": "articleTemplate.jsp",
        "cachePackageTagsTrack": 200,
        "cachePackageTagsStore": 200,
        "cachePackageTagsRefresh": 60,
        "cacheTemplatesTrack": 100,
        "cacheTemplatesStore": 50,
        "cacheTemplatesRefresh": 15,
        "cachePagesTrack": 200,
        "cachePagesStore": 100,
        "cachePagesRefresh": 10,
        "cachePagesDirtyRead": 10,
        "searchEngineListTemplate": "forSearchEnginesList.htm",
        "searchEngineFileTemplate": "forSearchEngines.htm",
        "searchEngineRobotsDb": "WEB-INF/robots.db",
        "useDataStore": true,
        "dataStoreClass": "org.cofax.SqlDataStore",
        "redirectionClass": "org.cofax.SqlRedirection",
        "dataStoreName": "cofax",
        "dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
        "dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
        "dataStoreUser": "sa",
        "dataStorePassword": "dataStoreTestQuery",
        "dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
        "dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
        "dataStoreInitConns": 10,
        "dataStoreMaxConns": 100,
        "dataStoreConnUsageLimit": 100,
        "dataStoreLogLevel": "debug",
        "maxUrlLength": 500}},
    {
      "servlet-name": "cofaxEmail",
      "servlet-class": "org.cofax.cds.EmailServlet",
      "init-param": {
      "mailHost": "mail1",
      "mailHostOverride": "mail2"}},
    {
      "servlet-name": "cofaxAdmin",
      "servlet-class": "org.cofax.cds.AdminServlet"},

    {
      "servlet-name": "fileServlet",
      "servlet-class": "org.cofax.cds.FileServlet"},
    {
      "servlet-name": "cofaxTools",
      "servlet-class": "org.cofax.cms.CofaxToolsServlet",
      "init-param": {
        "templatePath": "toolstemplates/",
        "log": 1,
        "logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
        "logMaxSize": "",
        "dataLog": 1,
        "dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
        "dataLogMaxSize": "",
        "removePageCache": "/content/admin/remove?cache=pages&id=",
        "removeTemplateCache": "/content/admin/remove?cache=templates&id=",
        "fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
        "lookInContext": 1,
        "adminGroupID": 4,
        "betaServer": true}}],
  "servlet-mapping": {
    "cofaxCDS": "/",
    "cofaxEmail": "/cofaxutil/aemail/*",
    "cofaxAdmin": "/admin/*",
    "fileServlet": "/static/*",
    "cofaxTools": "/tools/*"},

  "taglib": {
    "taglib-uri": "cofax.tld",
    "taglib-location": "/WEB-INF/tlds/cofax.tld"}}}

But how to create AVRO Schema based on it?

Looking for programatic way to do that since will have many schemas and can not create Avro Schema manually every time.

I checked 'avro-tools-1.8.1.jar' but that can not create Avro Schema from JSON directly.

Looking for a Jar or Python code that can create JSON -> Avro schema. It is ok if Data Types are not perfect (Strings, Integers and Floats are good enough for start).

4条回答
叛逆
2楼-- · 2020-07-11 09:09

If you want to avoid creating a dedicated AVRO schema for every JSON format, you can use rec-avro package.

It allows you to take any python data structure, including parsed XML or JSON and store it in Avro without a need for a dedicated schema.

I tested it for python 3.

You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro

I gave a json to avro example code here: https://stackoverflow.com/a/55444481/6654219

查看更多
淡お忘
3楼-- · 2020-07-11 09:18

This one works cool with a simple copy and paste of avro schema.

https://toolslick.com/generation/metadata/avro-schema-from-json

查看更多
何必那么认真
4楼-- · 2020-07-11 09:25

Give this one a shot. http://www.dataedu.ca/avro

It basically infers the Avro schema that accepts the JSON.

You can even give it a JSON array. What it would do is generating an Avro schema that is compatible with all the JSON documents in your array.

There are other tools that you can verify the result.

查看更多
迷人小祖宗
5楼-- · 2020-07-11 09:27

you can use Kite SDK util to infer avro schema from a json input.

https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/JsonUtil.java#L539

Example:

    String json = "{\n" +
            "    \"id\": 1,\n" +
            "    \"name\": \"A green door\",\n" +
            "    \"price\": 12.50,\n" +
            "    \"tags\": [\"home\", \"green\"]\n" +
            "}\n"
            ;
    String avroSchema = JsonUtil.inferSchema(JsonUtil.parse(json), "myschema").toString();
    System.out.println(avroSchema);

Result:

{  
   "type":"record",
   "name":"myschema",
   "fields":[  
      {  
         "name":"id",
         "type":"int",
         "doc":"Type inferred from '1'"
      },
      {  
         "name":"name",
         "type":"string",
         "doc":"Type inferred from '\"A green door\"'"
      },
      {  
         "name":"price",
         "type":"double",
         "doc":"Type inferred from '12.5'"
      },
      {  
         "name":"tags",
         "type":{  
            "type":"array",
            "items":"string"
         },
         "doc":"Type inferred from '[\"home\",\"green\"]'"
      }
   ]
}

You can find the maven dependency here

查看更多
登录 后发表回答