Haskell, Aeson - Is there a better way of parsing

2019-03-31 23:27发布

问题:

By 'historical data' I just mean dates as key, and value on that day as value.

For example, often govt institutes or uni's research division compile date about earthquakes, rainfalls, market movement, etc. in this format


 {
        "Meta Data": {
            "1: Country": "SomeCountry",
            "2: Region": "SomeRegion",
            "3: Latest Recording": "2018-11-16"
        },
        "EarthQuakes": {
            "2018-11-16": {
                "Richter": "5.2508"
            },
            "2018-11-09": {
                "Richter": "4.8684"
            },
            "2018-11-02": {
                "Richter": "1.8399"
            },
    ...
    ...
    ...
            "1918-11-02": {
                "Richter": "1.8399"
            }
}

Usually it'll have a "Meta Data" section and other one would contain the values/data.


I as a beginner know of two ways to parse these type of documents.

Either you go with general parsing shown in Aeson's documentation where you define data types like this

Data MetaData = MetaData { country :: String, region :: String, latestRec :: String } deriving (Show, Eq, Generic)

Make it an instance of FromJSON

instance FromJSON MetaData where
  parseJSON = withObject "MetaData" $
    \v -> do
       metaData  <- v        .: pack "Meta Data"
       country   <- metaData .: pack "1: Country"
       region    <- metaData .: pack "2: Region"
       latestRec <- metaData .: pack "3: Latest Recording"
       return MetaData{..}

With of course RecordWildCard and DeriveGenerics extensions enabled.


The problem I see with this approach is that it can't be easily implemented for the "EarthQuakes" section.

I'll have to define each and every single date

earthQuakes <- v .: "EarthQuakes"
date1 <- earthQuakes .: "2018-11-16"
date2 <- earthQuakes .: "2018-11-06"
date3 <- earthQuakes .: "2018-11-02"
...
...
dateInfinity <- earthQuakes .: "1918-11-16"

A better approach would be just parsing all the data as default JSON values by decoding the link into Object type

thisFunction = do
    linksContents <- simpleHttp "somelink"
    let y = fromJust (decode linksContents :: Object)
        z = aLotOfFunctionCompositions y
    return z

where the aLotOfFunctionCompositions would first convert Object to maybe HashMap having [(k, v)] pairs. Then I would map an unConstruct function to get the value out of default constructors like

unConstruct (DefaultType value) = case (DefaultType value) of
             DefaultType x -> x

and finally you would get a nice list!

The problem with this approach is the aLotOfFunctionComposition.

That is just an example! But in reality it can look as ugly and unreadable as this

let y = Prelude.map (\(a, b) -> (decode (encode a) :: Maybe String, decode (encode (snd (Prelude.head b))) :: Maybe String)) x
      z = Prelude.map (\(a, b) -> (fromJust a, fromJust b)) y
      a = Prelude.map (\(a, b) -> (a, read b :: Double)) z
      b = Prelude.map (\(a, b) -> (Prelude.filter (/= '-') a, b)) a
      c = Prelude.map (\(a, b) -> (read a :: Int, b)) b

This is snippet from a working code I made.


So my question is this: Is there a better/cleaner way of decoding these sorts of JSON files where you have a lot of "dates" keys and you need to parse them into workable datatypes?

回答1:

Put a Map in your data type. Aeson translates Map k vs to/from objects, where the vs are en-/de-coded via their own To-/From-JSON instances and the ks by To-/From-JSONKeys. It turns out that Day (from the time package) has perfectly suitable To-/From-JSONKey instances.

data EarthquakeData = EarthquakeData {
    metaData :: MetaData,
    earthquakes :: Map Day Earthquake
} deriving (Eq, Show, Generic)

instance FromJSON EarthquakeData where
    parseJSON = withObject "EarthquakeData $ \v ->
        EarthquakeData <$> v .: "Meta Data"
        -- Map k v has a FromJSON instance that just does the right thing
        -- so just get the payloads with (.:)
        -- all this code is actually just because your field names are really !#$@~??
        -- not an Aeson expert, maybe there's a better way
                       <*> v .: "EarthQuakes"
instance ToJSON EarthquakeData where
    toJSON EarthquakeData{..} = object [ "Meta Data"   .= metaData
                                       , "EarthQuakes" .= earthquakes
                                       ]

data MetaData = MetaData { country :: String, region :: String, latestRec :: Day } deriving (Eq, Show)
instance FromJSON MetaData where
    parseJSON = withObject "MetaData" $ \v ->
        -- if you haven't noticed, applicative style is much neater than do
        -- using OverloadedStrings avoids all the pack-ing static
        MetaData <$> v .: "1: Country"
                 <*> v .: "2: Region"
                 <*> v .: "3: Latest Recording"
instance ToJSON MetaData where
    toJSON MetaData{..} = object [ "1: Country"          .= country
                                 , "2: Region"           .= region
                                 , "3: Latest Recording" .= latestRec
                                 ]
    toEncoding MetaData{..} = pairs $ "1: Country"          .= country
                                   <> "2: Region"           .= region
                                   <> "3: Latest Recording" .= latestRec

data Earthquake = Earthquake { richter :: Double } deriving (Eq, Show)
-- Earthquake is a bit funky because your JSON apparently has
-- numbers inside strings?
-- only here do you actually need monadic operations
instance FromJSON Earthquake where
    parseJSON = withObject "Earthquake" $ \v ->
        do string <- v .: "Richter"
           stringNum <- parseJSON string
           case readMaybe stringNum of
             Just num -> return $ Earthquake num
             Nothing -> typeMismatch "Double inside a String" string
instance ToJSON Earthquake where
    toJSON = object . return . ("Richter" .=) . show . richter
    toEncoding = pairs . ("Richter" .=) . show . richter

I've tested this against your example JSON, and it appears to roundtrip encode and decode successfully.