Date periods in clojure

2019-05-21 03:08发布

问题:

I have a data structure like this:

[{ :2007-08-05 [ { :meat-weight-gain 100} {:meat-weight-loss 80} {:meat-balance 20}]}, 
 { :2007-08-06 [ { :meat-weight-gain 10} {:meat-weight-loss 60} {:meat-balance -30}]},
 { :2007-08-07 [ { :meat-weight-gain 40} {:meat-weight-loss 80} {:meat-balance -70}]}
 { :2007-08-08 [ { :meat-weight-gain 100} {:meat-weight-loss 0} {:meat-balance 30}]}]

How can i iterate through it and return the data period of when the meat balance was negative? A sample data would be something like this:

[ {:end-period-balance -70, :period-start 2007-08-06, :period-end 2007-08-07 } ]

Other than that, can I improve my data structure or it is already ok? If yes, how? Thank you very much.

回答1:

i would advice you to change your data shape to a list of tuples, each containing date and map of balance data. Just like this:

(def data [[:2007-08-05 { :meat-weight-gain 100 :meat-weight-loss 80 :meat-balance 20}], 
           [:2007-08-06 { :meat-weight-gain 10 :meat-weight-loss 60 :meat-balance -30}],
           [:2007-08-07 { :meat-weight-gain 40 :meat-weight-loss 80 :meat-balance -70}]
           [:2007-08-08 { :meat-weight-gain 100 :meat-weight-loss 0 :meat-balance 30}]
           [:2007-08-09 { :meat-weight-gain 19 :meat-weight-loss -20 :meat-balance -10}]])

then it would be easy to classify the periods by weight gain/loss (using partition-by) and collect needed info:

user> (let [parts (partition-by #(-> % second :meat-balance neg?) data)]
        (keep #(let [[p-start _] (first %)
                     [p-end {balance :meat-balance}] (last %)]
                 (when (neg? balance)
                   {:period-start p-start
                    :period-end p-end
                    :end-period-balance balance}))
              parts))

;;=> ({:period-start :2007-08-06, :period-end :2007-08-07, :end-period-balance -70} 
;;    {:period-start :2007-08-09, :period-end :2007-08-09, :end-period-balance -10})

or a list of maps including date:

(def data [{:date :2007-08-05 :meat-weight-gain 100 :meat-weight-loss 80 :meat-balance 20}, 
           {:date :2007-08-06 :meat-weight-gain 10 :meat-weight-loss 60 :meat-balance -30},
           {:date :2007-08-07 :meat-weight-gain 40 :meat-weight-loss 80 :meat-balance -70}
           {:date :2007-08-08 :meat-weight-gain 100 :meat-weight-loss 0 :meat-balance 30}
           {:date :2007-08-09 :meat-weight-gain 100 :meat-weight-loss 0 :meat-balance -10}])

user> (let [parts (partition-by #(-> % :meat-balance neg?) data)]
        (keep #(let [{p-start :date} (first %)
                     {p-end :date balance :meat-balance} (last %)]
                 (when (neg? balance)
                   {:period-start p-start
                    :period-end p-end
                    :end-period-balance balance}))
              parts))

;;=> ({:period-start :2007-08-06, :period-end :2007-08-07, :end-period-balance -70} 
;;    {:period-start :2007-08-09, :period-end :2007-08-09, :end-period-balance -10})

UPDATE

if you really need your initial data format, then you can use the same approach, just redefining values retrieval parts:

user> (defn meat-balance [rec]
        (some :meat-balance (-> rec first second)))

user> (let [parts (partition-by #(-> % meat-balance neg?) data)]
        (keep #(let [p-start (-> % first ffirst)
                     p-end (-> % last ffirst)
                     balance (-> % first meat-balance)]
                 (when (neg? balance)
                   {:period-start p-start
                    :period-end p-end
                    :end-period-balance balance}))
              parts))
;;=> ({:period-start :2007-08-06, :period-end :2007-08-07, :end-period-balance -30})


回答2:

Change the format of the data:

  • Consolidate the vector of data for each date into a single map .
  • Make the whole thing a map, keyed by the date keywords.
  • Lose the :meat-weight-balance data - it's redundant.

(The first two changes follow @leetwinski's advice)

We get ...

(def data
  {:2007-08-05 {:meat-weight-gain 100, :meat-weight-loss 80},
   :2007-08-06 {:meat-weight-gain 10, :meat-weight-loss 60},
   :2007-08-07 {:meat-weight-gain 40, :meat-weight-loss 80},
   :2007-08-08 {:meat-weight-gain 100, :meat-weight-loss 0}})

The entries happen to be in date order, because it's a small map. If we want to ensure date order, we'd better have a sorted map:

(def sorted-data (into (sorted-map) data))

This doesn't look any different, but will always present the data in key order, which is - thankfully - date order.

This seems a long way round to get the records into the original order in the vector, but the vector has the unused date-keyword order cutting across it: Don't Repeat Yourself.

Let's calculate the daily balances:

(def balances
  (map-vals #(- (:meat-weight-gain %)  (:meat-weight-loss %)) sorted-data))

balances
=> {:2007-08-05 20, :2007-08-06 -50, :2007-08-07 -40, :2007-08-08 100}

... where the map-vals function is an analogue of map and mapv that works on the values of a map:

(defn map-vals [f m]
  (into (empty m) (map (fn [[k v]] [k (f v)])) m))

Notice that it returns the same species of map as it's given, in this case a sorted one.

We want to know over what periods there was a net weight loss. It isn't clear what this means. Let's look at the net weight gains from the start:

(reductions (fn [[_ av] [k v]] [k (+ av v)]) balances)
=> ([:2007-08-05 20] [:2007-08-06 -30] [:2007-08-07 -70] [:2007-08-08 30])

Or we could partition the sequence into gaining and losing sections:

(partition-by (fn [[_ v]] (neg? v)) balances)
=> (([:2007-08-05 20]) ([:2007-08-06 -50] [:2007-08-07 -40]) ([:2007-08-08 100]))

We need a variant of partition-by that keys its sub-sequences by the value of the discriminating function, as group-by does. Then you know what's a gaining range and what's a losing one. A cheap and cheerful version is ...

(defn group-partition-by [f coll]
  (let [parts (partition-by f coll)]
    (map #(-> % first f (list %)) parts)))

Then

(group-partition-by (fn [[_ v]] (neg? v)) balances)
=> ((false ([:2007-08-05 20]))
    (true ([:2007-08-06 -50] [:2007-08-07 -40]))
    (false ([:2007-08-08 100])))

You might want to reduce this data to a (sorted) map from date-range to total balance.


Conversion

How do we get from given to data? We can get to sorted-data directly as follows:

(def sorted-data
  (->> given
       (into (sorted-map))
       (map-vals (comp #(into {} %) #(remove :meat-balance %)))))

sorted-data
=>
{:2007-08-05 {:meat-weight-gain 100, :meat-weight-loss 80},
 :2007-08-06 {:meat-weight-gain 10, :meat-weight-loss 60},
 :2007-08-07 {:meat-weight-gain 40, :meat-weight-loss 80},
 :2007-08-08 {:meat-weight-gain 100, :meat-weight-loss 0}}

Impressions

  • You have to get to know the sequence library thoroughly.
  • Corresponding facilities for maps aren't on the surface. Getting to grips with transducers would help - not sure how much.

Note

You had better be using European, not American dates, otherwise you are going to need a cleverer keyfn to get the records in date sequence. I'd prefer clj-time local-dates to keywords as keys

  • in case the code crosses the Atlantic;
  • so that you can run validity checks, such as that you have a record for every day.


回答3:

As it already has been said above, your data are not structured well for such a purpose. Here is a step-by-step solution:

Prepare your data:

(def data
  [{ :2007-08-05 [ { :meat-weight-gain 100} {:meat-weight-loss 80} {:meat-balance 20}]}, 
   { :2007-08-06 [ { :meat-weight-gain 10} {:meat-weight-loss 60} {:meat-balance -30}]},
   { :2007-08-07 [ { :meat-weight-gain 40} {:meat-weight-loss 80} {:meat-balance -70}]}
   { :2007-08-08 [ { :meat-weight-gain 100} {:meat-weight-loss 0} {:meat-balance 30}]}])

Create a new data structure:

(defn turner [stats]
  (apply merge
         {:year (-> stats keys first)}
         (-> stats vals first)))

(def data2 (mapv turner data))

[{:year :2007-08-05, :meat-weight-gain 100, :meat-weight-loss 80, :meat-balance 20}
 {:year :2007-08-06, :meat-weight-gain 10, :meat-weight-loss 60, :meat-balance -30}
 {:year :2007-08-07, :meat-weight-gain 40, :meat-weight-loss 80, :meat-balance -70}
 {:year :2007-08-08, :meat-weight-gain 100, :meat-weight-loss 0, :meat-balance 30}]

Now you group your data by a predicate that check whether the balance was negative or not:

(partition-by #(-> % :meat-balance neg?) (sort-by :year data2))

(({:year :2007-08-05, :meat-weight-gain 100, :meat-weight-loss 80, :meat-balance 20})
 ({:year :2007-08-06, :meat-weight-gain 10, :meat-weight-loss 60, :meat-balance -30}
  {:year :2007-08-07, :meat-weight-gain 40, :meat-weight-loss 80, :meat-balance -70})
 ({:year :2007-08-08, :meat-weight-gain 100, :meat-weight-loss 0, :meat-balance 30}))

Let it be data3. Then, filter that data structure to get only negative ones:

(filter #(-> % first :meat-balance neg?) data3)

(({:year :2007-08-06, :meat-weight-gain 10, :meat-weight-loss 60, :meat-balance -30}
  {:year :2007-08-07, :meat-weight-gain 40, :meat-weight-loss 80, :meat-balance -70}))

Let it be data4. Now you get the boundaries:

{:period-start (-> data4 first first :year) 
 :period-end (-> data4 first last :year) 
 :end-period-balance (-> data4 first last :meat-balance)}

what gives you exactly

{:period-start :2007-08-06, 
 :period-end :2007-08-07, 
 :end-period-balance -70}


回答4:

First of all the complex input data structure can be disentangled:

(map (juxt ffirst (comp first #(keep :meat-balance %) val first)))
;;=> ([:2007-08-05 20] [:2007-08-06 -30] [:2007-08-07 -70] [:2007-08-08 30])

... into tuples of [date-keyword meat-balance].

Notice that so far we are keeping both positive and negative meat balances. The answer requires negative runs i.e. contiguous negative meat balances. partition-by is the go to function for any kind of run, after which we can filter to get only the partitioned groups required for the answer. And before anything we need to sort because your date keys were originally in a map and maps are unsorted. After sorting, partitioning and filtering we are ready to deliver the answer, which simply entails transforming our canonical [date-keyword meat-balance] data structure into the required structure:

(->> data
     (map (juxt ffirst (comp first #(keep :meat-balance %) val first)))
     (sort-by first)
     (partition-by #(-> % second neg?))
     (filter #(-> % first second neg?))
     (map (fn [neg-run]
           (let [[start-date _] (first neg-run)
                 [end-date end-value] (last neg-run)]
             {:period-start start-date
              :period-end end-date
              :end-period-balance end-value})))
;;=> [{:end-period-balance -70, :period-start 2007-08-06, :period-end 2007-08-07 }]