Read a list of records and perform ongoing calcula

2019-05-18 04:56发布

This is a playground example inspired by a real task (much more complicated) that I once did. The basic flow is that there is the reading of records from a sequential file. Some of the records contain commands that require examining the previous records to calculate a value.

Undesirable about this solution is that it requires an extra list thus extra duplicate storage. That extra list is called REMEMBER in the following code. This example has a simple record structure just containing one data value so duplicating everything in the REMEMBER list is not a real issue. But please assume the real task involves a much more complicated record structure such that duplicating everything in the REMEMBER list is very undesirable.

I am inclined to use a doubly linked list however per discussion at this link Doubly Linked List in Prolog it seems that is not the Prolog way to do things. Therefore I am interested to see what the Prolog way of doing things should be.

/*
A file contains sequential records.
There are two types of record.
A data record provides a data value.
An average record provides a count and is a request for an average of the last count data values.
The parse dcg below parses a list from the data file.
The report dcg uses that list to generate a report.

After parse the list looks like this:

[value=5.9,value=4.7,value=7.5,average=3,value=9.0,value=1.1,value=8.3,average=5,value=7.1,value=1.3,value=6.7,value=9.9,value=0.5,value=0.3,value=1.5,value=0.2,average=7,value=2.2,value=7.8,value=2.5,value=4.5,value=2.4,value=9.7,average=4,value=5.2,value=8.5,value=2.2,value=8.0,value=0.7].

An example report looks like this:

[[count=3,total=18.1,average=6.033333333333333],[count=5,total=30.599999999999998,average=6.12],[count=7,total=20.400000000000002,average=2.9142857142857146],[count=4,total=19.1,average=4.775]].
*/

:- use_module(library(dcg/basics)).
:- use_module(library(readutil)).
:- use_module(library(clpfd)).
:- use_module(library(clpr)).

dospy
:-
spy(report),
spy(average),
leash(-all).

:- initialization main.

report(LIST)
-->
% the report starts with nothing to REMEMBER.
report(LIST,[]).

report([value=VALUE|LIST],REMEMBER)
-->
% value in the LIST goes into REMEMBER.
report(LIST,[value=VALUE|REMEMBER]).

report([average=COUNT|LIST],REMEMBER)
-->
% request for average in the LIST.
average(REMEMBER,COUNT),
report(LIST,REMEMBER).

report([],_REMEMBER)
% the LIST is empty so the report is done.
--> [].

average(REMEMBER,COUNT)
-->
% the average starts at 0 then accumulates for COUNT values from REMEMBER.
average(REMEMBER,COUNT,0,0.0).

average([value=VALUE|REMEMBER],COUNT,AT,TOTAL)
-->
% found needed value in the REMEMBER.
clpfd( AT #\= COUNT ),
clpfd( AT_NEXT #= AT + 1 ),
clpr( TOTAL_NEXT = TOTAL + VALUE ),
average(REMEMBER,COUNT,AT_NEXT,TOTAL_NEXT).

average(_REMEMBER,COUNT,COUNT,TOTAL)
-->
% all of the needed value have been seen so calculate and add to report. 
clpr( AVERAGE = TOTAL / COUNT ),
[[count=COUNT,total=TOTAL,average=AVERAGE]].

% now the part that does the parse of the data file.

parse(LIST) --> parse(data,LIST).
parse(LIST) --> parse(average,LIST).
parse(LIST) --> parse(end,LIST).

parse(data,[value=FLOAT|LIST])
-->
"data", whites, float(FLOAT), blanks, !,
parse(LIST).

parse(average,[average=COUNT|LIST])
-->
"average", whites, integer(COUNT), blanks, !,
parse(LIST).

parse(end,[])
-->
[].

clpr( CLPR )
-->
{ clpr:{ CLPR } }.

clpfd( CLPFD )
-->
{ CLPFD }.

main :-
system:( read_file_to_codes("doubly_motivation_data.txt",CODES,[]) ),
prolog:( phrase(parse(LIST),CODES) ),
system:( writeq(LIST),writeln(.) ),
prolog:( phrase(report(LIST),REPORT) ),
system:( writeq(REPORT),writeln(.) ).

/* doubly_motivation_data.txt
data 5.9
data 4.7
data 7.5
average 3
data 9.0
data 1.1
data 8.3
average 5
data 7.1
data 1.3
data 6.7
data 9.9
data 0.5
data 0.3
data 1.5
data 0.2
average 7
data 2.2
data 7.8
data 2.5
data 4.5
data 2.4
data 9.7
average 4
data 5.2
data 8.5
data 2.2
data 8.0
data 0.7
*/

标签: list prolog
3条回答
We Are One
2楼-- · 2019-05-18 05:42

So, for starters, I noticed that you can build results in either direction. In other words, the classic "int list" grammar is something like this:

intlist([]) --> [].
intlist([X|Xs]) --> integer(X), whites, intlist(Xs).

This works like so:

?- phrase(intlist(X), "1 23 45 9").
X = [1, 23, 45, 9] ;

But you can flip it around so it parses the list backwards like so:

rintlist([]) --> [].
rintlist([X|Xs]) --> rintlist(Xs), whites, integer(X).

This works, ish:

?- phrase(rintlist(X), "1 23 45 9").
X = [9, 45, 23, 1] 

The problem with this is that putting the recursive call at the front, followed by something like "blanks" that can match empty lists is a recipe for a stack explosion. But, you can also parse things backwards by passing the "previous" state through the DCG itself:

rintlist(L) --> rintlist([], L).
rintlist(Prev, Prev) --> [].
rintlist(Prev, Last) --> integer(X), whites, rintlist([X|Prev], Last).

?- phrase(rintlist(X), "1 23 45 9").
X = [9, 45, 23, 1] .

Now, I think we can solve your problem nicely just from this; I wrote my solution and now see it is pretty similar to @PauloMoura's above, but here it is anyway:

commands(Report) --> record(data(V)), blanks, commands([V], _, Report).

commands(Prev, Prev, []) --> [].
commands(Prev, Last, Report) -->
    record(data(V)),
    blanks,
    commands([V|Prev], Last, Report).
commands(Prev, Last, [report(Count, Total, Avg)|Report]) -->
    record(average(N)),
    blanks,
    { calculate_average(N, Prev, Count, Total, Avg) },
    commands(Prev, Last, Report).

calculate_average(N, Prev, Count, Total, Avg) :-
    length(L, N),
    append(L, _, Prev),
    sumlist(L, Total),
    Avg is Total / N,
    Count = N.

This seems to give similar output to your example:

?- phrase_from_file(commands(C), 'mdata.txt'), write_canonical(C).
[report(3,18.1,6.033333333333334),
 report(5,30.599999999999998,6.119999999999999),
 report(7,20.400000000000002,2.9142857142857146),
 report(4,19.1,4.775)]

Now, expanding it to the doubly-linked list, let's first see what we would need to do to handle the "int list" grammar in a doubly-linked fashion. Much like this one, we have to pass a previous link forward into the recursive call, but making it a bit worse than this one, we need to fill in the "next" link in the previous variable we receive, with the current node. But because that link will be nil the first time, we have to have a bit of conditional logic to ignore that one. And I couldn't think of a sensible empty doubly-linked list, so I changed the base case to be [X] instead of []. So it gets a bit grotty.

% entry point (nil meaning there is no previous)
dlist(X) --> dlist(nil, X).

% base case: last integer
dlist(Prev, node(X, Prev, nil)) --> integer(X).
dlist(Prev, Last) -->
    integer(X), whites,
    {
     Prev = node(PV, PP, Cur)
     -> 
         Cur = node(X, node(PV, PP, Cur), _)
     ;
         Cur = node(X, Prev, _)
    },
    dlist(Cur, Last).

Note the self-reference in Cur = node(..., node(..., Cur), ...). This unification is what "ties the knot" as it were, between the previous link and this link. Let's try it:

?- phrase(dlist(L), "1 23 45 9").
L = node(9, _S2, nil), % where
    _S1 = node(1, nil, node(23, _S1, _S2)),
    _S2 = node(45, node(23, _S1, _S2), _71658) 

A bit hard to read, but basically, 9 points to 45 points to 23 points to 1. We parsed it back-to-front and wound up with pointers in both directions.

What remains to be done at this point is to change the parser to emit records with these pointers instead, and write an averager that works this way. I couldn't quite get there doing the average in-place, so I wrote a helper to give me "up to N previous" from a doubly-linked list:

take_upto(N, DL, Result) :- take_upto(N, 0, DL, [], Result).
take_upto(N, N, _, Result, Result).
take_upto(_, _, nil, Result, Result).
take_upto(N, I, node(V, Prev, _), Rest, Result) :-
    I < N,
    succ(I, I1),
    take_upto(N, I1, Prev, [V|Rest], Result).

This works like so:

?- phrase(dlist(X), "1 2 3 4 5 6 7 8 9 10"), take_upto(5, X, L).
X = node(10, _S2, nil), % where
   ... [trimmed]
L = [6, 7, 8, 9, 10] .


?- phrase(dlist(X), "1 2 3 4 5 6 7"), take_upto(15, X, L).
X = node(7, _S2, nil), % where
   ... [trimmed]
L = [1, 2, 3, 4, 5, 6, 7] .

With this utility in place, we can finish this out:

commandsdl(Report) --> commandsdl(nil, _, Report).
commandsdl(Prev, Prev, []) --> [].
commandsdl(Prev, Last, Report) -->
    record(data(V)),
    blanks,
    {
     Prev = node(PV, PP, Cur)
     ->
         Cur = node(V, node(PV, PP, Cur), _)
     ;
         Cur = node(V, Prev, _)
    },
    commandsdl(Cur, Last, Report).
commandsdl(Prev, Last, [report(Count, Total, Avg)|Report]) -->
    record(average(N)),
    blanks,
    {
       calculate_average_dl(N, Prev, Count, Total, Avg)
    },
    commandsdl(Prev, Last, Report).

calculate_average_dl(N, Prev, Count, Total, Avg) :-
    take_upto(N, Prev, L),
    length(L, Count),
    sumlist(L, Total),
    Avg is Total / Count.

Overall, I'm pleased I was able to make this work, but in this formulation you really don't need the "next" pointers in your doubly-linked list, so I would be inclined to just go for the list implementation above (or perhaps Paulo's implementation if I were looking at Logtalk). Hopefully this illustrates how you could do this with doubly-linked lists, if your actual problem necessitates it despite your model not really needing it. Hope it helps!

查看更多
男人必须洒脱
3楼-- · 2019-05-18 05:54

I would try to exploit DCG semicontext, as explained on metalevel.at page. An OT example, I think is easy to grasp, is this answer (solving a zebra like puzzle in a DCG).

hint1 -->
  kind(brad, K), {dif(K, wheat)}, topping(brad, plain), size(walt, small).
hint2 -->
  size(P1, medium), size(P2, medium), {P1 \= P2},
  flavor(P1, hazelnut), topping(P2, peanut_butter).
...

Hints access the context sharing 'by magic':

kind(P, K) --> state([P, K, _, _, _]).
topping(P, T) --> state([P, _, T, _, _]).
...

the DCG must be called in this way, providing the relevant initial state:

bagels(Sol):- Sol =
    [[brad,_,_,_,_],
     [walt,_,_,_,_],
    ...],
  phrase((hint1, hint2, hint3, hint4, hint5, hint6), [state(Sol)], _).

Now, for your applicative case, this is near to useless (you already solved, just in a verbose way). For start, I don't get why you perform a 2 pass algorithm. Consider how concise could be the code, that yields the very same results what you posted (just displayed differently), in a single pass, making use of library(aggregate) to perform arithmetic. BTW, why clpfd, clpr can count as well... are you really interested in services from CLP for such a simple task?

cc_main :-
    %system:( read_file_to_codes("doubly_motivation_data.txt",CODES,[]) ),
    codes(CODES),
    tokenize_atom(CODES, Tks),
    phrase(cc_report([],Rep), Tks),
    maplist(writeln, Rep).

cc_report(_,[]) --> [].
cc_report(R,Re) -->
    [data,N],
    cc_report([N|R],Re).
cc_report(R,[ave(Ave)=sum(Sum)/C|Re]) -->
    [average,C],
    {aggregate_all((sum(V),count),(
         % no need for doubly linked lists, just peek from stack...
         nth1(I,R,V),I=<C
       ),(Sum,Count)),Ave is Sum/Count},
    cc_report(R,Re).

yields:

?- cc_main.
ave(6.033333333333334)=sum(18.1)/3
ave(6.119999999999999)=sum(30.599999999999998)/5
ave(2.9142857142857146)=sum(20.400000000000002)/7
ave(4.775)=sum(19.1)/4
true .

Anyway, swipl addons offer some useful material. See for instance edgc, an extension to handle many multiple accumulators while doing several visits of the concrete syntax tree of Acquarius Prolog compiler - developed back into 90s.

查看更多
Lonely孤独者°
4楼-- · 2019-05-18 05:55

Follows a Logtalk + SWI-Prolog solution, which doesn't require any materialization of double-linked lists. Only a stack, trivially implemented using a list is required:

------------ reports.lgt ------------
% load the required modules
:- use_module(library(dcg/basics), []).

% ensure desired interpretation of double-quoted text
:- set_prolog_flag(double_quotes, codes).

% optimize the generated code
:- set_logtalk_flag(optimize, on). 


:- object(reports).

    :- public(process/2).

    :- uses(list, [take/3]).
    :- uses(numberlist, [sum/2]).
    :- uses(reader, [file_to_codes/2]).

    :- use_module(dcg_basics, [blanks//0, whites//0, integer//1, float//1]).

    process(File, Report) :-
        file_to_codes(File, Codes),
        phrase(report(Report), Codes).

    report(Report) -->
        data(Value),
        report(Report, [Value]).

    report([], _) -->
        eos.
    report([Record| Records], Stack) -->
        average(Count),
        {compute_record(Count, Stack, Record)},
        report(Records, Stack).
    report(Records, Stack) -->
        data(Value),
        report(Records, [Value| Stack]).

    average(Count) -->
        "average", whites, integer(Count), blanks.

    data(Value) -->
        "data", whites, float(Value), blanks.

    compute_record(Count, Stack, r(Count,Total,Average)) :-
        take(Count, Stack, Values),
        sum(Values, Total),
        Average is Total / Count.

:- end_object.
-------------------------------------

Sample call using the data file in the question:

?- {library(types_loader), library(reader_loader), reports}.
...

?- reports::process('doubly_motivation_data.txt', Report).
Report = [r(3, 18.1, 6.033333333333334), r(5, 30.599999999999998, 6.119999999999999), r(7, 20.400000000000002, 2.9142857142857146), r(4, 19.1, 4.775)] .

As you noticed, I use a more sensible representation for the report than a list of lists. A bit more efficient solution can be coded by combining the take/3 and sum/2 calls into a custom predicate by avoiding the stack prefix with length Count being traversed twice and creating a temporary list of values. For example:

compute_record(Count, Stack, r(Count,Total,Average)) :-
    compute_record(0, Count, Stack, 0, Total),
    Average is Total / Count.

compute_record(Count, Count, _, Total, Total) :-
    !.
compute_record(Count0, Count, [Value| Stack], Total0, Total) :-
    Count1 is Count0 + 1,
    Total1 is Total0 + Value,
    compute_record(Count1, Count, Stack, Total1, Total).

From the sample data file, it seems that the file can end with a request to compute the average of all values in the file. Thus, the report//2 non-terminal must keep the whole stack until all the data file is processed.

查看更多
登录 后发表回答