Twitter stream api with agents in F#

From Don Syme blog (http://blogs.msdn.com/b/dsyme/archive/2010/01/10/async-and-parallel-design-patterns-in-f-reporting-progress-with-events-plus-twitter-sample.aspx) I tried to implement a twitter stream listener. My goal is to follow the guidance of the twitter api documentation which says "that tweets should often be saved or queued before processing when building a high-reliability system".

So my code needs to have two components:

A queue that piles up and processes each status/tweet json
Something to read the twitter stream that dumps to the queue the tweet in json strings

I choose the following:

An agent to which I post each tweet, that decodes the json, and dumps it to database
A simple http webrequest

I also would like to dump into a text file any error from inserting in the database. ( I will probably switch to a supervisor agent for all the errors).

Two problems:

is my strategy here any good ? If I understand correctly, the agent behaves like a smart queue and processes its messages asynchronously ( if it has 10 guys on its queue it will process a bunch of them at time, instead of waiting for the 1 st one to finish then the 2nd etc...), correct ?
According to Don Syme's post everything before the while is Isolated so the StreamWriter and the database dump are Isolated. But because I need this, I never close my database connection... ?

The code looks something like:

let dumpToDatabase databaseName = 
   //opens databse connection 
   fun tweet -> inserts tweet in database

type Agent<'T> = MailboxProcessor<'T>



 let agentDump =
            Agent.Start(fun (inbox: MailboxProcessor<string>) ->
               async{
                   use w2 = new StreamWriter(@"\Errors.txt")
                   let dumpError  =fun (error:string) -> w2.WriteLine( error )
                   let dumpTweet =  dumpToDatabase "stream"
                   while true do 
                       let! msg = inbox.Receive()
                       try 
                           let tw = decode msg
                           dumpTweet tw
                       with 
                       | :? MySql.Data.MySqlClient.MySqlException as ex -> 
    dumpError (msg+ex.ToString() ) 
                        | _ as ex -> () 



                             }
                             )

    let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
    let parameters = "track=RT&"
    let stream_url = filter_url

    let stream = twitterStream MyCredentials stream_url parameters


    while true do 
        agentDump.Post(stream.ReadLine())

Thanks a lot !

Edit of code with processor agent:

let dumpToDatabase (tweets:tweet list)= 
    bulk insert of tweets in database    

let agentProcessor = 
        Agent.Start(fun (inbox: MailboxProcessor<string list>) ->
           async{
               while true do 
                       let! msg = inbox.Receive()
                       try
                          msg
                          |> List.map(decode)
                          |> dumpToDatabase 
                        with
                        | _ as ex -> Console.WriteLine("Processor "+ex.ToString()))
                 }
                 )



let agentDump =
        Agent.Start(fun (inbox: MailboxProcessor<string>) ->
                  let rec loop messageList count = async{
                      try
                          let! newMsg = inbox.Receive()
                          let newMsgList = newMsg::messageList
                          if count = 10 then 
                               agentProcessor.Post( newMsgList )
                               return! loop [] 0
                          else                    
                               return! loop newMsgList (count+1)
                      with
                      | _ as ex -> Console.WriteLine("Dump "+ex.ToString())

                  }
                  loop [] 0)

let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url

let stream = twitterStream MyCredentials stream_url parameters


while true do 
    agentDump.Post(stream.ReadLine())

标签： api f# twitter stream agents

1条回答

不美不萌又怎样

2楼-- · 2019-05-23 08:27

I think that the best way to describe agent is that it is is a running process that keeps some state and can communicate with other agents (or web pages or database). When writing agent-based application, you can often use multiple agents that send messages to each other.

I think that the idea to create an agent that reads tweets from the web and stores them in a database is a good choice (though you could also keep the tweets in memory as the state of the agent).

I wouldn't keep the database connection open all the time - MSSQL (and MySQL likely too) implements connection pooling, so it will not close the connection automatically when you release it. This means that it is safer and similarly efficient to reopen the connection each time you need to write data to the database.
Unless you expect to receive a large number of error messages, I would probably do the same for file stream as well (when writing, you can open it, so that new content is added to the end).

The way queue of F# agents work is that it processes messages one by one (in your example, you're waiting for a message using inbox.Receive(). When the queue contains multiple messages, you'll get them one by one (in a loop).

If you wanted to process multiple messages at once, you could write an agent that waits for, say, 10 messages and then sends them as a list to another agent (which would then perform bulk-processing).
You can also specify timeout parameter to the Receive method, so you could wait for at most 10 messages as long as they all arrive within one second - this way, you can quite elegantly implement bulk processing that doesn't hold messages for a long time.

0人赞添加讨论(0) 举报

Twitter stream api with agents in F#

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间