Reading protobuf in python. Extracting data

2019-09-06 11:18发布

问题:

I am trying to work with data from spinn3r. The data is returned as a protobuf. In python, when I print the protobuf object, I get this:

print data
source {
  link {
    href: ""
    resource: ""
  }
  canonical_link {
    href: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
    resource: ""
  }
  title: ""
  hashcode: ""
  lang {
    code: "en"
    probability: -1.0
  }
  generator: ""
  description: ""
  last_posted: ""
  last_published: ""
  date_found: ""
  publisher_type: "MICROBLOG"
}
feed {
  link {
    href: ""
    resource: ""
  }
  canonical_link {
    href: ""
    resource: ""
  }
  title: ""
  hashcode: ""
  lang {
    code: "en"
    probability: -1.0
  }
  generator: ""
  description: ""
  last_posted: ""
  last_published: ""
  date_found: ""
  etag: ""
  channel_link {
    href: ""
    resource: ""
  }
}
feed_entry {
  link {
    href: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
    resource: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
  }
  canonical_link {
    href: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
    resource: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
  }
  title: "The value of a man resides in what he gives and not in what he is capable of receiving. ~ Albert Einstein"
  hashcode: "8WhKLK9Lyng"
  lang {
    code: "en"
    probability: -1.0
  }
  author {
    name: "_PattiShaw (Patti Shaw)"
    email: ""
    link {
      href: "http://twitter.com/_PattiShaw"
    }
  }
  spam_probability: 0.0
  last_published: "2011-01-20T19:08:49Z"
  date_found: "2011-01-20T19:08:49Z"
  identifier: 1295550574016007548
  content {
    mime_type: "text/html"
    data: "x\332M\214\301\r\2000\014\304V\271\t`\201\n\211\007\033\260@B\003\215TR\324\226\362cv\020/\276\266\3459\010\032\305S\220V\020v2d)\352\245@\rW\240\212\267\330\264\275\300\361@\346]\317\003,\325\277\327\202\205\016\342\370m\262,\242Mm\353pc\214,\271bR+U\324\036\200\236&\363"
    encoding: "zlib"
  }
}
permalink_entry {
  link {
    href: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
    resource: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
  }
  canonical_link {
    href: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
    resource: "http://twitter.com/_PattiShaw/statuses/28167079857225728"
  }
  title: "The value of a man resides in what he gives and not in what he is capable of receiving. ~ Albert Einstein"
  hashcode: "8WhKLK9Lyng"
  lang {
    code: "en"
    probability: -1.0
  }
  author {
    name: "_PattiShaw (Patti Shaw)"
    email: ""
    link {
      href: "http://twitter.com/_PattiShaw"
    }
  }
  spam_probability: 0.0
  last_published: "2011-01-20T19:08:49Z"
  date_found: "2011-01-20T19:09:34Z"
  identifier: 1295550574016007548
  content {
    mime_type: "text/html"
    data: ""
  }
  content_extract {
    mime_type: "text/html"
    data: ""
  }
  generator: ""
}

I want to extract the "author name" from the "feed_entry" object. I tried this:

print data.feed_entry.author.name

I get the error:

AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'name'

I tried just printing the author object to see what happens. This is what I got:

print u.feed_entry.author
[<spinn3rApi_pb2.Author object at 0x362e6d0>]

How do I extract this author name?

回答1:

It looks like u.feed_entry.author is a list. Note the square brackets:

[<spinn3rApi_pb2.Author object at 0x362e6d0>]

This should solve your problem (assuming you have at least one author):

print data.feed_entry.author[0].name