Remove data from RRDTool

2019-04-30 15:31发布

问题:

I have several graphs created by RRDTool that collected bad data during a time period of a couple hours.

How can I remove the data from the RRD's during that time period so that it no longer displays?

回答1:

Best method I found to do this...

  1. Use RRDTool Dump to export RRD files to XML.
  2. Open the XML file, find and edit the bad data.
  3. Restore the RRD file using RRDTool Restore .


回答2:

I had a similar problem where I wanted to discard the most recent few hours from my RRDtool databases, so I wrote a quick script to do it (apologies for the unconventional variable names - coding style inherited from work, sigh):

#!/usr/bin/env python2                                                                                                                                                                                 
"""                                                                                                                                                                                                    
Modify XML data generated by `rrdtool dump` such that the last update was at                                                                                                                           
the unixtime specified (decimal). Data newer than this is simply omitted.                                                                                                                              

Sample usage::                                                                                                                                                                                         

    rrdtool dump foo.rrd \
       | python remove_samples_newer_than.py 1414782122 \
       | rrdtool restore - foo_trimmed.rrd                                                                                          
"""                                                                                                                                                                                                    

import sys                                                                                                                                                                                             

assert sys.argv[1:], "Must specify maximum Unix timestamp in decimal"                                                                                                                                  

iMaxUpdate = int(sys.argv[1])

for rLine in iter(sys.stdin.readline, ''):                                                                                                                                                             
    if "<lastupdate>" in rLine:                                                                                                                                                                        
        # <lastupdate>1414782122</lastupdate> <!-- 2014-10-31 19:02:02 GMT -->                                                                                                                         
        _, _, rData = rLine.partition("<lastupdate>")                                                                                                                                                  
        rData, _, _ = rData.partition("</lastupdate")                                                                                                                                                  
        iLastUpdate = int(rData)                                                                                                                                                                       
        assert iLastUpdate < iMaxUpdate, "Last update in RRD older than " \                                                                                                                            
                                    "the time you provided, nothing to do"                                                                                                                             
        print "<lastupdate>{0}</lastupdate>".format(iMaxUpdate)                                                                                                                                        
    elif "<row>" in rLine:                                                                                                                                                                             
        # <!-- 2014-10-17 20:04:00 BST / 1413572640 --> <row><v>9.8244774011e+01</v><v>8.5748587571e-01</v><v>4.2046610169e+00</v><v>9.3016101695e+01</v><v>5.0000000000e-02</v><v>1.6652542373e-01</  v><v>1.1757062147e+00</v><v>1.6901226735e+10</v><v>4.2023108608e+09</v><v>2.1457537707e+08</v><v>3.9597816832e+09</v><v>6.8812800000e+05</v><v>3.0433198080e+09</v><v>6.0198912250e+06</v><v>2.        0000000000e+00</v><v>0.0000000000e+00</v></row>                                                                                                                                                        
        rData, _, _ = rLine.partition("<row>")                                                                                                                                                         
        _, _, rData = rData.partition("/")                                                                                                                                                             
        rData, _, _ = rData.partition("--")                                                                                                                                                            
        rData = rData.strip()                                                                                                                                                                          
        iUpdate = int(rData)                                                                                                                                                                           
        if iUpdate < iMaxUpdate:                                                                                                                                                                       
            print rLine,                                                                                                                                                                               
    else:                                                                                                                                                                                              
        print rLine,                                                                                                                                                                                   

Worked for me. Hope it helps someone else.



回答3:

If you want to avoid writing and editing of xml file as this may takes few file IO calls(based on how much bad data you have) , you can also read entire rrd into memory using fetch and update values in-memory.

I did similar task using python + rrdtool and i ended up doing :

  1. read rrd in-memory in a dictionary
  2. fix values in the dictionary
  3. delete existing rrd file
  4. create new rrd with same name.


回答4:

The only who proposed, what exactly to edit, was RobM. I tried his solution, and it did not work for me in rrdtool 1.4.7

My database uses AVERAGE, MAX and MIN. It contains DERIVE, GAUGE and COMPUTED. Intervals: second (70), minute (70), hour (25), day (367). My task: delete some last part (typical reason: clock moved back).

I applied RobM's solution: change to my new end time, delete all after it. Restored database seemed to be normal. But it did not accept new additions. I examined a newly created empty database. And I found in it 70 second records with NaN, same for minute and hour.

So, my working solution - if I delete records in some period end, I add the same number of NaN records in this period beginning, with correctly decreasing times. Exception - daily records, they are only deleted without addition. If period becomes empty after deletes, I fill it with NaN records ending to my new end time (rounded to the period boundary).



标签: rrdtool rrd