How can we filter out manipulations to the hives

6 posts / 0 new
Last post
Steven
How can we filter out manipulations to the hives

Hi guys,

I've been working on a data warehouse for some weeks now. I already have a lot of conversion stuff done. A lot of the info can be found here

The only thing I'm struggling with is filtering out the manipulations. As we all know when working on the bees we sometimes add a super, remove honey, ... These are all actions that have an impact on the weight.

Paul gave me a bit of code he used to filter out the manipulations when generating the graph for the hives on hivetool.net. I've worked the same way in the data warehouse but am bumping into changes that do not get catched with this algorithm.

to give an example:

In the query output below you can see that at some point in time the filtered weight goes from 13 to -61. In the other columns you can see that there's a large change but also a big gap in the time. Therefore it is not flagged as a manipulation by the algorithm.

I suspect some other manipulations higher up as well because of the 13 kg but lets ignore that for now.

mysql> select hive_id, hive_row_id, hive_observation_time_utc, hive_weight_kgs, hive_weight_kgs_delta , hive_manipulation_change_kgs, hive_weight_kgs_filtered from S_HIVE_DATA where hive_id = 10 and hive_row_id > 68800 order by hive_row_id limit 0,15;
+---------+-------------+---------------------------+-----------------+-----------------------+------------------------------+--------------------------+
| hive_id | hive_row_id | hive_observation_time_utc | hive_weight_kgs | hive_weight_kgs_delta | hive_manipulation_change_kgs | hive_weight_kgs_filtered |
+---------+-------------+---------------------------+-----------------+-----------------------+------------------------------+--------------------------+
| 10 | 68801 | 2014-04-15 21:05:00 | 74.98 | 0.00 | NULL | 13.31 |
| 10 | 68802 | 2014-04-15 21:10:00 | 74.93 | -0.05 | NULL | 13.26 |
| 10 | 68803 | 2014-04-15 21:15:00 | 74.93 | 0.00 | NULL | 13.26 |
| 10 | 68804 | 2014-04-15 21:20:00 | 74.93 | 0.00 | NULL | 13.26 |
| 10 | 68805 | 2014-04-15 21:25:00 | 74.93 | 0.00 | NULL | 13.26 |
| 10 | 68806 | 2014-04-20 18:55:00 | 0.00 | -74.93 | NULL | -61.67 |
| 10 | 68807 | 2014-04-20 19:00:00 | 0.23 | 0.23 | NULL | -61.44 |
| 10 | 68808 | 2014-04-20 19:05:00 | 54.98 | 0.00 | 54.75 | -61.44 |
| 10 | 68809 | 2014-04-20 19:10:00 | 73.89 | 0.00 | 18.91 | -61.44 |
| 10 | 68810 | 2014-04-20 19:15:00 | 73.84 | -0.05 | NULL | -61.49 |
| 10 | 68811 | 2014-04-20 19:20:00 | 73.80 | -0.04 | NULL | -61.53 |
| 10 | 68812 | 2014-04-20 19:25:00 | 73.80 | 0.00 | NULL | -61.53 |
| 10 | 68813 | 2014-04-20 19:30:00 | 73.80 | 0.00 | NULL | -61.53 |
| 10 | 68814 | 2014-04-20 19:35:00 | 73.80 | 0.00 | NULL | -61.53 |
| 10 | 68815 | 2014-04-20 19:40:00 | 73.80 | 0.00 | NULL | -61.53 |
+---------+-------------+---------------------------+-----------------+-----------------------+------------------------------+--------------------------+

The detailed query I used for debugging the manipulation calculation:

mysql> select a.hive_row_id,a.hive_weight_kgs, b.hive_weight_kgs,abs(a.hive_weight_kgs - b.hive_weight_kgs) as delta,timestampdiff(SECOND,b.hive_observation_time_utc,a.hive_observation_time_utc) as timedif, (a.hive_weight_kgs - b.hive_weight_kgs)/(timestampdiff(SECOND,b.hive_observation_time_utc,a.hive_observation_time_utc)/3600) as thresh from S_HIVE_DATA a, S_HIVE_DATA b where a.hive_id = 10 and a.hive_row_id between 68800 and 68820 and a.hive_id = b.hive_id and b.hive_row_id = a.hive_row_id -1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 370846
Current database: centesb7_hivetool_converted

+-------------+-----------------+-----------------+-------+---------+------------+
| hive_row_id | hive_weight_kgs | hive_weight_kgs | delta | timedif | thresh |
+-------------+-----------------+-----------------+-------+---------+------------+
| 68800 | 74.98 | 74.98 | 0.00 | 300 | 0.000000 |
| 68801 | 74.98 | 74.98 | 0.00 | 300 | 0.000000 |
| 68802 | 74.93 | 74.98 | 0.05 | 300 | -0.600000 |
| 68803 | 74.93 | 74.93 | 0.00 | 300 | 0.000000 |
| 68804 | 74.93 | 74.93 | 0.00 | 300 | 0.000000 |
| 68805 | 74.93 | 74.93 | 0.00 | 300 | 0.000000 |
| 68806 | 0.00 | 74.93 | 74.93 | 423000 | -0.637702 |
| 68807 | 0.23 | 0.00 | 0.23 | 300 | 2.760000 |
| 68808 | 54.98 | 0.23 | 54.75 | 300 | 657.000003 |
| 68809 | 73.89 | 54.98 | 18.91 | 300 | 226.920001 |
| 68810 | 73.84 | 73.89 | 0.05 | 300 | -0.600000 |
| 68811 | 73.80 | 73.84 | 0.04 | 300 | -0.480000 |
| 68812 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68813 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68814 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68815 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68816 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68817 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68818 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68819 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
| 68820 | 73.80 | 73.80 | 0.00 | 300 | 0.000000 |
+-------------+-----------------+-----------------+-------+---------+------------+
21 rows in set (14.22 sec)

You can see we lost track of the hive for 5 days. Therefore the algorithm doesn't see it.

Maybe we should include a maximum gap in the data. For instance: When gap larger then a day, start calculating from the hive weight again instead of using the filtered value.

Any ideas or comments higly appreciated.

Steven
advanced procedure

Hi I changed the procedure so:
- the filtered weight is reset to the hive weight when there is a gap of 24 hours in the data
- the filtered weight is reset to the hive weight when the hive weight is < 1 kg

I think i will include some code that skips 5 records when one of these two conditions is met. Just to be sure we're back in normal operation.

Esa
Esa's picture
Hi

Hi

in telecommunications network monitoring these equipment manipulations are tagged with a "Maintenance mode" status, during which times events, alarms and measurements are omitted in the processing until that status is turned off again. The maintenance mode could be scheduled, or set manually on, or triggered by a special condition met.

We could add a button (hw and/or sw) to set maintenance mode on for the hivetool, or via web interface to schedule the maintenance,
what do you think ?

Steven
Hi Esa,

Hi Esa,

We've indeed talked about that before. The fact that you propose it as well means it has value. I'm not sure it will be taken in on the new board design. I guess I will post this idea in the Hardware page.
We also played with the idea of setting these maintenance windows via an app in which you can directly write down what you did. And note the condition of the hive for instance.

On the second hand. Now we have data from the last 3 years I guess and ofcource there are always some glitches. Some sensors misbehaved, manipulations, test setups, ...

I try to find some algorithms to filter out these kind of things. But my last idea is to also fill a table with a reference to the records where I noticed a glitch. This can be useful to check if it really is a glitch, or fine tune the algorithm.

Paul
Paul's picture
glitch table

Yes, I think this is what we are lacking. I think this comes under this heading of meta data. This information comes from two sources.:

1. Hive computer/beekeeper
2. Data warehouse/data scrubbing.

The first line of defense is the beekeeper as he knows what he did and should know if a sensor is not working. I found that it was easy to make a one line entry (a comment text input box on the hive computer graph page) on the day I worked the hive. Also, very useful as I loose and forget notebooks. Any comments that were made on a date within the range of dates that were graphed, were listed in a table under the graph.

The data needs to be checked when it goes to the data warehouse before it is released to researchers. I guess the same meta data table could be used to mark and comment a questionable event.

In comment#3, "code that skips 5 records when one of these two conditions is met". If we had a maintenance push button, then maybe skip a few records either site of when the button was pressed?

Paul
Paul's picture
What happened

I can tell you what happened, but I don't know how to fix it.

For some reason (GFI kicked out, or power glitch) the computer died. When restarted 5 days later, the hive was removed and zero checked or it was zeroed (the Adam Equipment scales would zero when powered up so you would have to remove the hive and re-zero each time there was a power failure). At 68807 and 08 I was putting the hive back together. The scale drifts a little when turned on. After it settled down, there was a loss of about 1.1 lbs in 5 days or about .2 lbs per day which would be consistent with metabolic loss.

If records 68806-8 are deleted, then dw/dt should be small and the 1.1 delta between 68805 and 68809 would not be flagged as a manipulation change. But, in the graphing software I use, a line would be drawn between 68805 and 68809, as it doesn't know that there is data missing. If there was a null record with a date/time between 68805 and 68809, then I think the graph would leave a blank during that time.

Log in to post comments