
The Data Analysis Framework (DAF) is an object-oriented data engineering framework for analysis of IoT data. The framework was created for analysis of livestock farm data and automated data ingestion and cleaning. Allows the creation of reports like the daily report above. (The name of the client and the location of the barn has been removed for privacy reasons.)
Context
DAF is a data extraction and preparation framework. It is meant to assist in building data science applications and dashboards by providing a simplified layer that deals with the underlying infrastructure and abstracts many of the tasks that a data analysis project would have to build for every application.
The data that this framework deals with comes from a robotic IoT device that operates in poultry farms. An older video can be seen here. Although the framework has many generalised classes and methods, there are underlying concepts that permeate the code.
Meat chicken production is usually done in “batches” called production cycles or flocks of a length anywhere between 36-45 days depending on the market (whole birds vs parts, for example) and the target weight.
The data is generated during this time from a moving IoT device with a number of sensors, including AI-enabled cameras. Measurements of 13 “observables” (for e.g. temperature or dead birds) are collected; these measurements are stored on AWS S3 and then made available (typically) through an API. The data has a standard, stand-alone format. A set of measurements is called an “observation” in this context.
The code was developed as an object-oriented version of a previous code developed in R written .
Problem
While DAF is meant to be a general purpose framework for data generated by this and similar IoT devices, the aim described here is an efficient generation of a “daily report”, essentially a dashboard that clients receive typically early in the morning. This dashboard (which is sent by email) summarises the conditions in the barn and allows farmers to decide whether any immediate action has to be taken and what to pay attention to during the day.
The dashboard uses 12 observables with measurements taken in the last 24h before generation of the report. It uses a variety of ways of displaying this data. An example is shown below.
The report needs to be generated for all clients in different languages at a configurable time (in their respective time zones).
Solution
Overview over the object model
The code is organised into four packages for the object layer and one package for the business layer. (Automated) tests are kept in a final package. For i18n localisation, labels are kept in folder locales.
The main classes are kept in the package observation. In helper, “secondary” classes are kept, i.e. those that are necessary to interpret observations. In util, classes are kept that populate observations. Plot contains a wrapper around Plotly plot types, adapted to the needs of e.g. the daily report. Finally reports contains the daily report class.

observation
The central class is Observation. This class loads a set of measurements, allows for manipulations and can also save changes if so desired. Observation has standard methods for filtering and cleaning the data which can be either configured or overwritten. Observation has a lot of convenience methods for data manipulation.
A collection of Observations can be automatically instantiated by using the class Observation. It then holds the Observation objects and allow for manipulation.
Some observables are calculated from a combination of observable (like Humidex from temperature and relative humidity). For this purpose there is the abstract class DerivedObservation. Currently there are four subclasses that implement DerivedObservation.
There is also ObservationIndex which is a utility class to calculate the quotient between two observables. While this could be implemented with a DerivedObservation, this convenience class also provides a way to scale the values of the quotient. Two specific indices extend the ObservationIndex.
In StatisticalToolbox methods for imputation are collected.

helper
The helper package contains a set of classes to “operate” the main classes. This includes classes like APIKey which manages access to the API via APIKeys and offers different methods from where to obtain them (file-based or keyring-based).
The package also contains Configuration which is a wrapper around the multi-level configuration file(s) of the framework. Configuration serves to access the properties of the IOT device and the farm conveniently.
ProductionCycles and DownTimes are wrapper around a list of available production cycles and the planned down times of the IOT device(s). Circuit represents the circuit the IOT devices move on and translates robot coordinates to real-world coordinates.
Status provides methods to “guess” the operational status of the IOT device (as taken from the data).
Targets is a wrapper around a daily target for environmental observables. LightProgramme is a wrapper around the dark and daytime in the barn, i.e it allows to know whether in a particular moment the lights in the barn were turned on or off.

util
The util package contains a number of classes under different topics. The most important is the ReaderFactory and its associated services. The ReaderFactory allows the abstraction of the data storage. Currently, there is a service for CSV, Excel and XML files as well as the service for downloading the data through the API or from S3.
In order to use DAF with Docker, certain configuration files need to be copied into the container and changes persisted on S3. This is handled by PersistCriticalFilesOnS3.
Sending emails from templates is handled by AWSEmail.
Finally, to access environment variables in .env files, there is the class LoadEnv. ExecUtils provides some utility methods to run reports, such as obtaining command line arguments and timing the execution of reports.

plot
The plot package contains classes that preconfigure different plot types used in reports. The goal is to generate a complex (sub)plot with a maximum of 2-3 lines of code.
The classes are based on the graphics library plotly which generates interactive graphics (when generated as HTML).
All plot extend PlotObject which helps with consistent formatting and provides methods to save plots in the different formats supported by plotly.
Currently 6 plots of differing degree of complexity are implemented. The simplest is a TablePlot – which has convenience methods for standard blocks used in the daily plot. The most complex plot is the Gauge which generates gauges for 1-3 observations.

Key features
- Highly configurable
- Localised
- Can deal with SI units and with imperial units
- Provides state-of-the-art data cleaning methods like smoothing-with-exponentially-weighted-moving-averages
- Provides an updated data set in 3 lines of code
More information
While the code for DAF resides on Bitbucket, it is not open source. For specific inquiries, please contact the author.