Back to data catalog

Crop Health API

Early disease detection in crops using open data and machine learning

More info

Data sources

The API uses a machine learning model to serve predictions given close-up images of crops. The data used for training consists of a vast array of labeled images from the Harvard Dataverse. These images cover a diverse range of crops such as maize, cassava, beans, cocoa, and bananas, pivotal for agricultural activities in sub-Saharan Africa. The label for an image is either "healthy" or one of several diseases. In total, approximately 120,000 labeled images were used for training.

The nine specific datasets used can be found at the following URLs: Spectrometry Cassava Dataset, Cassava Dataset Uganda, Maize Dataset Tanzania, Maize Dataset Namibia, Maize Dataset Uganda, Beans Dataset Uganda, Bananas Dataset Tanzania, Cocoa Dataset, and KaraAgro AI Maize Dataset. All datasets are licensed under the Creative Commons 1.0 DEED license.

Processing

Each dataset is downloaded programmatically from the Harvard Dataverse using HTTP requests. Each dataset is split up into several archive files, either in the form of a ZIP file or a RAR file. These files are then unpacked and processed to remove any corrupted images.

A CSV file containing the metadata (label and dimensions) for each image is generated by performing a walk through the directory structure of the dataset, and creating a row in the CSV file for each image. This file is used during training to match each image with its label. The model training process involves loading the images and labels from the CSV file, and then fine-tuning a pre-trained ResNet model in PyTorch.

Models

The code for the models can be found in the crop-health-model repository.

Three different models are provided by the API. The models differ in the number of classes they predict. The models are:
  • Binary model: This is a binary model that predicts the health of crops into three classes: healthy and diseased.
  • Single-HLT model: This is a multiclass model that predicts the health of crops into a single healthy (HLT) class and several diseases.
  • Multi-HLT model: This is a multiclass model that predicts the health of crops into multiple healthy (HLT) classes and several diseases.
The key difference between the single-HLT and multi-HLT models is that only the multi-HLT model has a healthy class for each crop type. The different classes for each model are as follows:
  • Binary model (2): HLT (healthy), NOT_HLT
  • Single-HLT model (13): HLT, CBSD (Cassava Brown Streak Disease), CMD (Cassava Mosaic Disease), MLN (Maize Lethal Necrosis), MSV (Maize Streak Virus), FAW (Fall Armyworm), MLB (Maize Leaf Blight), BR (Bean Rust), ALS (Angular Leaf Spot), BS (Black Sigatoka), FW (Fusarium Wilt Race 1), ANT (Anthracnose), CSSVD (Cocoa Swollen Shoot Virus Disease)
  • Multi-HLT model (17): HLT_cassava, CBSD_cassava, CMD_cassava, MLN_maize, HLT_maize, MSV_maize, FAW_maize, MLB_maize, HLT_beans, BR_beans, ALS_beans, HLT_bananas, BS_bananas, FW_bananas, HLT_cocoa, ANT_cocoa, CSSVD_cocoa

The response from all three models is a JSON object with each model's respective classes paired with the confidence scores. The confidence score is a value between 0 and 1 indicating the model's confidence in the specific class. All scores are normalized to sum to 1.

Note that the models only accept images with three channels (RGB) and do not accept images with an alpha channel (RGBA). Before being passed to the model, the images undergo bilinear interpolation to resize them to 256x256 pixels. The images are then center cropped to 224x224 pixels and normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225].

Examples

Example 1

Retrieving the binary model's crop health prediction for a given image using JavaScript.

const imageData = fs.readFileSync("cocoa.jpg")

// Get the binary model prediction for image cocoa.jpg
// passed as a binary file in the request body
fetch.then(async (fetch) => {
	const response_binary = await fetch(
		"https://api-test.openepi.io/crop-health/predictions/binary",
		{
			method: "POST",
			body: imageData,
		}
	)
	const data_binary = await response_binary.json()
	// Print the prediction for the healthy class
	console.log(data_binary.HLT)
})

Example 2

Retrieving the single-HLT model's crop health prediction for a given image using Python.

from httpx import Client

# Open the image file as a binary file
with open("cocoa.jpg", "rb") as image_file:
    image_bytes = image_file.read()

with Client() as client:
    # Get the single-HLT model prediction for image cocoa.jpg 
    # passed as a binary file in the request body
    response_single = client.post(
        url="https://api-test.openepi.io/crop-health/predictions/single-HLT",
        content=image_bytes,
    )

    data_single = response_single.json()
    # Print the prediction for the CBSD class
    print(data_single["CBSD"])

Example 3

Retrieving the multi-HLT model's crop health prediction for a given image using JavaScript.

const imageData = fs.readFileSync("cocoa.jpg")

// Get the multi-HLT model prediction for image cocoa.jpg
// passed as a binary file in the request body
fetch.then(async (fetch) => {
	const response_multi = await fetch(
		"https://api-test.openepi.io/crop-health/predictions/multi-HLT",
		{
			method: "POST",
			body: imageData,
		}
	)
	const data_multi = await response_multi.json()
	// Print the prediction for the MLN_maize class
	console.log(data_multi.MLN_maize)
})