API incidents

2024-07-09 - Outage of Plant.id API

Between 15:30 and 15:45 (UTC) our APIs experienced outage due to problems with messaging service.

2024-07-09 15:45 [fixed] Developers fixed the issue.

2024-07-09 15:30 Still experiencing issues with our RabbitMQ cluster. We "lost" around 2000 identifications.


2024-07-06 - Outage of Plant.id API

Between 15:05 and 15:30 (UTC) our APIs experienced outage due to problems with messaging service.

2024-07-06 15:30 [fixed] Developers fixed the issue.

2024-07-06 15:05 [investigating] RabbitMQ cluster lost message queue reserved for plant.id, preventing all identifications from reaching our models. We "lost" around 3000 identifications.


2024-07-03 - Outage of Kindwise APIs

Between 12:47 and 13:46 our APIs experienced outage due to problems with messaging service.

2024-07-03 13:46:31 [fixed] Developers fixed the issue.

2024-07-03 12:50:25 [investigating] RabbitMQ cluster is not available, preventing all identifications from reaching our models.


2024-03-17 - Outage of Plant.id API

2024-03-17 14:03:31 [investigating] Plant.id API is down a few minutes (since 13:42:55 (UTC)) We are investigating the cause.

2024-03-17 14:45 [fixed] Developers fixed the issue.

2024-03-18 13:50 [analysis] The casue was a combination of three factors. First was an CPU inefficient dump of analytical data into analytics database casuing a delay in processing of the identification. The second factor was a flaw in design of internal transaction system which started a feedback loop. Third factor was processes limit on webserver was configured too low.

Two of the mentioned factors were already fixed today. We are planning to tackle the last one until the end of this week. We are also considering adding more CPU cores to the affected machine.


2023-11-24 - Outage of "similar images" functionality

2022-11-24 15:16 [fixed] The issue have been fixed. The outage took approximately 4 hours. We will cover this part of the response with integration tests to mitigate the issue in the future.

2022-11-24 14:53 [investigating] A few minutes ago we received a report that similar images from plant health assessment service are not available in API responses. We confirm this report and are currently in the process of investigation of the cause.


2023-08-09 - Plant.id API outage

2022-08-09 12:15 [resolved] Connectivty is stable again and all services are working correctly.

2022-08-09 11:25 [update] Cause of outage is networking problem of cloud service provider (incident report).

2023-08-09 10:30 [investigating] We are experiencing connectivity issues to DB and our model workers. Most of the identifications are rejected or fail.


2023-05-21 - Slow identification for 0.05% of traffic

2023-05-26 10:01 [investigating] On of our GPU workers sometimes mysteriously freezes a causes a tiny amount of traffic to be inefficiently requeued causing some requests to take much slower than we are used to (sometimes even a minute or two).

Note: The new efficient requeueing system is already developed and we are planning to release it during the summer.


2022-07-07 - Plant.id API outage

2022-07-07 11:18 [update] We switched back to the boosted database and turned on additional notifications to prevent this issue in the future.

2022-07-07 11:04 [resolved] We switched to the recovery database. Plant.id is working again.

2022-07-07 10:20 [update] We are increasing the database space and creating a new recovery database.

2022-07-07 10:01 [investigating] Our database ran out of storage space and switched to the read-only mode. Identification stopped working.


2021-08-19 - Plant.id API outage

2021-08-19 6:09 [resolved] The migration took almost 2 hours.

2021-08-19 4:10 [investigating] Our database node stopped responding. The traffic could not be automatically routed to a dedicated secondary node for reasons we are still investigating. Our Engineering team immediately contacted the cloud provider. We decided to migrate DB to another cluster.