Humans, Robots, and Kubernetes

2022/05/16

Alex Miłowski


Humans and robots in the loop

As a company with scientists and staff working with physical objects, running experiments, and using robots, our systems are neither completely computational nor are they always automated. Sometimes an actual human needs to walk a sample from one machine to another to start a process. Robots may be good at automating their tasks, but they also need to be monitored for their health and given daily instructions.

In machine learning systems with human computation, we often talk about processes or workflows as having a human in the loop. Here at MicroByre, we are extending that to be human and robots in the loop where the non-computational components may very well be tasks performed, on premise, by an intelligent (a human) or not so intelligent (a robot) agent.

Our on-premise devices and compute

We have robots and people, on premise, working with physical objects to perform tasks with microbes. We do things like:

  • fill well plates with various combinations of growth solutions for microbes
  • incubate microbes in a robotic incubator
  • sequence DNA and RNA samples

There are tools and techniques for automating the aspects of these tasks that can be performed by the robots. Those operations need to be fairly autonomous to the state of the outside world, our internet connectivity, and whether or not a cloud service is functioning.

More importantly, while the robots and devices we use often have network connectivity and APIs, they are not always appropriate for direct integration. There are also security concerns to take into account when integrating any device. As such, the robots we use are partitioned onto isolated networks and we bridge these “islands of security” with small computers running a lightweight operating system and hosting services that we control.

With this approach we provide secure services to our platform, with APIs that we control, that insulate our processes from the devices themselves. We can update the internals of the services when the devices change, handle devices from different manufacturers (or even the same device with different firmware), and monitor their status all without changing the API that our systems see from the outside.

We also have important equipment — like laboratory freezers that must maintain temperatures of -70˚C — that have ethernet ports, IOT-enabled sensors, or even old school RS-232 ports outputting status data once an hour. For that RS-232 port, we simply attach a raspberry pi to talk to the device and let us know if there is a problem. That is a local service we cannot move to the cloud. Such little things are the physical fabric of the laboratory that we must automate and monitor.

Monitoring and workflows

While monitoring a freezer and alerting when the temperature spikes might sound like a minor issue for the technology platform, it is really a small part of a larger strategy for which we need a uniform approach. We have a variety of equipment, some automatable and others just important (like gel docs) or expensive (like mass spectrometers). All are essential to our workflows, so we need to know that they are available and operating properly. But they all also produce data —- we need to get data on and off every device.

We face a scheduling challenge as we scale our operations. We need to know what is available to be used, but we also need to schedule what is to be done, where, and by which robot or human. If the task is something we can automate even partially, we need to interface with a device and upload configurations, invoke processes, and download results. At different points in time during that process, a person might need to interact with the equipment. Our system must orchestrate and record every aspect of data collection, as machines and people move in and out of availability while generating scads of information.

Just like in machine learning workflows, this is both a workflow scheduling & execution problem as well as a task-oriented resource-constrained scheduling problem. We need to build a human and robot in the loop aware workflow system that can mix compute tasks with robots and humans while keeping track of all the data (inputs and outputs) as well as the metadata (what happened where, why, and by whom).

Incubator Workflow

Incubator Workflow

Our platform and Kubernetes

We track everything we do in a data model and repository. Every robot, bacteria, experimental result, genome sequence, and chemical formula is traceable through the data model. We ensure this data lives in multiple places which includes both locally on-premise for devices and robots to use and in the cloud for both long-term storage and cloud-based services.

We have a variety of compute distributed throughout our platform:

  • raspberry pi devices attached to equipment or bridging networks
  • laboratory computers attached to specific equipment or robots
  • on-premise compute infrastructure (servers and network fabric)
  • cloud-based compute and services (Kubernetes and databases)

Our intent is to connect all the compute together into one Kubernetes cluster or possibly several. Even our little raspberry pi devices will become nodes in the cluster. This will give us a uniform way to distribute, monitor, and provide services that interact with everything.

K8s cluster spanning on premise into the cloud

K8s cluster spanning on premise into the cloud

The compute fabric will extend from on-premise into the cloud. With continuously synchronized data, we can run services and compute tasks in either place using the same Kubernetes-based deployment methods.

We are using MicroK8s to do all of this. Because MicroK8s is latency tolerant, we can have nodes that are “slow” or remote over VPN connection. We can even have a cluster that spans our on-premise hardware and cloud-based VMs.

Interested?

If this sounds like an interesting challenge (we think so), then come and join us. We’re hiring!