A new data platform

Data in bioengineering is often high in variety, modest in volume, and low in velocity. Our scientific process are different in that they produce a high variety of data from various sources — scientists, devices, robots, or computations — but are also scalable. The volume of data is ever increasing.

We collect experimental data with the help of robots:

  • microbial behavior in a variety of environments
  • chemicals consumed and produced
  • genomes and transcriptomes

Every observation is sewn together as a data fabric that is traceable from conception, through experimental instantiation, robotic execution, and analysis. Every step also has human annotation by scientists and research associates.

Our data platform endeavors to automate all of these aspects.

Technologies we use

  • Kubernetes - Kubernetes provides a compute fabric for data platform services. Our on-premise cluster interacts with cloud-based clusters to provide a uniform hybrid-cloud where we move tasks closest to the data and resources they require.

  • python and Jupyter - Our data scientists and scientists use python and Jupyter notebooks for ad hoc interactions with data. Yes, our scientists and research associates write code!

  • IOT - besides robots, we have a lot of less automatable scientific equipment that needs to be monitored for safety, asset protection, capacity tracking, and tracking when they are free for use.

  • Graphs and databases - we have a large variety of data which should be treated as a vast network of experiments, microbes, observations, genomes, and annotations.

  • Machine learning - we use machine learning to help analyze the results of experiments and shape what we do next. Our models will encapsulate our scientists insights so they can be automated at a higher velocity.

  • Workflows - all these components are tied together with various underlying workflows. Our workflow system will orchestrate tasks via Kubernetes.