π Explore Our Exciting Datasets π
Welcome to the world of data exploration! Our chemotools
package offers a treasure trove of datasets that will not only help you test the package but also serve as an exciting playground for your learning journey. These captivating datasets are tucked away in the chemotools.datasets
module and can be easily unleashed using loading functions. Letβs dive into the adventure:
-
π· The Fermentation Dataset: A Journey into Biochemistry: Immerse yourself in the world of fermentation with this dataset containing mesmerizing spectra collected during a unique fermentation process.
-
β The the Coffee Dataset: A Global Coffee Journey: Savor the flavors of the world with this dataset, featuring spectra from diverse coffee samples sourced from different countries.
π· The Fermentation Dataset: A Journey into Biochemistry π§ͺ
The Fermentation Dataset takes you on a thrilling ride through the art of fermentation. These spectra were meticulously gathered using attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR). The dataset comprises two sets of spectra: a training set and a test set. Take a peek at the enchanting fermentation setup in the image below:
For those curious minds, you can find more about the Fermentation Dataset in these fascinating publications:
-
Cabaneros Lopez, P., Abeykoon Udugama, I., Thomsen, S.T., et al. π Transforming data to information: A parallel hybrid model for real-time state estimation in lignocellulosic ethanol fermentation.
-
Cabaneros Lopez, P., Abeykoon Udugama, I., Thomsen, S.T., et al. π Towards a digital twin: a hybrid data-driven and mechanistic digital shadow to forecast the evolution of lignocellulosic fermentation.
-
Cabaneros Lopez, P., Abeykoon Udugama, I., Thomsen, S.T., et al. π Promoting the co-utilisation of glucose and xylose in lignocellulosic ethanol fermentations using a data-driven feed-back controller.
π THE TRAIN SET: Start Your Training Adventure
The train set boasts 21 synthetic spectra paired with their reference glucose concentrations, measured by high-performance liquid chromatography (HPLC). Ready to embark on your training journey? You can load the train set as a pandas.DataFrame
or as a polars.DataFrame
with a single command:
- Load as
pandas.DataFrame
:
from chemotools.datasets import load_fermentation_train
X_train, y_train = load_fermentation_train()
- Load as
polars.DataFrame
:
from chemotools.datasets import load_fermentation_train
X_train, y_train = load_fermentation_train(set_output="polars")
Polars is supported in
chemotools
>=0.1.5
Want to master the art of building a PLS model using the Fermentation Dataset? π Dive into our Training Guide.
π§ͺ THE TEST SET: Real-Time Exploration
The test set takes you on a real-time adventure with over 1000 spectra collected during a fermentation process. These spectra are captured every 1.25 minutes over several hours. Moreover, you have 35 reference glucose concentrations, measured hourly during the fermentation, to gauge your modelβs performance.
Ready for this real-time exploration? Load the test set like a pro:
- Load as
pandas.DataFrame
:
from chemotools.datasets import load_fermentation_test
X_test, y_test = load_fermentation_test()
- Load as
polars.DataFrame
:
from chemotools.datasets import load_fermentation_test
X_test, y_test = load_fermentation_test(set_output="polars")
Note that the wavenumbers are stored as the column names in both the
pandas.DataFrame
and thepolars.DataFrame
. However, while in apandas.DataFrame
the column names can be of typefloat
, in apolars.DataFrame
the column names must be of typestr
.
β The Coffee Dataset: A Global Coffee Journey π
The Coffee Dataset invites you to embark on a journey through the world of coffee. These captivating spectra are collected from a rich diversity of coffee samples, each originating from a different country. The magic happens with attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR).
Feeling the coffee buzz? You can load the Coffee Dataset with ease as a pandas.DataFrame
or as a polars.DataFrame
.
- Load as
pandas.DataFrame
:
from chemotools.datasets import load_coffee
spectra, labels = load_coffee()
- Load as
polars.DataFrame
:
from chemotools.datasets import load_coffee
spectra, labels = load_coffee(set_output="polars")
Ready to brew up some knowledge and build a PLS-DA classification model using the Coffee Dataset? π Get started with our Training Guide.
Get ready to embark on an exhilarating data journey with our fascinating datasets. Happy exploring! πππ