Skip to main content Link Menu Expand (external link) Document Search Copy Copied

πŸš€ Explore Our Exciting Datasets πŸ“Š

Welcome to the world of data exploration! Our chemotools package offers a treasure trove of datasets that will not only help you test the package but also serve as an exciting playground for your learning journey. These captivating datasets are tucked away in the chemotools.datasets module and can be easily unleashed using loading functions. Let’s dive into the adventure:

🍷 The Fermentation Dataset: A Journey into Biochemistry πŸ§ͺ

The Fermentation Dataset takes you on a thrilling ride through the art of fermentation. These spectra were meticulously gathered using attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR). The dataset comprises two sets of spectra: a training set and a test set. Take a peek at the enchanting fermentation setup in the image below:

Fermentation setup

For those curious minds, you can find more about the Fermentation Dataset in these fascinating publications:

πŸ“š THE TRAIN SET: Start Your Training Adventure

The train set boasts 21 synthetic spectra paired with their reference glucose concentrations, measured by high-performance liquid chromatography (HPLC). Ready to embark on your training journey? You can load the train set as a pandas.DataFrame or as a polars.DataFrame with a single command:

  • Load aspandas.DataFrame:
from chemotools.datasets import load_fermentation_train

X_train, y_train = load_fermentation_train()
  • Load aspolars.DataFrame:
from chemotools.datasets import load_fermentation_train

X_train, y_train = load_fermentation_train(set_output="polars")

Polars is supported in chemotools>=0.1.5

Want to master the art of building a PLS model using the Fermentation Dataset? πŸ“ Dive into our Training Guide.

πŸ§ͺ THE TEST SET: Real-Time Exploration

The test set takes you on a real-time adventure with over 1000 spectra collected during a fermentation process. These spectra are captured every 1.25 minutes over several hours. Moreover, you have 35 reference glucose concentrations, measured hourly during the fermentation, to gauge your model’s performance.

Ready for this real-time exploration? Load the test set like a pro:

  • Load aspandas.DataFrame:
from chemotools.datasets import load_fermentation_test

X_test, y_test = load_fermentation_test()
  • Load aspolars.DataFrame:
from chemotools.datasets import load_fermentation_test

X_test, y_test = load_fermentation_test(set_output="polars")

Note that the wavenumbers are stored as the column names in both the pandas.DataFrame and the polars.DataFrame. However, while in a pandas.DataFrame the column names can be of type float, in a polars.DataFrame the column names must be of type str.

β˜• The Coffee Dataset: A Global Coffee Journey 🌍

The Coffee Dataset invites you to embark on a journey through the world of coffee. These captivating spectra are collected from a rich diversity of coffee samples, each originating from a different country. The magic happens with attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR).

Feeling the coffee buzz? You can load the Coffee Dataset with ease as a pandas.DataFrame or as a polars.DataFrame.

  • Load aspandas.DataFrame:
from chemotools.datasets import load_coffee

spectra, labels = load_coffee()
  • Load aspolars.DataFrame:
from chemotools.datasets import load_coffee

spectra, labels = load_coffee(set_output="polars")

Ready to brew up some knowledge and build a PLS-DA classification model using the Coffee Dataset? πŸ“š Get started with our Training Guide.

Get ready to embark on an exhilarating data journey with our fascinating datasets. Happy exploring! πŸŒŸπŸ”πŸš€