ML In Bioinformatics

Bioinformatics is the development of software and methods to interpret and understand biological data. Several problems fall under bioinformatics, including genome sequencing and simulation of proteins.

A specific problem we wanted to focus on was classification of species that look similar. We can tackle these problems by quantifying certain parts of similar species. In this case we needed to differentiate between iris setosa, iris virginica and iris versicolor.

While this problem may seem insignificant, this is simply a model application of ML basics and can be used on other problems. In this case, it can create an easy pathway for biologists to classify these flowers without heavy analysis.

A good way to quantify these species is by taking the petal length, petal width, sepal length and sepal width. By importing data from the seaborn dataset, we can get a relatively big sample of data.

Using this data we can create a solution to classifying these species with the help of machine learning. To learn more, look for the “Resolving Problem Using ML” tab.