Skip to content

Introduction

Machine learning is a subfield of artificial intelligence that focuses on developing algorithms and models that allow computers to learn from data. Machine learning is a key enabler of many of today's AI applications, from speech recognition and image classification to autonomous vehicles and recommendation systems.

Key Concepts

Machine learning focuses on developing algorithms and models that allow computers to learn and make predictions or decisions based on data. At its core, machine learning is founded on several key concepts and components:

  1. Data: Data is the lifeblood of machine learning. It can be any form of information, such as text, images, numbers, or even sensor readings. High-quality and relevant data is crucial for training accurate machine learning models.

  2. Features: Features are specific attributes or characteristics extracted from the data that are used as input for machine learning models. Feature engineering involves selecting, transforming, or creating features that provide meaningful information for the task at hand.

  3. Labels or Targets: In supervised learning, the data is divided into two parts: input features and target labels. The target labels are the correct or desired outputs that the model should learn to predict. In unsupervised learning, there are no target labels, and the model tries to find patterns or structure in the data.

  4. Algorithms: Machine learning algorithms are mathematical and statistical techniques that learn patterns and relationships in the data. Different algorithms are used for various types of tasks, including classification, regression, clustering, and recommendation, among others.

  5. Training: Training a machine learning model involves feeding it a large dataset with known inputs and outputs (in the case of supervised learning). The model learns from this data by adjusting its internal parameters to minimize the difference between its predictions and the actual target values.

  6. Testing and Evaluation: After training, the model's performance is assessed using a separate dataset it hasn't seen before (the testing dataset). Common evaluation metrics include accuracy, precision, recall, and F1-score for classification tasks, and mean squared error or R-squared for regression tasks.

  7. Overfitting and Underfitting: Overfitting occurs when a model learns the training data too well, including noise and irrelevant details, and performs poorly on new, unseen data. Underfitting occurs when a model is too simple to capture the underlying patterns in the data. Finding the right balance is crucial.

  8. Validation: Validation is the process of tuning the hyperparameters of the machine learning model to improve its performance. This is typically done using a separate validation dataset.

  9. Cross-Validation: Cross-validation is a technique used to assess a model's performance more robustly by splitting the data into multiple subsets, training and evaluating the model on different subsets, and averaging the results.

  10. Deployment: Once a model is trained and validated, it can be deployed in real-world applications to make predictions or decisions.

  11. Continual Learning: In some cases, machine learning models need to be updated with new data to adapt to changing conditions or to avoid model degradation.

  12. Ethical Considerations: Machine learning also raises ethical concerns, such as bias in data and models, transparency, privacy, and fairness, which need to be carefully addressed.

These are the foundational concepts of machine learning. It's a dynamic and evolving field with a wide range of techniques and algorithms, and it continues to advance rapidly, enabling exciting applications across various domains.

Machine Learning Types

Machine Learning Algorithms fall roughly into these types:

  • Supervised Learning: The algorithm learns a function from given pairs of input and output (labeled data). In other words, it generates a function that maps inputs to desired outputs.

  • Unsupervised Learning: The algorithm generates a model for a set of inputs (unlabeled data). The main problem of this kind of learning is partitioning the input set into subsets in such way that each subset can be handled with an appropriate function. One example of using unsupervised learning is data compression, where the probability distribution of the input set plays an important role.

  • Semi-supervised Learning: Semi-supervised learning is a type of machine learning that utilizes both labeled and unlabeled data to train a model. The labeled data is used to train the model and the unlabeled data is used to improve the model's performance by providing additional information about the underlying distribution of the data.

  • Reinforcement Learning: In Reinforcement Learning the aim is the maximization of rewards given as feedback on the actions which are performed in an environment by the learning agent. In other words: "Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal."

The following table shows modeling approach, algorithm and application examples of supervised, unsupervised and reinforcement learning. Semi-supervised Learning is a category that lies between supervised and unsupervised learning.

Supervised Unsupervised Reinforcement
Modeling Approach Regression and classification Clustering Markov Decision Process
popular algorithms Linear Regression, Support Vector Machines (SVM), Neural Networks, Decision Trees, Naive Bayes, Nearest Neighbor. k-means clustering, Association rule Q-Learning, Deep Adversarial Networks
Applications Predicting Modelling Descriptive Modelling Robotics, Self-driving cars

Table: ML modeling, algorithms and applications