mirror of
https://github.com/kamranahmedse/developer-roadmap.git
synced 2026-03-12 17:51:53 +08:00
chore: sync content to repo (#9474)
Co-authored-by: kamranahmedse <4921183+kamranahmedse@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
df486a616b
commit
b32bd7b179
@@ -1,9 +1,9 @@
|
||||
# Airflow
|
||||
|
||||
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
|
||||
Airflow is a platform to programmatically author, schedule, and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command-line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Airflow](https://airflow.apache.org/)
|
||||
- [@official@Airflow Documentation](https://airflow.apache.org/docs)
|
||||
- [@feed@Explore top posts about Apache Airflow](https://app.daily.dev/tags/apache-airflow?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Apache Airflow](https://app.daily.dev/tags/apache-airflow?ref=roadmapsh)
|
||||
@@ -1,11 +1,11 @@
|
||||
# AWS / Azure / GCP
|
||||
|
||||
AWS (Amazon Web Services), Azure and GCP (Google Cloud Platform) are three leading providers of cloud computing services. AWS by Amazon is the oldest and the most established among the three, providing a breadth and depth of solutions ranging from infrastructure services like compute, storage, and databases to the machine and deep learning. Azure, by Microsoft, has integrated tools for DevOps, supports a large number of programming languages, and offers seamless integration with on-prem servers and Microsoft’s software. Google's GCP has strength in cost-effectiveness, live migration of virtual machines, and flexible computing options. All three have introduced various MLOps tools and services to boost capabilities for machine learning development and operations.
|
||||
AWS (Amazon Web Services), Azure, and GCP (Google Cloud Platform) are three leading providers of cloud computing services. AWS by Amazon is the oldest and the most established among the three, providing a breadth and depth of solutions ranging from infrastructure services like compute, storage, and databases to machine learning and deep learning. Azure, by Microsoft, has integrated tools for DevOps, supports a large number of programming languages, and offers seamless integration with on-prem servers and Microsoft’s software. Google's GCP has strength in cost-effectiveness, live migration of virtual machines, and flexible computing options. All three have introduced various MLOps tools and services to boost capabilities for machine learning development and operations.
|
||||
|
||||
Visit the following resources to learn more about AWS, Azure, and GCP:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap.sh@Visit Dedicated AWS Roadmap](https://roadmap.sh/aws)
|
||||
- [@roadmap@Visit Dedicated AWS Roadmap](https://roadmap.sh/aws)
|
||||
- [@official@Microsoft Azure](https://docs.microsoft.com/en-us/learn/azure/)
|
||||
- [@official@Google Cloud Platform](https://cloud.google.com/)
|
||||
- [@official@GCP Learning Resources](https://cloud.google.com/training)
|
||||
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
|
||||
@@ -2,8 +2,9 @@
|
||||
|
||||
Bash (Bourne Again Shell) is a Unix shell and command language used for interacting with the operating system through a terminal. It allows users to execute commands, automate tasks via scripting, and manage system operations. As the default shell for many Linux distributions, it supports command-line utilities, file manipulation, process control, and text processing. Bash scripts can include loops, conditionals, and functions, making it a powerful tool for system administration, automation, and task scheduling.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Bash Reference Manual](https://www.gnu.org/software/bash/manual/bashref.html)
|
||||
- [@roadmap@Visit the Dedicated Shell-Bash Roadmap](https://roadmap.sh/shell-bash)
|
||||
- [@opensource@bash-guide](https://github.com/Idnan/bash-guide)
|
||||
- [@article@Bash Reference Manual](https://www.gnu.org/software/bash/manual/bashref.html)
|
||||
- [@video@Bash Scripting Course](https://www.youtube.com/watch?v=tK9Oc6AEnR4)
|
||||
@@ -1,8 +1,8 @@
|
||||
# CI / CD
|
||||
# CI/CD
|
||||
|
||||
CI/CD (Continuous Integration and Continuous Deployment/Delivery) is a software development practice that automates the process of integrating code changes, running tests, and deploying updates. Continuous Integration focuses on regularly merging code changes into a shared repository, followed by automated testing to ensure code quality. Continuous Deployment extends this by automatically releasing every validated change to production, while Continuous Delivery ensures code is always in a deployable state, but requires manual approval for production releases. CI/CD pipelines improve code reliability, reduce integration risks, and speed up the development lifecycle.
|
||||
CI/CD, which stands for Continuous Integration and Continuous Delivery/Deployment, is a software development practice that automates the process of building, testing, and deploying code changes. Continuous Integration focuses on frequently merging code changes into a central repository, followed by automated builds and tests. Continuous Delivery/Deployment then automates the release of these validated code changes to a staging or production environment.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is CI/CD? - GitLab](https://about.gitlab.com/topics/ci-cd/)
|
||||
- [@article@What is CI/CD? - Redhat](https://www.redhat.com/en/topics/devops/what-is-ci-cd)
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
# Cloud Computing
|
||||
|
||||
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.
|
||||
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing, such as public clouds, private clouds, and hybrid clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over its data and infrastructures.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Cloud Computing - IBM](https://www.ibm.com/think/topics/cloud-computing)
|
||||
- [@article@What is Cloud Computing? - Azure](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing)
|
||||
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)
|
||||
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)
|
||||
@@ -1,9 +1,9 @@
|
||||
# Cloud-native ML Services
|
||||
# Cloud-Native ML Services
|
||||
|
||||
Most of the cloud providers offer managed services for machine learning. These services are designed to help data scientists and machine learning engineers to build, train, and deploy machine learning models at scale. These services are designed to be cloud-native, meaning they are designed to work with other cloud services and are optimized for the cloud environment.
|
||||
Cloud-native ML services are pre-built machine learning tools and platforms offered by cloud providers. These services allow users to build, train, and deploy machine learning models without managing the underlying infrastructure. They often include features like automated model training, scalable deployment options, and integration with other cloud services.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@AWS Sage Maker](https://aws.amazon.com/sagemaker/)
|
||||
- [@official@Azure ML](https://azure.microsoft.com/en-gb/products/machine-learning)
|
||||
- [@video@What is Cloud Native?](https://www.youtube.com/watch?v=fp9_ubiKqFU)
|
||||
- [@video@What is Cloud Native?](https://www.youtube.com/watch?v=fp9_ubiKqFU)
|
||||
@@ -1,13 +1,11 @@
|
||||
# Containers
|
||||
# Containerization
|
||||
|
||||
Containers are a construct in which cgroups, namespaces, and chroot are used to fully encapsulate and isolate a process. This encapsulated process, called a container image, shares the kernel of the host with other containers, allowing containers to be significantly smaller and faster than virtual machines.
|
||||
|
||||
These images are designed for portability, allowing for full local testing of a static image, and easy deployment to a container management platform.
|
||||
Containerization is a form of operating system virtualization that packages an application and its dependencies into a single, isolated unit called a container. This container includes everything the application needs to run, such as code, runtime, system tools, libraries, and settings. Containers offer a consistent and portable environment for applications, ensuring they run the same way regardless of where they are deployed.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What are Containers? - Google Cloud](https://cloud.google.com/learn/what-are-containers)
|
||||
- [@article@What is a Container? - Docker](https://www.docker.com/resources/what-container/)
|
||||
- [@video@What are Containers?](https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz)
|
||||
- [@article@Articles about Containers - The New Stack](https://thenewstack.io/category/containers/)
|
||||
- [@feed@Explore top posts about Containers](https://app.daily.dev/tags/containers?ref=roadmapsh)
|
||||
- [@video@What are Containers?](https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz)
|
||||
- [@feed@Explore top posts about Containers](https://app.daily.dev/tags/containers?ref=roadmapsh)
|
||||
@@ -2,7 +2,8 @@
|
||||
|
||||
Data Engineering is essentially dealing with the collection, validation, storage, transformation, and processing of data. The objective is to provide reliable, efficient, and scalable data pipelines and infrastructure that allow data scientists to convert data into actionable insights. It involves steps like data ingestion, data storage, data processing, and data provisioning. Important concepts include designing, building, and maintaining data architecture, databases, processing systems, and large-scale processing systems. It is crucial to have extensive technical knowledge in various tools and programming languages like SQL, Python, Hadoop, and more.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit the Dedicated Data Engineer Roadmap](https://roadmap.sh/data-engineer)
|
||||
- [@article@Data Engineering 101](https://www.redpanda.com/guides/fundamentals-of-data-engineering)
|
||||
- [@video@Fundamentals of Data Engineering](https://www.youtube.com/watch?v=mPSzL8Lurs0)
|
||||
@@ -1,10 +1,10 @@
|
||||
# Data Ingestion Architectures
|
||||
|
||||
Data ingestion is the process of collecting, transferring, and loading data from various sources to a destination where it can be stored and analyzed. There are several data ingestion architectures that can be used to collect data from different sources and load it into a data warehouse, data lake, or other storage systems. These architectures can be broadly classified into two categories: batch processing and real-time processing. How you choose to ingest data will depend on the volume, velocity, and variety of data you are working with, as well as the latency requirements of your use case.
|
||||
Data ingestion is the process of collecting, transferring, and loading data from various sources to a destination where it can be stored and analyzed. Several data ingestion architectures can be used to collect data from different sources and load it into a data warehouse, data lake, or other storage systems. These architectures can be broadly classified into two categories: batch processing and real-time processing. How you choose to ingest data will depend on the volume, velocity, and variety of data you are working with, as well as the latency requirements of your use case.
|
||||
|
||||
Lambda and Kappa architectures are two popular data ingestion architectures that combine batch and real-time processing to handle large volumes of data efficiently.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data Ingestion Patterns](https://docs.aws.amazon.com/whitepapers/latest/aws-cloud-data-ingestion-patterns-practices/data-ingestion-patterns.html)
|
||||
- [@video@What is a data pipeline?](https://www.youtube.com/watch?v=kGT4PcTEPP8)
|
||||
@@ -1,9 +1,9 @@
|
||||
# Data lakes & Warehouses
|
||||
# Data Lakes & Warehouses
|
||||
|
||||
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary. On the other hand, **Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.
|
||||
Data lakes and data warehouses are both systems for storing large amounts of data, but they differ in structure and purpose. A data lake stores data in its raw, unprocessed format, allowing for flexibility in analysis and exploration. A data warehouse, on the other hand, stores data that has been structured and transformed for specific analytical purposes, often optimized for querying and reporting.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data Lake Definition](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-a-data-lake)
|
||||
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)
|
||||
- [@video@@hat is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)
|
||||
- [@video@What is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)
|
||||
@@ -1,8 +1,8 @@
|
||||
# Data Lineage and Feature Stores
|
||||
|
||||
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in MLOps for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.
|
||||
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in MLOps for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data-related bugs. It provides a clear representation of data sources, transformations, and dependencies, thereby aiding in audits, governance, or reproduction of machine learning models.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is Data Lineage?](https://www.ibm.com/topics/data-lineage)
|
||||
- [@article@What is a Feature Store](https://www.snowflake.com/guides/what-feature-store-machine-learning/)
|
||||
- [@article@What is a Feature Store](https://www.snowflake.com/guides/what-feature-store-machine-learning/)
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
Data pipelines are a series of automated processes that transport and transform data from various sources to a destination for analysis or storage. They typically involve steps like data extraction, cleaning, transformation, and loading (ETL) into databases, data lakes, or warehouses. Pipelines can handle batch or real-time data, ensuring that large-scale datasets are processed efficiently and consistently. They play a crucial role in ensuring data integrity and enabling businesses to derive insights from raw data for reporting, analytics, or machine learning.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is a Data Pipeline? - IBM](https://www.ibm.com/topics/data-pipeline)
|
||||
- [@video@What are Data Pipelines?](https://www.youtube.com/watch?v=oKixNpz6jNo)
|
||||
@@ -1,6 +1,6 @@
|
||||
# Docker
|
||||
|
||||
Docker is a platform for working with containerized applications. Among its features are a daemon and client for managing and interacting with containers, registries for storing images, and a desktop application to package all these features together.
|
||||
Docker is a platform that uses operating system-level virtualization to deliver software in packages called containers. These containers isolate software from its environment and ensure that it works uniformly despite differences between development and production environments. Docker simplifies the process of building, shipping, and running applications by packaging all dependencies, libraries, and configurations into a single unit.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
@@ -8,4 +8,4 @@ Visit the following resources to learn more:
|
||||
- [@official@Docker Documentation](https://docs.docker.com/)
|
||||
- [@video@Docker Tutorial](https://www.youtube.com/watch?v=RqTEHSBrYFw)
|
||||
- [@video@Docker Simplified in 55 Seconds](https://youtu.be/vP_4DlOH1G4)
|
||||
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
**Experiment Tracking** is an essential part of MLOps, providing a system to monitor and record the different experiments conducted during the machine learning model development process. This involves capturing, organizing and visualizing the metadata associated with each experiment, such as hyperparameters used, models produced, metrics like accuracy or loss, and other information about the computational environment. This tracking allows for reproducibility of experiments, comparison across different experiment runs, and helps in identifying the best models.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Experiment Tracking](https://madewithml.com/courses/mlops/experiment-tracking/#dashboard)
|
||||
- [@article@ML Flow Model Registry](https://mlflow.org/docs/latest/model-registry.html)
|
||||
@@ -6,4 +6,4 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@Apache Flink Documentation](https://flink.apache.org/)
|
||||
- [@article@Apache Flink](https://www.tutorialspoint.com/apache_flink/apache_flink_introduction.htm)
|
||||
- [@feed@Explore top posts about Apache Flink](https://app.daily.dev/tags/apache-flink?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Apache Flink](https://app.daily.dev/tags/apache-flink?ref=roadmapsh)
|
||||
@@ -5,7 +5,7 @@ Git is a distributed version control system used to track changes in source code
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Git & GitHub Roadmap](https://roadmap.sh/git-github)
|
||||
- [@video@Git & GitHub Crash Course For Beginners](https://www.youtube.com/watch?v=SWYqp7iY_Tc)
|
||||
- [@article@Learn Git with Tutorials, News and Tips - Atlassian](https://www.atlassian.com/git)
|
||||
- [@article@Git Cheat Sheet](https://cs.fyi/guide/git-cheatsheet)
|
||||
- [@feed@Explore top posts about Git](https://app.daily.dev/tags/git?ref=roadmapsh)
|
||||
- [@video@Git & GitHub Crash Course For Beginners](https://www.youtube.com/watch?v=SWYqp7iY_Tc)
|
||||
- [@feed@Explore top posts about Git](https://app.daily.dev/tags/git?ref=roadmapsh)
|
||||
@@ -8,4 +8,4 @@ Visit the following resources to learn more:
|
||||
- [@official@GitHub](https://github.com)
|
||||
- [@official@GitHub Documentation](https://docs.github.com/en/get-started/quickstart)
|
||||
- [@video@What is GitHub?](https://www.youtube.com/watch?v=w3jLJU7DT5E)
|
||||
- [@feed@Explore top posts about GitHub](https://app.daily.dev/tags/github?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about GitHub](https://app.daily.dev/tags/github?ref=roadmapsh)
|
||||
@@ -1,13 +1,13 @@
|
||||
# Go
|
||||
|
||||
Go, also known as Golang, is an open-source programming language developed by Google that emphasizes simplicity, efficiency, and strong concurrency support. Designed for modern software development, Go features a clean syntax, garbage collection, and built-in support for concurrent programming through goroutines and channels, making it well-suited for building scalable, high-performance applications, especially in cloud computing and microservices architectures. Go's robust standard library and tooling ecosystem, including a powerful package manager and testing framework, further streamline development processes, promoting rapid application development and deployment.
|
||||
Go, also known as Golang, is an open-source programming language developed by Google that emphasizes simplicity, efficiency, and strong concurrency support. Designed for modern software development, Go features a clean syntax, garbage collection, and built-in support for concurrent programming through goroutines and channels, making it well-suited for building scalable, high-performance applications, especially in cloud computing and microservices architectures. Go's robust standard library and tooling ecosystem, including a powerful package manager and testing framework, further streamlines development processes, promoting rapid application development and deployment.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Go Roadmap](https://roadmap.sh/golang)
|
||||
- [@official@A Tour of Go – Go Basics](https://go.dev/tour/welcome/1)
|
||||
- [@official@Go Reference Documentation](https://go.dev/doc/)
|
||||
- [@video@Go Programming Course](https://www.youtube.com/watch?v=un6ZyFkqFKo)
|
||||
- [@article@Making a RESTful JSON API in Go](https://thenewstack.io/make-a-restful-json-api-go/)
|
||||
- [@article@Go, the Programming Language of the Cloud](https://thenewstack.io/go-the-programming-language-of-the-cloud/)
|
||||
- [@feed@Explore top posts about Golang](https://app.daily.dev/tags/golang?ref=roadmapsh)
|
||||
- [@video@Go Programming Course](https://www.youtube.com/watch?v=un6ZyFkqFKo)
|
||||
- [@feed@Explore top posts about Golang](https://app.daily.dev/tags/golang?ref=roadmapsh)
|
||||
@@ -7,4 +7,4 @@ Visit the following resources to learn more:
|
||||
- [@roadmap@Visit Dedicated Terraform Roadmap](https://roadmap.sh/terraform)
|
||||
- [@article@What is Infrastructure as Code?](https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac)
|
||||
- [@video@Terraform Course for Beginners](https://www.youtube.com/watch?v=SLB_c_ayRMo)
|
||||
- [@video@8 Terraform Best Practices](https://www.youtube.com/watch?v=gxPykhPxRW0)
|
||||
- [@video@8 Terraform Best Practices](https://www.youtube.com/watch?v=gxPykhPxRW0)
|
||||
@@ -1,9 +1,9 @@
|
||||
# Kafka
|
||||
|
||||
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
|
||||
Kafka is a distributed, fault-tolerant, high-throughput streaming platform. It's primarily used for building real-time data pipelines and streaming applications, allowing you to publish, subscribe to, store, and process streams of records. These streams can originate from various sources and be consumed by multiple applications simultaneously.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Apache Kafka Quickstart](https://kafka.apache.org/quickstart)
|
||||
- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
|
||||
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
|
||||
@@ -1,12 +1,12 @@
|
||||
# Kubernetes
|
||||
|
||||
Kubernetes is an open source container management platform, and the dominant product in this space. Using Kubernetes, teams can deploy images across multiple underlying hosts, defining their desired availability, deployment logic, and scaling logic in YAML. Kubernetes evolved from Borg, an internal Google platform used to provision and allocate compute resources (similar to the Autopilot and Aquaman systems of Microsoft Azure). The popularity of Kubernetes has made it an increasingly important skill for the DevOps Engineer and has triggered the creation of Platform teams across the industry. These Platform engineering teams often exist with the sole purpose of making Kubernetes approachable and usable for their product development colleagues.
|
||||
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. By orchestrating containers across multiple machines, Kubernetes ensures high availability and efficient resource utilization, making it a powerful tool for managing complex deployments.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Kubernetes Roadmap](https://roadmap.sh/kubernetes)
|
||||
- [@official@Kubernetes](https://kubernetes.io/)
|
||||
- [@official@Kubernetes Documentation](https://kubernetes.io/docs/home/)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@article@Kubernetes: An Overview](https://thenewstack.io/kubernetes-an-overview/)
|
||||
- [@feed@Explore top posts about Kubernetes](https://app.daily.dev/tags/kubernetes?ref=roadmapsh)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@feed@Explore top posts about Kubernetes](https://app.daily.dev/tags/kubernetes?ref=roadmapsh)
|
||||
@@ -2,8 +2,9 @@
|
||||
|
||||
Machine learning fundamentals encompass the key concepts and techniques that enable systems to learn from data and make predictions or decisions without being explicitly programmed. At its core, machine learning involves algorithms that can identify patterns in data and improve over time with experience. Key areas include supervised learning (where models are trained on labeled data), unsupervised learning (where models identify patterns in unlabeled data), and reinforcement learning (where agents learn to make decisions based on feedback from their actions). Essential components also include data preprocessing, feature selection, model training, evaluation metrics, and the importance of avoiding overfitting. Understanding these fundamentals is crucial for developing effective machine learning applications across various domains.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit the Dedicated Machine Learning Roadmap](https://roadmap.sh/machine-learning)
|
||||
- [@course@Fundamentals of Machine Learning - Microsoft](https://learn.microsoft.com/en-us/training/modules/fundamentals-machine-learning/)
|
||||
- [@course@MLCourse.ai](https://mlcourse.ai/)
|
||||
- [@course@Fast.ai](https://course.fast.ai)
|
||||
- [@course@Fast.ai](https://course.fast.ai)
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
MLOps components can be broadly classified into three major categories: Development, Operations and Governance. The **Development** components include everything involved in the creation of machine learning models, such as data extraction, data analysis, feature engineering, and machine learning model training. The **Operations** category includes components involved in deploying, monitoring, and maintaining machine learning models in production. This may include release management, model serving, and performance monitoring. Lastly, the **Governance** category encompasses the policies and regulations related to machine learning models. This includes model audit and tracking, model explainability, and security & compliance regulations.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@MLOps Workflow, Components, and Key Practices](https://mlops.tv/p/understanding-ml-pipelines-through)
|
||||
- [@article@MLOps Lifecycle](https://www.moontechnolabs.com/blog/mlops-lifecycle/)
|
||||
@@ -2,19 +2,20 @@
|
||||
|
||||
MLOps (Machine Learning Operations) principles focus on streamlining the deployment, monitoring, and management of machine learning models in production environments. Key principles include:
|
||||
|
||||
1. **Collaboration**: Foster collaboration between data scientists, developers, and operations teams to ensure alignment on model goals, performance, and lifecycle management.
|
||||
|
||||
2. **Automation**: Automate workflows for model training, testing, deployment, and monitoring to enhance efficiency, reduce errors, and speed up the development lifecycle.
|
||||
|
||||
3. **Version Control**: Implement version control for both code and data to track changes, reproduce experiments, and maintain model lineage.
|
||||
|
||||
4. **Continuous Integration and Deployment (CI/CD)**: Establish CI/CD pipelines tailored for machine learning to facilitate rapid model iteration and deployment.
|
||||
|
||||
5. **Monitoring and Governance**: Continuously monitor model performance and data drift in production to ensure models remain effective and compliant with regulatory requirements.
|
||||
|
||||
6. **Scalability**: Design systems that can scale to handle varying workloads and accommodate changes in data volume and complexity.
|
||||
|
||||
7. **Reproducibility**: Ensure that experiments can be reliably reproduced by standardizing environments and workflows, making it easier to validate and iterate on models.
|
||||
1. **Collaboration**: Foster collaboration between data scientists, developers, and operations teams to ensure alignment on model goals, performance, and lifecycle management.
|
||||
|
||||
2. **Automation**: Automate workflows for model training, testing, deployment, and monitoring to enhance efficiency, reduce errors, and speed up the development lifecycle.
|
||||
|
||||
3. **Version Control**: Implement version control for both code and data to track changes, reproduce experiments, and maintain model lineage.
|
||||
|
||||
4. **Continuous Integration and Deployment (CI/CD)**: Establish CI/CD pipelines tailored for machine learning to facilitate rapid model iteration and deployment.
|
||||
|
||||
5. **Monitoring and Governance**: Continuously monitor model performance and data drift in production to ensure models remain effective and compliant with regulatory requirements.
|
||||
|
||||
6. **Scalability**: Design systems that can scale to handle varying workloads and accommodate changes in data volume and complexity.
|
||||
|
||||
7. **Reproducibility**: Ensure that experiments can be reliably reproduced by standardizing environments and workflows, making it easier to validate and iterate on models.
|
||||
|
||||
|
||||
These principles help organizations efficiently manage the lifecycle of machine learning models, from development to deployment and beyond.
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# Model Training and Serving
|
||||
|
||||
Model Training refers to the phase in the Machine Learning (ML) pipeline where we teach a machine learning model how to make predictions by providing it with data. This process begins with feeding the model a training dataset, which it uses to learn and understand patterns or perform computations. The model's performance is then evaluated by comparing its prediction outputs with the actual results. Various algorithms can be used in the model training process. The choice of algorithm usually depends on the task, the data available, and the requirements of the project. It is worth noting that the model training stage can be computationally expensive particularly when dealing with large datasets or complex models.
|
||||
Model Training refers to the phase in the Machine Learning (ML) pipeline where we teach a machine learning model how to make predictions by providing it with data. This process begins with feeding the model a training dataset, which it uses to learn and understand patterns or perform computations. The model's performance is then evaluated by comparing its prediction outputs with the actual results. Various algorithms can be used in the model training process. The choice of algorithm usually depends on the task, the data available, and the requirements of the project. It is worth noting that the model training stage can be computationally expensive, particularly when dealing with large datasets or complex models.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@MLOps Principles](https://ml-ops.org/content/mlops-principles)
|
||||
- [@opensource@ML Deployment k8s Fast API](https://github.com/sayakpaul/ml-deployment-k8s-fastapi/)
|
||||
- [@article@MLOps Principles](https://ml-ops.org/content/mlops-principles)
|
||||
- [@article@ML deployment with k8s FastAPI, Building an ML app with FastAPI](https://dev.to/bravinsimiyu/beginner-guide-on-how-to-build-a-machine-learning-app-with-fastapi-part-ii-deploying-the-fastapi-application-to-kubernetes-4j6g)
|
||||
- [@article@KServe Tutorial](https://towardsdatascience.com/kserve-highly-scalable-machine-learning-deployment-with-kubernetes-aa7af0b71202)
|
||||
@@ -1,8 +1,8 @@
|
||||
# Monitoring and Observability
|
||||
|
||||
**Monitoring** in MLOps primarily involves tracking the performance of machine learning (ML) models in production to ensure that they continually deliver accurate and reliable results. Such monitoring is necessary because the real-world data that these models handle may change over time, a scenario known as data drift. These changes can adversely affect model performance. Monitoring helps to detect any anomalies in the model’s behaviour or performance and such alerts can trigger the retraining of models with new data. From a broader perspective, monitoring also involves tracking resources and workflows to detect and rectify any operational issues in the MLOps pipeline.
|
||||
**Monitoring** in MLOps primarily involves tracking the performance of machine learning (ML) models in production to ensure that they continually deliver accurate and reliable results. Such monitoring is necessary because the real-world data that these models handle may change over time, a scenario known as data drift. These changes can adversely affect model performance. Monitoring helps to detect any anomalies in the model’s behaviour or performance, and such alerts can trigger the retraining of models with new data. From a broader perspective, monitoring also involves tracking resources and workflows to detect and rectify any operational issues in the MLOps pipeline.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@ML Monitoring vs ML Observability](https://medium.com/marvelous-mlops/ml-monitoring-vs-ml-observability-understanding-the-differences-fff574a8974f)
|
||||
- [@video@ML Observability vs ML Monitoring: What's the difference?](https://www.youtube.com/watch?v=k1Reed3QIYE)
|
||||
- [@video@ML Observability vs ML Monitoring: What's the difference?](https://www.youtube.com/watch?v=k1Reed3QIYE)
|
||||
@@ -2,6 +2,6 @@
|
||||
|
||||
ML orchestration refers to the process of managing and coordinating the various tasks and workflows involved in the machine learning lifecycle, from data preparation and model training to deployment and monitoring. It involves integrating multiple tools and platforms to streamline operations, automate repetitive tasks, and ensure seamless collaboration among data scientists, engineers, and operations teams. By using orchestration frameworks, organizations can enhance reproducibility, scalability, and efficiency, enabling them to manage complex machine learning pipelines and improve the overall quality of models in production. This ensures that models are consistently updated and maintained, facilitating rapid iteration and adaptation to changing data and business needs.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@ML Observability: what, why, how](https://ubuntu.com/blog/ml-observability)
|
||||
- [@article@ML Observability: what, why, how](https://ubuntu.com/blog/ml-observability)
|
||||
@@ -1,11 +1,11 @@
|
||||
# Python
|
||||
|
||||
Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its significant use of indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.
|
||||
Python is a widely-used programming language known for its clear syntax and extensive libraries. It's a versatile tool that can handle many tasks, from simple scripting to complex software development. Its ease of use and the availability of specialized libraries for data analysis, machine learning, and automation make it a popular choice for building and deploying machine learning systems.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Python Roadmap](https://roadmap.sh/python)
|
||||
- [@official@Python](https://www.python.org/)
|
||||
- [@article@Real Python](https://realpython.com/)
|
||||
- [@article@Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
|
||||
- [@feed@Explore top posts about Python](https://app.daily.dev/tags/python?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Python](https://app.daily.dev/tags/python?ref=roadmapsh)
|
||||
@@ -6,4 +6,4 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@ApacheSpark](https://spark.apache.org/documentation.html)
|
||||
- [@article@Spark By Examples](https://sparkbyexamples.com)
|
||||
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)
|
||||
@@ -1,9 +1,9 @@
|
||||
# Version Control Systems
|
||||
|
||||
Version control/source control systems allow developers to track and control changes to code over time. These services often include the ability to make atomic revisions to code, branch/fork off of specific points, and to compare versions of code. They are useful in determining the who, what, when, and why code changes were made.
|
||||
Version control systems are tools that manage changes to code, documents, and other files over time. They allow multiple people to collaborate on a project without overwriting each other's work, track the history of changes, and revert to previous versions if needed. These systems essentially create a detailed record of modifications, enabling efficient collaboration and easier debugging.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Git](https://git-scm.com/)
|
||||
- [@article@What is Version Control?](https://www.atlassian.com/git/tutorials/what-is-version-control)
|
||||
- [@feed@Explore top posts about Version Control](https://app.daily.dev/tags/version-control?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Version Control](https://app.daily.dev/tags/version-control?ref=roadmapsh)
|
||||
@@ -1,10 +1,10 @@
|
||||
# Version Control Systems
|
||||
# Version Control
|
||||
|
||||
Version control/source control systems allow developers to track and control changes to code over time. These services often include the ability to make atomic revisions to code, branch/fork off of specific points, and to compare versions of code. They are useful in determining the who, what, when, and why code changes were made.
|
||||
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. It allows multiple people to collaborate on the same project without overwriting each other's work. This system tracks modifications to code, configurations, data, and other artifacts, providing a history of changes and enabling easy rollback to previous states.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Git](https://git-scm.com/)
|
||||
- [@official@Git Documentation](https://git-scm.com/docs)
|
||||
- [@article@What is Version Control?](https://www.atlassian.com/git/tutorials/what-is-version-control)
|
||||
- [@feed@Explore top posts about Version Control](https://app.daily.dev/tags/version-control?ref=roadmapsh)
|
||||
- [@feed@Explore top posts about Version Control](https://app.daily.dev/tags/version-control?ref=roadmapsh)
|
||||
Reference in New Issue
Block a user