Compare commits

..

36 Commits

Author SHA1 Message Date
Arik Chakma
7e83371e97 fix: remove varify 2025-09-05 03:54:28 +06:00
Arik Chakma
cc2e75c812 chore: remove varify 2025-09-05 00:06:41 +06:00
V Sridhar Subramaniam
24eace0f73 Add pandas library content (#9113)
* Added a description and couple of links

* Update src/data/roadmaps/machine-learning/content/pandas@PnOoShqB3z4LuUvp0Gh2e.md

---------

Co-authored-by: Kamran Ahmed <kamranahmed.se@gmail.com>
2025-09-04 10:31:54 +01:00
Favor
3c06b122e6 fix: course banner overlays the table of contents (#9097)
* fix: prevent sticky elements from overlapping with course announcement

- Changed sticky positioning from top-0 to top-[36px] in GuideContent.tsx
- Changed sticky positioning from top-0 to top-[36px] in TableOfContent.tsx
- This accounts for the maximum height of the CourseAnnouncement component
- Fixes visual overlap for both 'Other Guides' and 'In this article' sections

* fix: stop "In this article" title from overlapping with CourseAnnouncement component

* Add comment for announcement hide duration

Added a clarifying comment indicating that the announcement is hidden for 14 days when dismissed.

* revert unrelated files
2025-09-04 10:29:20 +01:00
Arik Chakma
2fdb647413 fix: duplicate guides (#9110) 2025-09-03 13:06:42 +01:00
Obscure octopus
3ca9f81298 Update learning resource (#9091)
Included a updated version of git & github crash course as earlier version was 8 years old
2025-09-03 12:52:14 +01:00
shreyazh
56c4630e0d Fix typo (#9100)
Corrected spelling from WHow to How
2025-09-03 12:50:07 +01:00
Daniel Wolff
36af3ddcf1 Fix typo (#9101)
Fixed the name of the tool (Perfect->Prefect)

Co-authored-by: Kamran Ahmed <kamranahmed.se@gmail.com>
2025-09-03 12:49:48 +01:00
github-actions[bot]
0e7afe3c99 chore: sync content to repository - nextjs (#9098)
* chore: sync content to repo

* Update src/data/roadmaps/nextjs/content/adapters@fXXlJ6oN_YPWVr-fqEar3.md

---------

Co-authored-by: kamranahmedse <4921183+kamranahmedse@users.noreply.github.com>
Co-authored-by: Kamran Ahmed <kamranahmed.se@gmail.com>
2025-09-03 12:44:49 +01:00
kamranahmedse
b605fd6337 chore: sync content to repo 2025-09-03 12:43:21 +01:00
kamranahmedse
ba1e5a58b5 chore: sync content to repo 2025-09-03 12:43:01 +01:00
Arik Chakma
dd12cf1c99 fix: remove log 2025-09-03 11:02:46 +01:00
Arik Chakma
44854cc5fb fix: official roadmap json 2025-09-03 11:02:46 +01:00
Arik Chakma
b1e60f1614 fix: beginner roadmaps 2025-09-02 17:54:54 +01:00
Arik Chakma
168ad05afe fix: project card 2025-09-02 17:54:54 +01:00
Arik Chakma
bb0419bf8a feat: official project 2025-09-02 17:54:54 +01:00
Kamran Ahmed
2d18cefd55 Revert "Revert "feat: official roadmap meta"" (#9096)
* Revert "Revert "chore: update roadmap json endpoint""

This reverts commit 8dbe1468ed.

* Revert "Revert "feat: roadmap main page""

This reverts commit bb13bf38a8.

* Revert "Revert "chore: replace roadmap listing""

This reverts commit 80dfd5b206.

* Revert "Revert "feat: roadmap courses""

This reverts commit a89c2d454f.

* Revert "Revert "fix: course length""

This reverts commit d1cf7cca99.

* Revert "Revert "feat: roadmap with courses""

This reverts commit 9c32f9d469.

* Revert "Revert "chore: disable pre-render for roadmaps""

This reverts commit cef4c29f10.
2025-09-01 20:22:54 +01:00
Arik Chakma
931e1b4a31 fix: rename key 2025-09-01 20:12:50 +01:00
Arik Chakma
e2075529ac feat: add roadmap key 2025-09-01 20:12:50 +01:00
Kamran Ahmed
8dbe1468ed Revert "chore: update roadmap json endpoint"
This reverts commit 580e764097.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
bb13bf38a8 Revert "feat: roadmap main page"
This reverts commit ffb1cb5059.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
80dfd5b206 Revert "chore: replace roadmap listing"
This reverts commit c4c28944ee.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
a89c2d454f Revert "feat: roadmap courses"
This reverts commit f9f38101f9.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
d1cf7cca99 Revert "fix: course length"
This reverts commit 40c7ea1b43.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
9c32f9d469 Revert "feat: roadmap with courses"
This reverts commit 4e569df2a3.
2025-09-01 18:56:02 +01:00
Kamran Ahmed
cef4c29f10 Revert "chore: disable pre-render for roadmaps"
This reverts commit 679e29d12d.
2025-09-01 18:56:02 +01:00
Arik Chakma
679e29d12d chore: disable pre-render for roadmaps 2025-09-01 18:11:04 +01:00
Arik Chakma
4e569df2a3 feat: roadmap with courses 2025-09-01 18:11:04 +01:00
Arik Chakma
40c7ea1b43 fix: course length 2025-09-01 18:11:04 +01:00
Arik Chakma
f9f38101f9 feat: roadmap courses 2025-09-01 18:11:04 +01:00
Arik Chakma
c4c28944ee chore: replace roadmap listing 2025-09-01 18:11:04 +01:00
Arik Chakma
ffb1cb5059 feat: roadmap main page 2025-09-01 18:11:04 +01:00
Arik Chakma
580e764097 chore: update roadmap json endpoint 2025-09-01 18:11:04 +01:00
github-actions[bot]
111a97bb55 chore: sync content to repo (#9076)
Co-authored-by: kamranahmedse <4921183+kamranahmedse@users.noreply.github.com>
2025-09-01 17:15:22 +06:00
kamranahmedse
5d85495d72 chore: sync content to repo 2025-08-28 15:01:48 +01:00
Kamran Ahmed
ed2a251de4 Add content to Data Engineer Roadmap (#9016)
* Add basic content

* add content to data engineer roadmap

* add content to DE roadmap and fix some typos in content appearing in several roadmaps

* batch of new content for data engineer roadmap

* new batch of content from DE roadmap

* new batch in DE roadmap with 25 contents

* add 30 new content for DE roadmap

* new 30 contents for DE roadmap

* add last batch of content for DE roadmap. Ready to PR

* add 4 missing contents

* clean typo in de roadmap

---------

Co-authored-by: Javi Canales <javicanales@Dans-Laptop.local>
2025-08-28 14:59:59 +01:00
460 changed files with 3283 additions and 1672 deletions

1
.astro/types.d.ts vendored
View File

@@ -1 +1,2 @@
/// <reference types="astro/client" />
/// <reference path="content.d.ts" />

View File

@@ -7,4 +7,6 @@ PUBLIC_STRIPE_INDIVIDUAL_MONTHLY_PRICE_ID=
PUBLIC_STRIPE_INDIVIDUAL_YEARLY_PRICE_ID=
PUBLIC_STRIPE_INDIVIDUAL_MONTHLY_PRICE_AMOUNT=10
PUBLIC_STRIPE_INDIVIDUAL_YEARLY_PRICE_AMOUNT=100
PUBLIC_STRIPE_INDIVIDUAL_YEARLY_PRICE_AMOUNT=100
ROADMAP_API_KEY=

2
pnpm-lock.yaml generated
View File

@@ -9897,4 +9897,4 @@ snapshots:
react: 19.1.0
use-sync-external-store: 1.5.0(react@19.1.0)
zwitch@2.0.4: {}
zwitch@2.0.4: {}

View File

@@ -4,7 +4,6 @@ import { getPageTrackingData } from '../../lib/browser';
declare global {
interface Window {
gtag: any;
varify: any;
fireEvent: (props: {
action: string;
category: string;
@@ -68,7 +67,7 @@ window.fireEvent = (props) => {
}
const trackingData = getPageTrackingData();
window.gtag('event', action, {
event_category: category,
event_label: label,

View File

@@ -23,7 +23,12 @@ type EditorRoadmapProps = {
};
export function EditorRoadmap(props: EditorRoadmapProps) {
const { resourceId, resourceType = 'roadmap', dimensions, hasChat = true } = props;
const {
resourceId,
resourceType = 'roadmap',
dimensions,
hasChat = true,
} = props;
const [hasSwitchedRoadmap, setHasSwitchedRoadmap] = useState(false);
const [isLoading, setIsLoading] = useState(true);

View File

@@ -1,3 +0,0 @@
<div class='text-sm sm:text-base leading-relaxed text-left p-2 sm:p-4 text-md text-gray-800 border-t border-t-gray-300 bg-gray-100 rounded-bl-md rounded-br-md [&>p:not(:last-child)]:mb-3 [&>p>a]:underline [&>p>a]:text-blue-700'>
<slot />
</div>

View File

@@ -1,42 +0,0 @@
---
import { markdownToHtml } from '../../lib/markdown';
import Answer from './Answer.astro';
import Question from './Question.astro';
export type FAQType = {
question: string;
answer: string[];
};
export interface Props {
faqs: FAQType[];
}
const { faqs } = Astro.props;
if (faqs.length === 0) {
return '';
}
---
<div class='border-t bg-gray-100 mt-8'>
<div class='container'>
<div class='flex justify-between relative -top-5'>
<h2 class='text-sm sm:text-base font-medium py-1 px-3 border bg-white rounded-md'>Frequently Asked Questions</h2>
</div>
<div class='flex flex-col gap-1 pb-14'>
{
faqs.map((faq, questionIndex) => (
<Question isActive={questionIndex === 0} question={faq.question}>
<Answer>
{faq.answer.map((answer) => (
<p set:html={markdownToHtml(answer)} />
))}
</Answer>
</Question>
))
}
</div>
</div>
</div>

View File

@@ -0,0 +1,61 @@
import { useState } from 'react';
import type { OfficialRoadmapQuestion } from '../../queries/official-roadmap';
import { Question } from './Question';
import { guideRenderer } from '../../lib/guide-renderer';
type FAQsProps = {
faqs: OfficialRoadmapQuestion[];
};
export function FAQs(props: FAQsProps) {
const { faqs } = props;
if (faqs.length === 0) {
return null;
}
const [activeQuestionIndex, setActiveQuestionIndex] = useState(0);
return (
<div className="mt-8 border-t bg-gray-100">
<div className="container">
<div className="relative -top-5 flex justify-between">
<h2 className="rounded-md border bg-white px-3 py-1 text-sm font-medium sm:text-base">
Frequently Asked Questions
</h2>
</div>
<div className="flex flex-col gap-1 pb-14">
{faqs.map((faq, questionIndex) => {
const isTextDescription =
typeof faq?.description === 'string' &&
faq?.description?.length > 0;
return (
<Question
key={faq._id}
isActive={questionIndex === activeQuestionIndex}
question={faq.title}
onClick={() => setActiveQuestionIndex(questionIndex)}
>
<div
className="text-md rounded-br-md rounded-bl-md border-t border-t-gray-300 bg-gray-100 p-2 text-left text-sm leading-relaxed text-gray-800 sm:p-4 sm:text-base [&>p:not(:last-child)]:mb-3 [&>p>a]:text-blue-700 [&>p>a]:underline"
{...(isTextDescription
? {
dangerouslySetInnerHTML: {
__html: faq.description,
},
}
: {})}
>
{!isTextDescription
? guideRenderer.render(faq.description)
: null}
</div>
</Question>
);
})}
</div>
</div>
</div>
);
}

View File

@@ -1,42 +0,0 @@
---
import Icon from '../AstroIcon.astro';
export interface Props {
question: string;
isActive?: boolean;
}
const { question, isActive = false } = Astro.props;
---
<div
class='faq-item bg-white border rounded-md hover:bg-gray-50 border-gray-300'
>
<button
faq-question
class='flex flex-row justify-between items-center p-2 sm:p-3 w-full'
>
<span class='text-sm sm:text-base text-left font-medium'>{question}</span>
<Icon icon='down' class='h-6 hidden sm:block text-gray-400' />
</button>
<div class:list={['answer', { hidden: !isActive }]} faq-answer>
<slot />
</div>
</div>
<script>
document.querySelectorAll('[faq-question]').forEach((el) => {
el.addEventListener('click', () => {
// Hide any other visible answers
document.querySelectorAll('[faq-answer]').forEach((element) => {
element.classList.add('hidden');
});
// Show the current answer
const answer = el.nextElementSibling;
if (answer) {
answer.classList.remove('hidden');
}
});
});
</script>

View File

@@ -0,0 +1,29 @@
import { cn } from '../../lib/classname';
import { ChevronDownIcon } from '../ReactIcons/ChevronDownIcon';
type QuestionProps = {
question: string;
isActive?: boolean;
children: React.ReactNode;
onClick?: () => void;
};
export function Question(props: QuestionProps) {
const { question, isActive = false, children, onClick } = props;
return (
<div className="faq-item rounded-md border border-gray-300 bg-white hover:bg-gray-50">
<button
className="flex w-full flex-row items-center justify-between p-2 sm:p-3"
onClick={onClick}
>
<span className="text-left text-sm font-medium sm:text-base">
{question}
</span>
<ChevronDownIcon className="hidden h-3.5 stroke-[3] text-gray-400 sm:block" />
</button>
{isActive && <div className={cn('answer')}>{children}</div>}
</div>
);
}

View File

@@ -1,6 +1,4 @@
---
import type { RoadmapFileType } from '../lib/roadmap';
export interface Props {
url: string;
title: string;
@@ -27,7 +25,7 @@ const { url, title, description, isNew } = Astro.props;
{
isNew && (
<span class='flex items-center gap-1.5 absolute bottom-1.5 right-1 rounded-xs text-xs font-semibold uppercase text-purple-500 sm:px-1.5'>
<span class='absolute right-1 bottom-1.5 flex items-center gap-1.5 rounded-xs text-xs font-semibold text-purple-500 uppercase sm:px-1.5'>
<span class='relative flex h-2 w-2'>
<span class='absolute inline-flex h-full w-full animate-ping rounded-full bg-purple-400 opacity-75' />
<span class='relative inline-flex h-2 w-2 rounded-full bg-purple-500' />

View File

@@ -19,7 +19,7 @@ export function GuideContent(props: GuideContentProps) {
return (
<article className="lg:grid lg:max-w-full lg:grid-cols-[1fr_minmax(0,700px)_1fr]">
{(showTableOfContent || hasRelatedGuides) && (
<div className="sticky top-0 bg-linear-to-r from-gray-50 py-0 lg:relative lg:col-start-3 lg:col-end-4 lg:row-start-1">
<div className="sticky top-[36px] bg-linear-to-r from-gray-50 py-0 lg:relative lg:col-start-3 lg:col-end-4 lg:row-start-1">
{hasRelatedGuides && (
<RelatedGuides relatedGuides={guide?.relatedGuides || []} />
)}

View File

@@ -3,21 +3,16 @@ import { useToast } from '../../hooks/use-toast';
import { httpGet, httpPost } from '../../lib/http';
import { LoadingSolutions } from './LoadingSolutions';
import { EmptySolutions } from './EmptySolutions';
import { ThumbsDown, ThumbsUp } from 'lucide-react';
import { getRelativeTimeString } from '../../lib/date';
import { Pagination } from '../Pagination/Pagination';
import { deleteUrlParam, getUrlParams, setUrlParams } from '../../lib/browser';
import { pageProgressMessage } from '../../stores/page';
import { LeavingRoadmapWarningModal } from './LeavingRoadmapWarningModal';
import { isLoggedIn } from '../../lib/jwt';
import { showLoginPopup } from '../../lib/popup';
import { VoteButton } from './VoteButton.tsx';
import { GitHubIcon } from '../ReactIcons/GitHubIcon.tsx';
import { SelectLanguages } from './SelectLanguages.tsx';
import type { ProjectFrontmatter } from '../../lib/project.ts';
import { ProjectSolutionModal } from './ProjectSolutionModal.tsx';
import { SortProjects } from './SortProjects.tsx';
import { ProjectSolutionRow } from './ProjectSolutionRow';
import type { OfficialProjectDocument } from '../../queries/official-project.ts';
export interface ProjectStatusDocument {
_id?: string;
@@ -69,12 +64,12 @@ type PageState = {
};
type ListProjectSolutionsProps = {
project: ProjectFrontmatter;
project: OfficialProjectDocument;
projectId: string;
};
export function ListProjectSolutions(props: ListProjectSolutionsProps) {
const { projectId, project: projectData } = props;
const { projectId, project } = props;
const toast = useToast();
const [pageState, setPageState] = useState<PageState>({
@@ -226,7 +221,7 @@ export function ListProjectSolutions(props: ListProjectSolutionsProps) {
<div className="relative mb-5 hidden items-center justify-between sm:flex">
<div>
<h1 className="mb-1 text-xl font-semibold">
{projectData.title} Solutions
{project.title} Solutions
</h1>
<p className="text-sm text-gray-500">
Solutions submitted by the community

View File

@@ -1,47 +1,43 @@
import { Badge } from '../Badge.tsx';
import type {
ProjectDifficultyType,
ProjectFileType,
} from '../../lib/project.ts';
import { Users } from 'lucide-react';
import { formatCommaNumber } from '../../lib/number.ts';
import { cn } from '../../lib/classname.ts';
import { isLoggedIn } from '../../lib/jwt.ts';
import type { OfficialProjectDocument } from '../../queries/official-project.ts';
type ProjectCardProps = {
project: ProjectFileType;
project: OfficialProjectDocument;
userCount?: number;
status?: 'completed' | 'started' | 'none';
};
const badgeVariants: Record<ProjectDifficultyType, string> = {
const badgeVariants = {
beginner: 'yellow',
intermediate: 'green',
advanced: 'blue',
};
} as const;
export function ProjectCard(props: ProjectCardProps) {
const { project, userCount = 0, status } = props;
const { frontmatter, id } = project;
const { difficulty, title, description, slug, topics = [] } = project;
const isLoadingStatus = status === undefined;
const userStartedCount = status !== 'none' && userCount === 0 ? userCount + 1 : userCount;
const userStartedCount =
status !== 'none' && userCount === 0 ? userCount + 1 : userCount;
return (
<a
href={`/projects/${id}`}
href={`/projects/${slug}`}
className="flex flex-col rounded-md border bg-white p-3 transition-colors hover:border-gray-300 hover:bg-gray-50"
>
<span className="flex justify-between gap-1.5">
<Badge
variant={badgeVariants[frontmatter.difficulty] as any}
text={frontmatter.difficulty}
/>
<Badge variant={'grey'} text={frontmatter.nature} />
<Badge variant={badgeVariants[difficulty]} text={difficulty} />
{topics?.map((topic, index) => (
<Badge key={`${topic}-${index}`} variant={'grey'} text={topic} />
))}
</span>
<span className="my-3 flex min-h-[100px] flex-col">
<span className="mb-1 font-medium">{frontmatter.title}</span>
<span className="text-sm text-gray-500">{frontmatter.description}</span>
<span className="mb-1 font-medium">{title}</span>
<span className="text-sm text-gray-500">{description}</span>
</span>
<span className="flex min-h-[22px] items-center justify-between gap-2 text-xs text-gray-400">
{isLoadingStatus ? (

View File

@@ -0,0 +1,25 @@
import { guideRenderer } from '../../lib/guide-renderer';
import type { OfficialProjectDocument } from '../../queries/official-project';
type ProjectContentProps = {
project: OfficialProjectDocument;
};
export function ProjectContent(props: ProjectContentProps) {
const { project } = props;
const isContentString = typeof project?.content === 'string';
return (
<div
className="prose prose-h2:mb-3 prose-h2:mt-5 prose-h3:mb-1 prose-h3:mt-5 prose-p:mb-2 prose-blockquote:font-normal prose-blockquote:text-gray-500 prose-pre:my-3 prose-ul:my-3.5 prose-hr:my-5 prose-li:[&>p]:m-0 max-w-full [&>ul>li]:my-1"
{...(isContentString
? {
dangerouslySetInnerHTML: { __html: project?.content },
}
: {
children: guideRenderer.render(project?.content),
})}
/>
);
}

View File

@@ -2,11 +2,7 @@ import { ProjectCard } from './ProjectCard.tsx';
import { HeartHandshake, Trash2 } from 'lucide-react';
import { cn } from '../../lib/classname.ts';
import { useEffect, useMemo, useState } from 'react';
import {
projectDifficulties,
type ProjectDifficultyType,
type ProjectFileType,
} from '../../lib/project.ts';
import {
deleteUrlParam,
getUrlParams,
@@ -14,9 +10,14 @@ import {
} from '../../lib/browser.ts';
import { httpPost } from '../../lib/http.ts';
import { isLoggedIn } from '../../lib/jwt.ts';
import {
allowedOfficialProjectDifficulty,
type AllowedOfficialProjectDifficulty,
type OfficialProjectDocument,
} from '../../queries/official-project.ts';
type DifficultyButtonProps = {
difficulty: ProjectDifficultyType;
difficulty: AllowedOfficialProjectDifficulty;
isActive?: boolean;
onClick?: () => void;
};
@@ -46,7 +47,7 @@ export type ListProjectStatusesResponse = Record<
>;
type ProjectsListProps = {
projects: ProjectFileType[];
projects: OfficialProjectDocument[];
userCounts: Record<string, number>;
};
@@ -55,7 +56,7 @@ export function ProjectsList(props: ProjectsListProps) {
const { difficulty: urlDifficulty } = getUrlParams();
const [difficulty, setDifficulty] = useState<
ProjectDifficultyType | undefined
AllowedOfficialProjectDifficulty | undefined
>(urlDifficulty);
const [projectStatuses, setProjectStatuses] =
useState<ListProjectStatusesResponse>();
@@ -66,7 +67,7 @@ export function ProjectsList(props: ProjectsListProps) {
return;
}
const projectIds = projects.map((project) => project.id);
const projectIds = projects.map((project) => project.slug);
const { response, error } = await httpPost(
`${import.meta.env.PUBLIC_API_URL}/v1-list-project-statuses`,
{
@@ -82,22 +83,27 @@ export function ProjectsList(props: ProjectsListProps) {
setProjectStatuses(response);
};
const projectsByDifficulty: Map<ProjectDifficultyType, ProjectFileType[]> =
useMemo(() => {
const result = new Map<ProjectDifficultyType, ProjectFileType[]>();
const projectsByDifficulty: Map<
AllowedOfficialProjectDifficulty,
OfficialProjectDocument[]
> = useMemo(() => {
const result = new Map<
AllowedOfficialProjectDifficulty,
OfficialProjectDocument[]
>();
for (const project of projects) {
const difficulty = project.frontmatter.difficulty;
for (const project of projects) {
const difficulty = project.difficulty;
if (!result.has(difficulty)) {
result.set(difficulty, []);
}
result.get(difficulty)?.push(project);
if (!result.has(difficulty)) {
result.set(difficulty, []);
}
return result;
}, [projects]);
result.get(difficulty)?.push(project);
}
return result;
}, [projects]);
const matchingProjects = difficulty
? projectsByDifficulty.get(difficulty) || []
@@ -111,7 +117,7 @@ export function ProjectsList(props: ProjectsListProps) {
<div className="flex flex-col">
<div className="my-2.5 flex items-center justify-between">
<div className="flex flex-wrap gap-1">
{projectDifficulties.map((projectDifficulty) => (
{allowedOfficialProjectDifficulty.map((projectDifficulty) => (
<DifficultyButton
key={projectDifficulty}
onClick={() => {
@@ -122,6 +128,7 @@ export function ProjectsList(props: ProjectsListProps) {
isActive={projectDifficulty === difficulty}
/>
))}
{difficulty && (
<button
onClick={() => {
@@ -155,25 +162,25 @@ export function ProjectsList(props: ProjectsListProps) {
{matchingProjects
.sort((project) => {
return project.frontmatter.difficulty === 'beginner'
return project.difficulty === 'beginner'
? -1
: project.frontmatter.difficulty === 'intermediate'
: project.difficulty === 'intermediate'
? 0
: 1;
})
.sort((a, b) => {
return a.frontmatter.sort - b.frontmatter.sort;
return a.order - b.order;
})
.map((matchingProject) => {
const count = userCounts[matchingProject?.id] || 0;
const count = userCounts[matchingProject?.slug] || 0;
return (
<ProjectCard
key={matchingProject.id}
key={matchingProject.slug}
project={matchingProject}
userCount={count}
status={
projectStatuses
? (projectStatuses?.[matchingProject.id] || 'none')
? projectStatuses?.[matchingProject.slug] || 'none'
: undefined
}
/>

View File

@@ -7,16 +7,16 @@ import {
setUrlParams,
} from '../../lib/browser.ts';
import { CategoryFilterButton } from '../Roadmaps/CategoryFilterButton.tsx';
import {
projectDifficulties,
type ProjectFileType,
} from '../../lib/project.ts';
import { ProjectCard } from './ProjectCard.tsx';
import {
allowedOfficialProjectDifficulty,
type OfficialProjectDocument,
} from '../../queries/official-project.ts';
type ProjectGroup = {
id: string;
title: string;
projects: ProjectFileType[];
projects: OfficialProjectDocument[];
};
type ProjectsPageProps = {
@@ -28,7 +28,7 @@ export function ProjectsPage(props: ProjectsPageProps) {
const { roadmapsProjects, userCounts } = props;
const allUniqueProjectIds = new Set<string>(
roadmapsProjects.flatMap((group) =>
group.projects.map((project) => project.id),
group.projects.map((project) => project.slug),
),
);
const allUniqueProjects = useMemo(
@@ -37,15 +37,15 @@ export function ProjectsPage(props: ProjectsPageProps) {
.map((id) =>
roadmapsProjects
.flatMap((group) => group.projects)
.find((project) => project.id === id),
.find((project) => project.slug === id),
)
.filter(Boolean) as ProjectFileType[],
.filter(Boolean) as OfficialProjectDocument[],
[allUniqueProjectIds],
);
const [activeGroup, setActiveGroup] = useState<string>('');
const [visibleProjects, setVisibleProjects] =
useState<ProjectFileType[]>(allUniqueProjects);
useState<OfficialProjectDocument[]>(allUniqueProjects);
const [isFilterOpen, setIsFilterOpen] = useState(false);
@@ -67,11 +67,11 @@ export function ProjectsPage(props: ProjectsPageProps) {
const sortedVisibleProjects = useMemo(
() =>
visibleProjects.sort((a, b) => {
const projectADifficulty = a?.frontmatter.difficulty || 'beginner';
const projectBDifficulty = b?.frontmatter.difficulty || 'beginner';
const projectADifficulty = a?.difficulty || 'beginner';
const projectBDifficulty = b?.difficulty || 'beginner';
return (
projectDifficulties.indexOf(projectADifficulty) -
projectDifficulties.indexOf(projectBDifficulty)
allowedOfficialProjectDifficulty.indexOf(projectADifficulty) -
allowedOfficialProjectDifficulty.indexOf(projectBDifficulty)
);
}),
[visibleProjects],
@@ -111,7 +111,7 @@ export function ProjectsPage(props: ProjectsPageProps) {
{isFilterOpen && <X size={13} className="mr-1" />}
Categories
</button>
<div className="container relative flex flex-col gap-4 sm:flex-row">
<div className="relative container flex flex-col gap-4 sm:flex-row">
<div
className={cn(
'hidden w-full flex-col from-gray-100 sm:w-[160px] sm:shrink-0 sm:border-r sm:bg-linear-to-l sm:pt-6',
@@ -171,7 +171,7 @@ export function ProjectsPage(props: ProjectsPageProps) {
</div>
</div>
</div>
<div className="flex grow flex-col pb-20 pt-2 sm:pt-6">
<div className="flex grow flex-col pt-2 pb-20 sm:pt-6">
<div className="mb-4 flex items-center justify-between text-sm text-gray-500">
<h3 className={'flex items-center'}>
<Box size={15} className="mr-1" strokeWidth={2} />
@@ -187,9 +187,9 @@ export function ProjectsPage(props: ProjectsPageProps) {
<div className="grid grid-cols-1 gap-1.5 sm:grid-cols-2">
{sortedVisibleProjects.map((project) => (
<ProjectCard
key={project.id}
key={project.slug}
project={project}
userCount={userCounts[project.id] || 0}
userCount={userCounts[project.slug] || 0}
status={'none'}
/>
))}

View File

@@ -1,69 +1,22 @@
---
import { getQuestionGroupsByIds } from '../lib/question-group';
import { getRoadmapsByIds, type RoadmapFrontmatter } from '../lib/roadmap';
import { Map, Clipboard } from 'lucide-react';
import { Map } from 'lucide-react';
import { listOfficialRoadmaps } from '../queries/official-roadmap';
export interface Props {
roadmap: RoadmapFrontmatter;
relatedRoadmaps: string[];
}
const { roadmap } = Astro.props;
const { relatedRoadmaps } = Astro.props;
const relatedRoadmaps = roadmap.relatedRoadmaps || [];
const relatedRoadmapDetails = await getRoadmapsByIds(relatedRoadmaps);
const relatedQuestions = roadmap.relatedQuestions || [];
const relatedQuestionDetails = await getQuestionGroupsByIds(relatedQuestions);
const allRoadmaps = await listOfficialRoadmaps();
const relatedRoadmapsDetails = allRoadmaps.filter((roadmap) =>
relatedRoadmaps.includes(roadmap.slug),
);
---
{
relatedQuestionDetails.length > 0 && (
<div class='border-t bg-gray-100 pb-3'>
<div class='container'>
<div class='relative -top-5 flex justify-between'>
<span class='text-md flex items-center rounded-md border bg-white px-3 py-1 font-medium'>
<Clipboard className='mr-1.5 text-black' size='17px' />
Test your Knowledge
</span>
<a
href='/questions'
class='text-md rounded-md border bg-white px-3 py-1 font-medium hover:bg-gray-50'
>
<span class='hidden sm:inline'>All Quizzes &rarr;</span>
<span class='inline sm:hidden'>More &rarr;</span>
</a>
</div>
<div class='flex flex-col gap-1 pb-8'>
{relatedQuestionDetails.map((relatedQuestionGroup) => (
<a
href={`/questions/${relatedQuestionGroup.id}`}
class='flex flex-col gap-0.5 rounded-md border bg-white px-3.5 py-2 hover:bg-gray-50 sm:flex-row sm:gap-0'
>
<span class='inline-block min-w-[150px] font-medium'>
{relatedQuestionGroup.title}
</span>
<span class='text-gray-500'>
{relatedQuestionGroup.description}
</span>
</a>
))}
</div>
</div>
</div>
)
}
{
relatedRoadmaps.length && (
<div
class:list={[
'border-t bg-gray-100',
{
'mt-0': !relatedQuestionDetails.length,
},
]}
>
<div class:list={['border-t bg-gray-100']}>
<div class='container'>
<div class='relative -top-5 flex justify-between'>
<span class='text-md flex items-center rounded-md border bg-white px-3 py-1 font-medium'>
@@ -80,17 +33,15 @@ const relatedQuestionDetails = await getQuestionGroupsByIds(relatedQuestions);
</div>
<div class='flex flex-col gap-1 pb-8'>
{relatedRoadmapDetails.map((relatedRoadmap) => (
{relatedRoadmapsDetails.map((relatedRoadmap) => (
<a
href={`/${relatedRoadmap.id}`}
href={`/${relatedRoadmap.slug}`}
class='flex flex-col gap-0.5 rounded-md border bg-white px-3.5 py-2 hover:bg-gray-50 sm:flex-row sm:gap-0'
>
<span class='inline-block min-w-[195px] font-medium'>
{relatedRoadmap.frontmatter.briefTitle}
</span>
<span class='text-gray-500'>
{relatedRoadmap.frontmatter.briefDescription}
{relatedRoadmap.title.card}
</span>
<span class='text-gray-500'>{relatedRoadmap.description}</span>
</a>
))}
</div>

View File

@@ -5,9 +5,7 @@ import {
Bot,
FolderKanbanIcon,
MapIcon,
MessageCircle,
} from 'lucide-react';
import { type RoadmapFrontmatter } from '../lib/roadmap';
import LoginPopup from './AuthenticationFlow/LoginPopup.astro';
import { DownloadRoadmapButton } from './DownloadRoadmapButton';
import { MarkFavorite } from './FeaturedItems/MarkFavorite';
@@ -20,20 +18,16 @@ import { PersonalizedRoadmap } from './PersonalizedRoadmap/PersonalizedRoadmap';
export interface Props {
title: string;
description: string;
note?: string;
partner?: {
description: string;
link: string;
linkText: string;
};
roadmapId: string;
isUpcoming?: boolean;
hasSearch?: boolean;
projectCount?: number;
coursesCount?: number;
hasAIChat?: boolean;
question?: RoadmapFrontmatter['question'];
hasTopics?: boolean;
isForkable?: boolean;
activeTab?: 'roadmap' | 'projects' | 'courses';
}
@@ -43,12 +37,8 @@ const {
description,
roadmapId,
partner,
isUpcoming = false,
note,
hasTopics = false,
hasAIChat = false,
projectCount = 0,
question,
activeTab = 'roadmap',
coursesCount = 0,
} = Astro.props;

View File

@@ -10,10 +10,12 @@ import { useOutsideClick } from '../hooks/use-outside-click';
import { markdownToHtml } from '../lib/markdown';
import { cn } from '../lib/classname';
import { useScrollPosition } from '../hooks/use-scroll-position';
import type { JSONContent } from '@tiptap/core';
import { guideRenderer } from '../lib/guide-renderer';
type RoadmapTitleQuestionProps = {
question: string;
answer: string;
answer: JSONContent;
roadmapId?: string;
};
@@ -38,24 +40,24 @@ export function RoadmapTitleQuestion(props: RoadmapTitleQuestionProps) {
'rounded-0 -mx-4 sm:mx-0': isAnswerVisible,
// @FIXME:
// The line below is to keep the question hidden on mobile devices except for
// the frontend roadmap. This is because we did not use to have the question
// the frontend roadmap. This is because we did not use to have the question
// on mobile devices before and we don't want to cause any SEO issues. It will
// be enabled on other roadmaps in the future.
},
)}
>
{isAnswerVisible && (
<div className="fixed left-0 right-0 top-0 z-100 h-full items-center justify-center overflow-y-auto overflow-x-hidden overscroll-contain bg-black/50"></div>
<div className="fixed top-0 right-0 left-0 z-100 h-full items-center justify-center overflow-x-hidden overflow-y-auto overscroll-contain bg-black/50"></div>
)}
<h2
className="z-50 flex cursor-pointer select-none items-center px-2 py-2 text-sm font-medium"
className="z-50 flex cursor-pointer items-center px-2 py-2 text-sm font-medium select-none"
aria-expanded={isAnswerVisible ? 'true' : 'false'}
onClick={(e) => {
e.preventDefault();
setIsAnswerVisible(!isAnswerVisible);
}}
>
<span className="flex grow select-none items-center">
<span className="flex grow items-center select-none">
<Info className="mr-1.5 inline-block h-4 w-4" strokeWidth={2.5} />
{question}
</span>
@@ -65,7 +67,7 @@ export function RoadmapTitleQuestion(props: RoadmapTitleQuestionProps) {
</h2>
<div
className={`absolute left-0 right-0 top-0 z-100 mt-0 border bg-white ${
className={`absolute top-0 right-0 left-0 z-100 mt-0 border bg-white ${
isAnswerVisible ? 'rounded-0 block sm:rounded-md' : 'hidden'
}`}
ref={ref}
@@ -73,7 +75,7 @@ export function RoadmapTitleQuestion(props: RoadmapTitleQuestionProps) {
{isAnswerVisible && (
<h2
className={cn(
'sticky top-0 flex cursor-pointer select-none items-center rounded-t-md border-b bg-white px-[7px] py-[9px] text-base font-medium',
'sticky top-0 flex cursor-pointer items-center rounded-t-md border-b bg-white px-[7px] py-[9px] text-base font-medium select-none',
)}
onClick={() => {
setIsAnswerVisible(false);
@@ -95,9 +97,11 @@ export function RoadmapTitleQuestion(props: RoadmapTitleQuestionProps) {
</h2>
)}
<div
className="bg-gray-100 p-3 text-base [&>h2]:mb-2 [&>h2]:mt-5 [&>h2]:text-[17px] [&>h2]:font-medium [&>p:last-child]:mb-0 [&>p>a]:font-semibold [&>p>a]:underline [&>p>a]:underline-offset-2 [&>p]:mb-3 [&>p]:font-normal [&>p]:leading-relaxed [&>p]:text-gray-800 [&>ul>li]:mb-2 [&>ul>li]:font-normal"
dangerouslySetInnerHTML={{ __html: markdownToHtml(answer, false) }}
></div>
className="bg-gray-100 p-3 text-base [&>h2]:mt-5 [&>h2]:mb-2 [&>h2]:text-[17px] [&>h2]:font-medium [&>p]:mb-3 [&>p]:leading-relaxed [&>p]:font-normal [&>p]:text-gray-800 [&>p:last-child]:mb-0 [&>p>a]:font-semibold [&>p>a]:underline [&>p>a]:underline-offset-2 [&>ul>li]:mb-2 [&>ul>li]:font-normal"
// dangerouslySetInnerHTML={{ __html: markdownToHtml(answer, false) }}
>
{guideRenderer.render(answer)}
</div>
</div>
</div>
);

View File

@@ -25,7 +25,7 @@ export function TableOfContent(props: TableOfContentProps) {
className={cn(
'relative min-w-[250px] px-5 pt-0 max-lg:max-w-full max-lg:min-w-full max-lg:border-none max-lg:px-0 lg:pt-5',
{
'top-0 lg:sticky!': totalRows <= 20,
'top-[36px] lg:sticky!': totalRows <= 20,
},
)}
>

View File

@@ -1,5 +1,5 @@
---
import type { FAQType } from '../../../components/FAQs/FAQs.astro';
import type { FAQType } from '../../../components/FAQs/FAQs';
export const faqs: FAQType[] = [
{
@@ -11,13 +11,13 @@ export const faqs: FAQType[] = [
{
question: 'What is reinforcement learning?',
answer: [
'[Reinforcement learning](https://towardsdatascience.com/reinforcement-learning-101-e24b50e1d292) (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. Unlike traditional supervised learning, RL does not rely on labeled data. Instead, the agent learns by taking actions and receiving feedback in the form of rewards or penalties. Over time, it aims to maximize cumulative rewards by refining its strategy based on past experiences. RL is often used in areas like robotics, game AI, and autonomous systems, where the goal is to develop intelligent behaviors through trial and error.',
'[Reinforcement learning](https://towardsdatascience.com/reinforcement-learning-101-e24b50e1d292) (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. Unlike traditional supervised learning, RL does not rely on labeled data. Instead, the agent learns by taking actions and receiving feedback in the form of rewards or penalties. Over time, it aims to maximize cumulative rewards by refining its strategy based on past experiences. RL is often used in areas like robotics, game AI, and autonomous systems, where the goal is to develop intelligent behaviors through trial and error.',
],
},
{
question: 'Do AI Engineers need a degree?',
answer: [
'While a degree in computer science, data science, or a related field can provide a solid foundation for becoming an AI engineer, it is not strictly necessary. Many successful AI engineers are self-taught or have gained expertise through online courses, certifications, and hands-on projects.'
'While a degree in computer science, data science, or a related field can provide a solid foundation for becoming an AI engineer, it is not strictly necessary. Many successful AI engineers are self-taught or have gained expertise through online courses, certifications, and hands-on projects.',
],
},
];

View File

@@ -5,4 +5,4 @@ Amazon Elastic Compute Cloud (EC2) is a web service that provides secure, resiza
Visit the following resources to learn more:
- [@official@EC2 - User Guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html)
- [@video@Introduction to Amazon EC2](https://www.youtube.com/watch?v=eaicwmnSdCs)
- [@video@Introduction to Amazon EC2](https://www.youtube.com/watch?v=eaicwmnSdCs)

View File

@@ -4,4 +4,4 @@ Amazon RDS (Relational Database Service) is a web service from Amazon Web Servic
Visit the following resources to learn more:
- [@official@Amazon RDS](https://aws.amazon.com/rds/)
- [@official@Amazon RDS](https://aws.amazon.com/rds/)

View File

@@ -5,8 +5,8 @@ Apache Kafka is an open-source stream-processing software platform developed by
Visit the following resources to learn more:
- [@official@Apache Kafka](https://kafka.apache.org/quickstart)
- [@offical@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
- [@offical@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
- [@article@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
- [@article@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
- [@video@Kafka in 100 Seconds](https://www.youtube.com/watch?v=uvb00oaa3k8)
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@ApacheSpark](https://spark.apache.org/documentation.html)
- [@article@Spark By Examples](https://sparkbyexamples.com)
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)

View File

@@ -1,4 +1,4 @@
# APIs and Data Collection
# APIs and Data Collection
Application Programming Interfaces, better known as APIs, play a fundamental role in the work of data engineers, particularly in the process of data collection. APIs are sets of protocols, routines, and tools that enable different software applications to communicate with each other. An API allows developers to interact with a service or platform through a defined set of rules and endpoints, enabling data exchange and functionality use without needing to understand the underlying code. In data engineering, APIs are used extensively to collect, exchange, and manipulate data from different sources in a secure and efficient manner.

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@Argo CD - Argo Project](https://argo-cd.readthedocs.io/en/stable/)
- [@video@ArgoCD Tutorial for Beginners](https://www.youtube.com/watch?v=MeU5_k9ssrs)
- [@video@What is ArgoCD](https://www.youtube.com/watch?v=p-kAqxuJNik)
- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh)
- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh)

View File

@@ -5,5 +5,4 @@ Amazon Aurora (Aurora) is a fully managed relational database engine that's comp
Visit the following resources to learn more:
- [@official@SAmazon Aurora](https://aws.amazon.com/rds/aurora/)
- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)
- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)

View File

@@ -4,5 +4,5 @@ Authentication and authorization are popular terms in modern computer systems th
Visit the following resources to learn more:
- [@roadmap.sh@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
- [@article@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)

View File

@@ -4,8 +4,8 @@ The AWS Cloud Development Kit (AWS CDK) is an open-source software development f
Visit the following resources to learn more:
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
- [@official@AWS CDK](https://aws.amazon.com/cdk/)
- [@official@AWS CDK Documentation](https://docs.aws.amazon.com/cdk/index.html)
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
- [@opensource@AWS CDK Examples](https://github.com/aws-samples/aws-cdk-examples)
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)

View File

@@ -5,4 +5,4 @@ Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies the
Visit the following resources to learn more:
- [@official@Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)
- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/)
- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/)

View File

@@ -6,5 +6,4 @@ Visit the following resources to learn more:
- [@official@Amazon Simple Notification Service (SNS) ](http://aws.amazon.com/sns/)
- [@official@Send Fanout Event Notifications](https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/)
- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)
- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)

View File

@@ -6,5 +6,4 @@ Visit the following resources to learn more:
- [@official@Amazon Simple Queue Service](https://aws.amazon.com/sqs/)
- [@official@What is Amazon Simple Queue Service?](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html)
- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs)
- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs)

View File

@@ -1,9 +1,9 @@
# Azure Blob Storage
Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azures solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage.
Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azures solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage.
Visit the following resources to learn more:
- [@official@Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs)
- [@official@Introduction to Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
- [@video@A Beginners Guide to Azure Blob Storage](https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s)
- [@video@A Beginners Guide to Azure Blob Storage](https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s)

View File

@@ -1,10 +1,10 @@
# Azure SQL Database
Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks.
Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks.
Visit the following resources to learn more:
- [@official@Azure SQL Database](https://azure.microsoft.com/en-us/products/azure-sql/database)
- [@official@What is Azure SQL Database?](https://learn.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview?view=azuresql)
- [@article@Azure SQL Database: Step-by-Step Setup and Management](https://www.datacamp.com/tutorial/azure-sql-database)
- [@video@Azure SQL for Beginners](https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN)
- [@video@Azure SQL for Beginners](https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@Azure Virtual Machines](https://azure.microsoft.com/en-us/products/virtual-machines)
- [@official@Virtual Machines in Azure](https://learn.microsoft.com/en-us/azure/virtual-machines/overview)
- [@video@AVirtual Machines in Azure | Beginner's Guide](https://www.youtube.com/watch?v=_abaWXoQFZU)
- [@video@AVirtual Machines in Azure | Beginner's Guide](https://www.youtube.com/watch?v=_abaWXoQFZU)

View File

@@ -5,5 +5,4 @@ Batch processing is a method in which large volumes of collected data are proces
Visit the following resources to learn more:
- [@article@What is Batch Processing?](https://aws.amazon.com/what-is/batch-processing/)
- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/)
- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/)

View File

@@ -1,15 +1,15 @@
# Best Practices
1. **Ensure Reliability.** A robust messaging system must guarantee that messages arent lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong.
2. **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling.
3. **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition.
4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.
6. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
1. **Ensure Reliability.** A robust messaging system must guarantee that messages arent lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong.
2. **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling.
3. **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition.
4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.
5. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
Visit the following resources to learn more:
- [@article@Best Practices for Message Queue Architecture](https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565)
- [@article@Best Practices for Message Queue Architecture](https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565)

View File

@@ -1,6 +1,6 @@
# Big Data Tools
Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage.
Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage.
Hadoop and Spark are two of the most prominent frameworks in big data they handle the processing of large-scale data in very different ways. While Hadoop can be credited with democratizing the distributed computing paradigm through a robust storage system called HDFS and a computational model called MapReduce, Spark is changing the game with its in-memory architecture and flexible programming model.
@@ -8,5 +8,4 @@ Visit the following resources to learn more:
- [@article@What is Big Data?](https://cloud.google.com/learn/what-is-big-data?hl=en)
- [@article@Hadoop vs Spark: Which Big Data Framework Is Right For You?](https://www.datacamp.com/blog/hadoop-vs-spark)
- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy)
- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy)

View File

@@ -5,4 +5,4 @@ Bigtable is a high-performance, scalable database that excels at capturing, proc
Visit the following resources to learn more:
- [@official@Bigtable: Fast, Flexible NoSQL](https://cloud.google.com/bigtable?hl=en#scale-your-latency-sensitive-applications-with-the-nosql-pioneer)
- [@article@Google Bigtable](https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable)
- [@article@Google Bigtable](https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is business intelligence (BI)?](https://www.ibm.com/think/topics/business-intelligence)
- [@article@Business intelligence: A complete overview](https://www.tableau.com/business-intelligence/what-is-business-intelligence)
- [@video@What is business intelligence?](https://www.youtube.com/watch?v=l98-BcB3UIE)
- [@video@What is business intelligence?](https://www.youtube.com/watch?v=l98-BcB3UIE)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@article@What is CAP Theorem?](https://www.bmc.com/blogs/cap-theorem/)
- [@article@An Illustrated Proof of the CAP Theorem](https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/)
- [@article@CAP Theorem and its applications in NoSQL Databases](https://www.ibm.com/uk-en/cloud/learn/cap-theorem)
- [@video@What is CAP Theorem?](https://www.youtube.com/watch?v=_RbsFXWRZ10)
- [@video@What is CAP Theorem?](https://www.youtube.com/watch?v=_RbsFXWRZ10)

View File

@@ -5,6 +5,6 @@ Apache Cassandra is a highly scalable, distributed NoSQL database designed to ha
Visit the following resources to learn more:
- [@official@Apache Cassandra](https://cassandra.apache.org/_/index.html)
- [article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
- [@article@article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
- [@video@Apache Cassandra - Course for Beginners](https://www.youtube.com/watch?v=J-cSy5MeMOA)
- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh)
- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh)

View File

@@ -1,10 +1,10 @@
# Census
Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more.
Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more.
Visit the following resources to learn more:
- [@official@Census](https://www.getcensus.com/reverse-etl)
- [@official@Census Documentation](https://developers.getcensus.com/getting-started/introduction)
- [@article@A starter guide to reverse ETL with Census](https://www.getcensus.com/blog/starter-guide-for-first-time-census-users)
- [@video@How to "Reverse ETL" with Census](https://www.youtube.com/watch?v=XkS7DQFHzbA)
- [@video@How to "Reverse ETL" with Census](https://www.youtube.com/watch?v=XkS7DQFHzbA)

View File

@@ -1,16 +1,16 @@
# Choosing the Right Technologies
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
- **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
- **Interoperability**. When choosing a technology or system, youll need to ensure that it interacts and operates smoothly with other technologies.
- **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
- **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
- **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
- **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
* **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
* **Interoperability**. When choosing a technology or system, youll need to ensure that it interacts and operates smoothly with other technologies.
* **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
* **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
* **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
* **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
Visit the following resources to learn more:
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Build hybrid and multicloud architectures using Google Cloud](https://cloud.google.com/architecture/hybrid-multicloud-patterns)
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is CI/CD? Continuous Integration and Continuous Delivery](https://www.guru99.com/continuous-integration.html)
- [@article@Continuous Integration vs Delivery vs Deployment](https://www.guru99.com/continuous-integration-vs-delivery-vs-deployment.html)
- [@article@CI/CD Pipeline: Learn with Example](https://www.guru99.com/ci-cd-pipeline.html)
- [@article@CI/CD Pipeline: Learn with Example](https://www.guru99.com/ci-cd-pipeline.html)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@CircleCI](https://circleci.com/)
- [@official@CircleCI Documentation](https://circleci.com/docs)
- [@official@Configuration Tutorial](https://circleci.com/docs/config-intro)
- [@feed@Explore top posts about CI/CD](https://app.daily.dev/tags/cicd?ref=roadmapsh)
- [@feed@Explore top posts about CI/CD](https://app.daily.dev/tags/cicd?ref=roadmapsh)

View File

@@ -1,15 +1,15 @@
# Cloud Architectures
Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need.
Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need.
Cloud architecture components can included, among others:
- A frontend platform
- A backend platform
- A cloud-based delivery model
- A network (internet, intranet, or intercloud)
* A frontend platform
* A backend platform
* A cloud-based delivery model
* A network (internet, intranet, or intercloud)
Visit the following resources to learn more:
- [@article@What is cloud architecture? - Google](https://cloud.google.com/learn/what-is-cloud-architecture)
- [@video@WWhat is Cloud Architecture and Common Models?](https://www.youtube.com/watch?v=zTP-bx495hU)
- [@video@WWhat is Cloud Architecture and Common Models?](https://www.youtube.com/watch?v=zTP-bx495hU)

View File

@@ -2,8 +2,8 @@
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@Cloud Computing - IBM](https://www.ibm.com/think/topics/cloud-computing)
- [@article@What is Cloud Computing? - Azure](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing)
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)

View File

@@ -4,6 +4,6 @@ Google Cloud SQL is a fully-managed, cost-effective and scalable database servic
Visit the following resources to learn more:
- [@official@Cloud SQL](https://cloud.google.com/sql)
- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction)
- [@course@Cloud SQL](https://www.cloudskillsboost.google/course_templates/701)
- [@official@Cloud SQL](https://cloud.google.com/sql)
- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction)

View File

@@ -1,6 +1,3 @@
# Cluster Computing Basics
Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.
Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.

View File

@@ -1,5 +1,5 @@
# Cluster Management Tools
Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster.
Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster.
Some of the most popular Cluster Management Tools are Kubernetes and Apache Hadoop YARN.

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What are columnar databases? Here are 35 examples.](https://www.tinybird.co/blog-posts/what-is-a-columnar-database)
- [@article@Columnar Databases](https://www.techtarget.com/searchdatamanagement/definition/columnar-database)
- [@video@WWhat is a Columnar Database? (vs. Row-oriented Database)](https://www.youtube.com/watch?v=1MnvuNg33pA)
- [@video@WWhat is a Columnar Database? (vs. Row-oriented Database)](https://www.youtube.com/watch?v=1MnvuNg33pA)

View File

@@ -1,11 +1,9 @@
# Compute Engine (Compute)
Compute Engine is a computing and hosting service that lets you create and run virtual machines on Google infrastructure. Compute Engine offers scale, performance, and value that lets you easily launch large compute clusters on Google's infrastructure. There are no upfront investments, and you can run thousands of virtual CPUs on a system that offers quick, consistent performance. You can configure and control Compute Engine resources using the Google Cloud console, the Google Cloud CLI, or using a REST-based API. You can also use a variety of programming languages to run Compute Engine, including Python, Go, and Java.
Visit the following resources to learn more:
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
- [@course@The Basics of Google Cloud Compute](https://www.cloudskillsboost.google/course_templates/754)
- [@video@WCompute Engine in a minute](https://www.youtube.com/watch?v=IuK4gQeHRcI)
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
- [@video@WCompute Engine in a minute](https://www.youtube.com/watch?v=IuK4gQeHRcI)

View File

@@ -1,6 +1,6 @@
# Containers & Orchestration
**Containers** are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it's deployed. Containers share the host operating system's kernel, making them more efficient than traditional virtual machines.
**Containers** are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it's deployed. Containers share the host operating system's kernel, making them more efficient than traditional virtual machines.
**Orchestration** refers to the automated coordination and management of complex IT systems. It involves combining multiple automated tasks and processes into a single workflow to achieve a specific goal. Orchestration is one of the key components of any software development process and it should never be avoided nor preferred over manual configuration. As an automation practice, orchestration helps to remove the chance of human error from the different steps of the data engineering lifecycle. This is all to ensure efficient resource utilization and consistency.
@@ -8,7 +8,7 @@ Visit the following resources to learn more:
- [@article@What are Containers?](https://cloud.google.com/learn/what-are-containers)
- [@article@Containers - The New Stack](https://thenewstack.io/category/containers/)
- [@article@An Introduction to Data Orchestration: Process and Benefits](https://www.datacamp.com/blog/introduction-to-data-orchestration-process-and-benefits)
- [@article@What is Container Orchestration?](https://www.redhat.com/en/topics/containers/what-is-container-orchestration)
- [@article@An Introduction to Data Orchestration: Process and Benefits](https://www.datacamp.com/blog/introduction-to-data-orchestration-process-and-benefits)
- [@article@What is Container Orchestration?](https://www.redhat.com/en/topics/containers/what-is-container-orchestration)
- [@video@What are Containers?](https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz)
- [@video@Why You Need Data Orchestration](https://www.youtube.com/watch?v=ZtlS5-G-gng)
- [@video@Why You Need Data Orchestration](https://www.youtube.com/watch?v=ZtlS5-G-gng)

View File

@@ -1,11 +1,10 @@
# CosmosDB
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such as.NET, Python, Java, and Node.js.
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such [as.NET](http://as.NET), Python, Java, and Node.js.
Visit the following resources to learn more:
- [@official@What are Containers?](https://azure.microsoft.com/en-us/products/cosmos-db#FAQ)
- [@official@CAzure Cosmos DB - Database for the AI Era](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction)
- [@article@CAzure Cosmos DB: A Global-Scale NoSQL Cloud Database](https://www.datacamp.com/tutorial/azure-cosmos-db)
- [@video@What is Azure Cosmos DB?](https://www.youtube.com/watch?v=hBY2YcaIOQM&)
- [@video@What is Azure Cosmos DB?](https://www.youtube.com/watch?v=hBY2YcaIOQM&)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@CouchDB](hhttps://couchdb.apache.org/)
- [@official@CouchDB Documentation](https://docs.couchdb.org/en/stable/intro/overview.html)
- [@article@What is CouchDB?](https://www.ibm.com/think/topics/couchdb)
- [@article@What is CouchDB?](https://www.ibm.com/think/topics/couchdb)

View File

@@ -2,16 +2,15 @@
Data Analytics involves extracting meaningful insights from raw data to drive decision-making processes. It includes a wide range of techniques and disciplines ranging from the simple data compilation to advanced algorithms and statistical analysis. Data analysts, as ambassadors of this domain, employ these techniques to answer various questions:
- Descriptive Analytics *(what happened in the past?)*
- Diagnostic Analytics *(why did it happened in the past?)*
- Predictive Analytics *(what will happen in the future?)*
- Prescriptive Analytics *(how can we make it happen?)*
* Descriptive Analytics _(what happened in the past?)_
* Diagnostic Analytics _(why did it happened in the past?)_
* Predictive Analytics _(what will happen in the future?)_
* Prescriptive Analytics _(how can we make it happen?)_
Visit the following resources to learn more:
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
- [@article@The 4 Types of Data Analysis: Ultimate Guide](https://careerfoundry.com/en/blog/data-analytics/different-types-of-data-analysis/)
- [@article@What is Data Analysis? An Expert Guide With Examples](https://www.datacamp.com/blog/what-is-data-analysis-expert-guide)
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
- [@video@Descriptive vs Diagnostic vs Predictive vs Prescriptive Analytics: What's the Difference?](https://www.youtube.com/watch?v=QoEpC7jUb9k)
- [@video@Types of Data Analytics](https://www.youtube.com/watch?v=lsZnSgxMwBA)
- [@video@Types of Data Analytics](https://www.youtube.com/watch?v=lsZnSgxMwBA)

View File

@@ -2,13 +2,12 @@
Before designing the technology archecture to collect and store data, you should consider the following factors:
- **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
- **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
- **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
- **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
- **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isnt lost or corrupted during the data collection process.
* **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
* **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
* **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
* **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
* **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isnt lost or corrupted during the data collection process.
Visit the following resources to learn more:
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)

View File

@@ -4,13 +4,13 @@ The data engineering lifecycle encompasses the entire process of transforming ra
It involves 4 steps:
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
Visit the following resources to learn more:
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)

View File

@@ -4,13 +4,13 @@ The data engineering lifecycle encompasses the entire process of transforming ra
It involves 4 steps:
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
Visit the following resources to learn more:
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)

View File

@@ -1,6 +1,6 @@
# Data Engineering vs Data Science
Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.
Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.
Visit the following resources to learn more:

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What is a data fabric?](http://ibm.com/think/topics/data-fabric)
- [@article@Data Fabric defined](https://www.jamesserra.com/archive/2021/06/data-fabric-defined/)
- [@article@How Data Fabric Can Optimize Data Delivery](https://www.gartner.com/en/data-analytics/topics/data-fabric)
- [@article@How Data Fabric Can Optimize Data Delivery](https://www.gartner.com/en/data-analytics/topics/data-fabric)

View File

@@ -2,9 +2,9 @@
Data Factory, most commonly referring to Microsoft's Azure Data Factory, is a cloud-based data integration service that allows you to create, schedule, and orchestrate workflows to move and transform data from various sources into a centralized location for analysis. It provides tools for building Extract, Transform, and Load (ETL) pipelines, enabling businesses to prepare data for analytics, business intelligence, and other data-driven initiatives without extensive coding, thanks to its visual, code-free interface and native connectors.
Learn more from the following resources:
Visit the following resources to learn more:
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
- [@official@What is Azure Data Factory?](https://learn.microsoft.com/en-us/azure/data-factory/introduction)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)

View File

@@ -1,6 +1,6 @@
# Data Generation
Data generation refers to the different ways data is produced and generated. Thanks to progress in computing power and storage, as well as technology breakthrough in sensor technology (for example, IoT devices), the number of these so-called source systems is rapidly growing. Data is created in many ways, both analog and digital.
Data generation refers to the different ways data is produced and generated. Thanks to progress in computing power and storage, as well as technology breakthrough in sensor technology (for example, IoT devices), the number of these so-called source systems is rapidly growing. Data is created in many ways, both analog and digital.
**Analog data** refers to continuous, real-world information that is represented by a range of values. It can take on any value within a given range and is often used to describe physical quantities like temperature or sounds.
@@ -9,4 +9,4 @@ By contrast, **digital data** is either created by converting analog data to dig
Visit the following resources to learn more:
- [@article@The Concept of Data Generation](https://www.marktechpost.com/2023/02/27/the-concept-of-data-generation/)
- [@video@Analog vs. Digital](https://www.youtube.com/watch?v=zzvglgC5ut0)
- [@video@Analog vs. Digital](https://www.youtube.com/watch?v=zzvglgC5ut0)

View File

@@ -1,10 +1,10 @@
# Data Hub
A **data hub** is an architecture that provides a central point for the flow of data between multiple sources and applications, enabling organizations to collect, integrate, and manage data efficiently. Unlike traditional data storage solutions, a data hubs purpose focuses on data integration and accessibility. The design supports real-time data exchange, which makes accessing, analyzing, and acting on the data faster and easier.
A **data hub** is an architecture that provides a central point for the flow of data between multiple sources and applications, enabling organizations to collect, integrate, and manage data efficiently. Unlike traditional data storage solutions, a data hubs purpose focuses on data integration and accessibility. The design supports real-time data exchange, which makes accessing, analyzing, and acting on the data faster and easier.
A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services.
A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services.
Visit the following resources to learn more:
- [@article@Data hub](https://en.wikipedia.org/wiki/Data_hub)
- [@article@What is a Data Hub? Definition, 7 Key Benefits & Why You Might Need One](https://www.cdata.com/blog/what-is-a-data-hub)
- [@article@What is a Data Hub? Definition, 7 Key Benefits & Why You Might Need One](https://www.cdata.com/blog/what-is-a-data-hub)

View File

@@ -5,4 +5,4 @@ Data ingestion is the third step in the data engineering lifecycle. It entails t
Visit the following resources to learn more:
- [@article@What is Data Ingestion?](https://www.ibm.com/think/topics/data-ingestion)
- [@article@WData Ingestion](https://www.qlik.com/us/data-ingestion)
- [@article@WData Ingestion](https://www.qlik.com/us/data-ingestion)

View File

@@ -1,8 +1,8 @@
# Data Interoperability
Data interoperability is the ability of diverse systems and applications to access, exchange, and cooperatively use data in a coordinated and meaningful way, even across organizational boundaries. It ensures that data can flow freely, maintaining its integrity and context, allowing for improved efficiency, collaboration, and decision-making by breaking down data silos. Achieving data interoperability often relies on data standards, metadata, and common data elements to define how data is collected, formatted, and interpreted.
Data interoperability is the ability of diverse systems and applications to access, exchange, and cooperatively use data in a coordinated and meaningful way, even across organizational boundaries. It ensures that data can flow freely, maintaining its integrity and context, allowing for improved efficiency, collaboration, and decision-making by breaking down data silos. Achieving data interoperability often relies on data standards, metadata, and common data elements to define how data is collected, formatted, and interpreted.
Visit the following resources to learn more:
- [@article@Data Interoperability](https://www.sciencedirect.com/topics/computer-science/data-interoperability)
- [@article@What is Data Interoperability? Exploring the Process and Benefits](https://www.codelessplatforms.com/blog/what-is-data-interoperability/)
- [@article@What is Data Interoperability? Exploring the Process and Benefits](https://www.codelessplatforms.com/blog/what-is-data-interoperability/)

View File

@@ -1,8 +1,8 @@
# Data lakes
# Data lakes
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@Data Lake Definition](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-a-data-lake)
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)

View File

@@ -2,7 +2,7 @@
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in Data Engineering for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What is Data Lineage? - IBM](https://www.ibm.com/topics/data-lineage)
- [@article@What is Data Lineage? - Datacamp](https://www.datacamp.com/blog/data-lineage)
- [@article@What is Data Lineage? - Datacamp](https://www.datacamp.com/blog/data-lineage)

View File

@@ -2,11 +2,8 @@
A data mart is a subset of a data warehouse, focused on a specific business function or department. A data mart is streamlined for quicker querying and a more straightforward setup, catering to the specialized needs of a particular team, or function. Data marts only hold data relevant to a specific department or business unit, enabling quicker access to specific datasets, and simpler management
Visit the following resources to learn more:
- [@article@What is a Data Mart?](https://www.ibm.com/think/topics/data-mart)
- [@article@WData Mart vs Data Warehouse: a Detailed Comparison](https://www.datacamp.com/blog/data-mart-vs-data-warehouse)
- [@video@Data Lake VS Data Warehouse VS Data Marts](https://www.youtube.com/watch?v=w9-WoReNKHk)
- [@video@Data Lake VS Data Warehouse VS Data Marts](https://www.youtube.com/watch?v=w9-WoReNKHk)

View File

@@ -1,9 +1,8 @@
# Data Masking
Data masking is a process that creates a copy of real data but replaces sensitive information with false but realistic-looking data, preserving the format and structure of the original data for non-production uses like software testing, training, and development. The goal is to protect confidential information and ensure compliance with data protection regulations by preventing unauthorized access to real sensitive data without compromising the usability of the data for other business functions.
Data masking is a process that creates a copy of real data but replaces sensitive information with false but realistic-looking data, preserving the format and structure of the original data for non-production uses like software testing, training, and development. The goal is to protect confidential information and ensure compliance with data protection regulations by preventing unauthorized access to real sensitive data without compromising the usability of the data for other business functions.
Visit the following resources to learn more:
- [@article@Data masking](https://en.wikipedia.org/wiki/Data_masking)
- [@article@What is data masking?](https://aws.amazon.com/what-is/data-masking/)
- [@article@What is data masking?](https://aws.amazon.com/what-is/data-masking/)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What Is a Data Mesh? - AWS](https://aws.amazon.com/what-is/data-mesh)
- [@article@What Is a Data Mesh? - Datacamp](https://www.datacamp.com/blog/data-mesh)
- [@video@Data Mesh Architecture](https://www.datamesh-architecture.com/)
- [@video@Data Mesh Architecture](https://www.datamesh-architecture.com/)

View File

@@ -2,12 +2,12 @@
A data model is a specification of data structures and business rules. It creates a visual representation of data and illustrates how different data elements are related to each other. Different techniques are employed depending on the complexity of the data and the goals. Below you can find a list with the most common data modelling techniques:
- **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
- **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
- **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
- **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
* **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
* **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
* **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
* **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
Visit the following resources to learn more:
- [@article@7 data modeling techniques and concepts for business](https://www.techtarget.com/searchdatamanagement/tip/7-data-modeling-techniques-and-concepts-for-business)
- [@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)
- [@article@@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example](https://www.guru99.com/database-normalization.html)
- [@video@Complete guide to Database Normalization in SQL](https://www.youtube.com/watch?v=rBPQ5fg_kiY)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)

View File

@@ -1,4 +1,3 @@
# Data Obfuscation
Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.
Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.

View File

@@ -2,7 +2,7 @@
Data pipelines are a series of automated processes that transport and transform data from various sources to a destination for analysis or storage. They typically involve steps like data extraction, cleaning, transformation, and loading (ETL) into databases, data lakes, or warehouses. Pipelines can handle batch or real-time data, ensuring that large-scale datasets are processed efficiently and consistently. They play a crucial role in ensuring data integrity and enabling businesses to derive insights from raw data for reporting, analytics, or machine learning.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What is a Data Pipeline? - IBM](https://www.ibm.com/topics/data-pipeline)
- [@video@What are Data Pipelines?](https://www.youtube.com/watch?v=oKixNpz6jNo)

View File

@@ -1,5 +1,5 @@
# Data Quality
Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.
Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.
This last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.
This last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.

View File

@@ -1,4 +1,3 @@
# Data Serving
Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.
Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.

View File

@@ -1,6 +1,6 @@
# Data Storage
Data storage is the process of saving and preserving digital information on various physical or cloud-based media for future retrieval and use. It encompasses the use of technologies and devices like hard drives and cloud platforms to store data.
Data storage is the process of saving and preserving digital information on various physical or cloud-based media for future retrieval and use. It encompasses the use of technologies and devices like hard drives and cloud platforms to store data.
Visit the following resources to learn more:

View File

@@ -6,8 +6,7 @@
Visit the following resources to learn more:
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk\&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
- [@article@Interview Questions about Data Structures](https://www.csharpstar.com/csharp-algorithms/)
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
- [@video@Intro to Algorithms](https://www.youtube.com/watch?v=rL8X2mlNHPM)
- [@feed@Explore top posts about Algorithms](https://app.daily.dev/tags/algorithms?ref=roadmapsh)
- [@feed@Explore top posts about Algorithms](https://app.daily.dev/tags/algorithms?ref=roadmapsh)

View File

@@ -2,7 +2,7 @@
**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What Is a Data Warehouse?](https://www.oracle.com/database/what-is-a-data-warehouse/)
- [@video@@hat is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)

View File

@@ -11,7 +11,7 @@ Visit the following resources to learn more:
- [@article@Oracle: What is a Database?](https://www.oracle.com/database/what-is-database/)
- [@article@Prisma.io: What are Databases?](https://www.prisma.io/dataguide/intro/what-are-databases)
- [@article@Intro To Relational Databases](https://www.udacity.com/course/intro-to-relational-databases--ud197)
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
- [@article@NoSQL Explained](https://www.mongodb.com/nosql-explained)
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
- [@video@How do NoSQL Databases work](https://www.youtube.com/watch?v=0buKQHokLK8)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)

View File

@@ -1,3 +1,3 @@
# Database
A database is an organized, structured collection of electronic data that is stored, managed, and accessed via a computer system, usually controlled by a Database Management System (DBMS). Databases organize various types of data, such as words, numbers, images, and videos, allowing users to easily retrieve, update, and modify it for various purposes, from managing customer information to analyzing business processes.
A database is an organized, structured collection of electronic data that is stored, managed, and accessed via a computer system, usually controlled by a Database Management System (DBMS). Databases organize various types of data, such as words, numbers, images, and videos, allowing users to easily retrieve, update, and modify it for various purposes, from managing customer information to analyzing business processes.

View File

@@ -4,8 +4,7 @@ Delta Lake is the optimized storage layer that provides the foundation for table
Visit the following resources to learn more:
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
- [@official@What is Delta Lake in Databricks?](https://docs.databricks.com/aws/en/delta)
- [@article@Delta Table in Databricks: A Complete Guide](https://www.datacamp.com/tutorial/delta-table-in-databricks)
- [@video@Delta Lake](https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake)
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
- [@video@Delta Lake](https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake)

View File

@@ -5,4 +5,4 @@ Datadog is a monitoring and analytics platform for large-scale applications. It
Visit the following resources to learn more:
- [@official@Datadog](https://www.datadoghq.com/)
- [@official@Datadog Documentation](https://docs.datadoghq.com/)
- [@official@Datadog Documentation](https://docs.datadoghq.com/)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@Dataflow](https://cloud.google.com/products/dataflow)
- [@article@Dataflow](https://en.wikipedia.org/wiki/Google_Cloud_Dataflow)
- [@video@What is Google Dataflow](https://www.youtube.com/watch?v=KalJ0VuEM7s)
- [@video@What is Google Dataflow](https://www.youtube.com/watch?v=KalJ0VuEM7s)

View File

@@ -4,6 +4,6 @@ dbt, also known as the data build tool, is designed to simplify the management o
Visit the following resources to learn more:
- [@official@dbt](https://www.getdbt.com/product/what-is-dbt)
- [@official@dbt Documentation](https://docs.getdbt.com/docs/build/documentation)
- [@course@dbt Official Courses](https://learn.getdbt.com/catalog)
- [@official@dbt](https://www.getdbt.com/product/what-is-dbt)
- [@official@dbt Documentation](https://docs.getdbt.com/docs/build/documentation)

View File

@@ -1,15 +1,12 @@
# Declarative vs Imperative
When it comes to Infrastructure as Code (IaC), there are two fundamental styles: imperative and declarative.
When it comes to Infrastructure as Code (IaC), there are two fundamental styles: imperative and declarative.
In **imperative IaC**, you specify a list of steps the IaC tool should follow to provision a new resource. You tell your IaC tool how to create each environment using a sequence of command imperatives. Imperative IaC can offer more flexibility as it allows you to dictate each step. However, this can result in increased complexity. Popular imperative IaC tools are Chef and Puppet
In **declarative IaC**, you specify the name and properties of the infrastructure resources you wish to provision, and then the IaC tool figures out how to achieve that end result on its own. You declare to your IaC tool what you want, but not how to get there. Declarative IaC, while less flexible, tends to be simpler and more manageable. Terraform is the most popular declarative IaC tool
In **declarative IaC**, you specify the name and properties of the infrastructure resources you wish to provision, and then the IaC tool figures out how to achieve that end result on its own. You declare to your IaC tool what you want, but not how to get there. Declarative IaC, while less flexible, tends to be simpler and more manageable. Terraform is the most popular declarative IaC tool
Visit the following resources to learn more:
- [@article@Infrastructure as Code: From Imperative to Declarative and Back Again](https://thenewstack.io/infrastructure-as-code-from-imperative-to-declarative-and-back-again/)
- [@article@Declarative vs Imperative Programming for Infrastructure as Code (IaC)](https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac)
- [@article@Declarative vs Imperative Programming for Infrastructure as Code (IaC)](https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac)

View File

@@ -1,7 +1,7 @@
# Distributed File Systems
A Distributed File System (DFS) allows multiple computers to access and share files across a network as if they were stored on a single local machine. It distributes data across multiple servers, enhancing accessibility and data redundancy. This enables users to access files from various locations and devices, promoting collaboration and data availability.
A Distributed File System (DFS) allows multiple computers to access and share files across a network as if they were stored on a single local machine. It distributes data across multiple servers, enhancing accessibility and data redundancy. This enables users to access files from various locations and devices, promoting collaboration and data availability.
Visit the following resources to learn more:
- [@article@What is a Distributed File System (DFS)? A Complete Guide](http://starwindsoftware.com/blog/what-is-a-distributed-file-system-dfs-a-complete-guide/)
- [@article@What is a Distributed File System (DFS)? A Complete Guide](http://starwindsoftware.com/blog/what-is-a-distributed-file-system-dfs-a-complete-guide/)

View File

@@ -4,6 +4,6 @@ A distributed system is a collection of independent computers that communicate a
Visit the following resources to learn more:
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)
- [@article@Introduction to Distributed Systems](https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/)
- [@article@Distributed Systems Guide](https://www.baeldung.com/cs/distributed-systems-guide)
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@official@Docker Documentation](https://docs.docker.com/)
- [@video@Docker Tutorial](https://www.youtube.com/watch?v=RqTEHSBrYFw)
- [@video@Docker simplified in 55 seconds](https://youtu.be/vP_4DlOH1G4)
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)

View File

@@ -1,6 +1,6 @@
# Document
**Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
\*\*Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
Visit the following resources to learn more:

View File

@@ -4,4 +4,4 @@ Amazon DynamoDB is a fully managed NoSQL database solution that provides fast an
Visit the following resources to learn more:
- [@official@Amazon DynamoDB](https://aws.amazon.com/dynamodb/)
- [@official@Amazon DynamoDB](https://aws.amazon.com/dynamodb/)

View File

@@ -2,9 +2,8 @@
The California Consumer Privacy Act (CCPA) is a California state law enacted in 2020 that protects and enforces the rights of Californians regarding the privacy of consumers personal information (PI).
Visit the following resources to learn more:
- [@official@California Consumer Privacy Act (CCPA)](https://oag.ca.gov/privacy/ccpa)
- [@article@What is the California Consumer Privacy Act (CCPA)?](https://www.ibm.com/think/topics/ccpa-compliance)
- [@video@What is the California Consumer Privacy Act? | CCPA Explained?](https://www.youtube.com/watch?v=dpzsAgrDAO4)
- [@video@What is the California Consumer Privacy Act? | CCPA Explained?](https://www.youtube.com/watch?v=dpzsAgrDAO4)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@Elasticsearch Website](https://www.elastic.co/elasticsearch/)
- [@official@Elasticsearch Documentation](https://www.elastic.co/guide/index.html)
- [@video@What is Elasticsearch](https://www.youtube.com/watch?v=ZP0NmfyfsoM)
- [@feed@Explore top posts about ELK](https://app.daily.dev/tags/elk?ref=roadmapsh)
- [@feed@Explore top posts about ELK](https://app.daily.dev/tags/elk?ref=roadmapsh)

View File

@@ -1,6 +1,6 @@
# Environmental Management
Environmental management, or Environment as Code (EaC) takes the concept of Infrastructure as Code (IaC) one step further. EaC applies DevOps principles to manage and automate entire software environments—including infrastructure, applications, and configurations—using code, making them reproducible, versionable, and reliable. It extends IaC by focusing not just on the underlying servers and networks but on the complete, connected system of services and applications that run on top of it. This approach helps increase efficiency, speeds up deployments, and provides a consistent, auditable process for creating and managing development, testing, and production environments.
Environmental management, or Environment as Code (EaC) takes the concept of Infrastructure as Code (IaC) one step further. EaC applies DevOps principles to manage and automate entire software environments—including infrastructure, applications, and configurations—using code, making them reproducible, versionable, and reliable. It extends IaC by focusing not just on the underlying servers and networks but on the complete, connected system of services and applications that run on top of it. This approach helps increase efficiency, speeds up deployments, and provides a consistent, auditable process for creating and managing development, testing, and production environments.
Visit the following resources to learn more:

View File

@@ -1,6 +1,6 @@
# ETL vs Reverse ETL
ETL (Extract, Transform, Load) is a key process in data warehousing, enabling the integration of data from multiple sources into a centralized database.
ETL (Extract, Transform, Load) is a key process in data warehousing, enabling the integration of data from multiple sources into a centralized database.
Reverse ETL emerged as organizations recognized that their carefully curated data warehouses, while excellent for analysis, created a new form of data silo that prevented operational teams from accessing valuable insights. This methodology addresses the critical gap between analytical insights and operational execution by systematically moving processed data from centralized repositories back to the operational systems where business teams interact with customers and manage daily operations.
@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is ETL?](https://www.snowflake.com/guides/what-etl)
- [@article@ETL vs Reverse ETL vs Data Activation](https://airbyte.com/data-engineering-resources/etl-vs-reverse-etl-vs-data-activation)
- [@article@ETL vs Reverse ETL: An Overview, Key Differences, & Use Cases](https://portable.io/learn/etl-vs-reverse-etl)
- [@article@ETL vs Reverse ETL: An Overview, Key Differences, & Use Cases](https://portable.io/learn/etl-vs-reverse-etl)

View File

@@ -1,12 +1,12 @@
# EU AI Act
he Artificial Intelligence Act of the European Union, also known as the EU AI Act, is a comprehensive regulatory framework that is established to ensure safety and that fundamental human rights are upheld in the use of AI technologies. It governs the development and/or use of AI in the European Union. The act takes a risk-based approach to regulation, applying different rules to AI systems according to the risk they pose.
he Artificial Intelligence Act of the European Union, also known as the EU AI Act, is a comprehensive regulatory framework that is established to ensure safety and that fundamental human rights are upheld in the use of AI technologies. It governs the development and/or use of AI in the European Union. The act takes a risk-based approach to regulation, applying different rules to AI systems according to the risk they pose.
Considered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.
Considered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.
Visit the following resources to learn more:
- [@official@The EU AI Act Explorer](https://artificialintelligenceact.eu/ai-act-explorer/)
- [@article@AI Act - European Commission](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai)
- [@article@Artificial Intelligence Act](https://en.wikipedia.org/wiki/Artificial_Intelligence_Act)
- [@video@The EU AI Act Explained](https://www.youtube.com/watch?v=s_rxOnCt3HQ)
- [@video@The EU AI Act Explained](https://www.youtube.com/watch?v=s_rxOnCt3HQ)

Some files were not shown because too many files have changed in this diff Show More