Data Engineering/Data Infra & Process

[11ํŽธ] ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ ๋ฐฑ์—… & ๋ณต์› (๋ฐ์ดํ„ฐ ์œ ์‹ค ๋ฐฉ์ง€ ๋ฐ ๊ด€๋ฆฌ)

ygtoken 2025. 3. 7. 16:03
728x90

๐Ÿ“Œ ๊ฐœ์š”

 

์ด ๊ธ€์—์„œ๋Š” PostgreSQL pgvector์˜ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ๋ฐฑ์—…ํ•˜๊ณ  ๋ณต์›ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

ํŠนํžˆ, ๋ฐฑ์—… ์ž๋™ํ™”, ๋ฐ์ดํ„ฐ ์œ ์‹ค ๋ฐฉ์ง€, AWS S3 ๋˜๋Š” Kubernetes CronJob์„ ํ™œ์šฉํ•œ ์ž๋™ ๋ฐฑ์—… ์ „๋žต๊นŒ์ง€ ์‹ค๋ฌด์— ๋งž๊ฒŒ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

 

โœ… ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜๋Š” ๋ฐฑ์—… ์ „๋žต (pg_dump, WAL, PITR)

โœ… AWS S3 ๋˜๋Š” Kubernetes CronJob์„ ํ™œ์šฉํ•œ ์ž๋™ ๋ฐฑ์—…

โœ… ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ์˜ ์†์‹ค ์—†์ด ๋น ๋ฅธ ๋ณต์› ๋ฐฉ๋ฒ•

 


๐Ÿš€ 1. PostgreSQL ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ ๋ฐฑ์—… ์ „๋žต

 

๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋Š” ๋Œ€๋Ÿ‰์œผ๋กœ ์ €์žฅ๋˜๋ฏ€๋กœ ๋ฐ์ดํ„ฐ ์†์‹ค์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๋ฐฑ์—… ์ „๋žต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

 

โœ… PostgreSQL์—์„œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฑ์—…ํ•˜๋Š” ์ฃผ์š” ๋ฐฉ๋ฒ•

๋ฐฑ์—… ๋ฐฉ๋ฒ•์„ค๋ช…์ถ”์ฒœ ์‚ฌ์šฉ ์‚ฌ๋ก€

pg_dump ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ํŒŒ์ผ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๋ฐฉ์‹ ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ, ์ฃผ๊ธฐ์  ๋ฐฑ์—…
WAL(Write-Ahead Logging) ๋ณ€๊ฒฝ ๋กœ๊ทธ๋ฅผ ์ง€์†์ ์œผ๋กœ ์ €์žฅ ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ณดํ˜ธ
PITR(Point-in-Time Recovery) ํŠน์ • ์‹œ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ณต์› ์žฅ์•  ๋ฐœ์ƒ ์‹œ ๋ฐ์ดํ„ฐ ๋ณต๊ตฌ
S3 ๋˜๋Š” GCS๋กœ ์ž๋™ ๋ฐฑ์—… ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ํ™œ์šฉํ•œ ์›๊ฒฉ ๋ฐฑ์—… ์žฅ๊ธฐ ๋ณด๊ด€ ๋ฐ ์žฅ์•  ๋Œ€๋น„

 

 


๐Ÿš€ 2. pg_dump๋ฅผ ์ด์šฉํ•œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ ๋ฐฑ์—…

 

1๏ธโƒฃ pg_dump๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฑ์—…

 

pg_dump๋Š” PostgreSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ํŒŒ์ผ๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋ฐฑ์—… ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

 

๐Ÿ“Œ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฑ์—… (.sql ํŒŒ์ผ)

pg_dump -U postgres -h localhost -d ragdb > ragdb_backup.sql

๐Ÿ“Œ ์••์ถ•๋œ ๋ฐฑ์—… (.tar ํŒŒ์ผ)

pg_dump -U postgres -h localhost -d ragdb -F t > ragdb_backup.tar

 

โœ… ์ž๋™ ๋ฐฑ์—…์„ ์œ„ํ•ด ํฌ๋ก ์žก(CronJob) ์ถ”๊ฐ€

echo "0 2 * * * pg_dump -U postgres -h localhost -d ragdb > /backup/ragdb_$(date +\%Y\%m\%d).sql" | crontab -

์œ„ ์„ค์ •์€ ๋งค์ผ ์ƒˆ๋ฒฝ 2์‹œ์— ๋ฐฑ์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 


2๏ธโƒฃ ํŠน์ • ํ…Œ์ด๋ธ”(๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ)๋งŒ ๋ฐฑ์—…

 

๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ(embeddings ํ…Œ์ด๋ธ”)๋งŒ ๋ฐฑ์—…ํ•˜๋Š” ๊ฒฝ์šฐ:

pg_dump -U postgres -h localhost -d ragdb -t embeddings > embeddings_backup.sql

 

๐Ÿ“Œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋งŒ ๋ณต์›

psql -U postgres -h localhost -d ragdb < embeddings_backup.sql

โœ… ์ด ๋ฐฉ์‹์€ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์ž์ฃผ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒฝ์šฐ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

 


๐Ÿš€ 3. WAL(Write-Ahead Logging) ๋ฐ PITR(Point-in-Time Recovery)

 

pg_dump๋Š” ์ •๊ธฐ ๋ฐฑ์—…์—๋Š” ์ ํ•ฉํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์žฅ์•  ๋ฐœ์ƒ ์‹œ ์‹ค์‹œ๊ฐ„ ๋ณต๊ตฌ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด WAL(Write-Ahead Logging)๊ณผ PITR์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

1๏ธโƒฃ WAL ํ™œ์„ฑํ™”

 

๐Ÿ“Œ PostgreSQL postgresql.conf์—์„œ WAL ์„ค์ • ๋ณ€๊ฒฝ

wal_level: "replica"
archive_mode: "on"
archive_command: "cp %p /var/lib/postgresql/archive/%f"

โœ… ์ด ์„ค์ •์„ ์ ์šฉํ•˜๋ฉด PostgreSQL์ด ๋ชจ๋“  ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ WAL ๋กœ๊ทธ๋กœ ๊ธฐ๋กํ•˜์—ฌ ์žฅ์•  ๋ฐœ์ƒ ์‹œ ๋ณต๊ตฌ ๊ฐ€๋Šฅ

 


2๏ธโƒฃ PITR(ํฌ์ธํŠธ ์ธ ํƒ€์ž„ ๋ณต๊ตฌ)

 

PITR์€ ํŠน์ • ์‹œ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์›ํ•˜๋Š” ๊ธฐ๋Šฅ์œผ๋กœ, ์‹ค๋ฌด์—์„œ ์žฅ์•  ๋ณต๊ตฌ ์‹œ ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

๐Ÿ“Œ PITR์„ ์œ„ํ•ด WAL ๋ฐฑ์—…

pg_basebackup -D /var/lib/postgresql/basebackup -Ft -z -P

 

๐Ÿ“Œ ํŠน์ • ์‹œ์ ์œผ๋กœ ๋ณต์›

pg_restore -U postgres -h localhost -d ragdb -t embeddings --data-only --restore-time="2024-03-01 10:30:00"

โœ… PITR์„ ํ™œ์šฉํ•˜๋ฉด ํŠน์ • ์‹œ์ ์˜ ๋ฐ์ดํ„ฐ๋กœ ๋กค๋ฐฑ ๊ฐ€๋Šฅ

 


๐Ÿš€ 4. AWS S3 ๋˜๋Š” Kubernetes CronJob์„ ํ™œ์šฉํ•œ ์ž๋™ ๋ฐฑ์—…

 

1๏ธโƒฃ AWS S3์— ์ž๋™ ๋ฐฑ์—…

 

ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€(S3)์— ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™ ๋ฐฑ์—…ํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์œ ์‹ค์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๐Ÿ“Œ S3 CLI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฑ์—… ์—…๋กœ๋“œ

pg_dump -U postgres -h localhost -d ragdb | gzip > ragdb_backup.sql.gz
aws s3 cp ragdb_backup.sql.gz s3://my-postgres-backups/

โœ… ์ด์ œ AWS S3์— ๋ฐ์ดํ„ฐ๊ฐ€ ์•ˆ์ „ํ•˜๊ฒŒ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.

 


2๏ธโƒฃ Kubernetes CronJob์„ ํ™œ์šฉํ•œ ์ž๋™ ๋ฐฑ์—…

 

Kubernetes ํ™˜๊ฒฝ์—์„œ๋Š” CronJob์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ๊ธฐ์ ์ธ ๋ฐฑ์—…์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๐Ÿ“Œ Kubernetes backup-cronjob.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pgvector-backup
  namespace: database
spec:
  schedule: "0 3 * * *"  # ๋งค์ผ ์ƒˆ๋ฒฝ 3์‹œ์— ๋ฐฑ์—… ์‹คํ–‰
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: pgvector-backup
            image: postgres:14
            command:
            - "/bin/sh"
            - "-c"
            - "pg_dump -U postgres -h postgresql.database.svc.cluster.local -d ragdb | gzip > /backup/ragdb_backup.sql.gz"
            volumeMounts:
            - name: backup-volume
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-volume
            persistentVolumeClaim:
              claimName: backup-pvc

 

๐Ÿ“Œ Kubernetes์—์„œ CronJob ์ ์šฉ

kubectl apply -f backup-cronjob.yaml -n database

โœ… ์ด์ œ Kubernetes๊ฐ€ ์ž๋™์œผ๋กœ PostgreSQL ๋ฐฑ์—…์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 


๐Ÿ“Œ 5. ์ตœ์ข… ์ •๋ฆฌ

 

โœ… ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜๋Š” ๋ฐฑ์—… ์ „๋žต (pg_dump, WAL, PITR)

โœ… AWS S3 ๋˜๋Š” Kubernetes CronJob์„ ํ™œ์šฉํ•œ ์ž๋™ ๋ฐฑ์—… ์„ค์ •

โœ… ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ์˜ ์†์‹ค ์—†์ด ๋น ๋ฅธ ๋ณต์› ๋ฐฉ๋ฒ• (pg_restore)

 

 

 

728x90