El objetivo de este esquema de backup es contar con copias de seguridad de las carpetas compartidas en Google Drive para estar cubiertos en caso de un ataque de ransomware. Es sabido que un ataque ransomware puede encriptar el contenido de carpetas en Google Drive cuando la víctima utiliza el cliente de Drive. Por esta razón, resulta conveniente crear copias de seguridad por fuera de Google Drive como mecanismo de protección adicional.

Instalación de rclone

Drive no soporta descargar archivos y carpetas compartidas por otro usuario. Por ende se opta por recurrir a Rclone, un gestor de archivos de línea de comandos en la nube que soporta una gran cantidad de proveedores IaaS.

Instalar el cliente de línea de comandos rclone para descargas de archivos en almacenamiento en la nube:

root@debian:~# apt-get install rclone

La configuración del remoto en rclone es interactiva. Revisar cada uno de los siguientes pasos.

Configurar rclone:

root@debian:~# rclone config
2021/01/05 15:54:50 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> sistemas
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / A stackable unification remote, which can appear to merge the contents of several remotes
   \ "union"
 2 / Alias for a existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Providers (AWS, Ceph, Dreamhost, IBM COS, Minio)
   \ "s3"
 5 / Backblaze B2
   \ "b2"
 6 / Box
   \ "box"
 7 / Cache a remote
   \ "cache"
 8 / Dropbox
   \ "dropbox"
 9 / Encrypt/Decrypt a remote
   \ "crypt"
10 / FTP Connection
   \ "ftp"
11 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
12 / Google Drive
   \ "drive"
13 / Hubic
   \ "hubic"
14 / JottaCloud
   \ "jottacloud"
15 / Local Disk
   \ "local"
16 / Microsoft Azure Blob Storage
   \ "azureblob"
17 / Microsoft OneDrive
   \ "onedrive"
18 / OpenDrive
   \ "opendrive"
19 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
20 / Pcloud
   \ "pcloud"
21 / SSH/SFTP Connection
   \ "sftp"
22 / Webdav
   \ "webdav"
23 / Yandex Disk
   \ "yandex"
24 / http Connection
   \ "http"
Storage> 12
** See help for drive backend at: https://rclone.org/drive/ **

Google Application Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id> 
Google Application Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret> 
Scope that rclone should use when requesting access from drive.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Full access all files, excluding Application Data Folder.
   \ "drive"
 2 / Read-only access to file metadata and file contents.
   \ "drive.readonly"
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ "drive.file"
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ "drive.appfolder"
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ "drive.metadata.readonly"
scope> 2
ID of the root folder
Leave blank normally.
Fill in to access "Computers" folders. (see docs).
Enter a string value. Press Enter for the default ("").
root_folder_id> 
Service Account Credentials JSON file path 
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Enter a string value. Press Enter for the default ("").
service_account_file> 
Edit advanced config? (y/n)
y) Yes
n) No
y/n> y
Service Account Credentials JSON blob
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Enter a string value. Press Enter for the default ("").
service_account_credentials> 
ID of the Team Drive
Enter a string value. Press Enter for the default ("").
team_drive> 
Only consider files owned by the authenticated user.
Enter a boolean value (true or false). Press Enter for the default ("false").
auth_owner_only> false
Send files to the trash instead of deleting permanently.
Defaults to true, namely sending files to the trash.
Use `--drive-use-trash=false` to delete files permanently instead.
Enter a boolean value (true or false). Press Enter for the default ("true").
use_trash> 
Skip google documents in all listings.
If given, gdocs practically become invisible to rclone.
Enter a boolean value (true or false). Press Enter for the default ("false").
skip_gdocs> 
Only show files that are shared with me.

Instructs rclone to operate on your "Shared with me" folder (where
Google Drive lets you access the files and folders others have shared
with you).

This works both with the "list" (lsd, lsl, etc) and the "copy"
commands (copy, sync, etc), and with all other commands too.
Enter a boolean value (true or false). Press Enter for the default ("false").
shared_with_me> true
Only show files that are in the trash.
This will show trashed files in their original directory structure.
Enter a boolean value (true or false). Press Enter for the default ("false").
trashed_only> 
Deprecated: see export_formats
Enter a string value. Press Enter for the default ("").
formats> 
Comma separated list of preferred formats for downloading Google docs.
Enter a string value. Press Enter for the default ("docx,xlsx,pptx,svg").
export_formats> 
Comma separated list of preferred formats for uploading Google docs.
Enter a string value. Press Enter for the default ("").
import_formats> 
Allow the filetype to change when uploading Google docs (e.g. file.doc to file.docx). This will confuse sync and reupload every time.
Enter a boolean value (true or false). Press Enter for the default ("false").
allow_import_name_change> 
Use file created date instead of modified date.,

Useful when downloading data and you want the creation date used in
place of the last modified date.

**WARNING**: This flag may have some unexpected consequences.

When uploading to your drive all files will be overwritten unless they
haven't been modified since their creation. And the inverse will occur
while downloading.  This side effect can be avoided by using the
"--checksum" flag.

This feature was implemented to retain photos capture date as recorded
by google photos. You will first need to check the "Create a Google
Photos folder" option in your google drive settings. You can then copy
or move the photos locally and use the date the image was taken
(created) set as the modification date.
Enter a boolean value (true or false). Press Enter for the default ("false").
use_created_date> 
Size of listing chunk 100-1000. 0 to disable.
Enter a signed integer. Press Enter for the default ("1000").
list_chunk> 
Impersonate this user when using a service account.
Enter a string value. Press Enter for the default ("").
impersonate> 
Use alternate export URLs for google documents export.,

If this option is set this instructs rclone to use an alternate set of
export URLs for drive documents.  Users have reported that the
official export URLs can't export large documents, whereas these
unofficial ones can.

See rclone issue [#2243](https://github.com/ncw/rclone/issues/2243) for background,
[this google drive issue](https://issuetracker.google.com/issues/36761333) and
[this helpful post](https://www.labnol.org/internet/direct-links-for-google-drive/28356/).
Enter a boolean value (true or false). Press Enter for the default ("false").
alternate_export> 
Cutoff for switching to chunked upload
Enter a size with suffix k,M,G,T. Press Enter for the default ("8M").
upload_cutoff> 
Upload chunk size. Must a power of 2 >= 256k.

Making this larger will improve performance, but note that each chunk
is buffered in memory one per transfer.

Reducing this will reduce memory usage but decrease performance.
Enter a size with suffix k,M,G,T. Press Enter for the default ("8M").
chunk_size> 
Set to allow files which return cannotDownloadAbusiveFile to be downloaded.

If downloading a file returns the error "This file has been identified
as malware or spam and cannot be downloaded" with the error code
"cannotDownloadAbusiveFile" then supply this flag to rclone to
indicate you acknowledge the risks of downloading the file and rclone
will download it anyway.
Enter a boolean value (true or false). Press Enter for the default ("false").
acknowledge_abuse> 
Keep new head revision of each file forever.
Enter a boolean value (true or false). Press Enter for the default ("false").
keep_revision_forever> 
If Object's are greater, use drive v2 API to download.
Enter a size with suffix k,M,G,T. Press Enter for the default ("off").
v2_download_min_size> 
Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n
If your browser doesn't open automatically go to the following link: https://accounts.google.com/o/oauth2/auth?topsecret
Log in and authorize rclone for access
Enter verification code> 1234
Configure this as a team drive?
y) Yes
n) No
y/n> n
--------------------
[sistemas]
scope = drive.readonly
auth_owner_only = false
shared_with_me = true
token = {"access_token":"ultrasecreto"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
sistemas             drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

El proceso de configuración es muy extenso y varía de acuerdo al tipo de remote.

Se observa que uno de los últimos pasos en la configuración consiste en autenticar el acceso. Se debe copiar la URL y abrirla en un navegador logueado con el usuario específico:

Finalmente, verificar el acceso a la carpeta compartida:

root@debian:~# rclone lsd sistemas:
          -1 2021-01-05 13:34:57        -1 Backups

Creación de un directorio para "Backups"

Suponiendo que la carpeta de Google Drive que se desea resguardar se llama "Backups", crear un directorio local donde sincronizar Drive:

root@debian:~# mkdir -p /backups/drive && cd /backup/drive

Descargar una copia inicial de la carpeta "Backups":

root@debian:/backup/drive# rclone sync -P 'sistemas:/Backups' /backup/drive/Backups

La opción -P demuestra el progreso de la descarga en modo interactivo.

Instalación de AWS CLI

Instalar y configurar AWS CLI tal como explica el artículo Cómo instalar AWS CLI v2 en Debian/Devuan.

Creación del bucket S3

Luego se procede a crear un nuevo bucket para backups de Google Drive desde la consola de S3 con el botón "Crear bucket" o usar uno existente.

Indicar un nombre y seleccionar "Bloquear todo el acceso público". Desactivar el control de versiones y cifrado.

Volver a la consola de IAM y crear una nueva política de acceso al nuevo bucket desde "Crear una política":

Seleccionar sólo las acciones y recursos que se observan en la captura.

Luego indicar un nombre y una descripción para la política

Una vez creada la política, desde "Usuarios", asignar esta nueva política al usuario creado anteriormente. Clic en "Añadir permisos" y seleccionar "Asociar directamente las políticas existentes". Filtrar por palabra clave ("drive") y seleccionar la política.

Script de backup

Crear un script Bash para actualizar la copia local de Drive y luego subirla a S3. Se mantiene una copia completa por semana en S3.

root@debian:~# mkdir scripts
root@debian:~# cd scripts/
root@debian:~/scripts# nano s3-backup-drive.sh

Código del script:

#!/bin/bash

# Vars
BUCKET="backups-drive"
DRIVE="/backup/drive"
DIR='Backups'

# Fecha
FECHA=$(date +%Y-%m-%d)

# Log
LOG="/var/log/backup/s3-backup-drive-$FECHA.log"

# Sincronizar Google Drive
/usr/bin/rclone sync "sistemas:/$DIR" "$DRIVE/$DIR/"

# Subir una nueva copia de Drive a S3
/usr/local/bin/aws s3 cp --recursive "$DRIVE/$DIR/" "s3://$BUCKET/$DIR/$FECHA/" > $LOG

Crear el directorio donde almacenar logs:

root@debian:~/scripts# mkdir /var/log/backup

Correr el script:

root@debian:~/scripts# ./s3-backup-drive.sh

Logrotate

Configurar la rotación de logs del script de backup:

root@debian:~/scripts# cd /etc/logrotate.d/
root@debian:/etc/logrotate.d# nano backup

Rotar semanalmente, mantener 8 copias (7 comprimidas):

/var/log/backup/*.log {
        weekly
        missingok
        rotate 8 
        compress
        delaycompress
        notifempty
        create 0640 root root 
}

Cronjob

Crear un cronjob para correr el script de backup todos los domingos a las 4 de la mañana:

root@debian:/etc/logrotate.d# crontab -e
0 4 * * sun /root/scripts/s3-backup-drive.sh

Glacier

Finalmente, es posible crear una regla de ciclo de vida para pasar a Glacier los objetos con más de 30 días de creación.

Desde la consola de S3 acceder a la pestaña "Administración" dentro del bucket y hacer clic en "Crear la regla del ciclo de vida". Definir una regla que envíe los archivos al Glacier luego de 30 días.

Tener en cuenta que por cada objeto en S3 Glacier, Amazon requiere 8 KB adicionales para el nombre y otros metadatos más 32 KB extra para el índice y otros metadatos. En total se suman 40 KB por cada objeto. Tomando como ejemplo 10.000 objetos, se requerirán ~390 MB adicionales. Sin embargo, como el costo de S3 Glacier es despreciable ($0.004 por GB). Es preferible enviar los archivos pequeños a Glacier tal como están, en lugar de pagar uso de CPU intensivo (costo más elevado en dólares) para crear tars de gran tamaño.

Referencias

Compartí este artículo