Terriermon - Digimon

14. Cloud Composer: Copying BigQuery Tables Across Different Locations

2024. 11. 22. 08:31클라우드/GCP

'다른 리전에서 bigquery 테이블을 복사하기' 인데 composer에 대해 공부하고 싶어서 수행했다.

 

Task 1. Create a Cloud Composer environment

 

 

수행 과제 :

Click the dropdown for Show Advanced Configuration and select Airflow database zone as us-east4-b.

 

Task 2. Create Cloud Storage buckets

composer 생성시 자동으로 storage 생성됨

 

Create a bucket in US

 

Create a bucket in EU

 

 

Task 3. Create the BigQuery destination dataset

 

Task 4. Airflow and core concepts, a brief introduction

 

Airflow is a platform to programmatically author, schedule and monitor workflows.

Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.

Core concepts

DAG - A Directed Acyclic Graph is a collection of tasks, organized to reflect their relationships and dependencies.

Operator - The description of a single task, it is usually atomic. For example, the BashOperator is used to execute bash command.

Task - A parameterised instance of an Operator; a node in the DAG.

Task Instance - A specific run of a task; characterized as: a DAG, a Task, and a point in time. It has an indicative state: runningsuccessfailedskipped, ...

 

Task 5. Define the workflow

 

 

Task 6. View environment information

composer의 상태 체크

생성한지 얼마 안되어 색이 칠해진 부분이 짧다

 

 

cloud shell 접속

 

Creating a virtual environment

 

1. Install the virtualenv environment:

$ sudo apt-get install -y virtualenv

 

2.Build the virtual environment:
$ python3 -m venv venv

 

3. Activate the virtual environment.
$ source venv/bin/activate

 

Task 7. Create a variable for the DAGs Cloud Storage bucket

In Cloud Shell, run the following to copy the name of the DAGs bucket from your Environment Details page and set a variable to refer to it in Cloud Shell:

 

DAGS_BUCKET=<your DAGs bucket name>

 

Task 9. Upload the DAG and dependencies to Cloud Storage

1.Copy the Google Cloud Python docs samples files into your Cloud shell:
cd ~
gcloud storage cp -r gs://spls/gsp283/python-docs-samples .

 

2.Upload a copy of the third party hook and operator to the plugins folder of your Composer DAGs Cloud Storage bucket:

gcloud storage cp -r python-docs-samples/third_party/apache-airflow/plugins/* gs://$DAGS_BUCKET/plugins

 

3.Next, upload the DAG and config file to the DAGs Cloud Storage bucket of your environment:
gcloud storage cp python-docs-samples/composer/workflows/bq_copy_across_locations.py gs://$DAGS_BUCKET/dags
gcloud storage cp python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv gs://$DAGS_BUCKET/dags

 

Task 10. Explore the Airflow UI

Key Value Details
table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv CSV file listing source and target tables, including dataset
gcs_source_bucket {UNIQUE ID}-us Cloud Storage bucket to use for exporting BigQuery tabledest_bbucks from source
gcs_dest_bucket {UNIQUE ID}-eu Cloud Storage bucket to use for importing BigQuery tables at destination

 

 

+ 이건 airflow 콘솔 화면

 

 

composer와 airflow에 대해 더 공부해야겠다. 아는 것이 너무 없다 ㅜ

 

참고 : https://www.cloudskillsboost.google/focuses/3528?parent=catalog

 

Cloud Composer: 다양한 위치에서 BigQuery 테이블 복사 | Google Cloud Skills Boost

이 실습에서는 Cloud Composer에서 Apache Airflow 워크플로를 만들고 실행하여 미국의 Cloud Storage 버킷에 있는 BigQuery 데이터 세트의 테이블을 유럽의 버킷으로 내보낸 다음, 이 테이블을 유럽의 BigQuery

www.cloudskillsboost.google

 

반응형