Architecture
Extraction: Tweets are retrieved from the X API using Tweepy
Transformation: Data is structured into a Pandas DataFrame with fields:
user
,text
,like_count
,retweet_count
,created_at
Loading: Data is saved to an S3 bucket (
rume-airflow-bucket
) usings3fs
Orchestration: The ETL process is scheduled and managed with Airflow DAGs