preloader
post-thumb

Last Update: April 13, 2025


BYauthor-thumberic


Keywords

How to Scrape Twitter (X) User's Tweets Using tyo-crawler

Scraping tweets from X (formerly known as Twitter) has become significantly more difficult in recent years, especially after the company's acquisition and subsequent rebranding. Once known for its relatively open developer ecosystem, X has shifted toward a more closed, monetized model. The public Twitter API, which previously allowed developers and researchers to access tweet data with manageable rate limits and at little to no cost, has been deprecated. In its place, the official X API now comes with restrictive access tiers and steep pricing, making it nearly inaccessible for small developers, independent researchers, and hobbyists.

This change has created a real bottleneck for those who still need access to tweet data — whether for sentiment analysis, social media monitoring, public opinion research, or competitive intelligence. As a result, many have turned to alternative tools and strategies to extract information directly from the site. However, scraping tweets comes with its own set of technical challenges, including anti-bot protections, rate limiting, and the need to mimic human-like behavior to avoid detection.

tyo-crawler is a general web scraping tool that can crawl web pages and extract data from them. It is designed to be flexible, allowing users to customize their scraping tasks according to their needs. In this article, we will explore how to use tyo-crawler to scrape tweets from X.

Prerequisites

Before we begin, make sure you have the following prerequisites:

  1. Node.js 16+: tyo-crawler is a Node.js-based tool.

  2. npm (or yarn): Node.js package manager.

  3. git: You need to have Git installed on your machine to clone the repository.

  4. tyo-crawler: Install it using git:

    bash
    git clone https://github.com/e-tang/tyo-crawler.git
    cd tyo-crawler
    npm install
    
  5. Redis: Ensure you have Redis installed and running. You can install it directly or use the provided Docker Compose setup:

    bash
    # Using Docker Compose (recommended)
    cd docker
    docker-compose up -d
    
    # Or, install Redis directly (example for Ubuntu)
    # sudo apt-get update
    # sudo apt-get install redis-server
    # sudo systemctl start redis-server
    
  6. A Twitter (X) account: You will need an account to scrape tweets. As without an account, you will not be able to access all the tweets unless you simply want to take what is offered.

  7. (Optional) Basic understanding of JSON: You need to know how to edit a JSON file.

Step-by-Step Guide

Here's how to use tyo-crawler's X processor to scrape tweets from a specific X user's profile:

Step 1: Configure the Actions File (x.json)

tyo-crawler uses an actions file to define the processes of interacting with websites that require authentications. The repository provides a pre-configured actions file specifically for X, located at examples/x.example.json. You should copy this file to the root directory of the tyo-crawler project and rename it to x.json.

bash
cp examples/x.examples.json x.json

Important: Do not modify the original x.example.json file in the examples directory. Always work with a copy.

Here's the content of x.example.json (which you should copy to x.json):

json
[
    {
        "if": "a[href*=login]",
        "then": [
            {
                "action": "click",
                "selector": "a[href*=login]"
            },
            {
                "action": "wait",
                "time": 2000
            },
            {
                "action": "wait",
                "selector": "input[autocomplete='username']"
            },
            {
                "action": "type",
                "selector": "input[autocomplete='username']",
                "value": "YOUR_X_USERNAME"
            },
            {
                "action": "click",
                "selector": "button[role='button'] span span",
                "text": "Next"
            },
            {
                "action": "wait",
                "time": 2000
            },
            {
                "action": "type",
                "selector": "input[name='password']",
                "value": "YOUR_X_PASSWORD"
            },
            {
                "action": "click",
                "selector": "button[data-testid='LoginForm_Login_Button']"
            },            
            {
                "action": "wait",
                "time": 80000
            },
            {
                "action": "saveCookies"
            }
        ]
    },
    {
        "repeat": [
            { 
                "action": "scroll",
                "value": 50
            },
            {
                "action": "evaluate"
            },
            {
                "action": "process"
            },
            { 
                "action": "wait",
                "time": 1000
            }
        ],
        "failed_limit": 4
    }
]

Explanation

Actions

An array of actions to be performed.

Login Sequence (if: "a[href*=login]")

This part handles the login process.

  • click: Clicks on the login link.
  • wait: Pauses execution for a specified time (in milliseconds) or until an element is present.
    • time: 2000: Waits for 2 seconds.
    • selector: "input[autocomplete='username']": Waits until the username input field is present.
  • type: Enters text into an input field.
    • selector: "input[autocomplete='username']": The CSS selector for the username input field.
    • value: "YOUR_X_USERNAME": Replace YOUR_X_USERNAME with your actual X username.
  • click: Clicks the "Next" button.
    • selector: "button[role='button'] span span": The CSS selector for the next button.
    • text: "Next": The text content of the button.
  • type: Enters the password.
    • selector: "input[name='password']": The CSS selector for the password input field.
    • value: "YOUR_X_PASSWORD": Replace YOUR_X_PASSWORD with your actual X password.
  • click: Clicks the login button.
  • wait
    • time: 80000: Pauses execution for 80 seconds.

Purpose: This is a crucial step. It provides enough time for you to manually enter a two-factor authentication (2FA) code if you have it enabled on your X account. Even if you don't have 2FA, this wait time allows X's website to fully load after login and helps avoid triggering anti-bot measures.

  • saveCookies: Saves the cookies after login. This is important to maintain the login session for subsequent actions.
Repeat Sequence

This part handles the scrolling and data processing.

  • repeat: This key contains an array of actions to be repeated.
    • scroll: Scrolls the page.
      • value: 50: Scrolls down 50px.
    • evaluate: Evaluate the page.
    • process: Process the page.
    • wait: Waits for 1 second.
    • failed_limit: 4: If the loop fails 4 times, stop the loop.

Step 2: Edit the Actions File

Replace YOUR_X_USERNAME and YOUR_X_PASSWORD with your actual X username and password. This is crucial for the login process to work.

Step 3: Run the Crawler tyo-crawler with the X Processor

Now, you can run tyo-crawler using the X processor and the x.json actions file.

Open your terminal, navigate to the tyo-crawler directory, and run the following command:

bash
node index.js --show-window true --with-cookies true --actions-file ./x.json --processor x https://x.com/[ACCOUNT_NAME]

For example, to scrape tweets from the user CommSec, you would run:

bash
node index.js --show-window true --with-cookies true --actions-file ./x.json --processor x https://x.com/CommSec

Explanation of the Command

  • node index.js: Executes the main tyo-crawler script.
  • --show-window true: Shows the browser window during the scraping process (useful for debugging and entering 2FA codes).
  • --with-cookies true: Enables cookie handling, which is essential for maintaining the login session.
  • --actions-file ./x.json: Specifies the path to the actions file.
  • --processor x: Tells tyo-crawler to use the built-in X processor.
  • https://x.com/CommSec: The target URL (replace CommSec with the desired user's profile).

Step 4: Observe the Results

tyo-crawler will now:

  • Launch a browser.
  • Navigate to https://x.com/CommSec (or your specified URL).
  • Attempt to log in using the credentials in x.json.
  • Wait for 80 seconds to allow for 2FA or website loading.
  • Scroll down the page 50px each time repeatly.
  • Scrape the tweets.
  • Save the scraped data.

The scraped data will be saved in the [ACCOUNT_NAME] directory by default.

Step 5: Adapt and Enhance

  • Adjust Scrolling: Modify the value in the scroll action to scrape more tweets as this value decides how far we try to scroll the browser to.
  • Explore Other Actions: Refer to the README.md file for other available actions and parameters.
  • Error Handling: While the X processor handles many common issues, you might still encounter errors. Monitor the console output for any error messages.
  • Change the failed_limit: Adjust the failed_limit in the actions file to control how many times the crawler will attempt to scroll before stopping. This can help prevent infinite loops in case of unexpected errors.

Important Notes

  • X's Terms of Service: Be aware of X's terms of service regarding scraping. Scraping may be against their rules, and they may block your IP address if they detect suspicious activity.
  • Ethical Scraping: Be respectful of the website's resources. Don't overload their servers with requests.
  • Website Changes: X's website structure can change at any time. While the X processor is designed to handle common changes, you might need to update the actions file or the processor itself if major changes occur.
  • Anti-bot measures: X has strong anti-bot measures, so you may need to use proxies, rotate user agents, and add delays to avoid being blocked.
  • Login: You need to provide your X username and password in the x.json file.

Conclusion

This tutorial demonstrates how to use tyo-crawler's powerful X processor and an actions file to scrape tweets from X efficiently. By leveraging these built-in features, you can avoid writing complex scraping scripts and focus on extracting the data you need. Remember to always be mindful of ethical and legal considerations when web scraping.

Previous Article
post-thumb

Apr 15, 2025

Building an AI Knowledge Base: Using tyo-crawler to Convert Evernote Notes into Markdown

Learn how to transform your Evernote notes into a structured Markdown knowledge base using tyo-crawler, making them ready for integration with Retrieval-Augmented Generation (RAG) systems.

Next Article
post-thumb

Mar 27, 2025

Unleash the Power of Puppeteer: A Deep Dive into the TYO-Crawler Web Crawler

In this blog post, we explore `tyo-crawler`, a Node.js web crawler built on Puppeteer. Learn how to harness its features for efficient web scraping.

agico

We transform visions into reality. We specializes in crafting digital experiences that captivate, engage, and innovate. With a fusion of creativity and expertise, we bring your ideas to life, one pixel at a time. Let's build the future together.

Copyright ©  2025  TYO Lab