LinkedIn News Scraping with Persistent Authentication
GuideHow-to

LinkedIn News Scraping with Persistent Authentication

August 29, 2025

Building LinkedIn News Scrapers with Witrium: Authentication & Session Management

Purpose: Learn how to build authenticated web automations using Witrium's session management features to extract LinkedIn news headlines while maintaining logged-in state across multiple runs.

Legal note: Always review and comply with LinkedIn's Terms of Service and applicable laws. Use respectful rates, avoid abuse, and consider LinkedIn's API when available for commercial use cases.

What We're Building

LinkedIn's news section contains valuable industry insights, but accessing it requires authentication. Traditional scraping approaches struggle with login flows, session management, and maintaining authenticated state across multiple runs.

This tutorial demonstrates Witrium's session management capabilities: Instead of logging in every time you want to extract news data, we'll create a persistent login session that can be reused across multiple extraction runs. This approach is more efficient, respectful to the platform, and mirrors real user behavior.

The Strategy:

  • Login Workflow: Handle authentication once and save the browser state
  • Extraction Workflow: Reuse the saved session to extract news data on demand
  • Reusable Sessions: Maintain authentication across multiple runs without re-logging in

Why Session Management Matters

Traditional Challenges with Authenticated Scraping

Most web scraping approaches fail when authentication is required:

Manual Login Required:

  • Need to manually log in before each scraping session
  • Difficult to automate login flows reliably
  • Authentication state gets lost between runs

Session Complexity:

  • Managing cookies, tokens, and browser state manually
  • Handling multi-factor authentication and CAPTCHA challenges
  • Dealing with session timeouts and re-authentication

Infrastructure Overhead:

  • Maintaining persistent browser instances
  • Managing authentication credentials securely
  • Coordinating login state across multiple scraping tasks

The Witrium Session Management Advantage

Witrium solves these challenges with built-in session management:

Persistent Sessions - Save and reuse authenticated browser states
Secure Credential Handling - Encrypted storage for passwords and sensitive data
Session Isolation - Named sessions for different accounts or use cases
Automatic State Management - Browser state preserved across workflow runs
Visual Session Building - Test authentication flows interactively

Prerequisites

Implementation Overview

We'll build this in two phases:

  1. Authentication Workflow - Handle login and save the session
  2. Extraction Workflow - Use the saved session to extract news data

This separation allows you to authenticate once and run extractions multiple times without re-logging in.

Phase 1: Building the Login Workflow

Step 1: Create the Authentication Workflow

Creating the login workflow

Step 2: Set the Target URL for Direct Login

Set the Target URL to LinkedIn's direct login page:

https://www.linkedin.com/login

Why this matters: By targeting the login page directly instead of the homepage, we skip unnecessary redirects and get straight to the authentication form. This makes the workflow more reliable and faster.

Step 3: Start Build Session

Click Start Build Session. Witrium opens a live browser instance showing LinkedIn's login page. You'll see the login form ready for interaction.

Starting the build session

Step 4: Add Email Input Instruction

Add the following instruction:

In the email field enter {{email}}

Understanding Arguments: The {{email}} syntax creates a workflow argument. You can name it anything you want. (See Working with Instructions section in the documentation for more details). During execution, you can pass different values to this argument without modifying the workflow. This makes the workflow reusable across different accounts.

After adding this instruction, you'll see a new "email" field appear on the instruction panel. Enter your LinkedIn email address and click Play to test the instruction.

Email input instruction

Step 5: Add Password Input Instruction (Secure)

Add the password instruction:

In the password field enter {{$password}}

Secret Arguments: The $ prefix before password marks this as a secret value. (See Working with Instructions section in the documentation for more details). Witrium:

  • Stores secret values in an encrypted vault
  • Automatically deletes them after workflow completion
  • Never sends secret values to the underlying AI model
  • Displays them as password fields (masked input)

Enter your LinkedIn password in the "$password" field and click Play.

Password input instruction

Step 6: Submit the Login Form

Add the sign-in instruction:

Click on sign in button

Click Play to execute the login. You should see LinkedIn's authentication process begin.

Step 7: Add Session Stabilization Wait

Add a crucial wait instruction:

Wait 10 seconds

Why This Is Critical: This wait serves multiple purposes:

  • Allows LinkedIn's post-login redirects to complete
  • Ensures the homepage loads fully with all authentication tokens
  • Provides buffer time for any additional verification steps
  • Guarantees the session is in a stable state before saving

Session Management Requirement: Witrium's session saving works best when the page is fully loaded. This wait ensures reliable session preservation.

Wait instruction

Step 8: Complete the Login Workflow

Click End Build Session to finalize the login workflow. Your authentication workflow is now ready.

Step 9: Execute and Save the Session

Now we'll run the workflow and create a persistent session:

  1. Click Run Workflow
  2. In the popup, enter your email and $password credentials
  3. Click on Session Management tab
  4. Toggle the "Preserve session" switch to ON
  5. Enter a unique Session Name (e.g., "Linkedin login-session")
  6. Click Start Run

Session management settings

What Happens During Execution:

  • Witrium runs through all your login steps
  • Completes the authentication process
  • Captures the entire browser state (cookies, tokens, storage)
  • Securely saves the session under your chosen name
  • Makes the session available for future workflows

Managing Sessions: All your saved sessions can be managed at https://witrium.com/settings?tab=browser-sessions

Phase 2: Building the Extraction Workflow

Step 1: Create the News Extraction Workflow

Step 2: Add the Extraction Instructions

Add these instructions in order:

Instruction 1: Page Load Stabilization

Wait 10 seconds

Purpose: Ensures the LinkedIn homepage loads completely with all dynamic content, including the news section.

Instruction 2: Expand News Content

In the LinkedIn news section, click on "show more"

Purpose: LinkedIn initially shows only a few headlines. This instruction expands the view to display all available top stories for more comprehensive extraction.

Instruction 3: Data Extraction

Extract the following data for all visible top stories from the LinkedIn news section and return as JSON:

1) `headline`: the visible headline (string)
2) `time`: the time duration for that headline (string)  
3) `readers`: the number of readers listed for that headline (integer)

Return a top-level JSON object: { "news": [ ... ] }

Extraction Details:

  • Structured Output: Enforces consistent JSON format across runs
  • Complete Coverage: Extracts all visible stories, not just featured ones
  • Rich Metadata: Captures engagement metrics (readers) and recency (time)

Extraction instructions

Step 3: Test the Extraction Workflow

To test the workflow with your saved session:

  1. Click Run Workflow
  2. Navigate to the Session Management tab
  3. Select "Use existing session"
  4. Choose your saved LinkedIn session from the dropdown
  5. Click Start Run

Using existing session

What Happens:

  • Witrium loads your saved browser state
  • Bypasses the login process entirely
  • Starts directly from your authenticated LinkedIn homepage
  • Executes the extraction instructions
  • Returns structured JSON data

Step 4: Review Extraction Results

After the workflow completes, you'll see the extracted news data in JSON format:

{
  "news": [
    {
      "headline": "Tech layoffs continue as startups face funding challenges",
      "time": "2h",
      "readers": 15420
    },
    {
      "headline": "Remote work policies evolving in 2025",
      "time": "4h", 
      "readers": 8760
    }
    // ... more news items
  ]
}

Extraction results

Automation & Integration

Option A: REST API Integration

Each Witrium workflow generates an auto-generated REST endpoint. You can trigger the extraction workflow programmatically:

curl -X POST   -H "Authorization: Bearer <YOUR_API_TOKEN>"   -H "Content-Type: application/json"   "https://api.witrium.com/v1/workflows/<EXTRACTION_WORKFLOW_ID>/run"   -d '{
    "use_states": ["Linkedin login-session"]
  }'

Option B: Python SDK Integration

For more sophisticated integrations:

from witrium.client import SyncWitriumClient

API_TOKEN = "<YOUR_API_TOKEN>"
EXTRACTION_WORKFLOW_ID = "<YOUR_EXTRACTION_WORKFLOW_ID>"

def get_linkedin_news():
    with SyncWitriumClient(api_token=API_TOKEN) as client:
        result = client.run_workflow_and_wait(
            workflow_id=EXTRACTION_WORKFLOW_ID,
            use_states=["Linkedin login-session"]
        )
        
        if result.status == "COMPLETED":
            news_data = result.result.get("news", [])
            return news_data
        else:
            raise Exception(f"Extraction failed: {result.status}")

# Usage
news = get_linkedin_news()
for item in news:
    print(f"• {item['headline']} ({item['readers']} readers)")

Scheduled Extractions

You can set up automated news collection using:

  • Cron jobs for regular intervals
  • GitHub Actions for CI/CD integration
  • Cloud functions for serverless execution
  • Zapier/Make.com for no-code automation

Advanced Session Management

Multiple Account Support

Create separate login workflows for different LinkedIn accounts:

# Different sessions for different accounts
personal_session = "linkedin-personal"
company_session = "linkedin-company"
industry_session = "linkedin-industry-news"

# Use appropriate session based on context
def get_news_by_account(account_type="personal"):
    session_map = {
        "personal": personal_session,
        "company": company_session,
        "industry": industry_session
    }
    
    return run_extraction_with_session(session_map[account_type])

Session Refresh Strategy

LinkedIn sessions eventually expire. Implement a refresh strategy:

def get_news_with_refresh():
    try:
        return get_linkedin_news()
    except Exception as e:
        if "authentication" in str(e).lower():
            print("Session expired, refreshing...")
            refresh_linkedin_session()
            return get_linkedin_news()
        raise e

def refresh_linkedin_session():
    # Run the login workflow to refresh the session
    with SyncWitriumClient(api_token=API_TOKEN) as client:
        result = client.run_workflow_and_wait(
            workflow_id=LOGIN_WORKFLOW_ID,
            args={"email": LINKEDIN_EMAIL, "$password": LINKEDIN_PASSWORD},
            preserve_session=True,
            use_states=["Linkedin login-session"]
        )

Best Practices & Security

Credential Security

  • Always use secret arguments ($-prefixed arguments) for sensitive data (See Working with Instructions section in the documentation for more details).
  • Store your Witrium API tokens securely in environment variables
  • Use dedicated accounts for automation when possible

Respectful Usage

  • Add appropriate delays between requests
  • Don't overwhelm LinkedIn's servers
  • Respect rate limits and terms of service
  • Consider using LinkedIn's official API for commercial applications

Browser Session Hygiene

Error Handling

def robust_news_extraction():
    max_retries = 3
    for attempt in range(max_retries):
        try:
            return get_linkedin_news()
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                raise e
            time.sleep(2 ** attempt)  # Exponential backoff

Troubleshooting Common Issues

Session Not Working

  • Check session expiry: LinkedIn sessions expire after inactivity
  • Verify login success: Ensure the login workflow completed successfully
  • Review wait times: Insufficient wait times can cause incomplete session saves

Extraction Failures

  • LinkedIn layout changes: The news section layout may have changed, ensure it is visible on the homepage
  • Content not loaded: Add or increase wait times for dynamic content
  • Access restrictions: Verify your account has access to the news section

Rate Limiting

  • Add delays: Include wait instructions between actions
  • Reduce frequency: Don't run extraction workflows too frequently
  • Monitor usage: Watch for LinkedIn's rate limiting responses

Conclusion

You now have a robust LinkedIn news extraction system with persistent authentication. The two-workflow approach provides:

  • Secure credential handling with Witrium's secret management
  • Persistent sessions that eliminate repeated logins
  • Scalable extraction that can be automated and integrated
  • Flexible architecture that supports multiple accounts and use cases

This pattern can be extended to other authenticated platforms like Twitter, Facebook, or internal company portals that require login.

Next Steps:

  • Explore Witrium's workflow documentation for advanced features
  • Set up automated scheduling for regular news collection
  • Integrate with your existing data pipelines and analytics tools

Got questions about session management or authentication workflows? Reach out to our support team at support@witrium.com.