The Geography of Climate Discourses

8 min readAug 2, 2024

Originally published at https://blog.republicofdata.io on August 2, 2024.

We aim to gather conversations about climate change from various social networks and analyze how people’s perceptions and opinions vary based on location.

To achieve this, we will use the following design.

We have a database with the media feeds and social network configurations to scrape.
We then have 2 data products responsible for collecting and exposing media articles and social network conversations.
Finally, we have an AI agent to analyze the conversations and classify them into four discourse types.

This notebook is a prototype of the design, serving as a proof of concept to demonstrate how we can utilize the data we’ve gathered to analyze climate discussions based on geographical locations. For more information and in-depth exploration, please check out the public repository, which includes all the code and documentation for this project.

Theoretical Framework

Let’s first talk about the theoretical framework we use to classify the discourses.

We will reference the book “Climate and Society” by Robin Leichenko and Karen O’Brien, a seminal work that offers a comprehensive framework for understanding climate discourses.

In this book, the authors propose a framework to understand the different discourses on climate change. They identify four main discourses:

Biophysical: “Climate change is an environmental problem caused by rising concentrations of greenhouse gases from human activities. Climate change can be addressed through policies, technologies, and behavioural changes that reduce greenhouse gas emissions and support adaptation.”
Critical: “Climate change is a social problem caused by economic, political, and cultural processes that contribute to uneven and unsustainable patterns of development and energy usage. Addressing climate change requires challenging economic systems and power structures that perpetuate high levels of fossil fuel consumption.”
Dismissive: “Climate change is not a problem at all or at least not an urgent concern. No action is needed to address climate change, and other issues should be prioritized.”
Integrative: “Climate change is an environmental and social problem that is rooted in particular beliefs and perceptions of human-environment relationships and humanity’s place in the world. Addressing climate change requires challenging mindsets, norms, rules, institutions, and policies that support unsustainable resource use and practice.”

We will use this framework to classify the discourses we collect from social networks.

Data

In early 2024, we collected conversations from X that discussed climate-related articles from the New York Times.

The data spans from March 28, 2024, to May 30.
During that time, we collected 655 articles, 7164 conversations and 7974 posts.
Those posts came from 6578 users.
We were able to geolocate 3005 of those users.
That gives us 2071 posts with geolocation data for the users.
Of those posts, we have 1380 posts geolocated in the United States.

Let’s load that sample data.

conversations_df <- read.csv("../data/climate_conversations.csv")

And have a glimpse of the first few rows.

kable(head(conversations_df, n=5))

Climate Conversations

Conversations can sometimes deviate from their original purpose, resulting in not all discussions being about climate change. Therefore, we employed an AI agent to determine which conversations focused on climate change.

Here is the code for the AI agent that classifies the conversations.

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI
from langsmith import traceable


class ConversationClassification(BaseModel):
    """Classify if a conversation is about climate change"""

    conversation_id: str = Field(description="A conversation's id")
    classification: bool = Field(
        description="Whether the conversation is about climate change"
    )


# Agent to classify conversations as about climate change or not
@traceable
def initiate_conversation_classification_agent():
    # Components
    model = ChatOpenAI(model="gpt-4o-mini")
    structured_model = model.with_structured_output(ConversationClassification)

    prompt_template = ChatPromptTemplate.from_messages(
        [
            (
                "human",
                "Classify whether  this conversation is about climate change or not: {conversation_posts_json}",
            ),
        ]
    )

    # Task
    chain = prompt_template | structured_model
    return chain

Then, here’s the code that uses that agent to classify the conversations.

import json

import pandas as pd

from prototype.conversation_classification_agent import (
    initiate_conversation_classification_agent,
)
from prototype.utils import get_conversations

conversations_df = get_conversations()

# Classify conversations as about climate change or not
conversation_classifications_df = pd.DataFrame(
    columns=["conversation_id", "classification"]
)
conversation_classification_agent = initiate_conversation_classification_agent()

# Iterate over all conversations and classify them
for _, conversation_df in conversations_df.iterrows():
    conversation_dict = conversation_df.to_dict()
    conversation_json = json.dumps(conversation_dict)

    conversation_classifications_output = conversation_classification_agent.invoke(
        {"conversation_posts_json": conversation_json}
    )
    new_classification = pd.DataFrame([conversation_classifications_output.dict()])
    conversation_classifications_df = pd.concat(
        [conversation_classifications_df, new_classification], ignore_index=True
    )

# Save classified conversations to a new csv file
conversation_classifications_df.to_csv(
    "data/conversation_classifications.csv", index=False
)

Let’s see the results of that classification.

conversation_classifications <- read.csv("../data/conversation_classifications.csv")

conversation_classifications %>%
  head(n=5) %>%
  kable()

Let’s visualize the distribution of conversations that are about climate change.

conversation_classifications %>%
  count(classification) %>%
  kable()

We end up with 153 geolocated conversations about climate change. Let’s start associating their posts to discourses and map which discourses dominates per region.

Discourses

We are now using a second agent to classify the conversations into the 4 discourses we talked about earlier.

from typing import List

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI
from langsmith import traceable


# Define classes for LLM task output
class PostAssociation(BaseModel):
    """Association between post and narrative"""

    post_id: str = Field(description="A post's id")
    text: str = Field(description="A post's text")
    discourse: str = Field(description="The associated discourse's label")


class PostAssociations(BaseModel):
    """List of associations between posts and narratives"""

    post_associations: List[PostAssociation]


# Agent to classify posts to discourses
@traceable
def initiate_post_association_agent():
    # Components
    model = ChatOpenAI(model="gpt-4o")
    parser = PydanticOutputParser(pydantic_object=PostAssociations)

    # Prompt
    system_template = """
    # IDENTITY and PURPOSE 
    You are an expert at associating discourse types to social network posts.

    # STEPS
    1. Ingest the first json object which has all the posts from a social network conversation on climate change.
    2. Consider the discourse type definitions provided below.
    3. Take your time to process all those entries.
    4. Parse all posts and associate the most appropriate discourse type to each individual post.
    5. It's important that if no discourse is relevant, the post should be classified as N/A.
    5. Each association should have the post's text and the discourse's label.

    # DISCOURSE TYPES
    1. Biophysical: "Climate change is an environmental problem caused by rising concentrations of greenhouse gases from human activities. Climate change can be addressed through policies, technologies, and behavioural changes that reduce greehouse gas emissions and support adaptation."
    2. Critical: "Climate change is a social problem caused by economic, political, and cultureal procsses that contribute to uneven and unsustainable patterns of development and energy usage. Addressing climate change requires challenging economic systems and power structures that perpetuate high levels of fossil fuel consumption."
    3. Dismissive: "Climate change is not a problem at all or at least not an urgent concern. No action is needed to address climate change, and other issues should be prioritized."
    4. Integrative: "Climate change is an environmental and social problem that is rooted in particular beliefs and perceptions of human-environment relationships and humanity's place in the world. Addressing climate change requires challenging mindsets, norms, rules, institutions, and policies that support unsustainable resource use and practice."
    5. N/A: "No discourse is relevant to this post."

    # OUTPUT INSTRUCTIONS
    {format_instructions}
    """

    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", system_template),
            (
                "human",
                "Here's a json object which has all the posts from a social network conversation on climate change: {conversation_posts_json}",
            ),
        ]
    ).partial(format_instructions=parser.get_format_instructions())

    # Task
    chain = prompt_template | model | parser
    return chain

And here’s the code that uses that agent to associate posts to discourses.

import json

import pandas as pd

from prototype.post_association_agent import initiate_post_association_agent
from prototype.utils import get_conversations

conversations_df = get_conversations()
classifications_df = pd.read_csv("data/conversation_classifications.csv")

# Filter conversations classified as about climate change
climate_change_conversations_df = conversations_df[
    conversations_df["conversation_id"].isin(
        classifications_df[classifications_df["classification"] == True][
            "conversation_id"
        ]
    )
]

# # Associate posts with a discourse type
post_associations_df = pd.DataFrame(columns=["post_id", "discourse_type"])
post_association_agent = initiate_post_association_agent()

# Iterate over all conversations and classify them
for _, conversation_df in climate_change_conversations_df.iterrows():
    conversation_dict = conversation_df.to_dict()
    conversation_json = json.dumps(conversation_dict)

    try:
        post_associations_output = post_association_agent.invoke(
            {"conversation_posts_json": conversation_json}
        )

        for association in post_associations_output.post_associations:
            new_row = {
                "post_id": association.post_id,
                "discourse_type": association.discourse,
            }
            post_associations_df = pd.concat(
                [post_associations_df, pd.DataFrame([new_row])], ignore_index=True
            )
    except Exception as e:
        print(
            f"Failed to associate posts in conversation {conversation_df['conversation_id']}"
        )
        print(e)

# Save classified conversations to a new csv file
post_associations_df.to_csv("data/post_associations_df.csv", index=False)

Let’s see the results of that discourse association.

post_associations <- read.csv("../data/post_associations_df.csv")

post_associations %>%
  head(n=5) %>%
  kable()

And visualize the distribution of discourses.

post_associations %>%
  count(discourse_type) %>%
  kable()

Geographical Distribution

Now that we have the cleaned up, relevant conversations and their associated discourses, let’s pull it all together.

full_posts <- conversations_df %>%
  inner_join(conversation_classifications, by = "conversation_id") %>%
  inner_join(post_associations, by = "post_id") %>%
  filter(classification == "True") %>%
  filter(tolower(discourse_type) != "n/a") %>%
  filter(admin1_name != "")

full_posts %>%
  head(n=1) %>%
  kable()

And now let’s see the distribution of discourses by geography, with the most frequent discourse type per state.

discourses_geo_summary <- full_posts %>%
  count(discourse_type, admin1_name) %>%
  pivot_wider(names_from = discourse_type, values_from = n, values_fill = list(n = 0)) %>%
  rowwise() %>%
  mutate(most_frequent_discourse = names(.)[which.max(c_across(-admin1_name)) + 1]) %>%
  ungroup()

discourses_geo_summary %>%
  kable()

Finally, let’s visualize the most frequent discourse type per U.S. state.

library(ggplot2)
library(sf)
library(dplyr)
library(usmap)

# Load the U.S. map
us_states <- usmap::us_map(regions = "states")

# Convert the us_states to an sf object
us_states_sf <- st_as_sf(us_states, coords = c("x", "y"), crs = 4326, agr = "constant")

# Join the data
discourses_geo_summary_sf <- discourses_geo_summary %>%
  left_join(us_states_sf, by = c("admin1_name" = "full"))

# Convert to an sf object, ensuring 'geom' is recognized as the geometry column
discourses_geo_summary_sf <- st_as_sf(discourses_geo_summary_sf, sf_column_name = "geom")

# Plot the map
ggplot(discourses_geo_summary_sf) +
  geom_sf(aes(geometry = geom, fill = most_frequent_discourse), color = "black") +
  scale_fill_manual(values = c(
    "Biophysical" = "blue",
    "Critical" = "red",
    "Dismissive" = "green",
    "Integrative" = "purple"
  )) +
  theme_minimal() +
  labs(title = "Most Frequent Discourse Type per U.S. State")

Most frequent climate discourse type per U.S. State

Next Steps

This prototype shows how we can leverage the data we collected to understand climate discourses by geography. But we’re not done yet.

In a next iteration, we will:

Restart of data collection platform.
Increase the number of social network platforms we collect data from.
Increase the number of media sources that helps us capture climate conversations.
Improve the AI agents to better classify the discourses.
Develop a proper interactive data app to visualize the discourses and underlying narratives by geography. But also by time, to see how the discourses evolve.

For now though, I’m taking a few weeks off. So see you in September!