Selectors with CSS

Selectors with CSS in Scrapy provide a concise, readable, and web-developer-friendly way to query and extract data from HTML/XML documents using familiar CSS syntax — tag names, classes (.), IDs (#), attribute selectors ([]), combinators (>, +, ~, space), and pseudo-elements (::text, ::attr(name)). CSS selectors are generally faster to write/debug and more maintainable than XPath for common tasks, making them the default choice for most Scrapy projects. In 2026, CSS selectors dominate production spiders — they integrate seamlessly with response.css(), support chaining, relative selection, and vectorized extraction in pandas/Polars pipelines, while still allowing hybrid XPath use for advanced cases.

Here’s a complete, practical guide to using CSS selectors in Scrapy: core syntax and examples, attribute/class/ID selection, pseudo-elements, combinators, real-world patterns, and modern best practices with type hints, performance, error handling, and pandas/Polars integration.

Basic CSS selectors — target by tag, class, ID, or combinations — use response.css() or selector.css().


# All  elements
divs = response.css('div')

# All elements with class="example"
examples = response.css('.example')

# Element with id="header"
header = response.css('#header')

# First  inside a 
first_link = response.css('div a:first-of-type')

# All  inside 
important_ps = response.css('div.content p.important')

Attribute selectors — filter by attribute presence, exact value, starts/ends/contains — powerful for data-* attributes, href/src, aria-labels.


# Elements with href attribute
links = response.css('[href]')

# Links starting with "/product/"
product_links = response.css('a[href^="/product/"]')

# Elements with data-id="123"
specific = response.css('[data-id="123"]')

# Elements with title containing "Cool"
cool_items = response.css('[title*="Cool"]')

# Extract attribute values with ::attr()
hrefs = response.css('a::attr(href)').getall()        # all hrefs
srcs = response.css('img::attr(src)').getall()        # all image srcs
data_prices = response.css('[data-price]::attr(data-price)').getall()

Pseudo-elements and combinators — extract text/attributes or refine relationships.


# Text content
titles = response.css('h1::text').getall()

# Direct child
direct = response.css('div > p::text').getall()

# Adjacent sibling
adjacent = response.css('h2 + p::text').getall()

# General sibling
siblings = response.css('h2 ~ p::text').getall()

Real-world pattern: structured extraction in Scrapy spiders — CSS selectors for clean, maintainable code on modern sites with classes and data attributes.


def parse_product(self, response):
    yield {
        'name': response.css('h1.product-title::text').get(default='').strip(),
        'price': response.css('.price-amount::text').re_first(r'[\d,.]+') or '0.00',
        'image': response.css('img.product-image::attr(src)').get(),
        'rating': response.css('[data-rating]::attr(data-rating)').get(default='N/A'),
        'sku': response.css('[data-sku]::attr(data-sku)').get(),
        'category': response.css('nav.breadcrumb a:last-of-type::text').get().strip()
    }

Best practices make CSS selector usage safe, readable, and performant. Prefer CSS over XPath for most tasks — shorter, easier to read/write/debug, faster in Scrapy. Use ::text and ::attr(name) pseudo-elements — cleaner than XPath equivalents for text/attributes. Modern tip: use Polars for large-scale post-processing — pl.from_pandas(df) after Scrapy item export for fast cleaning/aggregation. Add type hints — response: scrapy.http.TextResponse — improves spider clarity. Use relative selectors — .css('.container p') — after narrowing scope to avoid full-tree searches. Handle missing data — .get(default='') or .getall() or []. Test selectors in Scrapy Shell — scrapy shell 'url' — iterate quickly. Combine with scrapy.linkextractors.LinkExtractor — for smart link following. Respect robots.txt and rate limit — Scrapy settings handle this automatically.

CSS selectors in Scrapy offer concise, powerful querying — tags, classes, IDs, attributes, pseudo-elements, and combinators for clean extraction. In 2026, prefer CSS for readability/speed, use ::attr() for attributes, ::text for content, relative selectors, and integrate with Polars for scale. Master CSS selectors, and you’ll write fast, maintainable spiders that extract accurate data from any site structure.

Next time you need to select elements in Scrapy — reach for CSS selectors. It’s Python’s cleanest way to say: “Find and extract exactly what I need using familiar CSS syntax.”

Generating content...