important_ps = response.css('div.content p.important')
Attribute selectors — filter by attribute presence, exact value, starts/ends/contains — powerful for data-* attributes, href/src, aria-labels.
# Elements with href attribute
links = response.css('[href]')
# Links starting with "/product/"
product_links = response.css('a[href^="/product/"]')
# Elements with data-id="123"
specific = response.css('[data-id="123"]')
# Elements with title containing "Cool"
cool_items = response.css('[title*="Cool"]')
# Extract attribute values with ::attr()
hrefs = response.css('a::attr(href)').getall() # all hrefs
srcs = response.css('img::attr(src)').getall() # all image srcs
data_prices = response.css('[data-price]::attr(data-price)').getall()
Pseudo-elements and combinators — extract text/attributes or refine relationships.
# Text content
titles = response.css('h1::text').getall()
# Direct child
direct = response.css('div > p::text').getall()
# Adjacent sibling
adjacent = response.css('h2 + p::text').getall()
# General sibling
siblings = response.css('h2 ~ p::text').getall()
Real-world pattern: structured extraction in Scrapy spiders — CSS selectors for clean, maintainable code on modern sites with classes and data attributes.
def parse_product(self, response):
yield {
'name': response.css('h1.product-title::text').get(default='').strip(),
'price': response.css('.price-amount::text').re_first(r'[\d,.]+') or '0.00',
'image': response.css('img.product-image::attr(src)').get(),
'rating': response.css('[data-rating]::attr(data-rating)').get(default='N/A'),
'sku': response.css('[data-sku]::attr(data-sku)').get(),
'category': response.css('nav.breadcrumb a:last-of-type::text').get().strip()
}
Best practices make CSS selector usage safe, readable, and performant. Prefer CSS over XPath for most tasks — shorter, easier to read/write/debug, faster in Scrapy. Use ::text and ::attr(name) pseudo-elements — cleaner than XPath equivalents for text/attributes. Modern tip: use Polars for large-scale post-processing — pl.from_pandas(df) after Scrapy item export for fast cleaning/aggregation. Add type hints — response: scrapy.http.TextResponse — improves spider clarity. Use relative selectors — .css('.container p') — after narrowing scope to avoid full-tree searches. Handle missing data — .get(default='') or .getall() or []. Test selectors in Scrapy Shell — scrapy shell 'url' — iterate quickly. Combine with scrapy.linkextractors.LinkExtractor — for smart link following. Respect robots.txt and rate limit — Scrapy settings handle this automatically.
CSS selectors in Scrapy offer concise, powerful querying — tags, classes, IDs, attributes, pseudo-elements, and combinators for clean extraction. In 2026, prefer CSS for readability/speed, use ::attr() for attributes, ::text for content, relative selectors, and integrate with Polars for scale. Master CSS selectors, and you’ll write fast, maintainable spiders that extract accurate data from any site structure.
Next time you need to select elements in Scrapy — reach for CSS selectors. It’s Python’s cleanest way to say: “Find and extract exactly what I need using familiar CSS syntax.”