Scraping Chinese B2B wholesale platforms with Crawlee — lessons from building a Yiwugo.com scraper #1737
wfgsss
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey Crawlee community! 👋
Wanted to share some lessons learned from building a scraper for Chinese B2B wholesale platforms using the Apify/Crawlee ecosystem.
The Challenge
Chinese e-commerce platforms like Yiwugo.com (the online portal for Yiwu International Trade Market — world's largest small commodities wholesale market) present some unique scraping challenges:
What Worked
CheerioCrawler for product listing pages — Most product data is server-rendered, so no need for browser automation on listing pages. This kept the scraper fast and resource-efficient.
Request queue with priority — Prioritizing product detail pages over category browsing pages improved data completeness when running with limited compute.
Custom encoding handler — Had to add a pre-processing step to detect and normalize character encoding before parsing.
Structured output with price tiers — Instead of just grabbing the headline price, extracting the full MOQ-based pricing table gives much more useful data for sourcing research.
The Result
The scraper is live on Apify Store as Yiwugo Scraper if anyone wants to try it. It extracts product details, wholesale pricing tiers, supplier info, and images from Yiwugo.com.
Why This Matters
There's a gap in scraping tools for Chinese wholesale platforms. Alibaba.com has decent coverage, but domestic Chinese platforms like Yiwugo, 1688.com, and GlobalSources often have better prices (closer to factory-direct) and different product mixes. For anyone doing product sourcing research or building e-commerce data pipelines, these platforms are goldmines.
Happy to answer questions about scraping Chinese e-commerce sites or share more technical details!
Beta Was this translation helpful? Give feedback.
All reactions