Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 0 additions & 33 deletions .github/workflows/lint.yml

This file was deleted.

30 changes: 0 additions & 30 deletions .github/workflows/publish.yml

This file was deleted.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Archived SDK versions and reference implementations
archive/
tests

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
62 changes: 62 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,67 @@
# Bright Data Python SDK Changelog

## Version 2.2.1 - 100 Datasets API

### ✨ New Features

#### Expanded Datasets Coverage
Added 92 new dataset integrations, bringing the total to **100 datasets**:

- **Luxury Brands**: Loewe, Berluti, Moynat, Hermes, Delvaux, Prada, Montblanc, YSL, Dior, Balenciaga, Bottega Veneta, Celine, Chanel, Fendi
- **E-commerce**: Amazon (Reviews, Sellers), Walmart, Shopee, Lazada, Zalando, Sephora, Zara, Mango, Massimo Dutti, Asos, Shein, Ikea, H&M, Lego, Mouser, Digikey
- **Social Media**: Instagram (Profiles, Posts), TikTok, Pinterest (Posts, Profiles), YouTube (Profiles, Videos, Comments), Facebook Pages Posts
- **Real Estate**: Zillow, Airbnb, Australia Real Estate, Otodom Poland, Zonaprop Argentina, Metrocuadrado, Infocasas Uruguay, Properati, Toctoc, Inmuebles24 Mexico, Yapo Chile
- **Business Data**: Glassdoor (Companies, Reviews, Jobs), Indeed (Companies, Jobs), ZoomInfo, PitchBook, G2, Trustpilot, TrustRadius, Owler, Slintel, Manta, VentureRadar, Companies Enriched, Employees Enriched
- **Other**: World Zipcodes, US Lawyers, Google Maps Reviews, Yelp, Xing Profiles, OLX Brazil, Webmotors Brasil, Chileautos, LinkedIn Jobs

#### SERP Pagination Support
Added sequential querying to retrieve more than 10 search results from Google:

```python
async with BrightDataClient() as client:
# Get up to 50 results with automatic pagination
results = await client.search.google(
query="python programming",
num_results=50 # Fetches multiple pages sequentially
)
```

---

## Version 2.2.0 - Datasets API

### ✨ New Features

#### Datasets API
Access Bright Data's pre-collected datasets with filtering and export capabilities.

```python
async with BrightDataClient() as client:
# Filter dataset records
snapshot_id = await client.datasets.amazon_products(
filter={"name": "rating", "operator": ">=", "value": 4.5},
records_limit=100
)
# Download results
data = await client.datasets.amazon_products.download(snapshot_id)
```

**8 Datasets:** LinkedIn Profiles, LinkedIn Companies, Amazon Products, Crunchbase Companies, IMDB Movies, NBA Players Stats, Goodreads Books, World Population

**Export Utilities:**
```python
from brightdata.datasets import export_json, export_csv
export_json(data, "results.json")
export_csv(data, "results.csv")
```

### 📓 Notebooks
- `notebooks/datasets/linkedin/linkedin.ipynb` - LinkedIn datasets (profiles & companies)
- `notebooks/datasets/amazon/amazon.ipynb` - Amazon products dataset
- `notebooks/datasets/crunchbase/crunchbase.ipynb` - Crunchbase companies dataset

---

## Version 2.1.2 - Web Scrapers & Notebooks

### 🐛 Bug Fixes
Expand Down
1 change: 0 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,3 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

1 change: 0 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ include CHANGELOG.md
include pyproject.toml
recursive-include src *.py
recursive-include src *.typed

51 changes: 50 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Bright Data Python SDK

The official Python SDK for [Bright Data](https://brightdata.com) APIs. Scrape any website, get SERP results, bypass bot detection and CAPTCHAs.
The official Python SDK for [Bright Data](https://brightdata.com) APIs. Scrape any website, get SERP results, bypass bot detection and CAPTCHAs, and access 100+ ready-made datasets.

[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
Expand Down Expand Up @@ -135,6 +135,55 @@ async with BrightDataClient() as client:
- `client.scrape.instagram` - profiles, posts, comments, reels
- `client.scrape.facebook` - posts, comments, reels

## Datasets API

Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms.

```python
async with BrightDataClient() as client:
# Filter a dataset — returns a snapshot_id
snapshot_id = await client.datasets.imdb_movies(
filter={"name": "title", "operator": "includes", "value": "black"},
records_limit=5
)

# Download when ready (polls until snapshot is complete)
data = await client.datasets.imdb_movies.download(snapshot_id)
print(f"Got {len(data)} records")

# Quick sample: .sample() auto-discovers fields, no filter needed
# Works on any dataset
snapshot_id = await client.datasets.imdb_movies.sample(records_limit=5)
```

**Export results to file:**

```python
from brightdata.datasets import export

export(data, "results.json") # JSON
export(data, "results.csv") # CSV
export(data, "results.jsonl") # JSONL
```

**Available dataset categories:**
- **E-commerce:** Amazon, Walmart, Shopee, Lazada, Zalando, Zara, H&M, Shein, IKEA, Sephora, and more
- **Business intelligence:** ZoomInfo, PitchBook, Owler, Slintel, VentureRadar, Manta
- **Jobs & HR:** Glassdoor (companies, reviews, jobs), Indeed (companies, jobs), Xing
- **Reviews:** Google Maps, Yelp, G2, Trustpilot, TrustRadius
- **Social media:** Pinterest (posts, profiles), Facebook Pages
- **Real estate:** Zillow, Airbnb, and 8+ regional platforms
- **Luxury brands:** Chanel, Dior, Prada, Balenciaga, Hermes, YSL, and more
- **Entertainment:** IMDB, NBA, Goodreads

**Discover available fields:**

```python
metadata = await client.datasets.imdb_movies.get_metadata()
for name, field in metadata.fields.items():
print(f"{name}: {field.type}")
```

## Async Usage

Run multiple requests concurrently:
Expand Down
Loading