Social Media Datasets
One-stop access to raw data from major social platforms. Our datasets cover trending TikTok videos and music, real-time interactions on Facebook & X (Twitter), video metadata from Instagram and YouTube. Includes Post IDs, video links, review sentiment data, and full trend analysis support.
Covers Major Global Sites
Strict GDPR & CCPA Compliance
JSON/CSV Format Testing Available
Flexible Pricing, Pay as You Go
Trusted by over 200 clients worldwide
Available Social Media Datasets
Data is updated daily, structured and cleaned, and supports direct integration through API or file download.
Reddit Community (Subreddit) Data
Subreddit Name, Subscribers, Description, Rules.
Reddit Comments & Conversations
Comment Body, Author, Nested Replies, Score, Timestamp.
TikTok User Comment Sentiment
Comment Text, Timestamps, Usernames, Reply structures.
TikTok Viral Music Tracks
Music IDs, Titles, Artists, Audio URLs, Usage context.
Instagram Profiles Data
Profile Name, Bio/Title, External Links, Contact Info.
Xiaohongshu Notes & Metadata
Note ID, Title, Description, Media URLs, Engagement Stats.
Facebook Comments Data
Comment ID, Text, Author Name, User ID, Creation Timestamp, Like Count.
Instagram Comments Data
Comment ID, Text, Author Username, User ID, Creation Timestamp.
Xiaohongshu Comments & Sentiment
Comment Text, Likes, IP Location, Nested Replies.
TikTok Trending Video Posts
Video Metadata, Play/Like/Share Stats, URLs, Hashtags, and more.
Instagram Posts Data
Post ID, Content, Date, URL, Engagement Metrics (Likes, Comments), Image URLs.
Xiaohongshu User Profiles
User ID, Nickname, Avatar, Xsec Token.
X Engagement Metrics
Likes, Retweets, Replies, Bookmarks, Quote Counts.
TikTok Creator Profiles
User IDs, Handles, Nicknames, Avatars from posts and comments.
Reddit Submissions & Posts
Title, Selftext, Subreddit, Author, Score, Upvote Ratio.
Facebook Posts Data
Post ID, Content, Date, URL, Engagement Metrics (Likes, Comments, Shares), Image URLs.
YouTube User Comments
Comment ID, Text, Author, Likes, Replies, Time, Sentiment data, and more.
Xiaohongshu Trending Tags
Tag ID, Tag Name, Topic Classification.
Facebook Profiles Data
Profile Name, Title, Email, Phone, Website, Address.
X (Twitter) Tweet Streams
Tweet Text, Creation Time, URL, Views, Hashtags.
YouTube Video Metadata
Video ID, Title, Description, Channel, Views, Likes, Duration, Keywords, and more.
X Multimedia Data
Image URLs, Media Type, Dimensions, Media Keys.
Available delivery methods
Maximize ROI on data investment through intelligent strategies
Incremental Update Model
Pay only for new or changed records—no need to repurchase the full database. Reduce acquisition costs with precision.
Multi-Source Data Bundling
Buy one or multiple datasets and unlock exclusive discounts. Get a full cross-platform view in a single purchase—better value, broader coverage.
Enterprise Volume Pricing
Built for high-volume demands. The more you buy, the lower the unit price. Deep discounts on bulk extractions and subscriptions—do more for less.
Data Cleaning & Enrichment
Receive pre-cleaned, deduplicated, and standardized data. No post-processing needed—ready for immediate business analysis, saving time and effort.
Facebook Posts Dataset Sample
The Facebook Posts dataset captures core content activity on the platform, including posting time, poster username, post content, multimedia links (images/videos), post permalink, and engagement metrics such as like count, comment count, and share count. This data can be used for content strategy analysis, brand sentiment analysis, trend identification, and user engagement evaluation.
| Name | Description | Type | Example |
|---|---|---|---|
| id | unique to each company | AZ text | highgoal–capital |
| name | The name of the company | AZ text | Highgoal Capital |
| country_code | The country where the company is located | AZ text | GB,EE |
| locations | General information about the company's locations | [ ] array | ["London, GB", "Tallinn, EE"] |
| followers | The number of followers the company has | # number | 41 |
| employees_in_linkedin | The number of employees listed on LinkedIn | # number | 2 |
| about | A description or summary of the company | AZ text | xtHighgoal Capital is a technology focused in... |
No data set found? Start custom collection
Please let us know your specific project requirements, and we will match you with the appropriate data set to help your project land efficiently.
| Name | Description | Type | Example |
|---|---|---|---|
| Post ID | Unique identifier for the post | AZ text | POSTID_db0ca14e9f |
| Date/Time | Timestamp of when the post was created | AZ date | 2026-01-17 14:00:00 |
| Username | Username of the account that posted | AZ text | A****************D |
| Post Content | The text content of the post | AZ text | ✨MILLION DOLLAR GROUP - PACESETTER✨... |
| Post URL | Direct URL to the Facebook post | ∞ url | https://www.facebook.com/photo/?fbid=1440237371435729&set=a.503339355125540 |
| Image | List of URLs for images contained in the post | [ ] list | ["https://scontent-ord5-2.xx.fbcdn.net/v/t39.30808-6/... (masked)"] |
| like_count | Number of likes the post received | # integer | 3 |
| comment_count | Number of comments on the post | # integer | 0 |
| share_count | Number of shares the post received | # integer | 0 |
No data set found? Start custom collection
Please let us know your specific project requirements, and we will match you with the appropriate data set to help your project land efficiently.
Dataset Pricing
Buy from a provider with a large scale and high moral standards
Register now and receive a bonus on your first deposit, up to $25.
Starter Plan
Minimum 100K Records
Suited for small-scale validation and initial use
600K Records Included
$840.00 Monthly Plan
Suited for medium-scale monthly needs
2.5M Records Included
$2,800.00 Semi-Annual Plan
Suited for continuously growing data needs
13M Records Included
$10,400.00 Annual Plan
Suited for long-term data solutions at large enterprises
Do you need more than 10 million data or a custom collection solution?
Instantly Empower AI Agents & LLMs
Our datasets are deeply optimized for RAG and model fine-tuning. Clean structure, full documentation, and multi-language SDK examples—seamlessly integrate e-commerce insights into your AI workflows.
Structured Data
Pre-formatted data ready for training and inference with ChatGPT, Claude, and other AI models.
Multi-Language Code Samples
Code snippets in Python, Java, C#, Node.js, and more. No coding from scratch—copy, paste, and build data pipelines in seconds.
Developer Documentation
Comprehensive API references and field definitions that reduce prompt engineering costs for AI-powered data understanding.
Custom Social Media Datasets Tailored to Your Needs
Easy-to-use, fully structured datasets built for diverse business scenarios.
High-Efficiency Data Extraction
Leverage clean residential proxy IPs to extract global site data in one click. 99%+ success rate, zero blocks, billion-scale collection capability.
Multiple Export Formats
Supports JSON, NDJSON, CSV, Parquet, JSON Lines, gzip compression, and more. Integrate seamlessly with your existing systems.
Flexible Payment Models
Flexible pricing, pay as you go. Covers major global sites. Fully GDPR & CCPA compliant—your data stays secure and compliant.
Unlimited Scaling Architecture
Handle massive concurrent requests via high-throughput proxy IPs. Integrates with Snowflake, Google Cloud, SFTP, and more—peak-ready.
Significant Cost Savings
Optimized proxy rotation and data extraction cut costs by 30%+. No self-hosted infrastructure required—focus on growing your business.
Fully Managed Service
We manage the entire data pipeline—including proxy IP maintenance and monitoring. Reduce operational overhead with guaranteed 24/7 uptime.
Seamless API Integration
Simple API interface with Webhook and S3 support. Quickly connect to your e-commerce system—extract ASINs, prices, reviews, and more.
24/7 Professional Support
Dedicated team on standby for custom guidance and troubleshooting. Combined with proxy optimization for worry-free, high-efficiency data collection.
Data Quality Assurance
AI-driven validation ensures accurate, complete, deduplicated data. Real-time monitoring and reporting included—ideal for product analysis, competitor tracking, and inventory management.
Popular Social Media Datasets
Facebook Dataset
osts, comments, profiles—covers all key data points. Supports brand sentiment analysis, competitor strategy, and engagement quantification across the web with hundreds of millions of records.
Instagram Dataset
Posts, KOL profiles, and visual content. Captures core visual trends and influencer insights for marketing matching and trend identification with tens of millions of records.
TikTok Dataset
Trending videos, creators, music. Provides watermark-free video links and download statistics—powering content discovery and short-form video algorithm research with massive real-time data.
YouTube Dataset
Video metadata, comments, channels with watch time details and user feedback. Ideal for video SEO optimization and content recommendation system training with millions of records.
X (Twitter) Dataset
Real-time tweets and media attachments. Captures global real-time conversations and retweet interactions—perfect for breaking news tracking and social sentiment analysis with hundreds of millions of tweets.
Reddit Dataset
Covers post titles, body text, nested comments, and upvote scores. Ideal for NLP sentiment analysis, opinion mining, and community-specific topic discovery—the best training material for conversational models.
iaohongshu (Little Red Book) Dataset
Posts with images, saves, and comment interactions. Core use cases: e-commerce trending product analysis, KOL selection, and consumer insight—help brands capture market opportunities first.
Focus on Your Core Business. Leave the Data Collection to Us.
Unlimited Web Scraping
Powered by dynamic residential IPs and intelligent unblocking. Bypass CAPTCHAs and geo-restrictions effortlessly—access data points from public web pages worldwide.
Ready-to-Use, Accurate Data
Every record goes through multi-stage validation and cleaning. Delivery-ready with no post-processing required—directly power your market analysis or AI model training.
Fully Automated Data Pipeline
Scheduled tasks and incremental updates supported. Data auto-delivers to your AWS S3 or database—zero manual intervention from start to finish.
How Companies Use Social Media Datasets
Identify High-Value Influencers
Stop wasting ad spend on ineffective influencers. Scientifically identify the most impactful creators across platforms based on real follower counts, engagement rates, and topic relevance. Leverage detailed influencer profiles and performance history for precise brand alignment and maximum campaign ROI.
Genuine Customer Voice Research
Deep-dive into what customers really think. Analyze likes, shares, and hashtags across millions of reviews and posts to accurately capture sentiment toward your products. Spot trending shifts fast—powering data-driven product iteration.
Brand Reputation & Crisis Monitoring
Stay in control of your brand reputation. Capture positive mentions across social networks, respond swiftly to customer praise or concerns. Use data analytics to identify potential PR risks before they escalate—proactive brand protection that builds lasting trust.