Recently, US food delivery giant DoorDash launched an app called "Tasks," which allows its delivery drivers to earn extra money by taking short videos or photos of street views, delivery scenes, or other images after each delivery.
DoorDash said that the purpose of Tasks is to help merchants gain more authentic offline insights while enabling AI and robotic systems to better understand the real world.
In other words, where delivery drivers once simply delivered meals, with the introduction of Tasks, they are now also responsible for submitting training data. It seems to shift the direction of AI training, moving from text and images on the internet into the physical world itself. But this raises a question: with so many ways to collect real-world data, why use delivery drivers?
The Tasks app is not simply an extra step added to the existing delivery process; it is an entirely independent system. In addition to accepting delivery orders, drivers can take on sporadic tasks like photographing dishes, capturing hotel entrances, recording daily actions, or taping foreign-language conversations. Although DoorDash promises to pay drivers for each completed task, the side gig is not as easy as one might think.
According to the Los Angeles Times, Tasks include not just collecting merchant information but also recording natural Spanish conversations, filming oneself washing dishes, folding laundry, loading a dishwasher, and even handling on-site tasks related to autonomous vehicles. In other words, to qualify for payment, drivers must do quite a bit beyond simply delivering food.
Observers can clearly see that DoorDash is not just looking for simple images or video clips; it wants standardized, reusable, real-world audio-visual samples. Notably, the company emphasizes that it has over 8 million "Dashers" covering every corner of every city. To put it bluntly, DoorDash is using Tasks as a pretext to announce to the market that it possesses AI training data collection capabilities.
Of course, the most pressing question is: how much do drivers actually earn from these tasks?
DoorDash has been vague on this point. WIRED, a US magazine, tested the app and found a sample data point: a task for filming a laundry process was advertised as paying US$15 per hour for a maximum of 20 minutes, but based on the platform's own estimate, the actual pay was as low as US$0.37. For tasks like scanning shelves, the page directly showed a US$16 payout.
The platform does provide drivers with extra income, but the rates are inconsistent, and the value of tasks varies. So why is DoorDash suddenly willing to pay for these seemingly mundane activities?
The answer is simple: DoorDash uses this audio-visual data, along with its partners, to train AI models. More tellingly, in March of this year, DoorDash officially launched its proprietary delivery robot, Dot, which is currently operating only in select cities and regions.
For DoorDash, the goal is to capture the hardest-to-solve "long-tail scenarios" in AI model training. Beyond standardized data derived from text, images, and code, models also need real-world situational data that works outside the lab. These are exactly the odd requirements found in Tasks: for example, filming obstructed storefront signs, temporary entrance changes, or disorganized items on shelves.
In the past, large model training relied mainly on internet text, images, code, and public videos, plus post-processing like human labeling and preference ranking. The core aim was to teach models to "see" and "speak." As early as the InstructGPT paper, OpenAI explained that even very large language models still require human feedback for fine-tuning. But with the rise of multimodal systems and robotics, that kind of data is no longer sufficient.
Multimodal and embodied AI need more than just cognitive knowledge—what a cup is or what a street sign looks like. They require concrete physical-world experience, such as which angle to approach a doorway, how to grasp objects of different materials, or what information most affects pathfinding on an unfamiliar street.
That is why DoorDash's Tasks, which looks like odd jobs for drivers, essentially functions as a low-cost data collection pipeline. Compared to traditional labelers sitting at computers, delivery drivers have one major advantage: they are already immersed in these complex scenarios, entering different stores, neighborhoods, office buildings, and hotels every day, generating authentic, usable training data.
On a deeper level, this effort also advances embodied AI. When DeepMind released Gemini Robotics in 2025, it noted that in the physical world, models must simultaneously handle perception, spatial understanding, state estimation, planning, and control—far more complex than generating text on a screen. Recent robotic models from companies like Google have focused on solving how to make robots not just repeat fixed actions but truly understand real-world scenes and human instructions, then perform corresponding operations. To achieve that, models need both internet-derived knowledge and vast amounts of real-world operational data.
Of course, delivery drivers are well-suited for inclusion in AI training not only because of cost-effectiveness but also because models moving into the real world most need this kind of "human-like" operation. If companies hired teams of professional engineers to collect real-world data, they might not do a better job than delivery drivers on this task.
DoorDash's Tasks program is a tool that helps companies quickly gather "ground-truth information" and stockpile foundational data that helps AI and robots better understand the real world.
That said, automated delivery is no longer a new concept.
In China, Meituan has already deployed autonomous delivery vehicles and drones in real delivery scenarios. According to Meituan's data, by the end of 2024, its autonomous vehicles had completed nearly 5 million orders, with 99% of mileage driven autonomously, reducing drivers' travel by over 2.4 million kilometers; its drones had completed over 450,000 orders.
This data shows that, at least in stable-route scenarios like campuses, residential areas, and airports, autonomous delivery has reached an acceptable level of efficiency.
In overseas markets, Serve Robotics announced in March of this year a partnership with White Castle to launch robot delivery via Uber Eats. Serve has already deployed robots in multiple US cities and completed over 2,000 robot deployments by the end of 2025. Meanwhile, Starship Technologies' autonomous delivery robots have completed over 9 million deliveries.
That is why DoorDash's move to have drivers "feed" data to AI is so telling. In publicized cases, DoorDash's own Dot delivery robot already achieves high success rates in automated delivery, but the scenarios remain relatively limited. The hardest challenges for robots remain trivial but complex issues: blocked storefront signs, temporary changes in community entrances, or incorrect pickup points.
DoorDash's brilliance lies in this action: it continues to have drivers deliver food while incentivizing them to collect AI training data; simultaneously, it uses that data for deep training to prepare for upcoming autonomous delivery robots. But in the short term, delivery drivers remain an indispensable part of this process, and their work is difficult to replace with automation.
Technological progress deserves recognition. But food delivery and instant logistics have never been just about "moving goods"; they involve customer communication, judgment, and handling complex situations. Thus, for a considerable time to come, delivery will still rely on human fallback.
(Source: Leikeji)
Related News:
Deepline | Beyond OpenClaw's craze: What remains after the AI storm?
Deepline | Sora's endgame tells cruel truth: Advanced AI goes nowhere without sound business model
Comment