Cruise leverages Apache Beam to manage and process petabytes of data monthly, essential for our autonomous vehicle model training. This talk will delve into the innovative features we’ve developed to enhance Beam’s capabilities, including a control plane for quota and user management, a C++ sandbox for running AV ROS nodes in the cloud, and shuffling optimization techniques to compress shuffled data