This talk covers the distributed architecture that Skyscanner built to solve the data challenges involved in the generation of images of all hotels in the world. Putting together a distributed system in Python, based on queues, surfing on the AWS Cloud.
Our goal? To build an incremental image processing pipeline that discards poor quality and duplicated images, scaling the final images to several sizes to optimise for mobile devices.
Among the challenges:
Among the tools we used? Pillow, ImageHash, Kombu and Boto.