US20240171761
2024-05-23
Electricity
H04N19/42
An end-to-end instance-separable semantic-image joint compression system is designed to improve the efficiency of image compression by utilizing the redundancy between images and their associated semantics. This system includes various components such as image and mask encoders, a union encoder, an embedding extraction module, and entropy coding modules. By allowing independent coding of instances, the system enhances the rate-distortion performance while enabling efficient retrieval of specific object semantics and images during decoding.
The proposed method addresses the limitations of existing technologies that compress images and their semantics separately, which often leads to significant redundancy. By jointly compressing images with their semantics—such as instance masks, bounding boxes, and relationships—the system reduces computational costs and improves overall compression ratios. This is particularly beneficial for applications like real-time special effects in movies or virtual group photos, where only specific parts of an image are needed.
The joint codec system consists of several key components:
The encoding process involves several steps starting with the extraction of semantics from input images. The image and mask representations are created through respective encoders, which are then combined in the union encoder. The resulting union representation is quantized into a union embedding that is embedded into a semantic graph. Each component's bitstream is generated through entropy coding based on probability distributions estimated by various entropy models, ensuring efficient compression.
The decoding process begins with the recovery of the semantic graph from its bitstream. Following this, the union embedding is decoded to retrieve the quantized union representation, which then allows for the recovery of mask representations. Finally, using all decoded information, the image representation is reconstructed. This systematic approach ensures that specific instances can be accessed without needing to decode all information, thereby optimizing computational resources and reducing transmission costs.