Data Storage in Images: PNG vs JPG

Data Storage in Images: PNG vs JPG

Obtaining Images from Text Data

I found an interesting fact today when I tried to implement a small project. I was trying to store text based information into an image by converting each of its values to its respective ASCII representation (ASCII = standard representation of all characters in their binary value equivalent) to check for possibility of changing data type before file transfer.

To my least expectation I found that when I stored data in a png format, the data took nearly half the data size as compared to original text file. This was implemented using the following code:

import numpy as np
from PIL import Image as im

with open('/content/test.txt', 'r') as f:
    text = f.read()

charCount = len(text)
image_size = int(np.ceil(np.sqrt(charCount)))

ascii_values = [ord(c) for c in text]

matrix = np.zeros((image_size , image_size ), dtype=np.uint8)

# Fill the matrix with the ASCII values, row by row
row = 0
col = 0
for value in ascii_values:
    matrix[row, col] = value
    col += 1
    if col == image_size :
        col = 0
        row += 1

# Convert the matrix to an image
image = im.fromarray(matrix)
image.save('data.png')

# Display the image
from IPython.display import Image, display
display(Image('data.png'))

For this demonstration I used my previous article on High Dimension Data Modelling as sample text. You can pick the sample text from here .

The sample text contained 32000+ characters. Given that an ASCII value has range 0-255 we used 1 grayscale pixel for each character. Thus the minimum size of matrix to represent the same would be ceil(sqrt(charCount)) where charCount is the number of characters in the sample text which came out to be 181 for our case. Thus an image of dimension 181x181 pixels was generated as an output.

Now the above code represents the output as a PNG image. In order to obtain a JPG image as well we do not need any extra changes. just replacing the data.png name with data.jpg will do the work.

Comparing Sizes of PNG and JPG

An interesting observation about the sizes of file was observed when the produced images were compared with the original text file.

Here we can see that the original text file was 33KB in size but the PNG file reduced to 24KB in size storing the same data, over that the JPG file went way ahead and had a storage size of 13KB only.

Let us compare the outputs of JPG and PNG files:

(sorry for the representational constraint on the platform)

PNG Output Image:

JPG Output image:

There isn't much difference right?

Now, isn't this an optimal way to store the data in JPG then? Well No. Why?

we will see the difference in the image to text conversion part.

Image back to Data

Now that we have the data stored in the image our next step would be to convert this image back to textual data. This is done using the following code:

import numpy as np
from PIL import Image as im

# Convert the grayscale image to a NumPy array
image2= im.open("data.png")
matrix2 = np.array(image2)

text = ""
for i in range(image_size):
    for j in range(image_size):
        text += chr(matrix2[i, j])

# Write the text to a file
with open('decrypted.txt', 'w') as f:
    f.write(text)

print("Text saved to file")

Here, we convert the image back to a numpy matrix and change the matrix values corresponding to ASCII value of characters back to characters.

The results were as follows:
- PNG converted the text back to original text.
PNG output: Introduction High-dimensional data modeling ...
- JPG converted the text to some un-recoganizable text.
JPG output: DkqnhhyZxkAubj1Wpclrwgupbho]wW|i^hm]qe&^v*#_pct^qYe/nrbnz

Difference between TXT, PNG and JPG

The difference in file size between the TXT file, PNG image, and JPG image is due to the different compression methods used by each format.

TXT files are uncompressed, which means that they store the text information in its original form without any encoding or compression. This results in a large file size, as each character requires a full byte to represent.

PNG files use a lossless compression method, which means that the image information is stored without any loss of quality. This method is able to achieve a significant reduction in file size compared to TXT files, but it is still not as efficient as lossy compression methods.

JPG files use a lossy compression method, which means that some image information is lost in order to achieve a smaller file size. This method is able to achieve very small file sizes, but it comes at the cost of a slight loss of image quality.

Conclusion

still, was full data retrieved from PNG? Yes.
Was there any error in Data? No.

If there was no issue with PNG file storage, then can we not store data in them? Yes we can. The only difference in the output of the PNG file was that it appended blank space at the end given that the values were added on a 0-valued matrix. so the unfilled values remained 0 leaving the last values as they are. Can we fix this issue?

Yes, one method would be to store the size of characters in a few pixels at the end. so we can remove the excess pixels from the back.

What about after this? Can we not store the data as an image after this implementation? Yes we can.

Alternatively, If you are aware, YouTube, Twitch and other such platforms offer an unlimited space to store your videos, so can you guess the next step? Yes, we convert each of these images into a frame for a video and process them to create a free Drive with unlimited space.

import cv2
import glob

# Define the video resolution
video_width = 640 # keep image width same as this 
video_height = 480 # keep image height same as this

fps = 30

# Get a list of all the image files in the current directory
images = glob.glob('*.png')

# Sort the image files by name
images.sort()

# Create a VideoWriter object
video = cv2.VideoWriter('video.mp4', cv2.VideoWriter_fourcc(*'XVID'), fps, (video_width, video_height))

# Loop through each image file and add it to the video
for image in images:
    image = cv2.imread(image)
    video.write(image)

video.release()

Warning: It is advised not to store any Important data onto servers like YouTube given that in case they take any compressive or super-resolution measures on video before storage, the data may get corrupted as with JPG compression and thus lost. It is advised to either retain data in your own local system or keep it in a reliable video cloud storage.