3 - File FORMAT and COMPRESSION....
Right getting down to some Nitty-Gritty here. If I open a picture in my favourite Photo-Editor; and then 'Save As', it gives me a drop down list of possible 'File Types' that I can save it as; .tif, .gif, .bmp, jpg, to mention four you might come accross. There are a lot of other such 'formats' that are more or less common. However, jpg or J-Pegg, is the most usual that most camera makers have standardised on for consumer cameras, and the format most cameras will default to saving photo's in.
I just described two pictures in words, right? And I'm typing this in 'English'... because I am English and I speak the English language and I write in the English language.... because well, I'm useless at languages really! I can just about order a cup of coffee in French... and a beer in Spanish.... but more than that? Not a hope! BUT... if I was so gifted, I could translate those descriptions into other languages; French, German, Spanish, Dutch, I dont know, Swahili, perhaps, ancient greek or even Latin.
And each of those different languages would probably take a different number of words, and certaily a different number of charecters to say the same thing... more or less.
So, the variouse different file formats used to describe an Image are a similar thing. Different languages, or conventions for describing the picture, and they each have thier own advantages and dissadvantages about how they do it.
And a big part of this is COMPRESSION.
Used the analogy earlier of a ton of feathers and a ton of lead; both weigh the same amount, but the feathers would take up a lot more space.... only you can 'squash' feathers.. You can squash lead actually, but tends to take a bit more brute force, and all you are likely to achieve is to change the lead's shape... bit like squashing plasticine... it'll just change shape, not volume.
Feathers, though. Nice and squashy, and you could pack a load into a plastic bag, then suck all the air out with a vacuum cleaner, and your ton of feathers would still weigh a ton.... but instead of filling a rather large garage... they would probably pack down enough to fit in the back of a small van.
Computer Data is similar; and depending on the format used, more or less compression is possible, to squash down the FILE SIZE.
Lets go back to that grid at the beginning...
We have 60 squares accross, and 40 squares down, 2,400 'pixels' in all.
Now, the blank grid is nice and easy to describe. ALL the pixels are 'white'. ah yes. 'white'... what is 'white'?
Well, remembering my O-Level physics a long long time ago, 'white' is ALL the colours of the rainbow, jumbled up and seen together. Richard Of York Was Found in a Carpark.... Red, Orange, Yellow, W..... err... no.... that's not right
Need to keep this simple! Back to my primary school art teacher, who gave me a little palet of just five paints, Red, Blue & Green, for 'colours' and Black and white for 'shades', and told me they were the 'Primary' colours and I could mix ANY colour from them.... actually... she gave me Red, Yellow and Blue... but it's close enough!
Right; Three colours, and mix them in the right proportion and you can make any other colour you want. Pale blue? Just a little bit of blue, nothing else. Bright Red? Lots and lots of red and nothing else. White? Nothing at all. Black? All of all of them.
Getting a bit scientific then, we can define a colour by saying how much of each primary colour is needed to make it.
So we have a scale, 0% to 100%, for each, Red, Blue and Green.
If we have 0%Red 0% Blue 0%Green, we get black. If we have 100%Red, 100%Blue 100% Green we get white.
So we need three values between 0 and 100 to define the colour and brightness of each pixel.... our grid has 2,400 of them... lets try and 'code' the image?
Well first of all we need to say what the proportions are, so 60 pixels wide, by 40 pixels tall, then we need to explain how we are going to describe them? Battlehsips grid? So rows denoted by numbers, columns by letters, A1 in the top right hand corner.... And so we begin
A1= 100R100B100G
B1 = 100R100B100G
C1 = I'm getting bored of this..... they are all the same, aren't they?
lets look at a something a bit more like a real picture, and have a think about this....
Its the leggo folk dancers again. Lets have a crack at coding it, using this Battle-ship grid method.
A1=66R87G91B A2=68R89G93B A3=67R90G93B A4=69R91G93B
B1=68R89G93B B2=67R88G91B B3=70R90G95B B4=71R91G95B
C1=65R85G97B C2=71R91G93B C3=69R90G93B C4=71R92G95B
D1=70R91G93B D2=70R90G93B D3=71R91G93B D4=71R92G95B
Right... I have only done the top right hand corner, which is mostly blue sky, so the values are all very similar, but you get the idea, could work right the way accross and down the picture, and assign every square a Red-Green-Blue colour value.
Lots of charecters in there isn't there? But, in the 'heavier' file format's like tiff, or RAW, this is the sort of laboriouse long hand way the data that descrives the picture is coded.... I've only done 16 pixels out of two and a half thousand, imagine how much it would take to make the 'propper' one.
That's the 600x400 resolution image, that's got 240,000 pixels needing thier three value description. The 'full-size' image I shot on my camera is 6000x4000 and needs 24 million, three value descriptions to describe all it's pixels!
There has to be a shorter way of doing this, especially as a LOT of pixels are going to have exactly the same value? And you are right. This is the 'Pallet Colour' method of doing it.
We started with two data-sets to describe the picture. First the 'Header' information is the one that defined the width and height of the image, then says that we are going to describe each pixel ordered in battle-ship grid array, by three value colour-code. Then we have the 'Data-Set' that defines the colour codes for each pixel in turn.
Now, with a lot of pixels, the same colour, what we could do, to short hand the millions and millions of possible colour codes we could have, is put a 'pallet' into the header data. We say "AA = 66R87G91B" and AB = 68R89G93B etc and our data array then becomes a lot smaller.
A1=AA A2=AB A3=AC A4=AD
B1=AE B2=AF B3=AG B4=AH
C1=AI C2=AJ C3=AK C4=AL
D1=AM D2=AN D3=AJ D4=AL
And provided that you have a lot of repeated values; enough that you make less extra data creating the look up table, than you save, in the pixel data, it can dramatically reduce the amount of raw code or data needed to describe the picture.
BUT, it means that the computer has to keep going and looking at the look-up table to find out what the actual values it needs are for each pixel, rather than having them immediately to hand.
Which means that while you need less 'space' to store the image file, it gives the device that has to write it, the camera, or whatever is going to read it, and display it on a screen more 'processing' to do on it.
And While I have used to methods of encoding to explain it, there are endless variations and permutations on how it might actually be done.
But this is PRIMARY compression that is contained in the actual 'format' for encoding a picture.
As said, the 'heavier' file formats that make files with the largest file-size for any given image size, tend to write thier code out on long hand, which reduced the processing demands reading and writing the file, making it easier to manage in the camera when created, then on anything that displays it, or in an editor if you are trying to make any changes.
The 'lighter' formats, that make smaller file sizes for any given image size, make the camera or display devices or editing software do more work, but the storage size is smaller, so they take up less storage space and if you want to transport them; upload them to the web, send them via e-mail or anything like that, can be moved more easily.
And that is really as far as I want to take it, without getting into the debates on the merits of different file formats, other than to explain a feature of J-Peg that you may have come across. 'Compression Rate'.