To answer your question, this is how you know if a photo has been Altered:
Meta properties consist of information about a file. These are typically the type of file (e.g., PNG or JPEG), file size, and picture dimensions. If a picture does not have the correct dimensions or has a different file size, then the analyst can immediately identify that the file is different from the expectation.
It is usually a good idea to also track the number of color channels. A picture with one (1) color channel is monochrome. Three (3) color channels are usually translated as RGB, but may actually be a JPEG's YUV data streams. Four (4) color channels are usually RGB with a transparency (alpha) channel -- even if the transparency is unused -- but it can also be a JPEG encoded with CMYK or similar color transformation information.
While the file's name is typically recorded as a meta property, the name is less important than other information. This is because software, services, and examiners may rename files. In addition, changes may be saved to the same file name, altering the data without changing the name. File names are usually not unique and are frequently altered.
It is important to remember that none of the meta properties are unique. Two very different pictures can have the same dimensions, file sizes, etc. Moreover, changes can be made inside the file, such as altering a comment or modifying a timestamp, without altering the dimensions or file size. Meta properties cannot identify tampering.
Meta properties provide a simple way to summarize the picture. For example, if you identify the image as a JPEG that is 300x500 but your corworker says it should be a PNG that is 350x600, then you know you have the wrong file.
In contrast to meta properties, cryptographic checksums (also called hashes or digests) act like digital fingerprints. It is extremely unlikely for two different files to have the same cryptographic checksum values. The most common cryptographic checksum algorithms are:
MD5: The Message-Digest Algorithm 5 (MD5) generates a 128-bit digest of the file. The hash is typically written as 32 alphanumeric (hexidecimal) values.
SHA1: The Secure Hash Algorithm Version 1 (written SHA1 or SHA-1) is similar to MD5, but it generates a 160-bit hash value. Compared to MD5, SHA1's longer hash size and alternate computation method lowers the likelihood of a hash-collision, where two different files generate the same hash value.
SHA256: The Secure Hash Algorithm Version 2 (SHA2) was designed to replace SHA1 due to a theoretical mathematical weakness. Unlike SHA1, SHA2 defines a family of functions that vary by bitsize: 224, 256, 384 or 512 bits. Each function is identified by the bit length. For example, SHA256 is the 256-bit SHA2 hash function. Along with SHA2 is SHA3, which defines even longer hash sizes.
With cryptographic checksums, a single file will always generate the same hash value. Any minor change to the file will cause a significantly different result. Even if the files have the same size and appearance, a single byte change will alter the digest. Moreover, the cryptographic complexity means that it is virtually impossible for someone to fiddle with the bytes in order to match the original checksum.
These digests can be used to detect tampering. By verifying a file's hash value, an analyst can confirm that they are evaluating the correct file. If the hashes differ, then it is either the wrong file or the evidence has been altered. A different hash value means that at least one byte was changed, but it does not idenitfy what was changed, who changed it, or when the change occurred.
While almost perfect, cryptographic checksums do have the concept of a hash collision. This happens when two different files have the same checksum value. Methods have been developed to intentionally generate two files with the same MD5 hash values. The MD5 attack requires changing large blocks of data to contain random-looking values; the attack changes the file's contents and may alter the file's size. For SHA1, the attack requires trying an estimated 261 (2,305,843,009,213,693,952) variations of the file; a determined attacker is unlikely to find a way to replicate a known hash value on a tampered file, even if they have a few years. SHA256 needs even more variations to intentionally generate a specific hash value.
Although hash collisions are technically possible, it is extremely unlikely for two image files to contain similar pictures, have valid file formats, and generate the same cryptographic checksum values. When using multiple digests to confirm a digital picture (e.g., using both MD5 and SHA1, or SHA1 and file size to identify a valid JPEG), it becomes effectively impossible to have a hash collision.
In general, MD5 and SHA1 are commonly used for file checksums. SHA1 is more robust than MD5, but MD5 is typically complex enough for hashing pictures. The SHA2 family of functions are better suited to security-sensitive applications, such as digital signatures for encrypted data streams. While less common, SHA256 has been used as a checksum for authenticating sensitive evidence files.
There are many other types of checksums. Some, like CRC-16 and CRC-32, are used for quickly checking consistency. However, these cyclic redundancy check (CRC) hashes are not unique and have frequent collisions. Detecting the same CRC-32 value on two files is not an indication that the files are the same. However, different CRC values does denote a difference in the files.
Even among cryptographic hash function, there are a wide variety of algorithms and hash size. For example, MD4 is a much weaker alternative to MD5, and most of the SHA family of algorithms, such as SHA2's SHA-384 and SHA3's SHA3-512, are uncommon outside of strong cryptographic systems.
FotoForensics Digest Information
For digital computer evidence, the most commonly recorded information consists of the picture's type, dimensions, file size, and either the MD5 or SHA1 checksum values. (Within FotoForensics, each file's ID consists of the SHA1 digest and file's size.)
The digests provided by FotoForensics includes:
Type of image (e.g., JPEG or PNG)
Number of color channels
File size in bytes
This information is enough for an analyst to verify that they are examining the correct file. If can also be used to ensure that a file was not altered by the upload or storage process.