Detect Partially Corrupt Images

3/27/2017

When transferring images across the internet, many things can occur to break the transfer leaving partially corrupted image files as a result. These partially corrupted images are often able to be viewed in virtually any image viewer without causing any errors to be displayed. To find these partially corrupted files, some might resort to visually inspecting every file, a process that many companies simply cannot afford. This tutorial will walk you through creating an ImageFile class that can be used to detect partially corrupt images. Before we get into the code, there are a few things you should take note of;

This code only works with PNGs, JPGs, JPEGs, and GIFs.
This code does not care about file extensions, it uses the files data signature to determine how to process it.
Some images may be identified by this code as partially corrupt when they are in fact whole. (During testing it was discovered that images from one specific camera were always identified as partially corrupt. This is because the camera ignores the file specifications requiring certain bits at the end of the file, and adds its own junk there.)
Some partially corrupt images will not be identified by this code because they may still have the proper end bits. This will almost always be the case if the file was opened and resaved using any image editor and some image viewers after the corruption occurred.
This code also includes null bit removal code. In some rare instances, null bits are appended to the end of the file during file transfer. These null bits can cause an image file to be identified as partial when it is in fact, complete. The null bit removal code removes the null bits before the file is processed. This can be easily commented out with no harm to the rest of the program.
This code does NOT tell you if the entire file is corrupt. To do that, simply open the file using an Image object, if the whole file is corrupt it will throw an error.
This code was written for use in a Dynamic Linked Library project, so you will need to write other code to work with this if you want to use or test it.

We’ll start with our using statements and the basic class definition

using System.IO;

namespace ImageUtilities
{
  public class ImageFile
  {
  }
}

The only using statement we’ll need is the System.IO class. We’ll be using this class to work with the files on a byte by byte basis. Then of course is our class container. You may feel free to change the names of the namespace or the class as you see fit. Add the following variables and properties to the top of the class

private readonly string _filename = string.Empty;
public string FileName { get { return _filename; } }

private ImageFileType _fileType = ImageFileType.FileNotFound;
public ImageFileType FileType { get { return _fileType; } }
private bool _fileComplete = false;
public bool FileComplete { get { return _fileComplete; } }

This class is designed so that it only performs the processing once and only when the class is instantiated. To facilitate this we use private variables with public get accessors. This means that once the class is instantiated its data cannot be changed. You will need to instantiate a copy of this class for each file to be processed.

#region Signatures
// 89 50 4E 47 0D 0A 1A 0A
private readonly byte[] _pngSignature = {137, 80, 78, 71, 13, 10, 26, 10};
// FF D8 FF
private readonly byte[] _jpgSignature = {255, 216, 255};
// 47 49 46 38 37 61
private readonly byte[] _gifaSignature = {71, 73, 70, 56, 55, 97};
// 47 49 46 38 39 61
private readonly byte[] _gifbSignature = {71, 73, 70, 56, 57, 97};
#endregion

These are the file signatures for each of the image file formats that this code can work with. The values in the comments are the hexadecimal equivalents to the values in the variable definitions.

#region EndBytes
// 49 45 4E 44 AE 42 60 82
private readonly byte[] _pngEnd = {73, 69, 78, 68, 174, 66, 96, 130};
// FF D9 FF FF
private readonly byte[] _jpgEndA = {255, 217, 255, 255};
// FF D9
private readonly byte[] _jpgEndB = {255, 217};
// 00 3B
private readonly byte[] _gifEnd = {0, 59};
#endregion

These are the byte signatures that should appear at the end of the files. Just as with the signatures, the values in the comments are the hexadecimal equivalents to the values in the definitions. You may note that there are two signatures for GIF and two end bytes for JPG. This is because there are multiple versions of these file format specifications. JPG actually uses one more byte in its signature. The fourth JPG signature byte specifies the JPG version, but all the versions I found, use one of two end bytes so I chose to ignore this last signature byte.

public ImageFile(string filename)
{
  _filename = filename;
  if(NeedsTrim()) TrimFile();
  SetFileType();
  if(_fileType != ImageFileType.FileNotFound && _fileType != ImageFileType.NotRecognized)
  {
    SetFileComplete();
  }
}

This is the only constructor that we’ll provide for this class, which means that you must specify a file name when instantiating the class. Take note that file name here means the entire path and name of the file in the proper system path format. The order of operations in this is important to ensure that the file gets processed correctly. If you do not wish to trim the file, comment out the first if statement in the function.

private void SetFileType()
{
  if(File.Exists(_filename))
  {
    var buffer = newbyte[20];
    using(var fs = newFileStream(_filename, FileMode.Open))
    {
      if(fs.Length > 20)
        fs.Read(buffer,0,20);
      else
        fs.Read(buffer, 0, (int)fs.Length);
    }
    if(MatchBytes(buffer, _pngSignature, ImageFileType.Png)) return;
    if(MatchBytes(buffer, _jpgSignature, ImageFileType.Jpg)) return;
    if(MatchBytes(buffer, _gifaSignature, ImageFileType.GifA)) return;
    if(MatchBytes(buffer, _gifbSignature, ImageFileType.GifB)) return;
    _fileType = ImageFileType.NotRecognized;
  }
  else
  {
    _fileType = ImageFileType.FileNotFound;
  }
}

private bool MatchBytes(byte[] buffer, byte[] comp, ImageFileType fType)
{
  for(var i = 0; i < comp.Length; i++)
  {
    if(buffer[i] != comp[i]) returnfalse;
  }
  _fileType = fType;
  return true;
}

These two functions determine what type of image file this is using the file’s byte signature. Because of the various versions of each file format, we can not assume that the file’s extension is accurate. I’ve also found instances where images created by certain graphical editors are given a JPG extension but are actually PNG files internally.

private void SetFileComplete()
{
  if(File.Exists(_filename))
  {
    switch(FileType)
    {
      case ImageFileType.Png:
        SetComplete(_pngEnd);
        break;
      case ImageFileType.Jpg:
        SetComplete(_jpgEndA);
        SetComplete(_jpgEndB);
        break;
      case ImageFileType.GifA:
      case ImageFileType.GifB:
        SetComplete(_gifEnd);
        break;
    }
  }
}

private void SetComplete(byte[] endBits)
{
  var buffer = new byte[endBits.Length];
  using(var fs = newFileStream(_filename, FileMode.Open))
  {
    if(fs.Length > endBits.Length)
    {
      fs.Seek((int)fs.Length - endBits.Length, 0);
      fs.Read(buffer, 0, endBits.Length);
    }
    else
    {
      fs.Read(buffer, 0, (int)fs.Length);
    }
    if(MatchEndBytes(buffer, endBits)) return;
  }
}

private bool MatchEndBytes(byte[] buffer,byte[] comp)
{
  for(var i = 1; i < comp.Length; i++)
  {
    if(buffer[buffer.Length - i] != comp[comp.Length - i]) return false;
  }
  _fileComplete = true;
  return true;
}

These three functions look at the last several bytes in the file to see if they match the file type’s end byte signature and sets the file completed Boolean accordingly. It’s written in this manner to allow one image type to have multiple signatures and/or multiple end byte signatures. Note also, that even if the image type has multiple possible end bytes, only one end byte must match to be considered a complete file.

public enum ImageFileType
{
  FileNotFound,
  NotRecognized,
  Png,
  Jpg,
  GifA,
  GifB
}

This is the enum that we’ve been using throughout this file that allows us to track what type of image file this is. This should be declared outside the class but within the same namespace.

private bool NeedsTrim()
{
  using(var fs = newFileStream(_filename, FileMode.Open))
  {
    if(fs.Length > 0)
    {
      fs.Seek(fs.Length-1, 0);
      var b = fs.ReadByte();
      return b == 0;
    }
    return false;
  }
}

private void TrimFile()
{
  byte[] buffIn;
  using(var fs = newFileStream(_filename, FileMode.Open))
  {
    buffIn = new byte[fs.Length];
    fs.Read(buffIn, 0, (int)fs.Length);
  }
  var index = FindFirstNull(buffIn);
  if(index < 0) return;
  var buffOut = new byte[index];
  for(int i = 0; i < index; i++)
  {
    buffOut[i] = buffIn[i];
  }
  using(var fs = newFileStream(_filename, FileMode.Create))
  {
    foreach(byte b in buffOut)
    {
      fs.WriteByte(B);
    }
  }
}

private int FindFirstNull(byte[] buffer)
{
  for(var i = buffer.Length - 1; i > 0; i--)
  {
    if(buffer[i] != 0) return i + 1;
  }
  return -1;
}

Finally, these three methods are used to detect and trim null values from the end of the file. If you commented out or deleted the appropriate line in the code above, then these methods do not need to be included for the class to operate. Null values being left at the end of an image file is a rare occurrence caused by faulty software. In my instance, our end users have a program that uploads images in large chunks. If the last chunk isn’t as large as the specified chunk size, the program fills in the rest with nulls. These nulls can prevent a complete file from being identified correctly so we had to remove them. As stated earlier, this code was written to be used from a DLL. You can put it directly into your project, either way, you’ll use it the same.

private void ScanForPartials()
{
  var targetPath ="C:\\SomePath\\";
  if(Directory.Exists(TargetPath))
  {
    var fileList = Directory.GetFiles(TargetPath);
    progressBar1.Maximum = fileList.Count();
    var incomplete = 0;
    foreach(var s in fileList)
    {
      var obj = newImageFile(s);
      if(obj.FileType == ImageFileType.FileNotFound || obj.FileType == ImageFileType.NotRecognized) continue;
      if(obj.FileComplete != false) continue;
      incomplete++;
      Console.Write(string.Format("{0}) InComplete {1}: {2}", incomplete, Enum.GetName(typeof(ImageFileType), obj.FileType), obj.FileName));
    }
  }
}

This is an example of how you can use this new class in your code. This example code scans through every file in a given directory and displays a list of incomplete files to the console. Notice that this example code will ignore any file format not specifically handled or recognized by the class code. To make this a complete program, simply place a call to this method inside your main event.

Respond

	CupCode Gamers
	From the Cup, to the Code, for the Gamers