3/27/2017
When transferring images across the internet, many things can occur to break the transfer leaving partially corrupted image files as a result. These partially corrupted images are often able to be viewed in virtually any image viewer without causing any errors to be displayed. To find these partially corrupted files, some might resort to visually inspecting every file, a process that many companies simply cannot afford. This tutorial will walk you through creating an ImageFile class that can be used to detect partially corrupt images. Before we get into the code, there are a few things you should take note of; We’ll start with our using statements and the basic class definition The only using statement we’ll need is the System.IO class. We’ll be using this class to work with the files on a byte by byte basis. Then of course is our class container. You may feel free to change the names of the namespace or the class as you see fit. Add the following variables and properties to the top of the class This class is designed so that it only performs the processing once and only when the class is instantiated. To facilitate this we use private variables with public get accessors. This means that once the class is instantiated its data cannot be changed. You will need to instantiate a copy of this class for each file to be processed. These are the file signatures for each of the image file formats that this code can work with. The values in the comments are the hexadecimal equivalents to the values in the variable definitions. These are the byte signatures that should appear at the end of the files. Just as with the signatures, the values in the comments are the hexadecimal equivalents to the values in the definitions. You may note that there are two signatures for GIF and two end bytes for JPG. This is because there are multiple versions of these file format specifications. JPG actually uses one more byte in its signature. The fourth JPG signature byte specifies the JPG version, but all the versions I found, use one of two end bytes so I chose to ignore this last signature byte. This is the only constructor that we’ll provide for this class, which means that you must specify a file name when instantiating the class. Take note that file name here means the entire path and name of the file in the proper system path format. The order of operations in this is important to ensure that the file gets processed correctly. If you do not wish to trim the file, comment out the first if statement in the function. These two functions determine what type of image file this is using the file’s byte signature. Because of the various versions of each file format, we can not assume that the file’s extension is accurate. I’ve also found instances where images created by certain graphical editors are given a JPG extension but are actually PNG files internally. These three functions look at the last several bytes in the file to see if they match the file type’s end byte signature and sets the file completed Boolean accordingly. It’s written in this manner to allow one image type to have multiple signatures and/or multiple end byte signatures. Note also, that even if the image type has multiple possible end bytes, only one end byte must match to be considered a complete file. This is the enum that we’ve been using throughout this file that allows us to track what type of image file this is. This should be declared outside the class but within the same namespace. Finally, these three methods are used to detect and trim null values from the end of the file. If you commented out or deleted the appropriate line in the code above, then these methods do not need to be included for the class to operate. Null values being left at the end of an image file is a rare occurrence caused by faulty software. In my instance, our end users have a program that uploads images in large chunks. If the last chunk isn’t as large as the specified chunk size, the program fills in the rest with nulls. These nulls can prevent a complete file from being identified correctly so we had to remove them. As stated earlier, this code was written to be used from a DLL. You can put it directly into your project, either way, you’ll use it the same. This is an example of how you can use this new class in your code. This example code scans through every file in a given directory and displays a list of incomplete files to the console. Notice that this example code will ignore any file format not specifically handled or recognized by the class code. To make this a complete program, simply place a call to this method inside your main event.
using System.IO;
namespace ImageUtilities
{
public class ImageFile
{
}
}private readonly string _filename = string.Empty;
public string FileName { get { return _filename; } }
private ImageFileType _fileType = ImageFileType.FileNotFound;
public ImageFileType FileType { get { return _fileType; } }
private bool _fileComplete = false;
public bool FileComplete { get { return _fileComplete; } }#region Signatures
// 89 50 4E 47 0D 0A 1A 0A
private readonly byte[] _pngSignature = {137, 80, 78, 71, 13, 10, 26, 10};
// FF D8 FF
private readonly byte[] _jpgSignature = {255, 216, 255};
// 47 49 46 38 37 61
private readonly byte[] _gifaSignature = {71, 73, 70, 56, 55, 97};
// 47 49 46 38 39 61
private readonly byte[] _gifbSignature = {71, 73, 70, 56, 57, 97};
#endregion#region EndBytes
// 49 45 4E 44 AE 42 60 82
private readonly byte[] _pngEnd = {73, 69, 78, 68, 174, 66, 96, 130};
// FF D9 FF FF
private readonly byte[] _jpgEndA = {255, 217, 255, 255};
// FF D9
private readonly byte[] _jpgEndB = {255, 217};
// 00 3B
private readonly byte[] _gifEnd = {0, 59};
#endregionpublic ImageFile(string filename)
{
_filename = filename;
if(NeedsTrim()) TrimFile();
SetFileType();
if(_fileType != ImageFileType.FileNotFound && _fileType != ImageFileType.NotRecognized)
{
SetFileComplete();
}
}private void SetFileType()
{
if(File.Exists(_filename))
{
var buffer = newbyte[20];
using(var fs = newFileStream(_filename, FileMode.Open))
{
if(fs.Length > 20)
fs.Read(buffer,0,20);
else
fs.Read(buffer, 0, (int)fs.Length);
}
if(MatchBytes(buffer, _pngSignature, ImageFileType.Png)) return;
if(MatchBytes(buffer, _jpgSignature, ImageFileType.Jpg)) return;
if(MatchBytes(buffer, _gifaSignature, ImageFileType.GifA)) return;
if(MatchBytes(buffer, _gifbSignature, ImageFileType.GifB)) return;
_fileType = ImageFileType.NotRecognized;
}
else
{
_fileType = ImageFileType.FileNotFound;
}
}
private bool MatchBytes(byte[] buffer, byte[] comp, ImageFileType fType)
{
for(var i = 0; i < comp.Length; i++)
{
if(buffer[i] != comp[i]) returnfalse;
}
_fileType = fType;
return true;
}private void SetFileComplete()
{
if(File.Exists(_filename))
{
switch(FileType)
{
case ImageFileType.Png:
SetComplete(_pngEnd);
break;
case ImageFileType.Jpg:
SetComplete(_jpgEndA);
SetComplete(_jpgEndB);
break;
case ImageFileType.GifA:
case ImageFileType.GifB:
SetComplete(_gifEnd);
break;
}
}
}
private void SetComplete(byte[] endBits)
{
var buffer = new byte[endBits.Length];
using(var fs = newFileStream(_filename, FileMode.Open))
{
if(fs.Length > endBits.Length)
{
fs.Seek((int)fs.Length - endBits.Length, 0);
fs.Read(buffer, 0, endBits.Length);
}
else
{
fs.Read(buffer, 0, (int)fs.Length);
}
if(MatchEndBytes(buffer, endBits)) return;
}
}
private bool MatchEndBytes(byte[] buffer,byte[] comp)
{
for(var i = 1; i < comp.Length; i++)
{
if(buffer[buffer.Length - i] != comp[comp.Length - i]) return false;
}
_fileComplete = true;
return true;
}public enum ImageFileType
{
FileNotFound,
NotRecognized,
Png,
Jpg,
GifA,
GifB
}private bool NeedsTrim()
{
using(var fs = newFileStream(_filename, FileMode.Open))
{
if(fs.Length > 0)
{
fs.Seek(fs.Length-1, 0);
var b = fs.ReadByte();
return b == 0;
}
return false;
}
}
private void TrimFile()
{
byte[] buffIn;
using(var fs = newFileStream(_filename, FileMode.Open))
{
buffIn = new byte[fs.Length];
fs.Read(buffIn, 0, (int)fs.Length);
}
var index = FindFirstNull(buffIn);
if(index < 0) return;
var buffOut = new byte[index];
for(int i = 0; i < index; i++)
{
buffOut[i] = buffIn[i];
}
using(var fs = newFileStream(_filename, FileMode.Create))
{
foreach(byte b in buffOut)
{
fs.WriteByte(B);
}
}
}
private int FindFirstNull(byte[] buffer)
{
for(var i = buffer.Length - 1; i > 0; i--)
{
if(buffer[i] != 0) return i + 1;
}
return -1;
}private void ScanForPartials()
{
var targetPath ="C:\\SomePath\\";
if(Directory.Exists(TargetPath))
{
var fileList = Directory.GetFiles(TargetPath);
progressBar1.Maximum = fileList.Count();
var incomplete = 0;
foreach(var s in fileList)
{
var obj = newImageFile(s);
if(obj.FileType == ImageFileType.FileNotFound || obj.FileType == ImageFileType.NotRecognized) continue;
if(obj.FileComplete != false) continue;
incomplete++;
Console.Write(string.Format("{0}) InComplete {1}: {2}", incomplete, Enum.GetName(typeof(ImageFileType), obj.FileType), obj.FileName));
}
}
}