Friday 19 September 2008

Comparing two files, one on a hard drive, and one from a web site

In my quest to create an effective and lean search engine service for the company I work for, I used to crawl all of our websites and download every file from them, and write them to a local cache. When the cache was detected as updated, my indexing system (based on Lucene.Net) detects the changes, and reindexes those sites.

You may well be able to imagine, that with over 800 sites, this is an awfully large amount of data to writing to the cache, and then an even bigger job for the index to reindex all these files!

So, what I wanted was a mechanism whereby I could compare the local copy, to a streamed copy of the file from the server into the local memory, and see there were any changes.

To do this, I immediate thought of looking for something like a checksum check, but a colleague recommended looking at doing a MD5 hash on them, using a tool like CipherLite by Obivex. Looking into this, i found an easier way, by using a webrespone to download a memory stream, opening the local file using a filestream, and performing hash on them to see whether the contents were the same:

//get a memory stream to hold the data that is downloaded
MemoryStream msFile = new MemoryStream();
writer = new BinaryWriter(msFile);
byte[] RecvBuffer = new byte[10240];
int nBytes, nTotalBytes = 0;
// loop to receive response buffer
while((nBytes = response.socket.Receive(RecvBuffer, 0, 10240, SocketFlags.None)) > 0)
{
// increment total received bytes
nTotalBytes += nBytes;
// write received buffer to file
writer.Write(RecvBuffer, 0, nBytes);
// check if the uri type not binary to can be parsed for refs
if(bParse == true)
// add received buffer to response string
strResponse += Encoding.ASCII.GetString(RecvBuffer, 0, nBytes);
// update view text
// check if connection Keep-Alive to can break the loop if response completed
if(response.KeepAlive && nTotalBytes >= response.ContentLength && response.ContentLength > 0)
break;
}
bool bContinue = false;
FileStream fStream = null;
try
{
//check to see if the file exists on the local file system
if(File.Exists(PathName))
{
//open the file, and read in the stream
fStream = File.Open(PathName, FileMode.Open,FileAccess.Read,FileShare.Read);

//compare the two streams, to see if they are the same (see later)
bContinue = compareFiles(msFile,fStream);
}
else
{
//file doesn't exist, download anyway
bContinue = true;
}
}
catch(Exception ex)
{
bContinue = true;
LogError(ex.Message,"");
}
finally
{
if(fStream!=null)
fStream.Close();
fStream = null;
}
if(bContinue)
{
//create a stream to create the new file
streamOut = File.Open(PathName, FileMode.Create, FileAccess.Write, FileShare.ReadWrite);
//create the new copy
msFile.WriteTo(streamOut);
//close up
streamOut.Close();
}
msFile.Close();


So, to how the streams are compared. I adapted my solution to one I found via a google search on hashing:

bool compareFiles(MemoryStream file1, FileStream file2)
{
using (HashAlgorithm hashAlg = HashAlgorithm.Create())
{
// Calculate the hash for the files.
byte[] hashBytesA = hashAlg.ComputeHash(file1);
byte[] hashBytesB = hashAlg.ComputeHash(file2);
// Compare the hashes.
if (BitConverter.ToString(hashBytesA) == BitConverter.ToString(hashBytesB))
{
//they are the same
return true;
}
else
{
//they are different
return false;
}
}
}

Hope this helps.

Detecting an enter key press on an asp.net form (but when using AJAX as well!)

I've been more than familiar with detecting key presses in javascript for quite some time, and used to do it all the time in classic asp, but this time when I came to do it in the .net world, in was a little more involved.

This setup was also the first time I had to cope with Masterpages using AJAX inside them.

So, I have a simple content placeholder, inside this a textbox and these themselves are inside an updatepanel. I also have a modalpopup extender that shows whenever the page causes an update (asyncpostback).

So, to get this to work, I had to tie up my javascript with the ScriptManager.RegisterClientScriptBlock() method and placing the script into a LiteralControl that I had to place on the form also:

if(!Page.IsPostBack)
{
this.textBoxQuery.Attributes.Add("onkeypress", "javascript:return checkEnter(event);");
Page.RegisterHiddenField("__EVENTTARGET", textBoxQuery.ClientID);
string strScript = "function checkEnter(e){ " +
"var characterCode; " +
"if(e && e.which){ " +
"e = e; " +
"characterCode = e.which; " +
"} " +
"else { " +
"e = event; " +
"characterCode = e.keyCode; " +
"} " +
"if (characterCode == 13) " +
"{ " +
" " +
" document.getElementById('" + ButtonSearch.UniqueID + "').click(); " +
" return false; " +
"} " +
"else " +
"{ " +
" return true; " +
"}" +
"}";

ScriptManager.RegisterClientScriptBlock(LitScript, this.GetType(), "regScripts", strScript, true);
}

This now captures the enter key, and handles it as a normal button click event.

Thursday 11 September 2008

Error messages "There is an error in XML document (1, 1994)." or inner exception of "DataTable does not match to any DataTable in source."

Today I have spent nearly 5 hours trying to get to the bottom of why I was receiving these errors.

The first error made me think that this was caused by an invalid character in my data, which was being returned from a webservice to my data in my program.

I took a copy of what was being returned using "fiddler" to see what was going on in my browser - it showed the xml being returned was fine. I even validated the XML to check.

The interesting thing was the inner exception of "DataTable does not match to any DataTable in source." thrown in Visual Studio. Looking at this, it was telling me that the datatable being returned from the webservice, was not the same as the datatable I had in Visual Studio. The strange this was, they were both typed datatables! so how could they be wrong.... well, they weren't.

The source of the problem was the following which was at position 1994 in the XML:

"diffgr:diffgram msdata="urn:schemas-microsoft-com:xml-msdata" "

This was telling me that there was a difference in the schema of the dataset... <- Note here, a difference in the DATASET. I was not returning a dataset, I was returning a datatable! Anyway, a quick google brought up a nice little link from the good boys at Microsoft:

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=105642

It turns out, you can't return a datatable from a webservice, you have to return a DATASET. What a load of pants.

So, to solve this, I had to alter the webservice method to return a fake dataset, that contained the datatable that I wanted, e.g:

[WebMethod]
public MyTypedDataSet(string someparams)
{
MyTypedDataSet ds = new MyTypedDataSet();
MyTypedDataTable dt = aFunctionThatGetsTheData(someparams);

ds.Tables.Add(dt.Copy());
return ds;
}
private MyTypedDataSet.MyTypedDataTable(string someinfo)
{
MyTypedDataSet.MyTypedDataTable dt;
//The work that gets the data is here
return dt;
}

I hope this helps someone if they have the same problem too!!!

Jamie.

Monday 8 September 2008

IsNumeric in C#

IsNumeric doesn't exist in c#, it's VB specific, but i've found to work today is the following:

int result;
Int32.TryParse(value_to_try, out result);

If it can parse an Int, it returns the value to the result variable, if it fails, it returns 0.

Brilliant!