Saturday, March 2, 2013

Full Text searching in NHibernate

There's a library called NHibernate.Search available from NuGet that is supposed to allow for using a Lucene-based full-text search, regardless of what database you're persisting to... I thought this was a great idea.

Problem is, it doesn't work!

The code hasn't been updated in years, the libraries are obsolete and deprecated, and the common components aren't compatible with other libraries from NuGet.  In other words, at this point, it is just this side of completely unusable without a lot of work.

But if you're looking to implement non-database-dependent Full-Text searching on your NHibernate project, there are ways... with surprisingly little effort.  This article will describe how I got it going for one of my NHibernate projects.

First, I installed the Lucene.Net and Common.Logging packages from the NuGet repository.  If you don't know how to do that, google it!  You will be dismayed that you have gone this long without knowing how to do this.

I wanted to define the string fields in my entities that could use a full-text index, so I created a FullTextIndexedAttribute class:


[AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
public class FullTextIndexedAttribute : Attribute
{
public int IndexPriority { get; set; }
public FullTextIndexedAttribute() { }
public FullTextIndexedAttribute(int indexPriority)
{
IndexPriority = indexPriority;
}
}

Note the constructor that lets me set the index priority.  This is useful when trying to determine relevance.  Some fields have higher significance in their contents than others.  This class allowed me to decorate properties in my model domain like so:

public partial class Address
{
public virtual Guid AddressId { get; set; }

[FullTextIndexed(2)]
[NotNull]
public virtual string Address1 { get; set; }

[FullTextIndexed(3)]
public virtual string Address2 { get; set; }

[FullTextIndexed(1)]
[NotNull]
public virtual string City { get; set; }

[NotNull]
public virtual string State { get; set; }

[NotNull]
public virtual string ZipCode { get; set; }

[FullTextIndexed(0)]
public virtual string Phone { get; set; }
}

At this point I should probably point out that I am using Fluent NHibernate Automapping; all of my ID columns are regularly named, and all are GUIDs.  This really cut down on the reflection and coding time for implementing Full Text Indexing.  If you have things manually mapped without a regular identifier pattern, you'll have some extra coding to do at some point, but that's out of scope for this article.

I created a class of static functions I called FullText that actually did the "grunt work" of interfacing with the Lucene engine:

public class FullText
{
private static readonly Common.Logging.ILog Logger = Common.Logging.LogManager.GetCurrentClassLogger();
private static string indexPath;

private static FSDirectory _FullSearchDirectory;
public static FSDirectory FullSearchDirectory
{
get
{
if (_FullSearchDirectory == null)
{
// this is collected in my configuration settings.
indexPath = MyCustomSettings.Instance.FullTextSearching.IndexLocation;

_FullSearchDirectory = FSDirectory.Open(new System.IO.DirectoryInfo(indexPath));

if (IndexWriter.IsLocked(_FullSearchDirectory))
{
IndexWriter.Unlock(_FullSearchDirectory);
}

var lockFilePath = System.IO.Path.Combine(indexPath, "write.lock");
if (System.IO.File.Exists(lockFilePath))
{
System.IO.File.Delete(lockFilePath);
}
}

return _FullSearchDirectory;
}
}

/// <summary>
/// Collects the list of PropertyInfo objects decorated with the 
/// FullTextIndexedAttribute class for the supplied object.
/// </summary>
/// <param name="obj"></param>
/// <returns></returns>
public static IEnumerable<PropertyInfo> GetIndexableProperties(object obj)
{
return GetIndexableProperties(obj.GetType());
}

/// <summary>
/// Collects the list of PropertyInfo objects decorated with the
/// FullTextIndexedAttribute class for the supplied type.
/// </summary>
/// <param name="type"></param>
/// <returns></returns>
public static IEnumerable<PropertyInfo> GetIndexableProperties(Type type)
{
var props =
type.GetProperties()
.Where(x => x.GetCustomAttributes<FullTextIndexedAttribute>().Any())
.Select(x => new
{
PropInfo = x,
Priority = x.GetCustomAttributes<FullTextIndexedAttribute>().First().IndexPriority
})
.OrderBy(x => x.Priority)
.Select(x => x.PropInfo);

return props;
}

/// <summary>
/// Determines if the type has Full-Text indexable properties.
/// </summary>
/// <param name="type"></param>
/// <returns></returns>
public static bool IsIndexable(Type type)
{
return type.GetProperties()
.Select(x => x.GetCustomAttributes<FullTextIndexedAttribute>())
.Any();
}

/// <summary>
/// Determines if the object has Full-Text indexable properties.
/// </summary>
/// <param name="obj"></param>
/// <returns></returns>
public static bool IsIndexable(object obj)
{
return IsIndexable(obj.GetType());
}

/// <summary>
/// Returns a list of object IDs that match the search criteria.
/// </summary>
/// <param name="searchText"></param>
/// <param name="type"></param>
/// <param name="maxRows"></param>
/// <returns></returns>
public static Guid[] Search(string searchText, Type type, int maxRows)
{
if (string.IsNullOrWhiteSpace(searchText)) return new Guid[] { };

Logger.Trace(t => t("Searching Type {0} for max results of {2} with search string '{3}'...", type.Name, maxRows, searchText));
using (var searcher = new IndexSearcher(FullSearchDirectory, true))
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
{
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30,
GetIndexableProperties(type).Select(x => x.Name).ToArray(), analyzer);
Query query;
try
{
query = parser.Parse(searchText.Trim());
}
catch (ParseException)
{
// if a ParseException is thrown, it's likely due to extraneous odd symbols
// in the search text, so this will escape them.
query = parser.Parse(QueryParser.Escape(searchText.Trim()));
}

var hits = searcher.Search(query, null, maxRows, Sort.RELEVANCE).ScoreDocs;
var results = hits.Select(x => Guid.Parse(searcher.Doc(x.Doc).Get(string.Format("{0}Id", type.Name))))
.ToArray();

Logger.Trace(t => t("Found {0} hits for '{1}'.", results.Count(), searchText));
return results;
}
}

/// <summary>
/// Removes all indices from the Full Text index; used for regular maintenance.
/// </summary>
public static void ClearIndices()
{
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(FullSearchDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
{
writer.DeleteAll();
}
}

/// <summary>
/// Refreshes the index of the supplied entity, assuming that the entity is indexable.
/// </summary>
/// <param name="entity"></param>
public static void UpdateIndexOnEntity(object entity)
{
var indexables = GetIndexableProperties(entity);
if (indexables.Count() == 0) return;

var entityType = entity.GetType();
var entityName = entityType.Name;
var entityIdName = string.Format("{0}Id", entityName);
var entityIdValue = entityType.GetProperty(entityIdName).GetValue(entity).ToString();

Logger.Trace(t => t("Updating FT Index for {0} {1}...", entityIdName, entityIdValue));

var term = new Term(entityIdName);
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(FullSearchDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
var doc = CreateDocument(entity, entityIdName, entityIdValue, indexables);
writer.UpdateDocument(term, doc);
Logger.Trace(t => t("Updated the FT index for {0} {1}", entityIdName, entityIdValue));
}
}

/// <summary>
/// Removes the index of the supplied entity.
/// </summary>
/// <param name="entity"></param>
public static void DeleteIndexForEntity(object entity)
{
var indexables = GetIndexableProperties(entity);
if (indexables.Count() == 0) return;

var entityType = entity.GetType();
var entityName = entityType.Name;
var entityIdName = string.Format("{0}Id", entityName);
var entityIdValue = entityType.GetProperty(entityIdName).GetValue(entity).ToString();

Logger.Trace(t => t("Deleting FT Index for {0} {1}...", entityIdName, entityIdValue));
var searchQuery = new TermQuery(new Term(entityIdName, entityIdValue));
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(FullSearchDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
writer.DeleteDocuments(searchQuery);
Logger.Trace(t => t("Deleted the FT index for {0} {1}", entityIdName, entityIdValue));
}
}

/// <summary>
/// Creates a new index for the supplied entity.
/// </summary>
/// <param name="entity"></param>
public static void InsertIndexOnEntity(object entity)
{
var indexables = FullText.GetIndexableProperties(entity);
if (indexables.Count() == 0) return;

//the just-persisted entity has full-text indexable fields, so create an indexed document
var entityType = entity.GetType();
var entityName = entityType.Name;
var entityIdName = string.Format("{0}Id", entityName);
var entityIdValue = entityType.GetProperty(entityIdName).GetValue(entity).ToString();

Logger.Trace(t => t("Inserting FT Index for {0} {1}...", entityIdName, entityIdValue));

using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(FullSearchDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
var doc = CreateDocument(entity, entityIdName, entityIdValue, indexables);
writer.AddDocument(doc);
Logger.Trace(t => t("Inserted the FT index for {0} {1}", entityIdName, entityIdValue));
}
}

/// <summary>
/// Creates new indices for the collection of indexable objects supplied
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="entities"></param>
public static void PopulateIndex<T>(IEnumerable<T> entities) where T : class
{
var entityType = typeof(T);
if (!IsIndexable(entityType)) return;

var entityName = entityType.Name;
var entityIdName = string.Format("{0}Id", entityName);

var indexables = GetIndexableProperties(entityType);

Logger.Info(i => i("Populating the Full-text index with values from the {0} entity...", entityName));
using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
using (var writer = new IndexWriter(FullSearchDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
foreach (var entity in entities)
{
var entityIdValue = entityType.GetProperty(entityIdName).GetValue(entity).ToString();
var doc = CreateDocument(entity, entityIdName, entityIdValue, indexables);
writer.AddDocument(doc);
}
}
Logger.Info(i => i("Index population of {0} is complete.", entityName));
}

private static Document CreateDocument(object entity, string entityIdName, string entityIdValue, IEnumerable<PropertyInfo> indexables)
{
var doc = new Document();
Logger.Trace(t => t("Adding {0} of {1}...", entityIdName, entityIdValue));
doc.Add(new Field(entityIdName, entityIdValue, Field.Store.YES, Field.Index.NOT_ANALYZED));
foreach (var property in indexables)
{
var propertyName = property.Name;
var propertyValue = property.GetValue(entity).ToStringOrEmpty();
Logger.Trace(t => t("Adding property '{0}'...", propertyName));
doc.Add(new Field(propertyName, propertyValue, Field.Store.YES, Field.Index.ANALYZED));
}
return doc;
}

Giving props where props are due, a lot of this logic and code was derived from Mikhail Tsennykh's blog here. The class above allows me to insert, update, delete, clean up, and search on the Lucene repository, and also determine which entities/ properties are indexable.  Note that I created an extension method .ToStringOrEmpty() to address potential null values.

Now we have to tell NHibernate that whenever an entity is changed, the Lucene repository should be similarly updated.  We do this by implementing the IPostUpdateEventListener, IPostDeleteEventListener, and IPostInsertEventListener interfaces in the NHibernate.Event namespace, and then attaching it to the NHibernate configuration.  Sound complicated?  It's shockingly simple:


public class LuceneFTIndexEventListener : IPostUpdateEventListener, IPostDeleteEventListener, IPostInsertEventListener
{
private static readonly Common.Logging.ILog Logger = Common.Logging.LogManager.GetCurrentClassLogger();

/// <summary>
/// Updates the Full-Text index on the entity.
/// </summary>
/// <param name="e"></param>
public void OnPostUpdate(PostUpdateEvent e)
{
FullText.UpdateIndexOnEntity(e.Entity);
}

/// <summary>
/// Removes Full-Text Indices on deleted entities.
/// </summary>
/// <param name="e"></param>
public void OnPostDelete(PostDeleteEvent e)
{
FullText.DeleteIndexForEntity(e.Entity);
}

/// <summary>
/// Adds a Full-Text index on the entity.
/// </summary>
/// <param name="e"></param>
public void OnPostInsert(PostInsertEvent e)
{
FullText.InsertIndexOnEntity(e.Entity);
}
}


That's all there is to the implementation of the interfaces.  To attach to the NHibernate configuration, in your NHibernate helper class (where you instantiate your ISessionFactory) you'll need to put something like this in:


//collect database-level info from web.config file
Logger.Debug(d => d("Collecting database information from the configuration file..."));
var webcfg = new Configuration();
webcfg.Configure();

//add the Lucene-based Full-Text Search population interceptors.
Logger.Debug(d => d("Attaching Full-Text engine listeners..."));
webcfg.SetListener(NHibernate.Event.ListenerType.PostUpdate, new LuceneFTIndexEventListener());
webcfg.SetListener(NHibernate.Event.ListenerType.PostInsert, new LuceneFTIndexEventListener());
webcfg.SetListener(NHibernate.Event.ListenerType.PostDelete, new LuceneFTIndexEventListener());


OK, at this point, the Lucene Full-Text Index will be populated and maintained.  Great!  Only one thing left to do, and that's actually use it for full-text searching.  In my Address repository class, I create a method like so:


public IList<Address> Search(string searchString, int maxRows)
{
if (maxRows <= 0) throw new ArgumentOutOfRangeException("Max Rows must be greater than zero.");
if (string.IsNullOrEmpty(searchString)) return GetFirst(maxRows);

return _Session
.CreateCriteria(typeof(Address))
.Add(Restrictions.In("AddressId", FullText.Search(searchString, typeof(Address), maxRows)))
.List<Address>();
}

Then, by invoking the Search method, I will collect a set of record Id's from the Full Text engine that correspond to the search text submitted.  I then collect the actual entities from the database using the ids.

That's all there is to it!