I'm trying to grasp how Azure table storage works to create facebook-style feeds and I'm stuck on how to retrieve the entries.
(My questions is almost the same as https://stackoverflow.com/questions/6843689/retrieve-multiple-type-of-entities-from-azure-table-storage but the link in the answer is broken.)
This is my intended approach:
Create a personal feed for all users within my application which can contain different types of entries (notification, status update etc). My idea is to store them in an Azure Table grouped by a partition key for each user.
Retrieve all entries within the same partition key and pass it to different views depending on entry type.
How do I query the table storage for all types of the same base type
while keeping their unique properties?
The CloudTableQuery<TElement>
requires a typed entity, if I specify EntryBase
as generic argument I don't get the entry-specific properties (NotificationSpecificProperty
, StatusUpdateSpecificProperty
) and vice versa.
My entities:
public class EntryBase : TableServiceEntity
{
public EntryBase()
{
}
public EntryBase(string partitionKey, string rowKey)
{
this.PartitionKey = partitionKey;
this.RowKey = rowKey;
}
}
public class NotificationEntry : EntryBase
{
public string NotificationSpecificProperty { get; set; }
}
public class StatusUpdateEntry : EntryBase
{
public string StatusUpdateSpecificProperty { get; set; }
}
My query for a feed:
List<AbstractFeedEntry> entries = // how do I fetch all entries?
foreach (var item in entries)
{
if(item.GetType() == typeof(NotificationEntry)){
// handle notification
}else if(item.GetType() == typeof(StatusUpdateEntry)){
// handle status update
}
}
Finally there's a official way! :)
Look at the NoSQL sample which does exactly this in this link from the Azure Storage Team Blog:
Windows Azure Storage Client Library 2.0 Tables Deep Dive
There are a few ways to go about this and how you do it depends a bit on your personal preference as well as potentially performance goals.
- Create an amalgamated class that represents all queried types. If I had StatusUpdateEntry and a NotificationEntry, then I would simply merge each property into a single class. The serializer will
automatically fill in the correct properties and leave the others null (or default). If you also put a 'type' property on the entity (calculated or set in storage), you could easily switch on that type. Since I always recommend mapping from table entity to your own type in the app, this works fine as well (the class only becomes used for DTO).
Example:
[DataServiceKey("PartitionKey", "RowKey")]
public class NoticeStatusUpdateEntry
{
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string NoticeProperty { get; set; }
public string StatusUpdateProperty { get; set; }
public string Type
{
get
{
return String.IsNullOrEmpty(this.StatusUpdateProperty) ? "Notice" : "StatusUpate";
}
}
}
- Override the serialization process. You can do this yourself by hooking the ReadingEntity event. It gives you the raw XML and you can choose to serialize however you want. Jai Haridas and Pablo Castro gave some example code for reading an entity when you don't know the type (included below), and you can adapt that to read specific types that you do know about.
The downside to both approaches is that you end up pulling more data than you need in some cases. You need to weigh this on how much you really want to query one type versus another. Keep in mind you can use projection now in Table storage, so that also reduces the wire format size and can really speed things up when you have larger entities or many to return. If you ever had the need to query only a single type, I would probably use part of the RowKey or PartitionKey to specify the type, which would then allow me to query only a single type at a time (you could use a property, but that is not as efficient for query purposes as PK or RK).
Edit: As noted by Lucifure, another great option is to design around it. Use multiple tables, query in parallel, etc. You need to trade that off with complexity around timeouts and error handling of course, but it is a viable and often good option as well depending on your needs.
Reading a Generic Entity:
[DataServiceKey("PartitionKey", "RowKey")]
public class GenericEntity
{
public string PartitionKey { get; set; }
public string RowKey { get; set; }
Dictionary<string, object> properties = new Dictionary<string, object>();
internal object this[string key]
{
get
{
return this.properties[key];
}
set
{
this.properties[key] = value;
}
}
public override string ToString()
{
// TODO: append each property
return "";
}
}
void TestGenericTable()
{
var ctx = CustomerDataContext.GetDataServiceContext();
ctx.IgnoreMissingProperties = true;
ctx.ReadingEntity += new EventHandler<ReadingWritingEntityEventArgs>(OnReadingEntity);
var customers = from o in ctx.CreateQuery<GenericTable>(CustomerDataContext.CustomersTableName) select o;
Console.WriteLine("Rows from '{0}'", CustomerDataContext.CustomersTableName);
foreach (GenericEntity entity in customers)
{
Console.WriteLine(entity.ToString());
}
}
// Credit goes to Pablo from ADO.NET Data Service team
public void OnReadingEntity(object sender, ReadingWritingEntityEventArgs args)
{
// TODO: Make these statics
XNamespace AtomNamespace = "http://www.w3.org/2005/Atom";
XNamespace AstoriaDataNamespace = "http://schemas.microsoft.com/ado/2007/08/dataservices";
XNamespace AstoriaMetadataNamespace = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata";
GenericEntity entity = args.Entity as GenericEntity;
if (entity == null)
{
return;
}
// read each property, type and value in the payload
var properties = args.Entity.GetType().GetProperties();
var q = from p in args.Data.Element(AtomNamespace + "content")
.Element(AstoriaMetadataNamespace + "properties")
.Elements()
where properties.All(pp => pp.Name != p.Name.LocalName)
select new
{
Name = p.Name.LocalName,
IsNull = string.Equals("true", p.Attribute(AstoriaMetadataNamespace + "null") == null ? null : p.Attribute(AstoriaMetadataNamespace + "null").Value, StringComparison.OrdinalIgnoreCase),
TypeName = p.Attribute(AstoriaMetadataNamespace + "type") == null ? null : p.Attribute(AstoriaMetadataNamespace + "type").Value,
p.Value
};
foreach (var dp in q)
{
entity[dp.Name] = GetTypedEdmValue(dp.TypeName, dp.Value, dp.IsNull);
}
}
private static object GetTypedEdmValue(string type, string value, bool isnull)
{
if (isnull) return null;
if (string.IsNullOrEmpty(type)) return value;
switch (type)
{
case "Edm.String": return value;
case "Edm.Byte": return Convert.ChangeType(value, typeof(byte));
case "Edm.SByte": return Convert.ChangeType(value, typeof(sbyte));
case "Edm.Int16": return Convert.ChangeType(value, typeof(short));
case "Edm.Int32": return Convert.ChangeType(value, typeof(int));
case "Edm.Int64": return Convert.ChangeType(value, typeof(long));
case "Edm.Double": return Convert.ChangeType(value, typeof(double));
case "Edm.Single": return Convert.ChangeType(value, typeof(float));
case "Edm.Boolean": return Convert.ChangeType(value, typeof(bool));
case "Edm.Decimal": return Convert.ChangeType(value, typeof(decimal));
case "Edm.DateTime": return XmlConvert.ToDateTime(value, XmlDateTimeSerializationMode.RoundtripKind);
case "Edm.Binary": return Convert.FromBase64String(value);
case "Edm.Guid": return new Guid(value);
default: throw new NotSupportedException("Not supported type " + type);
}
}
Another option, of course, is to have only a single entity type per table, query the tables in parallel and merge the result sorted by timestamp.
In the long run this may prove to be the more prudent choice with reference to scalability and maintainability.
Alternatively you would need to use some flavor of generic entities as outlined by ‘dunnry’, where the non-common data is not explicitly typed and instead persisted via a dictionary.
I have written an alternate Azure table storage client, Lucifure Stash, which supports additional abstractions over azure table storage including persisting to/from a dictionary, and may work in your situation if that is the direction you want to pursue.
Lucifure Stash supports large data columns > 64K, arrays & lists, enumerations, composite keys, out of the box serialization, user defined morphing, public and private properties and fields and more. It is available free for personal use at http://www.lucifure.com or via NuGet.com.
Edit: Now open sourced at CodePlex
Use DynamicTableEntity as the entity type in your queries. It has a dictionary of properties you can look up. It can return any entity type.