Monday 28 November 2011

Converting Bitmaps to Excel

Introduction
After having dinner I was browsing the internet and noticed someone making a joke about wasting time in an Excel course by using the grid to draw an image. As I'm interested in learning more about working with Excel automation I thought this would be an ideal learning exercise as working with Bitmaps is already simple.

Excel 2010 Addin
My initial approach was to write an addin driven by a button on the ribbon which would allow the user to select  a file and then convert that to an array and set the colour for each cell. I succeeded in this goal but came to the following conclusions

  1. I am missing something quite fundamental about COM Interobability with Excel or it really is needlessly irritating.
  2. It performed terribly so I'm clearly doing something wrong.
  3. There's no decent documentation anywhere.
  4. It restricts you to working with certain versions of Excel.
However, I had proved to myself that the logic was sound and as there wasn't anything on telly I decided to revert back to writing a command line application.

Command Line Application
I already know that you can generate an XML file that Excel will load and display having come across it when maintaining a COBOL program that built an XML file for Excel using only string concatenation. As I was well aware of the horror of this approach my first instinct was to find a library to do the work (why reinvent the wheel)?

I found this excellent and easy to use library (which was handy as I couldn't get the CHM help file to open) -CarlosAg Excel Xml Writer Library. This meant that I could concentrate on just getting a quick and dirty implementation up and running.

I knew I had the following constraints
  • Only 255 columns
  • Only 4000 unique colours
This meant that the process should be a simple
  1. Resize the image if necessary
  2. Identify every distinct colour
  3. Generate a style for each colour
  4. For each pixel in the image create a cell with the appropriate style.
Resizing the image is very easy with .Net


   1:          private void Resize(int width, int height)
   2:          {
   3:              Bitmap b = new Bitmap(width, height);
   4:              Graphics g = Graphics.FromImage((Image)b);
   5:              g.InterpolationMode = InterpolationMode.HighQualityBicubic;
   6:              g.DrawImage(image, 0, 0, width, height);
   7:              g.Dispose();
   8:          }


Identifying the unique colours was even simpler with LINQ as you only need the following command to get all distinct values.





   1:  s = k.Distinct().ToArray();

Generating the styles was just a case of iterating through the distinct array and adding each style to the workbook.

The Application
The application can be downloaded from here and is very simple to use.

From the commandline it's just

BitmapToExcel <filename>

This will create an xml file with the same name as your image which you may need to rename to .xls to get Excel to initially open it.

Friday 25 November 2011

Meego Linux - A Beautiful Failure

I have an Dell Notebook (Dell Mini 10V) which was the perfect device for a particular problem we had, that need has now passed and I have a device which isn't powerful enough to replace a laptop but still seems too good to not use.

It was dual booting Windows 7 (Starter edition) and Ubuntu 10 (Notebook remix) without a huge amount of success. I felt that Windows 7 was too slow on the device and Ubuntu was rather poor compared to how well it works on the desktop.

It was time to find a new distro.

I initially tried XUbuntu (alternate) but the installer was very iffy and I didn't try to download the normal Xubuntu iso. I then moved to Meego which upon initial installation was absolutely amazing, it was quick to install, looked great and was responsive with a really nice user interface. I thought I'd found the holy grail of lightweight Linux distributions as it really seemed to understand what people use a netbook for and everything was so well integrated, you could really see where the developers had learnt from mobile device development. Unfortunately Meego has a massive failing and that is the fact that it's completely single user, something which may make sense in a mobile device but takes real computers back to the mid 90s.

A computer which integrates your social media accounts and email so tightly with the user interface can only make sense as a single user platform in something like a mobile phone, netbooks and laptops are commonly used by multiple people and multi user support is a very basic requirement.

I've now switched to Easy Peasy Linux which while based on Ubuntu is a lot snappier and has (I think) a much better user interface.

Friday 18 November 2011

Excel (2010) Automation with C# (Continued)

In my last post I left it with the problem with how do you know that a cell value received as a double needs to be treated as a date. At this point in time I haven't found a complete solution but have found a sufficient solution for my problem.

Parameters in my command are identified as being curly braces surrounding a type and a column reference

e.g. {S2} refers to a string in column 2, while {N4} refers to a number in column 4 (The row is the same as command)

As I know what I'd like to be formatted as a date it was just a case of adding a new type identifier D for date so that I now have {D3} to refer to a date in column 3. I can then use DateTime.FromOADate() to parse the double and return a date time. This can then be output as a yyyy/mm/dd string in the parameter substitution

DateTime dt = DateTime.FromOADate(r.Value2);

Future Changes
Currently the command is processed as soon as you leave the cell, what I'd like to do is to be able to write the commands and then  select a range of cells and choose to convert them at that point. This way I could make sure I was happy with the entire thing before I lost all my parameter markers.

The other sensible change may be to  leave the initial string along and have the actual fully substituted string returned in the next column across, This way you don't lose your original command.

Thursday 17 November 2011

Excel (2010) Automation with C#

Introduction
When implementing and supporting systems you don't always stick with the cleanest and most elegant solution and one of my common tricks is downright ugly. At least twice a week I find it's necessary to update a set of data from a spreadsheet where it doesn't always make sense to bulk import it into a temporary table. Generally I find the fastest solution is just to use the string concatenation in Excel  and write something like this.

="UPDATE Employee Set Email = '" & A1 & "' Where Username = '" & A2 &"'"

The problem with this is

  • It can be quite fiddly trying to work out where you made a small mistake and missed a ' or ".
  • It's not very usable.
Idea
I wanted to write something which would be similar to how parameters can be bound to a dynamic piece of SQL e.g.

UPDATE Employee Set Email = {1} Where Username = {2}

Then from the type of value in the cell infer whether or not it needs to be wrapped in quotes.

The immediate solution I thought of was the Visual Studio Tools for Office (VSTO) and I'd write an Addin in C#.

Considering I've never written an Addin this was a pretty stupid thought for a first idea but I decided to carry on.

Stumbling Block One
I should have realised this sooner, all communication with Excel must be done through the COM Primary Interop Assembly interface. Oh good, yet another area I have no experience with. This actually turned out now to be a major issue until a little while later.

First Impressions
This is pretty easy, register an event handler so that I can see when a cell changes and work with it.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using Excel = Microsoft.Office.Interop.Excel;
using Office = Microsoft.Office.Core;
using Microsoft.Office.Tools.Excel;


namespace ExcelSQLFormatter
{
    public partial class ThisAddIn
    {
        private void ThisAddIn_Startup(object sender, System.EventArgs e)
        {
            this.Application.SheetChange += new Excel.AppEvents_SheetChangeEventHandler(Application_SheetChange);
        }

        void Application_SheetChange(object Sh, Excel.Range Target)
        {
            Debug.WriteLine("AppEvents_SheetChangeEventHandler");
        }
           private void ThisAddIn_Shutdown(object sender, System.EventArgs e)
        {
        }

        #region VSTO generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InternalStartup()
        {
            this.Startup += new System.EventHandler(ThisAddIn_Startup);
            this.Shutdown += new System.EventHandler(ThisAddIn_Shutdown);
        }

        #endregion
    }
}

Stumbling Block Two
The main problem I currently have and haven't solved is that when I read the value from a cell it's either a System.String or a System.Double which is fine until I have a date value.

Using .NumberFormat is not reliable as it returns m/d/yyyy for the date '31/12/2011'

Interim Code Solution
The following is where I'm leaving this experiment as I don't want to spend more time on something that will save me less than 10 minutes a week. It's been an interesting look at how you can automate Excel but what it's made really clear is that the documentation available is poor especially around PIA.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Globalization;
using System.Threading;
using System.Text;
using System.Xml.Linq;
using Excel = Microsoft.Office.Interop.Excel;
using Office = Microsoft.Office.Core;
using Microsoft.Office.Tools.Excel;
using System.Text.RegularExpressions;

namespace ExcelSQLFormatter
{
    public partial class ThisAddIn
    {
        string[] letters = new string[27] { "", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U","V","W","X","Y","Z" };
        private void ThisAddIn_Startup(object sender, System.EventArgs e)
        {
            this.Application.SheetChange += new Excel.AppEvents_SheetChangeEventHandler(Application_SheetChange);
        }

        void Application_SheetChange(object Sh, Excel.Range Target)
        {
            //Stop handling events while I change
            this.Application.SheetChange -= new Excel.AppEvents_SheetChangeEventHandler(Application_SheetChange);
            HandleSQL(Sh, Target);
            //Restore handling events
            this.Application.SheetChange += new Excel.AppEvents_SheetChangeEventHandler(Application_SheetChange);
        }

        private void HandleSQL(object Sh, Excel.Range Target)
        {
            /*A SQL String to format will be passed in the following format
             * {Si} String in column i
             * {Ni} Number in column i
             * e.g. Update Employee Set Username={S1} where EmployeeID={N2}
             */
            string FindTagPattern = @"{[S|N][0-9]*}";
            string FindColumnPattern = @"[0-9]";

            string type;
            int col;
            int row;
            string val;
            string sql;
            if (Target.Value2.GetType().ToString() == "System.String")
            {
                sql = (string)Target.Value2;
            }
            else if (Target.Value2.GetType().ToString() == "System.Double")
            {
                double d = Target.Value2;
                sql = d.ToString();
            }
            else
            {
                sql = Target.Value2.GetType().ToString();
            }

            MatchCollection matches = Regex.Matches(sql, FindTagPattern);
            foreach (Match m in matches)
            {
                row = Target.Row;
                col = int.Parse(Regex.Match(m.Value, FindColumnPattern).Value);
                type = m.Value.Substring(1, 1);
                Excel.Range r = Globals.ThisAddIn.Application.get_Range(letters[col]+row.ToString(), missing);
                if (r.Value2.GetType().ToString() == "System.String")
                {
                    val = (string)r.Value2;
                }
                else if (r.Value2.GetType().ToString() == "System.Double")
                {
                    double d = r.Value2;
                    val = d.ToString();
                }
                else
                {
                    val = r.Value2.GetType().ToString();
                }
                if (type == "N")
                {
                    sql = sql.Replace(m.Value, val);
                }
                else if (type == "S")
                {
                    sql = sql.Replace(m.Value, "'" + val + "'");
                }
            }
            Target.Value2 = sql;
        }

           private void ThisAddIn_Shutdown(object sender, System.EventArgs e)
        {
        }

        #region VSTO generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InternalStartup()
        {
            this.Startup += new System.EventHandler(ThisAddIn_Startup);
            this.Shutdown += new System.EventHandler(ThisAddIn_Shutdown);
        }

        #endregion
    }
}

Tuesday 15 November 2011

Installing Psycopg on Mac OS X (Lion) - Part 2

I now know that all the problems had been caused by architecture mismatches and I found some very good instructions by Pablo Casadro.

While his instructions listed all the steps and missteps he took I've decided to just list what I did and how it worked.


export PATH=$PATH:/Library/PostgreSQL/9.1/bin/
  • Set env to correctly mark the architecture
sudo env ARCHFLAGS="-arch i386 -arch x86_64"

  • Used Easy Install to install the package
sudo easy_install psycopg2
This was then tested by seeing if I could import the module and finally I tested it by seeing if Django would start

python manage.py runserver 8080
This confirmed everything was working and I could now find the next area to be frustrated with.

Installing Psycopg on Mac OS X (Lion) - Part 1

Introduction
As I had decided on PostgreSQL, Python and Django the first step was to get Django working and talking to the database (already correctly setup). On any sane platform this would have been a five minute job, not so for me.

Fink - Part 1
I need to install Psycopg2, go to the website and look at the instructions. This should be simple I can use Fink, oh no I can't I installed Fink before I upgraded and now it does work. OK, remove Fink and re-install and boot strap it (should be simple). Crap, for some reason I no longer have a C compiler. On to attempt two.

Mac Ports - Part 1
Nope, Mac Ports won't install it either as I need to upgrade. Ho hum... of course to install Mac Ports I now need to download XCode <tap fingers while it downloads>. Great, Mac Ports is being all whiny and won't work with some dependency problem; however, as I now have a C compiler I'll go back to Fink.

Fink - Part 2
After a long and tedious install session Fink claims to have installed Psycopg2. The Python interpreter disagrees with this as can't find the module.

ImportError: No module named psycopg
This is clearly some sort of path issue as Fink has placed it in

/sw/lib/python2.7/site-packages/psycopg2

As Mac OS X already has Python installed I decided that a symbolic link would probably be sufficient

cd /Library/Python/2.7/site-packages/
sudo ln -s  /sw/lib/python2.7/site-packages/psycopg2 psycopg2


I was wrong, while I could still not import psycopg any import of psycopg2 caused a segmentation fault (11).

Running the python2-7 executable in the /sw/bin directory (created by fink) showed that I could import the module without error. This would seem to indicate that the problem lies in how the package was built.


I'm not happy about replacing the standard Python that Apple installed with the fink version so I'm going to look at installing it myself.




Starting a New Project

Introduction
I'm starting a new project which is hardly revolutionary in what it is, a dynamically generated web site powered by a relational database with Latex being used to generated formatted PDFs. That's as much as I'm going to write about the project for now.

The problem is that this is a low capital out of hours project and this means my toolkit is considerably different to what I'd use at work. If I were planning this using any tools I wanted I'd most likely be using Microsoft SQL Server as the database, Windows 2008R2 as the operating system and a mix of C# and ASP.Net all developed in Visual Studio 2010.

There is a slight problem with this, two in fact. One: I can't afford to licence all that software for home use, two: I'm not going to use any work equipment or software. As annoyances best come in threes I'm also deciding to make sure I don't use any technologies used by my employers as this creates a complete separation between the two pieces of work.

What I'm Going to Use
Database
I'm not expecting massive traffic and coming from a relational database background I have no real interest in trying a noSQL solution just for the sake of it. This leaves two real contenders in the running MySQL and PostgreSQL, after a complete lack of careful consideration I went with PostgreSQL for the following reasons.
  1. MySQL has traditionally been a bit of a lightweight database (although now much better), PostgreSQL has more of a reputation as a serious database
  2. I like the PostgreSQL documentation style more than that of MySQL
  3. PostgreSQL has better ACID compliance
Language
This was the hardest choice as coming from Java and then moving into COBOL and then to C# I had a whole host of languages all needing something more expensive than a cheap hosting package.
The obvious choice was PHP which for small projects seems to be the de-facto language of web development. I don't yet have a deep hatred of PHP as I've not needed to use it for more than trivial applications, but research seemed to suggest that after any amount of time using it I would grow to hate it. This left (as far as I was concerned) Perl and Python and given a choice between line noise and copious amounts of white space I decided on Python. This should be interesting as I've never actually done anything serious in Python.

Framework
While I could do the entire thing without a framework it'd feel very odd. Mere minutes of research pushed me to Django as it looks like it will do the job.

Development Environment
As I no longer own a Linux box (something which must be rectified) I'm going to be using an iMac running Mac OS X Lion. I've got XCode, Emacs and a good terminal emulator; as soon as I get code completion working in Emacs everything should be sorted.