Sudoku Game on Azure

This is a game I am currently developing.

It is created with C# MVC and deployed on Windows Azure.

The game can generate and solve puzzles and gives hints.

The player can enter numbers on the cells by using number buttons.

The player can check for errors whenever they want. The first error is highlighted.

There is an undo button.

It currently cannot hint for all strategies. When the simpler strategies are exhausted it resorts to brute force search to solve puzzles.

Working on: Showing pencil marks. Better hints/coaching for advanced strategies.

Link | Posted on by | Leave a comment

Thinking Fast and Slow by Daniel Kahneman

http://www.borders.com.au/ebook/thinking-fast-and-slow/30822254/

I read this while on holiday and could not put it down. A compelling unifying theory describes and explains the psychology of normal people’s decision making, especially where and why we are not always logical.
There are many practical aspects that can be applied to project estimation, sales and relationship building, but mostly I found this book interesting because its full of the fascinating quirks and foibles of the human mind. These are laid bare through of surveys and experiments that reveal effects such as anchoring, priming , re-framing, time-insensitivity and many other delights.

Posted in Personal | Leave a comment

SSIS Configurations – Best Practice

SSIS provides flexible and varied methods for configuring its packages. These allow operators or developers to change a package’s behaviour at run time to adapt to changing environments.

While SSIS is flexible, I personally found there was a lack of guidance as to the best choices. It took some time and a good deal of trial and error to find techniques that worked best. This article aims to distil that experience.

Only Create Configurations for Changeable Items

Create configurations for:

  1. Server names. For instance database, smtp and ftp servers.
  2. File shares. (e.g. \\myserver\myshare)
  3. Operator parameters

Use configurations for items that are likely to change when the operating environment changes. For instance when a system is moved to test or production or when a server name changes.

Do not create configurations for items that could conceivably change but are unlikely to do so. For instance there would normally be no need to create configurations for database names or the relative paths to individual files.

Paths to Files should be an Expression

Usually a connection manager for a file path should use an expression referencing a configurable file share concatenated with a relative path.

Configurations should be re-used

When a number of SSIS packages reference the same servers and file shares, then the configurations of these packages should be coordinated. This should be done such that each logical server/file share has a single configuration that applies to all relevant packages. Note the term logical server. Sometimes a single server can be used for multiple purposes – for instance to house sales data and logging data. In this case a call must be made on the likelihood of logging and sales being on the same server in all environments. If it is likely that these will be deployed on separate servers in one environment (e.g. test), then there should be a separate logical sales server and logging server.

Avoid creating lots of logical servers – only do so if it is probable that a separate server will be required.

This requires planning but pays off in simpler management of the packages at execution time

It is usually better to use configurations to set variable values rather than setting the connection manager properties directly.

Each configuration should set the value of a variable. This allows multiple connection managers to use the configured variable. For instance multiple file connection managers can share a common root file share via a variable. Similarly multiple database connections of different types can share the same server name via a variable.

Choose the Name with Care

The name of the variable being configured should be chosen with care. The name is global to the entire suite of packages. Do not use the physical server name. Use a logical name where possible. For instance use the name SalesReportingServer. Do not use the name Dev09.

Use Matching Names

Where possible the internal name of the configuration, the variable name and the external name of the configuration should match. This allows people maintaining packages to quickly identify and line up configurations.

The internal name if a configuration is the name given to a configuration within a package. By the external name I mean the name of the filter for an SQL configuration or the name of the Xml file for an Xml configuration.

Choosing between Xml, SQL and Environment Configurations

SSIS supports 4 main types of configurations:

  • Xml files
  • SQL Server
  • Environment variables
  • Parent Package

There are many options, and combinations.

One problem with configurations is that if they are used in an uncontrolled way they can become confusing. Its best to set and stick to some simple standards.

A simple scheme that I have found works well is to use XML configurations located in a standard folder on the C: drive. (say C:\SSISConfig). The C: drive is chosen because its generally available on servers as well as developers machines.

A slightly more sophisticated scheme I have used for one client is to use a combination of SQL Server configurations and environment variable configurations. The location of the SqlServer (server and database) containing the main configurations is set by environment variable configurations.

Use Parent Package Configurations for Passing Parameters

Parent package configurations allow a child package value to be overridden by the calling parent package. This mechanism allows parent packages to pass “parameters” to children. For instance a child package may export all data for a particular catalog and so takes the catalog ID from the parent. The parent package may have a loop and iteratively call the child passing different values for the catalog ID.

Parent configurations should not generally be used to set connection manager properties, such as server, database and file shares. One reason to avoid using parent package configuration for connection manager properties is that validation occurs before parent package configurations are set, so validation errors could occur in the child package before the “correct” value comes from the parent.

 

Posted in Data Focus | Tagged , , , , | 1 Comment

A Technique for Auditing User Initiated Batch Updates on SQL Server

Overview

Auditing of user actions is a common business requirement. Typically users change data on a record-at-a-time basis using an interface that allows maintenance of a single record. In this case each change results in a single entry in an audit log. Sometimes, however, a user will initiate a batch job that results in a change to many records at a time. This document describes a design pattern for auditing in the latter situation.

To distinguish this from other auditing techniques I will call this snapshot auditing.

High Level Design

The snapshot audit design pattern uses a single generic table to hold a record of the user action and a snapshot table for each table that is being audited.

In the examples that follow the single table is called Ctrl.LogSnapshot and the individual snapshot tables are called Ctrl.LogSnapshot<tableName> where <tableName> is the name of the table being audited.

A single generic stored procedure call can be made to create the required log entries in both tables.

The same stored procedure also implements a simple archiving. Currently snapshots are removed when they are over one month and at least three more recent snapshots exist for the same table.

Example

For example, suppose a user is making changes to a product mapping table dbo.MyProductMapping. The table maps an external product pack code to an internal product code, which then allows processing and matching external transactions with the internal product database. The batch job first validates all codes, and if everything is OK it updates MyProductMapping and creates the logs.

We have 2 tables to hold the logging:

  • Ctrl.LogSnapshot – holds a record for each user action
  • Ctrl.LogSnapshotMyProductMapping – holds a copy of the entire dbo.MyProductMapping table for each snapshot

The relationship between these tables is shown in the following diagram – (although these relationships should not be checked on the database because of the necessary archiving of the snapshot logs described later).

Data from LogSnapshot

Snapshot ID Snapshot Time User Table Message
12 2011-06-17 17:02 USER1 ProductMarketHierachy Loaded file ProductMarket.xls
13 2011-06-17 17:34 USER1 MyProductMapping loaded file ExportMappedProductsFixed.xls
14 2011-06-20 11:21 USER2 PolicyTolerance loaded file ExportTolerances.xls
15 2011-06-20 11:30 USER2 PolicyTolerance Loaded file NewTolerances.xls
16 2011-06-20 11:33 USER3 PolicyTolerance Loaded file NewTolerances .xls
17 2011-06-20 11:33 USER2 PolicyTolerance loaded file NewTolerances.xls
18 2011-06-20 11:51 USER4 CustomerMapping loaded file ExportMappedCustomers.xls
19 2011-06-20 11:55 USER1 MyProductMapping loaded file ExportMappedProductsFixed.xls

Data from LogSnapshotMyProductMapping

This table only contains data from snapshots #13 and #19.

Snapshot ID Rown Number Pack Code Internal Code
13 1 AAX11 100391
13 2 AAY11 100392
13 3 BBC22 100393
….
19 1 AAX11 100391
19 2 AAY11 100392
19 3 BBC22 100395
19 4 CCZ22 100396

An advantage of snapshot logging is that it is decoupled from the actual updates themselves.

Other logging methods to which this can be compared include Triggers and Envers.

Here, Triggers, means the creation of database triggers on the target table so that INSERT/UPDATE or DELETE statements result in the creation of an audit entry.

Envers (http://docs.jboss.org/hibernate/envers/3.6/reference/en-US/html_single/) is an elegant auditing solution that is part of the Hibernate ORM tool.

Triggers auditing suffers from several problems:

  1. Unwanted log entries are created if the table being audited is the target of a bulk update that should not be logged (for instance replication or data synchronisation).
  2. The business user is often not available to the trigger because the connection made using some generic principal.
  3. It is not usually possible to have a separate central logging table that holds a record per transaction or batch

Envers only works for updates that occur through Hibernate, hence is not applicable to batch updates that use stored procedures.

The Stored Procedure

Here are the highlights from the generic stored procedure that does the work.

First how is it called?

A simpler test from T-SQL is:

EXEC usp_CreateLogSnapshot

‘MYDOM\USER01’, ‘Loaded file MyProductMappingTest.xls’, dbo.’, ‘MyProductMapping’, null, null;

SELECT
*
FROM Ctrl.LogSnapshot

SELECT
*
FROM Ctrl.LogSnapshotMyProductMapping

So, what is in the procedure?

ALTER
PROCEDURE [dbo].[usp_CreateLogSnapshot](

    @LoginName as
varchar(50),

    @message varchar(100),

    @SourcePrefix varchar(100)=,

    @SourceTable varchar(100)=null,

    @ExcludeColumns varchar(1000)=null,    — comma separated list of cols to exclude from copy

    @WhereClause varchar(8000)=null

)

AS

/******************************************************************************

**    Creates a log record in Ctrl.LogSnapshot

** Copies a snapshot of records from table @SourcePrefix.@SourceTable to LogSnapshot+@SourceTable

**        i.e. if @SourceTable is MyProductMap then the records are copied to Ctrl.LogSnapshotMyProductMap

**        Return values: none

*************************************************************************/

The first part of the procedure sets up some variable to hold the dynamic SQL and checks for the existence of the target logging table.

    SET
NOCOUNT
ON;

    DECLARE @SnapshotID int;

    DECLARE @tablename sysname, @fulltablename sysname, @objectid int;

    SELECT @tablename =
‘LogSnapshot’+@SourceTable;    — This is the table that is populated with a new snapshot

    SELECT @fulltablename =
‘Ctrl.[‘+@tablename + ‘]’;        — This is the table that is populated with a new snapshot

    SELECT @objectid =
object_id(@fulltablename);

    IF @objectid IS
NULL

    BEGIN

        RAISERROR(‘The table %s does not exist. It needs to be created in order for the log snapshot to work’, 16, 1, @fulltablename);

        RETURN;

    END

The main part of the procedure is within a try catch statement whose purpose is to abort and roll back everything and re-raise the error.

    DECLARE    @SQLString varchar(max),

            @ColList varchar(max);

    BEGIN
TRY

        — Wrap extra commas at start and end of exclude column list

        SELECT @ExcludeColumns =
‘,’
+
coalesce(@ExcludeColumns,)
+ ‘,’;

        BEGIN
TRANSACTION;

— The main insert in Ctrl.LogSnapshot is ordinary SQL

        INSERT
INTO Ctrl.LogSnapshot(SnapshotDateTime,SnapshotUser,SnapshotTable,[Message])

        SELECT
getdate(), @LoginName, @SourceTable, @message;

— @SnapshotID will be used in the dynamic insert

SELECT @SnapshotID=SCOPE_IDENTITY();

— Dynamic SQL generation

— Use dynamic SQL to generate INSERT INTO @fulltablename

        SET @ColList =

        — Get column list and create insert statement

        SELECT @ColList = @ColList +
‘[‘
+ syscolumns.name +
‘],’

        FROM dbo.sysobjects

        JOIN dbo.syscolumns

        ON sysobjects.id = syscolumns.id

        WHERE sysobjects.xtype =
‘U’
and sysobjects.name = @tablename

            and syscolumns.iscomputed<>1 and syscolumns.status<>128

            and @ExcludeColumns NOT
LIKE
‘%,’
+ syscolumns.name +
‘,%’

            and syscolumns.name <>
‘SnapshotID’

        ORDER
BY syscolumns.colorder;

        — Strip last ‘,’

        SET @ColList =
left(@ColList,len(@ColList)-1);


        SET @SQLString = N‘INSERT INTO ‘
+ @fulltablename

            +
char(13)
+
‘ (SnapshotID,’
+ @ColList +
‘)’

            +
char(13)
+
‘ SELECT ‘
+
convert(varchar(10),@SnapshotID)
+
‘,’

            +
char(13)
+ @ColList

            +
char(13)
+
‘ FROM ‘
+ @SourcePrefix +
‘[‘+ @SourceTable + ‘] ‘
+
coalesce(@WhereClause,);

        — Execute the dynamic Insert statement

        EXEC
(@SQLString);

–Archiving

— This removes old entries from @fulltablename

— At the moment the policy is thus

— We want to keep least 3 newer snapshots

— We must keep all snapshots for at least 1 month

        DECLARE @NewestSnapshotIdToArchive int;

        SELECT @NewestSnapshotIdToArchive =
max(Ranked.LogSnapshotID)

        FROM

            (

                SELECT LogSnapshotID,
Rank()
over(ORDER
BY LogSnapshotID desc) as
Rank, SnapshotDateTime

                FROM Ctrl.LogSnapshot

                WHERE SnapshotTable = @SourceTable

            ) Ranked


        WHERE

            — must be at least 3 newer snapshots

            Ranked.Rank > 3

            — and must be at least 1 month old

            AND Ranked.SnapshotDateTime <=
dateadd(month,
1, getdate());

        IF @NewestSnapshotIdToArchive IS
NOT
NULL

        BEGIN

            SELECT     @SQLString=‘DELETE FROM ‘ + @fulltablename +
‘ WHERE SnapshotID <= ‘
+
convert(varchar(10),@NewestSnapshotIdToArchive);

            PRINT
‘Delete Statement:’;

            PRINT @SQLString;

            — Execute Delete statement

            EXEC
(@SQLString);

        END

        COMMIT;

        RETURN 0;

    END
TRY

    BEGIN
CATCH

        IF
(XACT_STATE())
IN
(-1,1)

            ROLLBACK
TRAN;


     EXEC dbo.usp_RethrowError;

     RETURN 1;

    END
CATCH

Posted in Data Focus | Tagged , , , , | Leave a comment

Issues Importing Data from Excel

Sometimes data needs to be imported into systems using an Excel file format.

This article is about understanding the problems encountered when using Excel for data import, and some ways to avoid them.

Let me say straight up that while I love Excel, there are issues with using Excel files for data transfer. In theory it should be easy, but in practice its fraught with problems. Typically, Excel is not the best format to use for data transfer. Far better to use a simple open format such as fixed width or comma or tab separated text files or XML.

 

Why is Excel used as a data transfer format?

This is usually because users need to use Excel to modify or view the file. Most users  know Excel. Excel gives them the opportunity to view or modify a file before its loaded into another system.  This is especially true when the user is a competent spread-sheet user and needs some ability to sort, compare, count or calculate with the data. Often Excel is a cost-effective alternative to having a custom application developed to manipulate data.

 

So, in some cases the benefits of using an Excel format for data transfer, outweigh the problems. But lets get better acquainted with these problems….

 

The Problems

Problem #1: Mixed Data Types in a Column Cause Missing Data.

Internally Excel stores numbers and text in a different object types. The Excel OLEDB driver is used by most programs (other than Excel itself) to read Excel files.  The OLEDB driver decides if a column is text or numeric or memo (>255 characters) by examining the contents of the first 8 rows. OLDB decides this column is the same type as the first non empty cell. If the first non empty cell is text then the driver treats the column as a text column and looks only at text cells in that column. Numeric and memo data in a text column is ignored and so is text data in a numeric or memo column, etc.. There is nothing the programmer can do about this apart from bypassing OLEDB and using another access method.

 

Even SqlServer Integration Services (SSIS) suffers from this problem because its  Excel data source uses the OLEDB driver under the hood.

 

The problem is documented in http://support.microsoft.com/kb/194124 (PRB: Excel Values Returned as NULL Using DAO OpenRecordset).

The programming method of using Excel ADO is documented in http://support.microsoft.com/kb/278973 (ExcelADO demonstrates how to use ADO to read and write data in Excel workbooks)

 

Problem #2: Unintended Changes in Structure

Excel is very flexible and its easy for users to add extra worksheets,  remove or re-arrange columns or to add an extra row at the top or change column headings.

Any of these things will cause a data upload to stuff up.

The problem can also occur when data is loaded into Excel from a .CSV or text file, edited and then saved.

 

Problem #3: Lost leading zeroes

Codes with leading zeroes are common in business systems (e.g. employee ID 0004578). These leading zeroes are silently truncated by Excel both when the string 0004578 is typed directly into a cell or the formula area. Excel will have the number 4578 instead of the text string 0004578.

Note that leading zeroes will be truncated when data is loaded into Excel from a .CSV or text file and then saved.

 

So problem #2 and problem #3 are both due to using Excel to edit data, rather than with the Excel format.

 

Solutions

My first rule is to avoid using the Excel format unless users need to use the Excel tool to edit the data.

 

Given that Excel must be used,  we will deal with these problems in reverse order.

 

Problem #2 (Unintended Changes in Structure)  &  Problem #3 (Lost Leading Zeroes)

These can only be dealt with by user education and good error handling in the data load.

Users can be taught to:

  • Ensure that leading zeroes are captured (by using a single quote)
  • Leave the worksheet name alone
  • Leave the column names alone
  • Don’t introduce extra columns or remove columns

 

Error handling can attempt validate that the expected columns are present and that codes match existing master tables where possible and that values fall within expected ranges.

The feedback of errors to the correct users can be tricky if the data load is occurring unattended or asynchronously.

 

Problem #1

In its MSDN article, Microsoft describes a work-around that involves changing the registry so that more than 8 rows are read to determine the data type. This is not a great solution because there can still be mixed types in a column.

I think you need to either force users to enter data of a single type consistently in every row or bypass the Microsoft Excel Jet/DAO/OLDEB driver.

 

 Given the problems with the Microsoft OLEDB driver, I feel its best to bypass it and find another BIFF reader.

 

Alternative 1: ExcelDataReader

I recently tried this library available on codeplex. It’s a lightweight and fast library written in C# for reading Microsoft Excel files (’97-2007).

It is open source and works on .Net 2.0 and later as well as mobile and mono.

Look for it at http://exceldatareader.codeplex.com/

 

I used both the convert to DataSet and the direct use of the reader. It worked fine for the 2 files I was working with – (a customer master of about 8000 rows and a product file of about 500 rows). It recognised data, even in mixed columns and was able to convert the data whenever C# had a valid cast. This snippet of code shows how I tested this from the file upload click event on an ASPX form.

 

using Excel; // from http://exceldatareader.codeplex.com/

// It seems necessary to save as a file before using FileUpload1.PostedFile.InputStream

String tempPath = System.IO.Path.GetTempPath() + “\\”+System.IO.Path.GetRandomFileName();

FileUpload1.SaveAs(tempPath);

var stream = FileUpload1.PostedFile.InputStream;

//Reading from a binary Excel file (’97-2003 format; *.xls)

IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);

try

{

// DataSet – Create column names from first row

excelReader.IsFirstRowAsColumnNames = true;

DataSet excelDataSet = excelReader.AsDataSet();

GridViewAll.DataSource = excelDataSet;

GridViewAll.DataBind();

}

catch (Exception ex)

{

errorMessage = FileUpload1.FileName + ” does not seem to be in Excel (*.xls) format. Error creating or binding dataset from excelReader : “ + ex.Message;

}

finally

{

excelReader.Close();

System.IO.File.Delete(tempPath);

}

 

Alternative #2: Using The Office Automation APIs

Using the Office API to read the spread sheet data will get around the problems of mixed types, and also provides a huge amount of functionality.

However the Office API is not typically chosen to provide an Excel data upload for these reasons:

  • The office API  error handling is not suitable for “unattended execution”. http://support.microsoft.com/kb/257757. “Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behaviour and/or deadlock when Office is run in this environment.
  • Office needs to be installed on the machine running the upload. If this is a server then it would not typically have office installed. Office requires licensing and service packs.
  • Each version of Office has a different API, so although the load will probably work between versions, there may be some problems with versioning.

 

Alternative #3: SpreadsheetGear

Another possibility for error-free reading of spread sheets is SpreadsheetGear.

This can be used for unattended execution. This is a great product that has worked well for me in a project that required programmatically pushing values into a spread-sheet to get calculated results.

SpreadsheetGear  does require a license, however,  and  is probably too expensive and feature rich for what should be just a lightweight utility.

Posted in Data Focus | Tagged , , , , | 2 Comments

Source Control for Database

Thursday, 17 March 2011

I have spent a lot of time migrating database changes between environments to support the various stages in the sofware development lifecycle. I have succeeded in automating aspects of this, and it has always been something of a holy grail of mine to create better automation in this area. Now a product has been released that seems to have achieved this, and done it very well.

The Scene

Almost all serious software development now makes use of source control, and many projects use continuous integration. Code written in C# and Java is nicely controlled using products such SVN and TFS. Developers can view the history of check-ins and releases, see what changes were made , revert to old versions etc. Source control also provides the basis for continuous integration using products such as Team City, which marshal scripted builds and tests. Some products such as TFS, go even further to provide traceability between source changes and project tasks and requirements. All lovely stuff, that provides massive productivity gains!

There is however one major component that has been slow in coming to the party: the database.

A lot of software development involves the design and development of database objects such as tables and sometimes views, functions and procedures. In addition to this structure, a database often contains static reference data that needs to be present in order for a system to operate. Examples are transaction types, or user privilege types that need to be stored as records in the database. The changes in the database are typically co-dependent with code changes, so that related changes in code and database should be released at the same time, and the system fails if they are not.

The Problem

So here is the problem:

Today it is rare that database development and release is smoothly integrated with the development and release of code. This lack of integration and standardisation creates a number of difficulties:

  • Integrating database changes into automated build/test cycles is manual or awkward
  • Mistakes often occur, due to poor synchronisation of changes between code and database

Just to be more clear, let’s use an example of a team developing a financial modelling product in C# which stores financial models and client information in a SQL Server database. The team uses Team City, so that when developers check-in a change, the code is automatically built and tested on a server. Check-ins can easily be tagged and a release built and deployed in a test or production environment. Now, a change is made, involving the addition of a new field to the model and a change of another field from integer to string. To support this change, a column is added to the FinancialModel table in the database, and another column is changed from Int to Varchar. When the code change is deployed, the system is broken unless the database changes are also deployed. Conversely if the database changes are deployed, the system is broken until the code changes are deployed.

The database changes would ideally be managed in the same way as code , using source control and automated deployment . But this is not so easy, and here is why:

  • Database objects don’t exist as simple files that can be versioned.
  • Database deployment cannot simply involve the copying and replacement of the target objects, because this may involve loss of user data. For instance, deploying a change to the FinancialModel table cannot involve simply replacing the FinancialModel table, because valuable data would be lost.

Deployment of database changes, involves the creation of alter and update scripts. Often these are hand-crafted, although there are tools that can automate the production of database synchronisation/deployment scripts. Examples of these tools are Red-Gate SQL Compare for SQL Server and Toad and SQL-Developer for Oracle .

The Solution

Now, there is a new tool that takes the next step and integrates database with source control. This is an important enabling technology for software developers, because it allows the database objects to be included in the continuous build and release cycle that works so well with code.

http://www.red-gate.com/products/sql-development/sql-source-control/

A user review with some screenshots.

http://www.troyhunt.com/2011/02/automated-database-releases-with.html

http://downloads.red-gate.com/HelpPDF/ContinuousIntegrationForDatabasesUsingRedGateSQLTools.pdf

Costs:

Unfortunately, this is not open source. The costs are:

  • SQL Source Control: $295 (A copy is needed for each developer who needs to checkin/out changes on the database)
  • SQL DeveloperBundle: $1500 (One copy is needed per build server, to push changes to the target database)

Key Features:

  • Fully integrated with SQL Server Management Studio
  • Connect databases to Team Foundation Server (TFS) or Subversion (SVN)
  • Commit new, edited, and dropped object to source control with a couple of clicks
  • Easily get the latest version of objects from source control
  • View SQL differences between objects in the database and source control
  • Undo (revert) uncommitted schema changes
  • Automate schema deployments with SQL Compare Pro and data deployment with SQL Data Compare Pro
  • View development history for easy access to specific database versions
  • Support for Vault, Vault Pro, Mercurial, Perforce, CVS, Git, and Bazaar due Q1 2011

Limitations:

  • Unfortunately only works for SQL Server at the moment. (An Oracle version may come out in future because Red-Gate have an Oracle version of some other key tools)
Posted in Data Focus | Tagged , , | Leave a comment