Introduction

Data Frame Loader is a simple model for storage of rectangular data originating from csv files into a fixed a schema, that does not require creation of new tables when new schemas need to be persisted. For example when using MS Sql Server data import wizard, it will require you to create new tables for every new schema. Data Frame Loader persists all data frames, regardless of their schema, into a small number (4) of fixed tables.

This approach is particularly useful in a scenario where
  • the database environment is controled, thus making it difficult to create adhoc tables on-demand.
  • there are many different ad-hoc data files that need to be loaded into the database.
  • there is a need to do simple validation of data before it is used.
  • there is versioning of dataframes

Quick Start

  1. Download the bits from codeplex
  2. Build the project (open DataFrame.sln)
  3. Initialize the database (use the scripts in sql/ folder)
  4. Create a sample schema
    1. See supported Storage Data Types page.
    2. For example we can use the included yahoo stocks historical prices sample under doc/sample-data/ge-hist-prices.csv
    3. Setup Schema Info insert into dbo.SchemaInfo (Name) VALUES('YahooPrices') this simply creates a simple name and a unique id for tracking full schema information that will be used in the next step.
    4. Setup Column Info
given data that looks like the small sample below (taken from doc/sample-data/ge-hist-prices.csv)
Date Open High Low Close Volume Adj Close
2/16/2012 18.86 19.03 18.8 19.01 40494300 19.01
2/15/2012 19.01 19.02 18.7 18.76 41380000 18.76
2/14/2012 18.93 19.09 18.8 18.94 39229400 18.94


We initialize the schema as follows:
declare @SchemaId int
select @SchemaId = [Id] from dbo.SchemaInfo where [Name] = 'YahooPrices'

INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Date', @SchemaId, 'DateTime', 0)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Open', @SchemaId, 'double', 1)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('High', @SchemaId, 'double', 2)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Low', @SchemaId, 'double',  3)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Close', @SchemaId, 'double', 4)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Volume', @SchemaId, 'long', 5)
INSERT INTO dbo.ColumnInfo (Name, SchemaId, DataType, [Index])	 
VALUES('Adj Close', @SchemaId, 'double', 6)

  1. Now we are ready to load the data. This can be done in three ways
    1. GUI - a simple Winforms UI is provided.
    2. Console - a simple command line program is provided.
    3. Code - write you own .net code utilizing the DataFrame API.

Here we will use the GUI tool:
  1. Open the GUI tool (Privosoft.DataFrame.UI.exe)
  2. Click Import button in the toolbar
  3. Import Data Wizard dialog will popup. DfLoaderImportDataScreen.PNG
    1. Select the schema of file you are importing: YahooPrices
    2. Select the file you will be importing
    3. Click import. If there are any errors they will be shown in the 3rd pane on the right.

Last edited May 21, 2012 at 10:43 PM by privosoftllc, version 9

Comments

No comments yet.