RE: repeated conversion of data summarized into rows and columns

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Lets see how this works for you. For the moment I did not try to normalize
anything - this is going to write it all into one huge table. You can see
the database name, table name and field names I used in the code. You will
have to change all of that by changing the "Const" values in the code.

This will do the job in one 'pass', so it may take a while to complete.
This may be a task you want to start just some time before you head off to
bed, or leave from work some day. 31 million of anything is a lot of things
to work through.

To put the code into your workbook:
Start by makiing a copy of your workbook to work/test with!! Nothing like
losing 31M rows of data to just ruin your day.
Open the copy of the workbook. Press [Alt]+[F11] to open the VB Editor
(VBE).
From the VBE menu, choose Insert --> Module
Copy the code below and paste it into the empty module presented to you.
Make edits to the Const values to set it up to access your database and a
table set up to receive the data.
Set up the Tools --> References (again from the VBE menu), to reference the
latest
Microsoft ActiveX Objects x.x Library
where 'x.x' is the version number listed - there may be several versions in
the list.

My thanks to www.exceltip.com for providing the code snippet that made up
the heart of this solution. Saved me a lot of time in reviving old memories
on how to connect to Access!

Sub ADOFromExcelToAccess()
'requires a Tools --> References entry to
'Microsoft ActiveX Data Objects x.x Library
' used 6.0 in this application.
'*********************************************
'Redefine these Const values to correspond to*
'the information about your Access database *
'*********************************************
'your full path to the database file
Const fullPathToDB = "X:\IAN_DB\IansUserDB.mdb"
'the name of the table to put the data into
Const tableName = "tbl_RawDataFromExcel"
'the field names in the table
Const UserIDField = "fUser"
Const DateField = "fDate"
Const HourField = "fHour"
Const UseField = "fUsage"
'the first row with user information on the worksheets
Const firstRow = 2 ' assumes row 1 has labels
'end of user definable values

Dim WS As Worksheet
'user from row being examined
Dim currentUser As String
'date from row being examined
Dim currentDate As Date
' used cells in column A
Dim userListRange As Range
' individual cell within userListRange
Dim anyUser As Range
' Column Pointer to work through Hour entries
Dim CP As Integer

' exports data from the active worksheet to a table in an Access database
' this procedure must be edited before use
Dim cn As ADODB.Connection
Dim rs As ADODB.Recordset
Dim r As Long

'assume there will be some work to be done, so
'set up link over to the Access database here
Set cn = New ADODB.Connection
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0; " & _
"Data Source=X:\IAN_DB\IansUserDB.mdb;"
' open a recordset
Set rs = New ADODB.Recordset
' connect to the Access database
' all records in a table
rs.Open "tbl_RawDataFromExcel", cn, _
adOpenKeyset, adLockOptimistic, adCmdTable

'works through ALL worksheets in the workbook
'assuming each has user entries and that no
'other type of information is in the workbook.
For Each WS In ThisWorkbook.Worksheets
'set a reference to the used cells in column A
'of the worksheet
r = WS.Range("A" & Rows.Count).End(xlUp).Row
If r >= firstRow Then ' there is data on the sheet
Set userListRange = WS.Range("A" & firstRow & ":A" & r)
For Each anyUser In userListRange
'skip any gaps in user ID entries
If Not IsEmpty(anyUser) Then
r = anyUser.Row
currentUser = anyUser.Value
currentDate = anyUser.Offset(0, 1) ' from col B
'work through the hourly entries col's C through Z
For CP = 3 To 26 'columns C through 26 = hrs 1-24
'if usage is not zero, add a record to Access
If WS.Cells(r, CP) <> 0 Then
With rs
.AddNew ' create a new record
' add values to each field in the record
.Fields(UserIDField) = currentUser
.Fields(DateField) = currentDate
.Fields(HourField) = CP - 2 ' 3-2=1 ... 26-2=24
.Fields(UseField) = WS.Cells(r, CP).Value
.Update ' stores the new record
End With
End If
Next ' next hour column
End If ' end of empty cell test
Next ' end of anyUser loop
End If ' end of r >= test
Next ' move on to next worksheet
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
Set userListRange = Nothing
Set WS = Nothing
End Sub

"Ian" wrote:

Thanks a lot -- I'll restructure the tables to accomidate the normalization.
I have no idea how to create the code so I appreciate it. I got the db
working at 1.2GB but it was pushing the boundries. I may move it to a blade
server running SQL.

I'll check here but my email is eye (one letter) furst at cdsg dot canada
(suffix)

Thanks again. Ian.


"JLatham" wrote:

I'll work up some code that will take your current entries and put them into
4-field format, that will allow you to pull it into Access and massage it
further.

It's going to have to work in "chunks" because I presume you have all of
this in an earlier version of Excel? It will help if you let me know which
version of Excel you're using.

And, yes the last table is going to have the huge number of records in it.
The savings comes with not having to repeat entire names or dates and such.
If you're user's are identified by something like a social security number,
or almost any other text, then you use a huge amount of memory/disk space
storing each individually. But by just having one copy of that information
and using a long integer to refer to it, you should save a lot of space. I'd
guess that the trade off in query performance between matching a lot of text
entries and an equal number of long integers will actually pay off in enough
performance improvement to make up for having to include the other tables to
retrieve the "human readable" reference to the user IDs and dates involved.

Besides, having user IDs entered only ONCE in the database reduces risk of
typographic error big-time. If you use names, then typos can kill you, and
same even for ssan's (111223333 is not the same as 111232333)

You can ask about it in the Access forum, but I think that they will agree
that good normalization of data 'insists' that you repeat data an absolute
minimum number of times. The structure I outlined keeps user ID and date
entries to an absolute minimum, using references to identify the original
data entry.
"Ian" wrote:

OK - I've got a sample db up and running with 33.5 record sets in Access at
1.2GB and can move the data into a pivot table (with some minor physical
memory issues). What I'm really going to need help with is how to move the
data from the current format into the 4 column record set format. Thanks
again. Ian.


"Ian" wrote:

Thanks for the advice on Access -- I hadn't thought about breaking down the
tables like that but I'll still end up with 31million rows in the UsageTbl
correct? I'll save some space because the dates, userID's are in seperate
table but won't it run slower because the queries will pull on multiple
tables? Maybe I'll leave the userID, Date and Hour in one db as a "lookup"
then each years worth of data in another linked db (keep the record set to
10million or so). I posted on the Access forum to get help in creating a
sample 31million dataset to check for size and functionality.

I was looking for advice on code to pull the data out of the current table
format to covert it into a record set format (with 4 columns). I can import
it in several batches to the db once I can get it into the 4 colum. This
will be a once a year conversion. I have limited VBA skills with forms in
Access but am otherwise inept.

Thanks Ian.

"JLatham" wrote:

Possible double post - system hiccuped.

First of all, 31 million rows of data is a pretty hefty load for Access.
But for any database I'd use 5 tables :
UsersTable 1200 entries for your users individual identification information;
DatesTable 365 or 366 entries for each year and
HoursTable with up to 24 Hours entries per date (you could possibly do away
with this one and store the Hour as part of the 4th table);

The 4th table, UsageTable, would be a table that holds pointers to the
UsersTable, Dates and to the HoursDates table and containing a usage entry.

To add new usage entries, you'd choose User, Date and Hour from 3 drop downs
and enter the usage on a form. An entry in UsageTable would contain nothing
but Long values for the 3 pointers, plus an integer or single field for the
usage. Fields in it would be your 4 fields. Single record in it might look
like:
USER DATE HOUR Usage
1 2 3 5.00
which is the same as your User 1, 2-Jan-2005, (hour) 3 5.00 entry in the
sample table.
That's about as compact as I can envision it, and relatively easy to create
queries into, and it actually adheres to normalization rules.

Now, what exactly are you looking for from this forum with your question of
"Any good ideas about how to unsummarize the data?" Code to move into Access
tables?, Way of reorganizing it to do the same? Something else?


"Ian" wrote:

I have data ("Usage") with hours 1-24 as the column headers and dates Jan
1-Dec 31 in rows 1-365 for user 1, rows 366-733 for user 2, etc... for 1200
users. The year 2005 is on the first sheet, 2006 on the second, etc....

I'm going to clean up the data by putting it into an access db with only 4
fields (Date, Hour, User, Usage) (31 million rows) then connect with Excel to
use pivot table functions.

Any good ideas about how to unsummaryize the data?

Hour
User Date 1 2 3 4 5
1 1-Jan 5.00 6.00 4.00 4.00 10.00
1 2-Jan 3.00 2.00 5.00 9.00 3.00
1 3-Jan 11.00 6.00 8.00 2.00 2.00
1 4-Jan 7.00 5.00 8.00 4.00 10.00
1 5-Jan 2.00 8.00 4.00 9.00 7.00

.



Relevant Pages

  • Re: Windows service and outlook
    ... is stored in a database and the service scans the database for new ... Dim oSyncs As Outlook.SyncObjects ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Is This Possible ... ? Yes - Upload images to an Access database
    ... It looks like your databases folder is outside the root of your web ... I created a database exactly as you said, the only change I made was to ... and underneath an 'upload' button and a 'view images' button. ... Dim con As New Data.OleDb.OleDbConnection ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Is This Possible ... ? Yes - Upload images to an Access database
    ... It looks like your databases folder is outside the root of your web ... I created a database exactly as you said, the only change I made was to ... and underneath an 'upload' button and a 'view images' button. ... Dim con As New Data.OleDb.OleDbConnection ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Is This Possible ... ? Yes - Upload images to an Access database
    ... I created a database exactly as you said, the only change I made was to ... upload it into the 'databases' folder, and not a folder called 'App_Data', ... and underneath an 'upload' button and a 'view images' button. ... Dim con As New Data.OleDb.OleDbConnection ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: mailmerge and sql
    ... that is essentially a database application with a document ... the recordset to the Word Template and use it as if I got ... >> using an ADO recordset as a datasource (if it could be ... >Dim oCatalog As ADOX.Catalog ...
    (microsoft.public.word.mailmerge.fields)