Importing big MS Access legacy DB into Drupal nodes

Sometimes I need to move old data from a client's legacy site. Last time it was MS Access.. 96 Mb of CVS data =)

Here is how I managed to import all of this:

1) First thing I did - I imported CSV files as tables into the same DB which the Drupal site uses. This would simplify working with it and in case something would be forgotten you could always import this back. And once you are sure that all data is in place - just dump the imported tables.

For some reason I didn't find any ready (well, free) tool to import CSV into MySQL. I wrote a small script which looks for all *.csv files in the current directory, creates its own table for each file, and for each column it creates the same MySQL column with data format set to varchar. Also I added two columns - record_id and import_special; both for storing additional info during importing process. For example when importing rows into nodes - in import_special I am storing NID of a new node.

The script was written in Perl and it is attached.

2) I looked through the existing site and created all the required content types, taxonomies, etc. I mainly used Taxonomy node, Content Taxonomy modules for this.

3) Once Content Types and Taxonomy Vocabularies were in place I started writing script to actually import data from old tables into nodes. Since there were about 40,000 nodes, it was impossible to run it from the browser as a short script, so I added Drupal bootstrapping calls into the head of the script and placed it into Drupal directory. Then I just ran it from the command line like "php import_data.php".

Basically it filled all taxonomies vocabularies, and then started importing nodes itself. Again, for someone looking for a quick way to have things done - the actual working script is attached.

One of the node importing processes was uploading images into Image Field. I borrowed some code from module Remote File to do this =)

Links: