All Forums Aster
Pradi 11 posts Joined 04/14
12 May 2014
Loading data into aster database (Aster express)

Hi,

I'm using ncluster_loader for loading data into Aster express (6.0). But if i want load data from realational
database like MySQL then how should i do it, is there any connector for that?

also if i've csv file containing data in 100 columns how should i load this into aster database, i mean do i've to create
table of 100 columns or is there any other way where table can be created on the fly ?
 
Please help me in this

Thank you
Pradi

17 May 2014

Pradi-
(I apologize if this is a double-post.  It looks like the site "ate" my previous response.  I'll try to re-create it from memory.  Hopefully there won't be two similar but different versions out there.)
To answer your first question, there is an SQL-MR function called "AnyDB2Aster" that allows you to load directly from (almost) any database that has a jdbc driver.  I am not sure if it comes with the free Aster "Express" distribution, though.  Log on via ACT and type "\dF+" or run query "select * from nc_all_sqlmr_funcs;" and see if you have it.  If not, you'll either have to find a copy or write your own.
To answer your second question, there's no built-in way to automatically create tables on-the-fly while loading.  However, there IS an option to the ncluster_loader command that allows you to run a series of SQL statements (one per line) saved in a file.  The "--begin-script" option takes a file name as an argument, and the ncluster_loader runs the commands in the file.
You could:
1. Write a shell/perl/python/whatever script that parses your file and figures out how many columns it has, then compiles a "create table" command and saves it to a temp file.  The create table command would have one column definition for every column in your file.
2. Run the ncluster_loader command with the "--begin-script" option pointing to the temp file you just created.
Here's an example that assumes a comma-separated file with the first line as a header with column names:
-----------------------------snip-----------------------------
#!/bin/sh
# load-on-the-fly.sh     - Load to Aster, creating a temp table on the fly
#   usage: load-on-the-fly.sh filename
#
#

#Some set-up:
ASTERHOSTNAME=192.168.100.100
ASTERUSERNAME=db_superuser
ASTERPASSWORD=db_superuser
ASTERDATABASENAME=beehive

#And the real fun begins:
TMPFILE=`mktemp`.sql   #Create a temporary file for the --begin-script
TMPTABLE="public.temp_in_$$"

SQLCMD="create table ${TMPTABLE}("

FIRSTLINE=1 #Keep track of whether or not we need a leading comma

for i in `head -1 $1 | sed 's/,/ /g'`; do
  if [ ${FIRSTLINE} == 1 ]; then
    COLNAME="$i"
    FIRSTLINE=0
  else
    COLNAME=",$i"
  fi
  SQLCMD="$SQLCMD ${COLNAME} character varying"
done
SQLCMD="${SQLCMD}) distribute by replication;"
echo "${SQLCMD}" >> ${TMPFILE}

ncluster_loader -h ${ASTERHOSTNAME} -U ${ASTERUSERNAME} -w ${ASTERPASSWORD} -d ${ASTERDATABASENAME} -c --skip-rows 1 --begin-script ${TMPFILE} ${TMPTABLE} $1

rm -f ${TMPFILE}
-----------------------------snip-----------------------------
I have a nice little data file with some randomly generated person records in it:
-----------------------------snip-----------------------------
linux-qvsn:~ # cat mydata.text
id,first_name,last_name,email,country,ip_address
1,Teresa,Black,tblack@realfire.mil,Panama,240.247.198.212
2,Carolyn,Murray,cmurray@edgeclub.name,Egypt,238.238.245.221
3,Gerald,Reid,greid@skynoodle.edu,Myanmar,219.88.221.60
4,Lisa,Gardner,lgardner@fivechat.net,Argentina,19.206.230.207
5,Christopher,Snyder,csnyder@dazzlesphere.name,Faroe Islands,144.178.219.174
6,Raymond,Hayes,rhayes@muxo.com,Comoros,189.218.108.217
7,Elizabeth,Foster,efoster@skalith.info,Gambia,212.65.242.62
8,Wanda,Weaver,wweaver@buzzbean.gov,Greenland,29.238.186.129
9,Ryan,Black,rblack@flashpoint.name,Malta,100.48.78.166
10,Matthew,Rose,mrose@dynabox.biz,Slovakia,32.221.60.46
-----------------------------snip-----------------------------
Running my "load-on-the-fly.sh" program looks like this:
-----------------------------snip-----------------------------
linux-qvsn:~ # ./load-on-the-fly.sh mydata.text
Loading tuples using node '192.168.100.100'.
10 tuples were successfully loaded into table 'public.temp_in_19401'.
-----------------------------snip-----------------------------
And checking the database itself:
-----------------------------snip-----------------------------
linux-qvsn:~ # act -d beehive -U db_superuser -w db_superuser -A -c "select * from public.temp_in_19401;"
id|first_name|last_name|email|country|ip_address
1|Teresa|Black|tblack@realfire.mil|Panama|240.247.198.212
2|Carolyn|Murray|cmurray@edgeclub.name|Egypt|238.238.245.221
3|Gerald|Reid|greid@skynoodle.edu|Myanmar|219.88.221.60
4|Lisa|Gardner|lgardner@fivechat.net|Argentina|19.206.230.207
5|Christopher|Snyder|csnyder@dazzlesphere.name|Faroe Islands|144.178.219.174
6|Raymond|Hayes|rhayes@muxo.com|Comoros|189.218.108.217
7|Elizabeth|Foster|efoster@skalith.info|Gambia|212.65.242.62
8|Wanda|Weaver|wweaver@buzzbean.gov|Greenland|29.238.186.129
9|Ryan|Black|rblack@flashpoint.name|Malta|100.48.78.166
10|Matthew|Rose|mrose@dynabox.biz|Slovakia|32.221.60.46
(10 rows)
-----------------------------snip-----------------------------
 
I hope that helps!
 
 
[IMPORTANT: Code is provided as a courtesy, in an "as-is" basis w/o any promise that it will run properly.  Exercise extreme care, and run at your own risk.  No warranty is given or implied, and author shall not be held liable for any adverse consequences.]

Pradi 11 posts Joined 04/14
27 May 2014

Thanks a lot mcooper :)

You must sign in to leave a comment.