All Forums UDA
Paddu 14 posts Joined 09/07
01 Oct 2007
Difference and concepts of ETL.

Hi Recently i was joined in IT Company. Now i am a Trainee in DWH, still no project are alloted to me.Last day my PM told that get trained in Teradata of ETL part, and didn't explain me any more.CAn anyone please tell me 1) Difference in ETL, Administration and DQ of Teradata.2) What are the tools i have to learn in ETL of Teradata.Presently I am working in INFA tool.Any replies are highly appreciated.Thanks in AdvancePaddu

marcmc 112 posts Joined 12/05
02 Oct 2007

Your first port of call is to find out if there is a data warehouse already existing that you are scheduled to start supporting or if your team are building a new DW from scratch.If there is already a platform then you need to find some of the following:* What technologies your Teradata server sits on i.e. Teradata for Windows/Teradata for Unix/Linux. * What Teradata/3rd Party utilities you use to load your data from the source system into Teradata(staging area) i.e. BTEQ/FAST LOAD/MULTI LOAD/IMPORT/TPUMP etc. You can locate these via START>>PROGRAMS>>TERADATA CLIENT...betq/fload etc* If you use only stored procedures to load the data from staging into the main DW tables or if you use a mixture of Stored Procedures & UDF's written in C/Java etc.Any good developer/DBA could give you the answers to these.1) Difference in ETL, Administration and DQ of Teradata.ETL stands for Extract/Transformation and Load. It is the process by which you get your data from the source system into your data warehouse. Administration is the DBA function and consistes of storage management/backup and restore/recover methods/Source Control(maybe)/Security etc.2) What are the tools i have to learn in ETL of Teradata.As Above, this depends on the technologies and utilities you use in order to get data from source to destination. The Utilities are generally BTEQ/FLOAD/MLOAD/TPUMP/OLE Load/Fast Export etc and then Teradata stored procedures but there are 3rd party tools available but not supported(i think). You will need exposure to all of these utilities and tools to get training.3.) The teradata training web site provides a wealth of information on what training can be acquired.http://www.teradata.com/services-supportor talk to your vendor who could arrange and advise on formal training courses.Hope this helps, Best of Luck.

Paddu 14 posts Joined 09/07
03 Oct 2007

Hi SupremeThanks for your guidelines....I have small doubt on it, you told that utilites are there(FLOAD\MLOAD\etc) to load data from Source systems to Target(Teradata database). Then why informatica?Where it comes into picture...?Thanks in advance..Paddu

marcmc 112 posts Joined 12/05
03 Oct 2007

Hey Paddu,It seems you get a 'Supreme Being' on this forum after a number of questions & answers are posted on this forum. It certainly is not reflective of the experience I have with Teradata. I have only been using it for 6 months and am still very much a novice. However, I do know data warehouses and was heavily involved in Migrating our old SQLServer DWH to Teradata."BTEQ/FLOAD etc". These are the load utilities we used as they are native to Teradata. They have advantages and disadvantages like anythinmg else. However, They are not exclusive.I have not used Informatica but as far as I know it is a 3rd party tool. Maybe Teradata sell and support it, I don't know. Maybe one of the real gurus on this forum can answer that or ask your developers and/or vendors.All being said I have read a little about Informatica. It seems it is a Business Intelleigence and Data Load tool. If you are using it there must be an expert on it within your team. At the end of the day it's another way of getting data from one place to another.Hope that helps.Marc.

sudhiroracle 8 posts Joined 10/07
03 Oct 2007

As far as my knowledge goes, informatica is a tool which has certain analytical functions performed outside the database. you can create transformations in informatica, which helps to create complex queries using a graphical design tool. you can create certain type of transformation using informatica power designer. basically, it helps you to do away with lot of coding. you can achieve the same thing using the PL/SQL too. but when you are using the operational database, it is not appropriate to perform any data warehouse operations. it may affect the performance of the system. so we can take the data warehouse related functions out of the operational database. Please let me know your thoughts.

Paddu 14 posts Joined 09/07
03 Oct 2007

Hey TenaYour thought on Informatica is Right.To overcome the PL/SQL codes, informatica helps a lot.I have some questions on your reply:1) Is PL/SQL present in Teradata?2) "Operation Databases" means are you saying about the "OLTP".3) If Teradata utilites are there for loading, then why we need to go for Informatica?( may be to reduce the coding).4) Is all the teradata utilites are GUI or CUI?Hi MicPlease add your thoughts also in it.ThanksPaddu

joedsilva 505 posts Joined 07/05
04 Oct 2007

Teradata has it's own version of stored procedure language, there are variations in syntax with oracle, but it's not that hard to grasp, and it's not as sophisticated as Oracle PL/SQL.Starting TD 12.0, we would also be able to write Java stored procedures.Tools like informatica give a generic interface to ETL operations against various DB systems, which means you may need not learn vendor specific ETL scripting syntax etc... most of these ETL tools also has plugins for vendor specific utilities (like fastload, mload etc in Teradata) than can be used via the ETL tool, this ensures that the ETL tool gets maximum performance leverage while still giving an (almost) vendor independent development interface to the developers.Teradata utilities are terminal based (much like windows command prompt), you can also write all the instructions into a text file and execute it using the utility. which is the general approach.

marcmc 112 posts Joined 12/05
04 Oct 2007

Lots of interesting stuff there.Cheers.

Paddu 14 posts Joined 09/07
04 Oct 2007

Its good and great explanation from all.Thanks a lot.... :-) In the mean while, i met so many experienced guys to have more on TD. Following are the points that i came to conclude on ETL of Teradata. Please add any more points or please correct if any wrong goes.1) MULTI Load is the Utility to perform the Extraction Process, this may extract data from any system and creates the flatfiles may be temporary.2) Using BTEQ, we perform the Transformation. This may be like performing the logics of operations to the Target or it may be any from of Logical operations.3) FLOAD is the Utility to perform the Loading part of ETL. Once the flatfiles created in the MLOAD goes tranformation in BTEQ then the result will be loaded in to the Targets through FLOAD Utility.4) Informatica a Complete tool for ETL, is used to perform all the above actions. This tool must enable the Plug ins of all the Utilites, so that to perform the actions.5) All the TD Utilites are Command Based tools. Have their own set of commands in each Utility.6) Base Knowledge to have TD: BTEQ, MLOAD, FLOAD.I belive this discussion will make the Beginners well knowledge on TD.

shubh 16 posts Joined 09/07
04 Oct 2007

hi PadduI am little confused on the way you differentiated MLOAD and FLOADAs per my knowledge, both MLOAD and FLOAD are used to load data from FILE to TABLE with the following difference:1) FLOAD is used to load data from the file to an EMPTY table only.2) FLOAD cannot be used for updates or upserts into the table. For updates MLOAD should be used3) FLOAD as the name suggests has very high performance.4) Also FLOAD does not load duplicate records. If there are 2 records with exactly the same values of each record, only one will be loaded.However, multiload loads both the records, if your table definition permits.The above is as per my knowledge.Please correct me if i am wrong. Thank you.

Regards,
Shubhangi

leo.issac 184 posts Joined 07/06
04 Oct 2007

The discussion is quite interesting . I would like to add few more questions here.It is understood that MLOAD and FASTLOAD are native tools of teradata for ETL process.Informatica is a third party tool that has plug-ins for teradata and which can do some Transformations to data before loading the data into Teradata .Now I would like to understand how expensive or inexpensive it is in terms of cost,maintenance and usage when a third party tool is used for ETL in a datawarehouse environments.Why do companies opt for a third party ETL tools when Teradata has its own set of ETL stuff.

Paddu 14 posts Joined 09/07
04 Oct 2007

Hi SubhaYes, what you told is exactly right. After looking into your comment, I made again my questions to my senior people. Now i was good enough.You have explanied about the FLOAD, let me add some points to that...Oracle retreive\load the data in the form of ROWS nothing but a line of record. But in Teradata, some utilites will do the action in the form of Packets. Rows will be converted into packets and vice versa though CLI(Call Level Interphase).FLOAD loads the data in the form of Records only. Where as MLOAD loads the data in the form of PACKETS.( I too need some more info about other utilites)One more point is, MLOAD will make the TABLE LEVEL LOCK, that means when MLOAD is happening, we cann't fire any query on that particular table to fetch the records.Hey Marc, Tena, JoedNeed your inputs too.....Coming to Leo Question about the" why we need Third Party tools?"Because, All the Teradata tools are Command Based tools and requiers the Scripting Knowledge. Coming to Informatica, 1) Its a GUI.2) In built features which makes 0 coding.3) Scripts will be generated automatically.4) We can even change the Generated Script as required.Thats may be the main Features.

leo.issac 184 posts Joined 07/06
05 Oct 2007

Paddu,one more question on third party ETL.I have seen environments that still use teradata MLOAD and FASTLOAD to load data.However they are scheduling these loads using Job schedulers.So now I would like to understand how expensive or inexpensive it is in terms of cost(expenditure),license for third party tool and implementation overhead when informatica is used compared to Teradata ETL.

Paddu 14 posts Joined 09/07
05 Oct 2007

HeyI dont know the cost of the tool or Licence, even i am working in Informatica i dont have these details.Coming to the importance, you are right, even so many are using Utilites of TD only. But this third tool will make us such that there will be no difference in working with TD are any other DB. May be Connections differs. This third party tool will make us 0 level coding thats the over all benifit and make us clear picture of dataflow as it is GUI.

joedsilva 505 posts Joined 07/05
09 Oct 2007

From a non-technical standpoint, also remember that there are lots of folks who know tools like Informatica, DataStage etc out there compared to folks who are comfortable with Teradata ETL tools, the rest is a bit of maths and diplomacy when you are a customer who wants his stuff up and running quickly.

29 Oct 2007

You've got to remember that the ELT in Teradata is not free. If you load raw data into Teradata and use the scripting language and tools to transform, consolidate, remove duplicates, perform data quality and deliver it to BI tables you pay a price in terms of disk space and CPU utilization. This can be minimized if you run it overnight (and don't have big backup windows) but you still need to size your Teradata to handle a lot of extra hot storage space for the multiple times you transform and store and stage your data.I am currently the architect on an EDW using a 100 server DataStage grid and a large Teradata database and we have the ability to transform data 24x7 - trickle feed the EDW during the day and fast bulk load overnight with ready to load data. We can move into more heavy duty transformations such as address standardization, we can perform change data capture and change data detection on the grid with minimal impact on Teradata and we can use the grid for sending data to other systems.Informatica has added pushdown optimization for executing some steps in a database like Teradata and IBM are adding similar functionality to DataStage 8.1 at the start of next year. This gives you the best of both worlds and lets you build Teradata SQL steps (including temporary table support) in the GUI interface of the ETL tool. You can push ELT functions into both the source and the target databases.In the ETL versus ELT debate the are pros and cons in both directions.

Someshnr 53 posts Joined 06/07
09 Nov 2007

Please go to the following links and read all the relevant topics you want to learn about Teradata Data warehouse:http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Databasehttp://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Tools%20and%20Utilities

anill 26 posts Joined 08/07
20 Nov 2007

Hi friends...especially thanks to paddu(who raise first doubt).. I got more information regarding Teradata ETL tools....But I have basic doubt about ETL teradata tools... i.e., can we use Teradata ETL tools for loading data into warehouse which is notmade by Teradata??what abt performance...third party tools can give more efficient performance than Teradata ETL??(warehouse is not build with teradata dbms)... waiting for u r reply......Thanks and RegardsAnil Reddy

Paddu 14 posts Joined 09/07
20 Nov 2007

Hi AnilThird party ETL tools like Informatica or any other, will not change the performance behaviour. It only makes the Developer to interact with the databases with graphically and makes it as User friendly.Yes... we can load data from one database to another database using this third part tools.Let me know in case you need more info from me.ThanksPaddu.

lateshpant 16 posts Joined 01/08
09 Jan 2008

yes, your understanding is correct on fastload/multiload.just to add multiload is aslso used for deleting the records and also for other subtle advantages of readng the table while loading and also overcoming the limits of no of files / tables during the load.Probaly Paddu meant something equivalent of BTEQ Export / FastExport, as the name suggests which are used for moving data out of TD into flat files.TPUMP is another tool which one can use to get Extarct and load continously( can't speak more on this as have never practically used).

shakir 6 posts Joined 01/08
12 Jan 2008

FastExport is the Utility to perform the Extraction Process and this may extract data from any system and creates the flatfiles may be temporary and MLOAD is the Utility to perform the Loading of the FastExported data files.

21 Jun 2013

Hi
 Thanks to every one giving valuable answers, Now I get some idea about TD.

santhosh1991 1 post Joined 04/15
09 Jun 2015

what is ETL teradata

You must sign in to leave a comment.