 Analytics Welcome to the Teradata Forums. Read the guidelines on posting. Email notifications and RSS feeds are available, and also a chronological list of all forum activity.
 Print All Forums Analytics 11 Apr 2006 Randomied sample Can someone please helpI have a table with account_numberfirst namelast namepresent balanceoutstanding balanceASRThis table has lets say 2 million records.i have to take a sample from this table for 1000 accounts.but the requirment is that the avarage of the ASR of the base table must be very closeto the avarage of ASR of the sample data.thanks a lot for the help 20 Apr 2006 First you need to decide what you mean by "close to". For the example code I'll assume that close is within 1/2 a standard deviation of the average.One way of doing it is:SELECT * FROM big_tableWHERE asr < (SELECT AVG(asr)+STDDEV_SAMP(asr)/2 FROM big_table) AND asr > (SELECT AVG(asr)-STDEV_SAMP(asr)/2 FROM big_table)SAMPLE 0.1;This will select a 10% of the rows that have an asr within 1/2 a standard deviation of the mean asr.Of course you could just decide that (say) 5000 is close to the average and then replace the STDDEV_SAMP(asr)/2 with the number 5000 in the where clause. Your choice dependent upon how you want to "define close to the average".Hope this helpsPaul 20 Apr 2006 Thanks a lot paul.but the problem i have at hand is by the query above i will be picking accounts with ASR that is near to the avarage ASR of the table.But all i want is i would like to table accounts from all different ranges of ASR but when i calculate the avarage ASR to the sample it should be with in 10% of the avarage ASR of the base table.thanks 20 Apr 2006 Well if you treat the table as the population, then as the sample you draw from that population increases, the closer that the mean of the sample will be to the mean of the population.So if you want the sample mean to be within 10 units of the population mean, you need to work out how big a sample will give you that accuracy.Now assuming that the distribution of asr is approximately normal, we need to know the the standard deviation of asr. Once we have this we then need to decide how accurate do we want to be, eg do we want to be 95% sure that we have a sample that captures the population mean of asr by 10 units, or 99% sure?Once we have decided on the accuracy level, we need to consult a normal table to obtain the critical values for our level of confidence, some of the more common ones are below:90% - 1.64595% - 1.9699% - 2.576Now armed with our knowledge of the standard deviation of asr (stddev_pop(asr)] we can use the following formulae to calculate the required sample sizen approx. ((critical value)*(stddev_pop(asr)/(margin of error))^2 and round up.Plugging some numbers in for a worked example,population is 2 000 000margin of error 10stdev(asr) 2 500want 95% confidence.so(1.96*(2 500/10))^2(1.96*250)^2(490)^2240 100so with a sample of 240 100 we can be 95% sure we have captured the mean of the asr within 10 units.Paul 21 Apr 2006 Paul,Thanks a lot.I will have to try it.Your logic should work ...thanksteja You must sign in to leave a comment.