ITPub博客

首页 > IT职业 > IT职场 > use of dbms_random.(zt www.dbazine.com)

use of dbms_random.(zt www.dbazine.com)

原创 IT职场 作者:jametong 时间:2005-04-21 08:37:35 0 删除 编辑

Building a Random Character Value Generator

Oracle Database 10g provides a random value generator via a package named DBMS_RANDOM, which can be used to generate random values. However, those values are arbitrary, not representative of the real-life system. For instance, the following an example of how you can generate a string 30 characters long:

SQL> select dbms_random.string('P',30) from dual;

DBMS_RANDOM.STRING('P',30)
-----------------------------------------------------
O=*KXh}#O`D~1|'^VheWr:jZ>!:P}3

The package DBMS_RANDOM has several functions and procedures, one of which is STRING(), which accepts two parameters and returns a character value. The types of characters generated are determined by the first parameter — OPTION. Here are the options and the types of string generated:

Option Type of String Produced
U Any uppercase alphabetic characters
L Any lowercase alphabetic characters
A Any mixed case alphabetic characters
X Any alphanumeric character in upper case
P Any printable character

You can also use lowercase characters for the options; e.g., x instead of X. In this example, we used X, which produced a string of printable characters, not just alphabetic ones. Clearly, this kind of character string will not represent a customer name. Using an option such as “A” would generate a more appropriate string.

Populating a Table

Let’s start with a real-life example. In the case of Acme Bank’s application development, the team is mulling over the proper indexing, partitioning, physical layout, materialized view planning, and so on, for the ACCOUNTS table, which holds the customers’ account data. Here is how the table looks :

SQL> desc accounts
Name Null? Type
----------------- -------- ------------
ACC_NO NOT NULL NUMBER
FIRST_NAME NOT NULL VARCHAR2(30)
LAST_NAME NOT NULL VARCHAR2(30)
ACC_TYPE NOT NULL VARCHAR2(1)
FOLIO_ID NUMBER
SUB_ACC_TYPE VARCHAR2(30)
ACC_OPEN_DT NOT NULL DATE
ACC_MOD_DT DATE
ACC_MGR_ID NUMBER

The columns are fairly self-explanatory. To represent customers in the real world, the following requirements have been specified for the columns:

Column Name

Purpose

Data Pattern

ACC_NO

Account Number

Any number less than 10 digits

FIRST_NAME

The first name

  • 10% Alan

  • 10% Barbara

  • 5% Charles

  • 5% David

  • 15% Ellen

  • 20% Frank

  • 10% George

  • 5% Hillary

  • 10% Iris

  • 10% Josh

LAST_NAME

The last name

Any alphabetic character between four and 30 but 25 percent should be “Smith”

ACC_TYPE

The type of account — Savings, Checking, and so on

20 percent each of S, C, M, D, and X

FOLIO_ID

The folio ID from the other systems

Half NULL and the rest half a number related to the account number

SUB_ACC_TYPE

If the customer is incorporated, then sub-account types, if any 75 percent null

From the values populated:

  • 5% — S

  • 20% — C

ACC_OPEN_DT

Date account was opened

A date between now and 500 days ago

ACC_MGR_ID

The ID of the account manager servicing the account

There are five account managers, with account percentages distributed as follows:

  • 1 — 40 percent

  • 2 — 10 percent

  • 3 — 10 percent

  • 4 —10 percent

  • 5 — 30 percent

As you can see, some fairly complex requirements were specified, but for good reasons. These accurately reflect how the data will be distributed in real life. In real life, there will be customers with first names like “Josh” and “Ellen,” not “XepqjEuF”; so, the names must be chosen from the set of possible names. And, in the U.S., people are called by a variety of last names. Thus, we want a semi-random distribution with 25 percent of a very popular last name, “Smith.”

Generating Random Numbers

Before we go any further, we have to discuss how to generate random numbers. The package dbms_random contains a function to return random numbers. If you want to return any random number between -2^31 and +2^31, use the function RANDOM. The following shows how we have generated numbers 10 times:

SQL> begin
2 for i in 1..10 loop
3 dbms_output.put_line('Random Number='||dbms_random.random);
4 end loop;
5 end;
6 /
Random Number=-81420432
Random Number=-1024262734
Random Number=-1965250926
Random Number=1439118604
Random Number=675429938
Random Number=-466227661
Random Number=613708106
Random Number=-1154140330
Random Number=-643127572
Random Number=-133140229

This function is available but deprecated. Oracle recommends using a new function called VALUE().

SQL> l
1 begin
2 for i in 1..10 loop
3 dbms_output.put_line('Random Number='||dbms_random.value);
4 end loop;
5* end;
SQL> /
Random Number=.547109841457281046373693994862305187
Random Number=.00500720861223234783817674992068380455
Random Number=.86974675431616311272549579082479240362
Random Number=.16306119784529083761710557238498944243
Random Number=.39327329046753189206427695323437381763
Random Number=.68307381430584611139249432690613072007
Random Number=.24640768414299435941101562183729221882
Random Number=.50568180275705934132098716939122047439
Random Number=.73345513921455391594299189253661168712
Random Number=.71764049051903979360796313613342367114

Note the difference; VALUE returns a positive random decimal number less than one, with 38 digits after the decimal point. In most cases, this is probably enough, with some modification (if you are looking for a 10-digit whole number, for instance, simply multiply it by 10,000,000,000 and use FLOOR() to discard the decimals).

   1  begin
2 for i in 1..10 loop
3 dbms_output.put_line('Random Number='||
4 floor (10000000000 * dbms_random.value)
5 );
6 end loop;
7* end;
SQL> /
Random Number=939084911
Random Number=690705371
Random Number=4696700513
Random Number=7978266084
Random Number=5157885833
Random Number=8902042948
Random Number=5839885968
Random Number=6207324613
Random Number=5633096626
Random Number=1891871746

The function VALUE() is overloaded; another variation of the functions accepts a set of low and high values and returns a number between them. To generate a number between 10 and 20, for instance, you will use

dbms_random.value (10, 20)

Note: The numbers generated will be more than or equal to 10, but less than (never equal to) 20.

Using it to generate multiple values, we get

  1  begin
2 for i in 1..10 loop
3 dbms_output.put_line('Random Number='||
4 dbms_random.value (10,20)
5 );
6 end loop;
7* end;
SQL> /
Random Number=11.0908840421899990054735051046498510644
Random Number=14.5391525077977177619575126573478641174
Random Number=11.1343623028437386191996545034616142284
Random Number=14.8831724931888089801812805693409810454
Random Number=11.7294096895635532492353976105129628219
Random Number=15.0589018451126293465505440533299716691
Random Number=15.6152418028444556585301007183814067306
Random Number=18.2903839241741813303152060555229669855
Random Number=13.0914352695913402037626709927051618367
Random Number=12.6027933168087755621076348461280998683

As you can see, the numbers are generated with 37 digits after the decimal point. This is a very valuable function and will suffice for lmost types of random number generation. If you want to generate only whole numbers, use the same approach used earlier, using the FLOOR() function.

floor(dbms_random.value (10,20))

Generating Specific Characters and Strings

So, how can we generate specific strings that follow a predetermined statistical pattern?

We can borrow a page from the probability theory text to accomplish this. The trick is to use a method similar to Monte-Carlo simulation used by statisticians worldwide. In this approach, we generate a random number, between one and 100 (both inclusive). Over a period of time, the probability that a specific number, say six, will turn up is exactly one time out of 100, or 1 percent of the time. In fact, all the numbers have 1/100 probability. Going by the same approach, the probability that either of two numbers — say, 1 and 2 — will be 2 percent. And, of course, the probability that any one of numbers between one and 10 will turn up is 10 percent. We will use this to configure the probability of the random value.

Take, for instance, the value of the column ACC_TYPE, which calls for equal probability of S, C, M, D and X; or 20 percent probability each. If we generate a whole number between one and five (both inclusive), the probability of each number will be 20 percent. Then we can use a DECODE() function to get the ACC_TYPE value.

SQL> select
2 decode (
3 floor (
4 dbms_random.value (1,6)
5 ),
6 1,'S',
7 2,'C',
8 3,'M',
9 4,'D',
10 'X'
11 )
12 from dual;

First, we are generating a number between one and five (line 4). Since the number is generated is less than the highest value passed as a parameter, we have specified six. And since we want a whole number, we have used the FLOOR() function in line 3. It truncates all decimal values from the generated number. Depending on the number obtained, we used DECODE() to get one of the values — S, C, M, D, or X. Since the numbers 1, 2, 3, 4, and 5 will have equal probability of being generated, so will be the letters — at 20 percent each.

This technique is very valuable for generating random, but useful values, as shown previously. The same approach can be used to generate almost all types of pre-determined random values.

Random Values with NULLs

Remember, the requirement for FOLIO_ID is a little different. It needs only 50 percent of the values populated; the rest should be NULL. How can we achieve this?

Quite simply, we will use the same probability approach with a twist: we will use a determination of yes or no. Generating a random number between one and 100 will ensure 1 percent probability of each number. Hence, a number less than 51 will have exactly 50 percent probability of occurring. We can use this in a CASE statement to get the value.

SQL> select
2 case
3 when dbms_random.value (1,100) < 51 then null
4 else
5 floor(dbms_random.value(1,100))
6 end
7 from dual;

On line 3, we should check to see whether the number generated is less than 51. If so, we return NULL. Since the probability of a sub-51 number is 50 percent , we have NULLs occurring 50 percent of the time as well. In the other 50 percent of the time, we have generated a value to be used as a FOLIO_ID.

Random Strings of Random Length

In dbms_random.string, a random string is generated, but of fixed length. That is not representative of real life; in reality, people have last names of varying lengths. In this example, the requirement is to have a length between four and 30 characters. To facilitate this, we can pass the length as a random number as well to the function dbms_random.string in line 6 below.

  1  begin
2 for i in 1..10 loop
3 dbms_output.put_line('Random String='||
4 dbms_random.string (
5 'A',
6 dbms_random.value(4,30)
7 )
8 );
9 end loop;
10* end;
SQL> /
Random String=RniQZGquFVJYFpGLOvtNd
Random String=GhcphpcsaCXlhigRQY
Random String=JtakoelUf
Random String=BgCOu
Random String=QFBzQxcHqGlHWkZFmnN
Random String=lSxVjqJvpwBB
Random String=jfhNARzALrLOKZRpOwnhrzz
Random String=KuFtdJcqQpjkrFmzFbzcXnYFGjWo
Random String=BhuZ
Random String=GebcqcgvzBfEpTYnJPmYAQdb

Notice that the strings are of different lengths. Remember, 25 percent of the last names must be “Smith,” and the rest must have random lengths. We can accomplish this by combining the random strings and the Monte-Carlo approach:

   decode (
floor(dbms_random.value(1,5)),
1,'Smith',
dbms_random.string ('A',dbms_random.value(4,30))
)

The previous expression will return “Smith” 25 percent of the time and a random alphabetic string between four and 30 characters long the rest of the time.

Putting it All Together

Now that you understood the building blocks of the randomization approach, you can put them together to build the account record generation PL/SQL block as shown below. In the following example, we are loading 100,000 records into the table ACCOUNTS. Here is the loading program in full:

begin
for l_acc_no in 1..100000 loop
insert into accounts
values
(
l_acc_no,
-- First Name
decode (
floor(dbms_random.value (1,21)),
1, 'Alan',
2, 'Alan',
3, 'Barbara',
4, 'Barbara',
5, 'Charles',
6, 'David',
7, 'Ellen',
8, 'Ellen',
9, 'Ellen',
10, 'Frank',
11, 'Frank',
12, 'Frank',
13, 'George',
14, 'George',
15, 'George',
16, 'Hillary',
17, 'Iris',
18, 'Iris',
19,'Josh',
20,'Josh',
'XXX'
),
-- Last Name
decode (
floor(dbms_random.value(1,5)),
1,'Smith',
dbms_random.string ('A',dbms_random.value(4,30))
),
-- Account Type
decode (
floor(dbms_random.value (1,5)),
1,'S',2,'C',3,'M',4,'D','X'
),
-- Folio ID
case
when dbms_random.value (1,100) < 51 then null
else
l_acc_no + floor(dbms_random.value(1,100))
end,
-- Sub Acc Type
case
when dbms_random.value (1,100) < 76 then null
else
decode (floor(dbms_random.value (1,6)),
1,'S',2,'C',3,'C',4,'C',5,'C',null)
end,
-- Acc Opening Date
sysdate - dbms_random.value(1,500),
-- Account Manager ID
decode (
floor(dbms_random.value (1,11)),
1,1,2,1,3,1,4,1,5,2,6,3,7,4,8,5,9,5,10,5,0
)
);
end loop;
commit;
end;

Now, how do we know that all these exercise yielded fruit? After this table is loaded, let’s see the actual distribution:

SQL> select first_name, count(*) from accounts group by first_name;


FIRST_NAME COUNT(*)
------------------------------ ----------
Alan 9834
Barbara 10224
Charles 5046
David 4980
Ellen 15094
Frank 14960
George 14890
Hillary 4898
Iris 10009
Josh 10065

Great! The distribution for each first name is exactly we wanted. For instance, we wanted to have 10 percent rows with first name “Alan,” and we got 9,834 out or 100,000; this equates to approximately 10 percent . We wanted 15 percent with the name “Ellen” and we got 15.094% — pretty close to that number and statistically significant. Similarly, you can go through all the other columns and see how they are actually distributed.

Conclusion

Generating random values in PL/SQL is a much-sought-after, but often less-understood aspect of the PL/SQL language. As we saw in the previously shown case, generating random numbers or strings is not what is most often needed. The real need is to generate data representing the real world, which requires a different approach, using the same built-in functions. In this article, you learned how to generate random values following a predetermined statistical approach, and then how to use it to populate a system mimicking real life.

[@more@]

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/94317/viewspace-796510/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2013-11-23

  • 博文量
    47
  • 访问量
    280612