Data Pump Import

Version 3

    by Cris Foot


    Let’s finish this series on 10G  Data Pump by investigating the Data Pump Import utility. In this blog, we'll  learn how to execute 10G Data Pump Import and take advantage of some of its more  popular features.


    We learned in a  previous blog that Data Pump Export is used to transfer data from an Oracle  database into a flat file. The  blog also contained a few quick examples of the benefits that transferring  data from an Oracle database to a flat file provides. Now that we have the data  in that flat file, we need to learn how to run the Data Pump Import utility to  transfer the data back into an Oracle database.


    Data Pump Import
    Let's begin our discussion with a quick review of  a sample Data Pump Import command:

    impdp foot/foot directory=data_pump_dir1 dumpfile=expfull.dmp  logfile=impfull.log full=y


    "impdp" tells the operating  system to run the Data Pump Import utility. The parameters that follow are used  to control the job's execution. In our example, the account and password are  both "foot", the file that will contain the input data is "expfull.dmp", the  logfile is"impfull.log" , and a full database import will be  performed

    We learned  previously that you must precreate a directory using a SQL DDL statement on the  database being accessed. If you don't pre-create the directory, Oracle provides  a default directory called DATA_PUMP_DIR. That means that users are unable to  fully qualify the output file and log file as they were able to do in the non  Data Pump versions of Export and Import. You specify a fully qualified directory  on the server when you create the new database directory object and then you  must grant access to that directory. My last blog contains more information on the creation and administration of  Oracle directories.


    Data Pump provides  the following import modes:

    • Full  - Activated by specifying "FULL=Y". Notifies Import to import the entire content  of the input file. The amount of data actually imported is dependent upon the  data contained in the input file. If the input file contains all of the objects  and data from a full database export, all of those objects will be imported. If  the input file contains a schema export, all of the objects in that schema will  be imported.


    • Schema  Import - Activated by specifying "SCHEMAS=schema, schema, schema….). Imports  the entire schema and all dependent objects (tables, indexes, grants, synonyms,  PL/SQL programs, views). The problem with the older version of Import was that  because it didn't recreate the user (just the user's objects), the administrator  was forced to save the user creation and grant statements to recreate the user  with the same privileges it had before the drop was performed. A Data Pump  schema import will recreate the user and execute all of the associated grants,  user password history, etc..


    • Table  Import -  Activated by specifying "TABLES=[schemas].tablename, [schemas].tablename,….  Imports a table and all dependent objects.


    • Tablespace  Import - Activated by specifying "TABLESPACES=tablespacename,  tablespacename, tablespacename… All of the objects contained in the specified  tablespace are imported. If a table is imported, its dependent objects are also  imported regardless of the tablespace they were stored in.


    • Transportable  Tablespace Import - Activated by specifying "TRANSPORT TABLESPACES=tablespacename… Here's  how this feature works. The administrator first runs Data Pump Export with the  Transportable Tablespace feature activated. Data Pump exports only the metadata  for the objects contained in the specified tablespaces. The objects in the  tablespace set being exported can not have any references to objects contained  in tablespaces outside of the tablespace set. In addition there can't be any  objects in the tablespace set that are referenced by objects outside of the  tablespace set being exported. Having a index in a tablespace that is not  contained in the same tablespace set that contains the table it is built upon  would be an example of a dependency reference problem. Administrators use the  TRANSPORT_FULL_CHECK parameter to perform the dependency  verification.

      The administrator then uses an operating system command to  copy the tablespace datafile(s) to the target database. The transportable  tablespace import feature is then used to attach the datafile (and its contents)  to the database. Transferring the datafile using operating system commands is  much faster than exporting and importing the data using the traditional Data  Pump Export and Import operations. Administrators are only limited by hardware  and network performance.


    It is important to  note that you are not forced into using the same import mode as you did for the  export. You can use the schema and table option using an export file that was  created by a full database export. You can also import one table from a schema  level export (and so on…). Lastly, Data Pump Import can only read data that was  created by a Data Pump Export.


    Let's continue by  reviewing a few of the 10G Data Pump Import parameters that you may find  interesting. Many of these parameters will be the same as for Data Pump  Export.


    Data Pump Import  Parameters

    • JOB_NAME=jobname_string - You are able to assign a job name of up  to 30 bytes to a Data Pump operation. This allows the job to be more easily  identified when querying the DBA_DATAPUMP_JOBS and DBA_DATAPUMP_SESSIONS data  dictionary views that provide information on Data Pump operations. If you do not  specify a job name, Oracle will create a default job name for you. The form will  be SYS_IMPORT_mode_nn, where mode is the type of export (FULL, TABLE, etc.) and  "nn" is a incrementing number starting at 01. The job name is used in the ATTACH  command (shown below) to attach, and reattach, to running jobs.


    • ATTACH  [=[schema_name.]job_name] - Used to attach a client session to an existing  Data Pump operation. An example of the ATTACH command is provided  below:

      > impdp foot/foot ATTACH=foot.importfulljob


    • DIRECTORY=directory_object - All of the dumpfile set I/O is  generated by Data Pump processes that run within the constructs of the database  engine. As a result, the O/S account performing the I/O is Oracle, which usually  has a higher level of privileges than most other accounts. Oracle attempts to  prevent unwarranted operations from occurring by using Oracle directory objects  that require read and write privileges granted by the DBA. Users running Data  Pump operations are only able to read and write files in the directories that  they have been granted access to.

      Before a user is able to run any Data  Pump operation, the administrator must pre-create a directory and grant  privileges to the user on that directory. This means that users are unable to  fully qualify the output file and log file as they were able to do in the non  Data Pump versions of Export and Import. If you don't pre-create the directory,  Oracle provides a default directory called DATA_PUMP_DIR.


    • DUMPFILE=[directory_object:]file_name [, ...] - Contains the metadata  and/or data output from a Data Pump Export operation that Data Pump Import is  using as input. The directory object specification is not required to be  specified if a directory is specified in the DIRECTORY parameter. If one is  specified in the DIRECTORY parameter and the DUMPFILE= parameter, the  DUMPFILE=directory: specification will take precedence.


    • ESTIMATE={BLOCKS | STATISTICS} - Used to estimate the amount of  data a Data Pump Import network operation will generate. This differs from the  estimate option that is used with Data Pump Export. Using the ESTIMATE parameter  with Data Pump Export will ask Data Pump to estimate how large the output  dumpfile will be. I'll provide more information on Data Pump network import  operations later in this blog.


    • Fine Grained  Selectivity Parameters (Exclude, Include, Query, Flashback,  Content) -  Allows the administrator to include or exclude objects. For more information on  these parameters, please turn to my previous blog titled "10G  Data Pump Part 1".


        • HELP = {y | n}  - Displays  a listing of all the parameters that can be specified for the Data Pump  operation. For example:

          > impdp help=y

          Would provide a listing  of parameters for a Data Pump Import operation.


    • LOGFILE=[directory_object:]file_name - Contains work-in-progress,  work completed and errors generated during a Data Pump Operation. For a real  time update on a Data Pump operation's activities, you are able to attach to a  running Data Pump operation using the ATTACH command and then execute the STATUS  command. Like its dumpfile counterpart, The directory object specification is  not required to be specified if a directory is specified in the DIRECTORY  parameter. If one is specified in the DIRECTORY parameter and the DUMPFILE=  parameter, the DUMPFILE=directory: specification will take precedence.


    • NETWORK_LINK=database_link_name - OK, I must admit that  this is one of my favorite new features provided by Data Pump Import. The  network transfer option allows administrators to attach to any source database  that can be accessed from the network, transfer the data and write it to the  target database. No dumpfile required! Administrators use the NETWORK_LINK  parameter to specify a pre-created database link that points to the source  database.

      This parameter must be specified when using the FLASHBACK_SCN,  FLASHBACK_TIME, ESTIMATE, or TRANSPORT_TABLESPACES parameters.


    • PARALLEL=integer - Multiple worker processes are able to perform  inter-table and inter-partition work in parallel. This allows administrators to  load and unload database tables using parallel processing. Data Pump Import  takes advantage of parallelism to build indexes and load package bodies.

      The most beneficial aspect of Data Pump's parallel processing feature is  that it is dynamic. Administrators are able to attach (connect) to a Data Pump  job that is currently executing and dynamically alter the number of parallel  processes it uses.

      Oracle documentation states  that the parallel option may cause file contention if too many parallel  processes are used to read the same input dumpfile. As a result, you'll need to  be careful when specifying the level of parallelism.


    • REUSE_DATAFILES={y | n} - Tells Data Pump Import to reinitialize  existing tablespace datafiles. Specifying "N" tells Data Pump Import to generate  an error message from the failing create tablespace execution. The Import will  then continue executing. Specifying "Y" tells Import to issue a warning and  reinitialize the tablespace datafile(s) which results in a total loss of  existing data in the target tablespace.


    • STATUS=[integer] - Specifies the number of seconds between each  status information display.


    • TABLE_EXISTS_ACTION={SKIP | APPEND | TRUNCATE | REPLACE} -  Provides Data Pump Import with instructions to perform when it finds a table in  the target database with the same name as a table contained in the input  file.

      TABLE_EXISTS_ACTION=SKIP - Existing tables are bypassed if they already  exist. No action is taken.

      TABLE_EXISTS_ACTION=APPEND -Data Pump Import will append rows to the  existing table.

      TABLE_EXISTS_ACTION=TRUNCATE - Data Pump Import will truncate the  existing table and load rows from the source.

      TABLE_EXISTS_ACTION=REPLACE - Existing tables will be dropped, recreated  and then loaded from the source.

        • TRANSFORM =  transform_name:value[:object_type] - Allows the administrator to alter  object creation DDL for specified objects . For more information on these  parameters, please turn to my previous blog titled "10G  Data Pump Part 1".


    Commands  Available in Interactive Mode
    We learned previously that administrators  are able to attach and detach from actively running Data Pump operations.  Administrators use the ATTACH and EXIT_CLIENT commands to connect and disconnect  from an active Data Pump job. This output file provides a demonstration of detaching and attaching from an active Data  Pump operation. You'll notice that I dynamically change the parallel setting  after I reattach.


    Let's take a look  at some of the other parameters for Data Pump Import that you can dynamically  change during its execution:

    • CONTINUE_CLIENT - Switches from interactive mode to status  (logging) mode. An example of this command follows:

      impdp>  CONTINUE_CLIENT


    • EXIT_CLIENT  - Detaches the client from an active Data Pump operation. The job continues  to run. The administrator is then able to use the ATTACH command to reattach to  the job at any time. The attach can be performed if the job is executing or in a  stopped state. An error message will be returned if an attempt is made to attach  to a completed Data Pump Operation.


    • HELP -  Provides help on interactive mode Data Pump commands.


    • KILL_JOB - Kills the active Data Pump operation and detaches all client sessions.  This command differs from the STOP_JOB command which is discussed below. The  KILL_JOB will remove DUMPFILEs that were created which prevents the user from  taking advantage of Data Pump's restart capabilities. The log files will  remain.


    • PARALLEL=integer - Change the number of  parallel processes for a Data Pump operation.


    • START_JOB /  STOP _JOB -  Restarts and stops the Data Pump operation the client is currently attached to.  Data Pump jobs update a master table with all pertinent information pertaining  to the job being executed. The master table knows about the objects being  exported or imported and their location in the dump file set.

      As a  result, a Data Pump job can be restarted in the event of a planned or unplanned  stoppage. The Data Pump job accesses the master table to determine where to  start and picks up right where it left off. Administrators are able to use the  STOP_JOB AND START_JOB commands on the client to temporarily stop jobs and  restart them when desired.

      Oracle provides the  SKIP_CURRENT option of the START_JOB parameter to allow administrators to  restart a Data Pump Import operation that failed to restart because of a failing  DDL statement. SKIP_CURRENT tells Data Pump Import to skip the object having the  problem and restart processing on the next object.


    • STATUS=[integer] - Specifies the number of seconds between each  status information display.

    I think after reading these last four blogs, we should all have a pretty  good understanding of 10G Data Pump. We can finally retire Data Pump's ancestors  Export and Import. As we now know, they just can't compete with the features and  functionality provided by Data Pump Export and Import.

    Thanks for  reading.

    Chris  Foot
    Oracle Ace