we have faced similar situations, here below a few notes that could help you - they are quite technical following the tone of your question.
Please contact me offline if you'd like to get more dedicated help: the following steps use Advanced functionalities and a good understanding of the solution helps.
For point (1) I would recommend to leverage the 'multiple schema polling' feature of the latest BPA CDB-CDB ETL. In this way, you can configure access to multiple schema in the same ETL configuration.
- With the wizard, create a new ETL task (let's say ETL-A1) configured with a reasonable number of schema (up to 10) and extraction for one architecture, e.g. Physical standalone. Test it and make sure all credentials are right. The 'Connect to multiple instances?' property is found in the Advanced properties.
- Create another ETL task (ETL-A2) with the wizard and set bogus values, except for lookup sharing and the domain where the systems will be created
- For this second ETL task (ETL-A2), create a new "run Configuration" copying the configuration of the ETL created in step 1 (ETL-A1); now change the architecture covered in ETL-A1 to another one, e.g. AIX.
- Repeat step 2 and 3 for all architectures covered in the 10 visualizer db instances (schema) of point 1 (solaris,etc.), creating ETL-A3, ETL-A4, etc.
- Repeat step 1 to 4 for all other visualizer db instances, creating ETL-B1, ETL-B2, ETL-C1, ETL-C2 etc.etc.
Carefully choose how many database instance you want to manage with 1 ETL
- More ETLs will enable a better and finer control of dataflow
- More ETLs mean more setup and slightly more administration
I would say 5 or 6 instances are a good compromise - your mileage may vary.
Regarding point (2), you can use the 'Days to extract' property to control how many days of data are imported at each run; I believe you should ask BMC support the recommended maximum value for a single ETL run, anyhow I would say tune the number of days to import approx 5 MRows at each run (it's a good compromise I use).
Things to remember
- Create an ETL that will serve as 'master lookup table' for all of the others, ideally it could be the CMDB ETL or an ETL with a fake configuration that does not do anything but serve as 'Master Lookup Table'
- Share the lookup table of each BPA ETL with the 'master lookup table'
- Before the first real run, set the 'Default last counter' property to the first day to import. I'm assuming you know that a proper aging configuration is required to import data (KA350735).
Experience with BCO - the short version is 'not much', that is part of the objective with this sandbox/POC. I am working out TAM and strategic accounts technical resource, and they have been super helpful. However it has been my experience with past BMC products that some of the real world implementation details and practices do not get captured in the support documents, but often are found on the BMC Community.
That said, my level of experience with the BPA products and associated data, databases and etl processing in general is pretty decent to advanced.
Well, then I think you need more than some community exchanges. Renato already provided some useful recommendations but I would strongly encourage you have either our PS or a partner (such as the company Renato works for) to help you in this deployment.
Another comment is related to you mentioning this implementation being for 7000 servers, which I would not consider a good target for a sandbox/PoC environment. So, I would also suggest you to verify your BCO infrastructure is right-sized.
I hope this helps,
Thanks for the tips. I had seem the 'multiple schema' section of the ETL configuration and wondered if that might be my path to simplifying the creation of the ETL process. A couple follow-up questions for you -
- Do you find it increases the administrative/maintenance overhead? Since I am not familar with the day to day upkeep of the ETL processes, I'm wondering if it is easier to maintain a small number of connectors that do a lot of work each or a large number of connectors that do a little work each.
- Is there any affect on the parallelism/throughput of the load process?
- The multiple schema option decreases the administrative and maintenance overhead, it was a nice feature added by BMC for cases like yours with many visualizer db instances.
- The right balance between parallelism and duration of the import process depends on your environment specs, plus it is not the only way to control parallelism; the best practice is to put ETLs in a chain and set the concurrent number of ETLs in the chain configuration
As you are working in a sandbox environment, as Giuseppe suggested, I would not import data for all instances.
On the sandbox/POC comments: My main interest in the BCO product is in exploring the capabilities for environment wide reporting and senior management dashboards. For that purpose I would rather have a small amount of data for all systems than all data for a small amount of systems. The thought I had was to import data from all our visualizer sources and adjust the aging policy to live within the space available. Based on the sizing and architecutre guide that I was provided, the only non-full size component for this sandbox/POC is the database space, otherwise the ETL server, APP/ETL server, and DB server have been sized with CPU and memory to handle full load.