Severe TDI Issue In Connections 5.5

I have been working with a customer who is migrating to Connections 5.5 from Connections 5.0.   When I do a migration I like to do it properly and create clean data by using dbt.jar to migrate content to new databases.  I know a lot of people are happy with the backup/restore of databases idea but for me that leaves too much scope for bad data to carry over from old system to new.

Everything was going fine, the profiles data migrated and then I tried a sync all dns to sync the ldap data to the database.  Something we schedule daily at this customer.  It failed as it tried to hash the database tables.  The error in the ibmdi.log was

Error: The sort page size property – source_ldap_sort_page_size= – must be greater than 10 if it is not 0. Aborting.

That’s a value that is set in profiles_tdi.properties and it was already set to 0.  So why was it aborting?

I decided to troubleshoot just with a cutdown list of names in collect.dns and using populate_from_dn_file function.  Again it failed but with the strangest error that would find the user in LDAP, get all their values then fail to find the user in the database and fail to update.

In SyncUpdates.log I could see the following error no matter what user I chose for populate_from_dn_file.

ERROR [com.ibm.di.log.FileRollerAppender.bc9c35a0-aae5-416e-9a99-1d418c3c564c] – [callSyncDB_mod] [ProfileConnector] null
java.lang.IndexOutOfBoundsException
at java.util.Collections$EmptyList.get(Collections.java:87)

I then tried copying the collect.dns to my 5.0 production environment and running there and it worked fine, found the users as duplicates and didn’t update them which is the correct behaviour.

I compared the map_dbrepos_from_source.properties files in 5.0 with 5.5 and it all looked pretty much the same.  So I opened a PMR which was eventually escalated to development. As soon as they received it they knew what the problem was – apparently a known but not documented bug that was fixed in CR1 with files that you have to manually deploy (we were already at CR1).

Development’s report of the problem was

log4j.logger.com.ibm.lconn.profiles.internal.service=ALL           
                                                                       
in log4j.properties causes TDI populate and sync commands to crash if an
EMPLOYEE is altered        

Well the crashing was true but the value  log4j.logger.com.ibm.lconn.profiles.internal.service=ALL was # out and unused so it wasn’t related to that particular log setting in my case.

The fix was to go find the two files

lc.profiles.core.service.impl.jar
lc.profiles.core.service.api.jar

in the Connections install and copy them to your TDI\lib directory in your tdisol environment.  In my case I had created a folder called TDISOL55 and under that I had a TDI directory with all the properties, script etc files in and the lib subdirectory full of jar files.  That came from the D1 (day 1) release download of Wizards which contained updated TDIPopulation directory and was dated 18th Dec.  There was no new tdisol with CR1 but clearly there should have been.

I found the files in my Websphere Application profile directory for the profiles application server under the directory

D:\IBM\WebSphere\AppServer\profiles\AppSrv01\installedApps\conn55Cell01\Profiles.ear

I copied those two files over and it al-most worked.  I had one more problem.  The value source_ldap_sort_attribute in profiles_tdi.properties which was initially set to empty (not null but = ) had been changed at the request of L2 to source_ldap_sort_attribute=mail which matched the 5.0 properties we were using.  They asked me to change it for exact comparison and that broke the updates.  Once I took out the “mail” mapping the scripts, both populate_from_dn_file and sync_all_dns ran perfectly.

The new environment does use different LDAP servers (but the same source data) and I don’t know if attempting to tell the server to sort the LDAP results failing is a problem with that server configuration (both environments are Domino 9.0.1) or 5.5 itself. I’ll investigate that and update.

So my two fixes were

  • copy the two jar files from your CR1 installedapps directory to your TDISol directory (lib subdirectory).
  • make sure source_ldap_sort_attribute= in profiles_tdi.properties

Sametime Audio and Video For External Users – NWTL

Today I did the second in my series of Sametime presentations for IBM’s New Way To Learn (NWTL) initiative.    The session was recorded with audio and is available by joining the Community here http://bit.ly/1t7e0LE . The session slides by themselves are on slideshare and shown below.

If your Sametime environment is going to include Audio and Video you will probably want to be able to talk to people outside your own company, or at least to your own users on their mobile devices who aren’t connected via VPN. In this recorded online session as part of IBM’s New Way To Work initiative we reviewed the infrastructure behind the Audio and Video elements of Sametime and how best to extend those features beyond your firewall.

 

More IBM Docs Fun And Games

…a few more notes from my latest IBM Docs install.  Previous installs including in test at this customer proceeded with no problems but this one presented several challenges so I’m sharing them here in case anyone else has the same.  Firstly since there’s a Windows machine involved let’s rule out the biggest possible issues

1. Make sure Windows is activated. Microsoft does restrict behaviour and performance in non activated Windows. No I don’t have proof I just have solid evidence of that behaviour.  Activating Windows often makes the pain go away

2. Make sure you disable the Windows local firewall.  Even if you can only do so during the install.  The server is going to have to talk to – and be talked to – the deployment manager at least and with Windows firewall enabled your install will fail

3. Make sure every server can ping every other server, even itself. And using an IPV4 not IPV6 routable address

4. Disable UAC.  PLEASE.  In Windows 2012 that’s a registry hack where you set EnableLUA to 0 under “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\policies\system”

So now we’re ready to install.  There are two options – Installation Manager and using the manual scripts.  Obviously Installation Manager is easier, if you’re installing all components at the same time and if it works.  Here are the standard components I’d usually install for full IBM Docs in a Connections environment with no CCM.

Installing IBM Docs

My problem was that in this instance the installer failed during the Docs Proxy server install.  I could see in the logs (found under the IBM Docs Conversion install directory – in my case D:\IBM\ConnectionsDocs\Conversion\logs) that Conversion, Docs and Viewer all installed and deployed with no problems.  However since I chose six components, when it failed on one it rolled back the entire thing.

The error was “Target with name docsserver.domain.com was not found“.  Why would it say that when the script is running on doscserver.domain.com and it can certainly find itself?  The answer is in how the installer works.  It has local python scripts that are actually called by the Job Manager in your Deployment Manager so the error (which exists only on the docs server) is basically saying “the Deployment Manager cannot run the python script on this server”. That’s curious.  Then I realise that to run a remote script the Deployment Manager must contain a job target.  A configuration setting that tells it how to reach a remote server and gives it credentials to run the code.  I checked and although the installer had created a job target , when I  tested there were no stored credentials.  My guess is this was from an earlier attempt when UAC wasn’t fully disabled and the job target was created incompletely.  I re-created it to make sure it worked ok (it tests on save).

JobTargets

So back to square 1 (or snapshot 1).  I removed the half created clusters for Docs, Conversion and Viewer, I removed the Docs Proxy cluster, but I left the job target in place and relaunched the install.  This time my plan was to install in stages taking snapshots between each one.  This was a VERY bad idea.  Docs and Conversion installed and tested perfectly.  However when I went to Installation Manager and chose “Modify” to add the Viewer component it failed.  It took 8hrs to fail, during which time I monitored the logs carefully and this is what it did.

  • To modify an existing IBM Docs install and add a new component the install first UNINSTALLS all existing components even the working ones you may have installed months before
  • It then reinstalls the components it just uninstalled and attempts to install the new component as well
  • When that failed , it uninstalled all the components again and then reinstalled the original two. Leaving me back where I started 8 hrs later

It wasn’t so much the time lost as my fear that during the whole uninstalling / reinstalling of perfectly good servers it would somehow fail and break something that worked.  So.  New plan.

I now had a working IBM Docs and Conversion server to which I needed to add Viewer and Docs Proxy.  I’m staying away from Installation Manager at this point.  I want more control and I don’t want to waste another 8hrs before I can troubleshoot.  Luckily we do have the option to manually install components instead of using installation manager. To do that I extracted the installers and modified the cfg.properties files as per the documentation.  That worked fine after an initial failure.  The instructions don’t say to pre-create the Clusters and server members before running the scripts but you must do that and use the Cluster server names as in the documentation.  If you don’t, the scripts will fail when they try and connect to the deployment manager to find the servers to install onto.  If you’re using Installation Manager you don’t need to do this as the installer does it for you.

Finally there are test URLs as you install each component of <hostname>/componentname/version.txt eg http://connect.turtlepartnership.com/docs/version.txt.  To ensure this works you must regenerate and propagate the plug-cfg.xml and restart your IHS server.  Also bear in mind the syntax must be lower case /docs/version.txt /viewer/version.txt and /conversion/viewer.txt.

So there you go.  This was probably the 5th 1.0.7 install I’ve done and the first one to hit a problem. Try it first with Installation Manager. Make sure you backup (or better yet snapshot) both Deployment Manager and your  IBM Docs server before starting and if it starts failing switch to running the manual scripts.

Have fun!

Getting Around Documentation Errors With Connections Scripts

I’ve been meaning to write this blog for a while.  And by “a while” I mean since v4 of Connections.  IBM supply a series of scripts with the Connections install that are found in the install directory under the folder connnections.sql.  These scripts are used for a variety of things but most people will have to use them if migrating from an earlier version of Connections to a new one.  The scripts are under the database type folder for each application so the scripts for the Blogs database on DB is in

/connections.sql/blogs/db2

Now you can put those scripts where you want obviously, but that’s where you will find them.  In that folder there are lots of files that are basically a series of SQL commands written out for you.  Each command line terminates with a ; or a @ to identify that’s the end of the command.  When running these commands with db2 you use a different syntax depending on whether the SQL file ends each line in a ; or @.  For example

;  means our command line is written as “db2 – tvf {filename} >{writetoalogfile} “

@ means our command line is written as “db2 -td@ -vf {filename} >{writetologfile}”

Writing to a log file isn’t compulsory but I always do so I can check if the script ran OK.

The problem is that on the  IBM Documentation site they often give the wrong syntax for each database (oh and they aren’t consistent) so on this page the instructions for the profiles database are

“db2 -tvf predbxferxx.sql”

If you run that (and the clue is it takes less than a second which is suspicious) you will see no errors but if you check your log you will see a single line saying

“End of file reached while reading the command”

That basically means we used the wrong line terminator, we told it -tvf so it looked for a ; at the end of each line but if we open the predbxfer45.sql we can see each line ends in @.  If we change the command to

“db2 -td@ -vf predbxfer45.sql”

it runs perfectly.

It would be nice if the IBM documentation was correct but it’s a simple problem to catch and fix.

One Dumb And Two Smart Things – Calling That A Win

Last night / yesterday afternoon I was building a Connections server (for an internal project) when I wiped out hours of work doing something dumb.  I had spent some time downloading all the software and fixes to the server which was Windows 2008 R2 (because I have plenty of licensing for that)  and then I installed DB2 and WAS and created the WAS profile.  Next step was to run dbwizard.bat to create the databases but that’s where weird stuff started happening.  The dumb bit had already occurred I just hadn’t noticed it yet…..

The DBWizard would launch and let me move past the first screen but no amount of clicking on “Next” would let me move off the “Create, Edit, Update” screen.  Clicking ‘Back” actually took me to the next screen (!) but I couldn’t get any further than that.  I refused to believe it could be a DB2 problem because at the point in the Wizard it had no idea I was running DB2 as I hadn’t chosen my database platform because I couldn’t get to that screen.  I started from the assumption that since DBWizard is a java program my version of Java (brand shiny new updated yesterday) was incompatible.  So cue much time spent uninstalling and installing different java versions to try and fix it with no luck.  I could have run DBWizard from another machine but I wanted to fix whatever the underlying problem was.  Then I realised the dumb bit, I had installed 32bit DB2 on a 64bit platform which DB2 is fine with but the DBWizard really isn’t.  I don’t know if that was my problem (I still can’t believe on the early DBWizard screen it even knows to check) but in my attempts to fix uninstall and cleanup DB2 , I corrupted the Windows registry.  At least that’s what I think I did because on restart Windows would only boot to a grey branded screen with no login, even if I chose one of  the Safe modes or tried booting from a CD.

Since this work was about installing Connections and not fixing Windows I decided not to waste more time on it and startover.  Here come the two smart things.

1. I have a pre built Windows 2008 R2 VM disk with a 40GB C drive I use to clone and make new VMs.

2. I had downloaded and installed everything to a separate 100GB virtual disk

I detached the virtual disk from the broken VM

deleted that VM from the host entirely

made a copy of my simple VM disk

created a new virtual machine using that copy as its disk

added the 100GB virtual disk to that new VM

opened it up and changed its ip to match that of the VM I just deleted

and I was back in business.  Total time elapsed about 7 minutes

Of course I now had a D drive with software on it the Windows registry new nothing about but it was simple to just delete those installer folders and reinstall (the right) DB2, WAS etc and get back on track.  Certainly much simpler than trying to fix a broken Windows server!