Finding duplicate keys in a data file with PowerShell

In my day job I’ve sometimes found myself looking for anomalies in data exported from our Siemens HiPath PBX. Siemens produce a tool that we run on a schedule to export some of the configuration settings for telephone extensions so that we can use it in various other systems. The file is called port.txt and it occasionally contains duplicate values in its unique_key field. (Don’t you just love irony?!)


The HiPath doesn’t care (apparently) but when we import the data into SQL Server it goes into a table where unique_key is the primary key, so the DTS package doing the import fails. We can find the duplicate unique_key very quickly with PowerShell and report back to the telecoms people to fix their config.


The data arrives in a CSV file, which are easy to work with in PowerShell. (I don’t know about anyone else, but I work with CSV files a lot.) Finding our duplicate is about as simple as:


Import-CSV port.txt | Group-Object unique_key | `
  Where-Object {$_.count -gt 1} | Select-Object -ExpandProperty Group


Sadly this file is not immediately usable. The header row is a,b,c,d,
i.e. The last item doesn’t have a name. Shame on you Siemens!


So we have to cope with that. There are a number of ways we could do this but we’re using one of the CSV data handling cmdlets that came along in PowerShell 2, ConvertFrom-CSV, which basically takes some comma-separated values and converts them into objects in the same way as Import-CSV does, but without the file.


Obviously we need to get it into a proper CSV format with no empty headers first. In fact, in this particular example there’s a comma on the end of every row with nothing after it, so we’ll simply replace those trailing commas with a comma followed by something using a regular expression match:


Get-Content port.txt | ForEach-Object {$_ -replace “,$”, “,foo”}


That gives us something like a,b,c,d,foo which we can convert to objects and deal with as before, so the whole job can be achieved with:


GC port.txt | Foreach {$_ -replace “,$”, “,foo”} | `
  ConvertFrom-CSV | Group unique_key | `
  Where {$_.count -gt 1} | Select -ExpandProperty Group


That should be clear, perhaps with the exception of the last cmdlet. When you group objects by some value, you get a GroupInfo object. That contains the value that you grouped on, the count of objects in the group, and a collection of the objects in the grouping called “group”. I’m using the ExpandProperty parameter on the Select-Object cmdlet to expand those objects back out so I can see the data that’s causing the problem. In this case it looks something like this:


unique_key       : 87734
switch_name      : KN0A
pen              : 1-14-25-25
extension        : 496625
tcos             : 6
least_cost_rout1 : 32
foo              : foo

unique_key       : 87734
switch_name      : KN0A
pen              : 1-15-49-73
extension        : 497573
tcos             : 6
least_cost_rout1 : 32
foo              : foo

NEBytes November 16th: Virtualising SharePoint and Mobile BI

Due to an abundance of quality speakers, we are running two NEBytes events in Newcastle this month. The first features a pair of UK MVPs: John Timney, talking about Virtualising SharePoint, and Jen Stirrup, with a session on Mobile Business Intelligence Using Microsoft and Tableau.


The details in their own words are as follows…


John:


Sharepoint 2010 Virtualisation remains a hot topic and for many firms it is still as yet unexplored territory and the concepts surrounding virtualisation are pushing the knowledge boundaries for Sharepoint Infrastructure Architects, Administrators and Solution Architects into areas they thought they wouldn’t need to understand, away from the physical and into the conceptual. This presentation offers an introductory view of some of the important aspects to consider when thinking about utilizing virtualization in the Sharepoint 2010 world and considers reasons for virtualizing Sharepoint 2010, supported physical and virtual platform architectures, guidelines for choosing which aspects of farm to virtualize, support and the issues it brings, considers layered designs as an easy means of visualising virtual architecture and introduces conceptual Geo-DR.


Jen:


Mobile Business Intelligence allows end users to access data to support their decisions, irrespective of time and place. The Apple iPad is the definitive tablet of the decade, and is game-changer as a credible business tool. PowerPivot is a game-changer because it places data, and data structures, back in the hands of business users. Apple’s cutting edge usability, along with PowerPivot’s accessibility, can be combined to enhance productivity for mobile business users. Together, these technologies can support Mobile Business Intelligence as an essential tool for end users who need data ‘on the go’.

What you will learn from this session:

  • Capabilities of Microsoft PowerPivot in terms of scalability and empowering access to data
  • Using PowerPivot as source for data, viewing the visualisation via an Apple iPad
  • Specific considerations for visualising data to serve Mobile Business Intelligence, supported by Tableau

Come and see Mobile Business Intelligence in action using the Apple iPad, with Microsoft PowerPivot as a data source.


Two excellent topics for IT Pros, I’m sure you’d agree!


The event is free (as usual) and you can register at Eventbrite.

First taste of FIM

I’ve spent two days this week at Microsoft’s UK HQ at Thames Valley Park at an Identity Management event run by Oxford Computer Group – basically learning about Forefront Identity Manager (FIM) 2010 (and 2010 R2). It also gave me the opportunity to catch up with some old friends and make some new ones, which is always good.


My knowledge of FIM prior to this was based almost entirely on a collection of FIM Ramp Up videos on the TechNet site, although I have a reasonable about of experience of the challenges around Identity Management, having helped develop the in-house solution we use at Newcastle University today. Actually, it was nice to discover that the way FIM does a lot of things is very similar to the way that we designed our system all those years ago.


During the two days there were several instances where people would ask how something could be achieved with FIM. If the solution wasn’t built-in, the answer was typically “buy 3rd party companion product x, or use PowerShell”. That didn’t come as a surprise to me, but it underlined once again that if you have a bit of PowerShell knowledge, it can really save you some money.


Yes, there are some costs involved with learning PowerShell, but it’s mainly time (especially if you take advantage of all the free resources that the community has produced), and the skills that you learn will be transferable to a long and growing list of other technologies. You’re going to get the time you spend learning PowerShell back many times over (and frankly, if you don’t have those skills you’re going to be increasingly replacable in the future).


It’s not yet certain that we’re going to be implementing FIM (we’ll certainly be waiting until FIM 2010 R2 next year if we do), but the ability to extend its functionality with PowerShell, reducing the need for hardcore development or consultancy, definitely goes in its favour.


If you’re also new to FIM, the Microsoft Forefront Identity Manager 2010 Technical Overview is a really good place to start (thanks to Mark Parris for his blog post pointing me to that), as well as those TechNet Ramp Up videos.