Friday, March 20, 2015

Data & Content Information – Making it Better

By: Gayle Anne McCaskill

If information is power then poor quality information will render you powerless. You cannot analyze what you cannot read, sort or organize. The costs can be measured in lost sales, lost opportunities, employee and consumer frustration and organization wide operational errors.
 
Whenever I speak about data, information and its requirements to colleagues, friends and family, I get amused by the glassy-eyed stares I receive from all of them. The what do you do for a living question always makes me uncomfortable I really do not know how to respond without boring the person to death.

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, or irrelevant, parts of the data and then replacing, modifying, or deleting the inaccurate or coarse data. [Wikipedia]

The term (Data cleansing) is also commonly used for adding, changing or discontinuing data in existing databases.
 
Data and information is considered a tactical versus a strategic exercise by most business professionals. It is work commonly assigned to entry-level resources or over-burdened merchandisers to populate data into the mainframe. Billions are spent annually by companies upgrading their hardware and software. The same cannot be said for any data-related content; field restructuring and future requirements, product classifications, universal classification systems, data attributes, business rules, certifications, database compliancy, and internal and external processes.  The lack of focus on content is evident within many companies but accurate content is critical to every organization.  We upgrade our hardware and software usually ignoring both current and future content requirements. 

A number of e-procurement projects experienced catastrophic failure simply because nobody thought to invite a data specialist to the table during the project development phase. The cost to fix what was missed was staggering both projects were terminated with losses in the multi-millions.  Each failure was directly related to missing data fields that everybody ignored or missed in the development phase.

 
As I journey down the road of data and information cleansing, which includes translation, categorization, classification, mapping and compliant databases, I have learned a great deal. Here are some observations:

1.     No two people speak or write language in the same way. If there are no business rules attached to description fields then every single line will differ depending on who populated the information. If our written language skills are not the best then our abbreviation skills are far worse. There is a difference between being able to read a product description versus knowing what the product actually is.

2.     No two people think the same way. Product classifications are one of the most important set of data fields any company can populate. Rarely audited or even discussed, these fields can cause havoc if not done properly.

3.     Manufacturers never planned for their internal databases to be seen or used by their customers or the consumer. Information field demands from clients have changed dramatically in the last two decades. Manufacturers are expected to provide their customers with all fields of information related to their products in usable formats. This requires a specialist who can manage the expectations of the customer as well as adding and populating new fields.  Providing compliant databases for the purposes of communication to external service providers is the new mandate.  Many manufacturers try to manage these demands on a request to request basis instead of developing a long term project plan regarding their product content and related fields of data.

4.     Your computer and its capabilities are not the same as my computer and its capabilities. Manufacturers and their clients use differing computer systems and software; some very sophisticated versus those that are rudimentary. There are many rules regarding what you can and cannot do. Formatting to the client requirements is largely ignored. The client may not be able to receive your data because their software differs from yours. Some systems will not allow symbols to be used (*, -, _, /, #, &, etc.) Therefore these limitations should be kept in mind when assigning manufacturer product code numbers.

5.     Did it really take me that long to enter the item number? I have seen product numbers that subject to numerous entry errors because of their length and complexity. I understand that some internal logic is applied to manufacturer product numbers for the purposes of production but many resellers, wholesalers and retailers use the manufacturer product code as their own. Lengthy alpha / numeric numbers slow everybody down and are particularly frustrating for the actual consumer who needs to reorder the product.

6.     Product code numbers can be very difficult to read? Many errors are created in databases because of the following; numeric 0 versus alpha O, numeric 5 versus alpha S, numeric 8 versus alpha B. Worse is starting a product code with a zero. Unless formatted properly, most databases will simply drop the zero corrupting the code.

7.     Brand and sub brand naming conventions. Remember when assigning a brand and sub brand name that your clients have a limited number of characters in their description fields to identify your products. Going back to lengthy, Unilevers I Cant Believe Its Not Butter takes up a lot of the description real estate without considering size requirements and other information that is needed to identify the product. When these sub brand names are too long, the client will be forced to drop them from their descriptions.

8.     Incorrect data only got worse with the internet. Researching product on the internet has shown what happens when the manufacturers data is in error and these errors are shared with the world. Once the error is out there, correcting the content and misinformation is impossible. Multiple content versions for a single product confuses the consumer and can result in lost sales.

9.     We dont read or write in CAPS LOCK. The old mainframes had to use CAPS LOCK. That changed decades ago. If someone sends an email all in CAPS Lock, then you know they are furious with you. Reading any document in CAPS LOCK is difficult and also takes up needed real estate. So why are so very many of us still using it on our mainframe systems?

10.  The kiddies have invented a new language and I am beginning to see it all over the internet. Text messaging and its online jargon (text message shorthand) are beginning to worm their way into product descriptions and sub brand naming conventions. It is bad enough out there without having to keep e-documents helping us all translate what is being said.

There you have some of it but not nearly all of it.
 
 

We run our businesses on data but I have never heard of a senior role in any organization dedicated to data and information content. (e.g. Vice President of Data)  The management and audit of current data and the implementation of future data field requirements begs the question who is responsible?  This position would work closely with information technology professionals, marketing, sales and operations. This position should stand apart as an integral role for enhancement of information and content, improving operational capabilities and providing first class data to the customer and the consumer. Savings on good data can be measured in the millions (CAD) by improving sales, eliminating errors and providing the client and consumer with the data tools they need to reorder your products.

Who is the single individual responsible for data integrity and continuity in your organization? 

Gayle Anne McCaskill,

Canadian Office Products Association (COPA), Data Factory Specialist for data cleansing, categorization, and classifications.

No comments: