If information is power then
poor quality information will render you powerless. You cannot analyze what you
cannot read, sort or organize. The costs can be measured in lost sales, lost
opportunities, employee and consumer frustration and organization wide operational
errors.
Whenever I speak about data,
information and its requirements to colleagues, friends and family, I get
amused by the glassy-eyed stares I receive from all of them. The “what do
you do for a living” question always makes me uncomfortable – I really do not know how to
respond without boring the person to death.
Data cleansing, data cleaning or data scrubbing is the process of detecting and
correcting (or removing) corrupt or inaccurate records from a record set, table, or
database. Used mainly in databases, the term refers to identifying incomplete,
incorrect, inaccurate, or irrelevant, parts of the data and then replacing,
modifying, or deleting the inaccurate or coarse data. [Wikipedia]
The term (Data cleansing) is also commonly used for adding,
changing or discontinuing data in existing databases.
A number of e-procurement projects experienced
catastrophic failure simply because nobody thought to invite a data specialist
to the table during the project development phase. The cost to fix what was
missed was staggering – both projects were terminated with losses in the multi-millions. Each failure was directly related to missing
data fields that everybody ignored or missed in the development phase.
As I journey down the road of data and
information cleansing, which
includes translation, categorization,
classification, mapping and compliant databases, I have learned a great deal. Here
are some observations:
1. No two people speak
or write language in the same way. If there are no business rules attached
to description fields then every single line will differ depending on who
populated the information. If our written language skills are not the best then
our abbreviation skills are far worse. There is a difference between being able
to read a product description versus knowing what the product actually is.
2. No two people think
the same way. Product classifications are one of the most important set of data
fields any company can populate. Rarely audited or even discussed, these fields
can cause havoc if not done properly.
3. Manufacturers never
planned for their internal databases to be seen or used by their customers or
the consumer. Information field demands from clients have changed dramatically in
the last two decades. Manufacturers are expected to provide their customers
with all fields of information related to their products in usable formats. This
requires
a specialist who can manage the expectations of the
customer as well as adding and populating new fields. Providing compliant
databases for the purposes of communication to external service providers
is the new mandate. Many manufacturers
try to manage these demands on a “request to request”
basis
instead of developing a long term project plan regarding their product content
and related fields of data.
4. Your computer and
its capabilities are not the same as my computer and its capabilities. Manufacturers and
their clients use differing computer systems and software; some very
sophisticated versus those that are rudimentary. There are many rules regarding
what you can and cannot do. Formatting to the client requirements is largely
ignored. The client may not be able to receive your data because their software
differs from yours. Some systems will not allow symbols to be used (*, -, _, /,
#, &, etc.) Therefore these limitations should be kept in mind when
assigning manufacturer product code numbers.
5. Did it really take
me that long to enter the item number? I have seen
product numbers that subject to numerous entry errors because of their length
and complexity. I understand that some internal logic is applied to
manufacturer product numbers for the purposes of production but many resellers,
wholesalers and retailers use the manufacturer product code as their own.
Lengthy alpha / numeric numbers slow everybody down and are particularly
frustrating for the actual consumer who needs to reorder the product.
6. Product code numbers
can be very difficult to read? Many errors are created in databases because of
the following; numeric 0 versus alpha O, numeric 5 versus alpha S, numeric 8
versus alpha B. Worse is starting a product code with a zero. Unless formatted
properly, most databases will simply drop the zero corrupting the code.
7. Brand and sub brand
naming conventions. Remember when assigning a brand and sub brand
name that your clients have a limited number of characters in their description
fields to identify your products. Going back to “lengthy”, Unilever’s “I Can’t Believe It’s Not Butter”
takes
up a lot of the description real estate without considering size requirements
and other information that is needed to identify the product. When these sub
brand names are too long, the client will be forced to drop them from their
descriptions.
8. Incorrect data only
got worse with the internet. Researching product
on the internet has shown what happens when the manufacturer’s data is in error
and these errors are shared with the world. Once the error is out there,
correcting the content and misinformation is impossible. Multiple content versions
for a single product confuses the consumer and can result in lost sales.
9. We don’t read or write in
CAPS LOCK. The old mainframes had to use CAPS LOCK. That changed decades ago.
If someone sends an email all in CAPS Lock, then you know they are furious with
you. Reading any document in CAPS LOCK is difficult and also takes up needed
real estate. So why are so very many of us still using it on our mainframe systems?
10. The kiddies have
invented a new language and I am beginning to see it all over the internet. Text messaging
and its online jargon (text message shorthand) are beginning to worm their way
into product descriptions and sub brand naming conventions. It is bad enough
out there without having to keep e-documents helping us all translate what is
being said.
There you have some of it – but not nearly all of it.
We run our businesses on data but I have never
heard of a senior role in any organization dedicated to data and information content. (e.g. Vice President of
Data) The management and audit of current data and the implementation of future
data field requirements begs the question
“who is responsible”? This position would work closely with information technology professionals,
marketing, sales and operations. This
position should stand apart as an integral role for enhancement of
information and content, improving operational capabilities and providing first
class data to the customer and the consumer. Savings on good data can be
measured in the millions (CAD) by improving sales, eliminating errors and
providing the client and consumer with the data tools they need to reorder your
products.
Who is the single individual
responsible for data integrity and continuity in your organization?
Gayle Anne McCaskill,
Canadian Office Products Association (COPA),
Data Factory Specialist for data cleansing, categorization, and
classifications.
No comments:
Post a Comment