How to display ETL data flow?

einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
edited 2012-01-03 in General Chat
Hi guys,

I need to create some kind of diagram that displays in an accurate but easy way where every single field in our DWH is extracted from. There are some transformations in between that should be also displayed. Data Source is not just an NAV database but also some other databases (MS SQL, Oracle), some Excel Sheets and some hard coded information.

So, in general I've got two questions regarding this:
1. What kind of diagram would you use for that purpose?
E.g. Data Flow Diagram in my opinion is a little bit limitated because it displays only a general data flow.
2. What kind of tool would you use to create such kind of diagram?


regards
"Money is likewise the greatest chance and the greatest scourge of mankind."

Comments

  • David_SingletonDavid_Singleton Member Posts: 5,479
    Just manually create a 3 or 4 page Visio document. Those first pages should look pretty important and throw lots of key words in there. Then attach another 200-300 pages of anything you can find.

    then take 3 months fully paid vacation.

    Then give this to who ever wanted it, telling them that it is the result of 3 months of 13 hour days of work.

    They will only ever look at the first page or two, and no one will every actually use the document, so you will be fine.
    David Singleton
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    Yeah, that will probably work! :mrgreen:

    To be serious, our standard dashboard isn't that much complicated. We are talking about approximately 50 figures.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    Come on, I can't believe no one in here ever did such kind of diagram.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • DenSterDenSter Member Posts: 8,304
    Visio comes to mind. And what else than a data flow diagram did you have in mind? You can make it as detailed as you want. All it takes is analysis time. I agree with David though, it'll probably never be used beyond the first time when you deliver said diagram.
  • jglathejglathe Member Posts: 639
    Hi,

    avoiding these tasks is an art of its own :mrgreen: A useful diagram would allow a drill-down on every stage, which is just not doable - there are always limitations. And there is the problem that this would be a (comprehensible) techie's view of the problem. It would never be used by someone who is making decisions. I know it's sort of the holy grail for salespeople - showing a management cockpit on BO and drilling down somewhere, and all numbers change accordingly in no time. This is the same sort of thing - only works on sales presentations, no practical use.

    with best regards

    Jens
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    It won't be used by management. It's part of our technical documentation.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    DenSter wrote:
    And what else than a data flow diagram did you have in mind?
    I was thinking about Sequence Diagram and Flowchart diagram, but I'm not sure about what's the best.
    Data Flow Diagrams are good to display a general data flow. But e.g. there's no option for decision or loops. I'm not sure if this is sufficient to display the whole ETL process.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • David_SingletonDavid_Singleton Member Posts: 5,479
    It won't be used by management. It's part of our technical documentation.


    In that case ignore my earlier advise where I said to create 3 or 4 pages at the beginning. Instead just create a binder with a pretty cover and filled with blank pages. I think we all know that techie people are never going top even open any documentation. :mrgreen:
    David Singleton
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    I always ignore your advises. :mrgreen:

    You can't do any data analysis based on blank pages. And if there's no technical documentation you'll always have to analyse the source code of the different steps of the ETL process. That's maybe to difficult for someone who don't know about the different tools that are used. And that would take much more time than simply take a look into the documentation.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • jglathejglathe Member Posts: 639
    Interesting discussion... If you find a good solution, let me know. I'm afraid there is no readily available tool for this. I've been using Hyperion FDQM for a while, and had a go at Oracle EPMA. Those tools are so... un-usable (even worse than RDLC :wink: ) that the consultants selling this stuff had excel templates to handle the load of the definitions... EPMA is supposed to be a modeling and data flow tool, somehow. Not working.
    I would try to go and build a SVN repository of the ETL scripts used. This way you know when someone has changed things. But it's not good (schematic) documentation.

    with best regards

    Jens
  • Alex_ChowAlex_Chow Member Posts: 5,063
    Come on, I can't believe no one in here ever did such kind of diagram.

    It's been done typically using Visio or Word or Excel. Nobody talks about it because the time and effort vs. the value you gained from it is marginal. In another words, a waste of time.

    The better way to go about documenting is document it as you do development. This way, putting all those documents in a big binder won't seem like such a large task.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    My favorite two pieces of computer documentation.

    Anyone that Learned FORTRAN IV will remember this
    The primary purpose of the DATA statement is to give names to constants; instead of referring to PI as 3.141592653589797, at every appearance, the variable PI can be given that value with a DATA statement, and used instead of the longer form of the constant. This also simplifies modifying the program, should the value of PI change.

    And the IBM classic















    This page intentionally left blank.

    I agree with Alex's comments.

    There is no point in creating documentation that costs more than it will ever benefit.
    David Singleton
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    I don't think it's a waste of time. What about if the developer of the ETL process leaves the company and nothing is documented? You can't do something like revers engineering everytime you want to change or approve a certain figure. Without such a documentation no one except a developer could ever check anything in your ETL process.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
  • einsTeIn.NETeinsTeIn.NET Member Posts: 1,050
    :-$ David... here's a secret... PI will never change. :wink:

    You are right, documentation is always an unloved job. Even in NAV there are some Fields that aren't explained enough in the help, e.g. Date Filter in Currency:
    This field is used internally by the program.
    "Money is likewise the greatest chance and the greatest scourge of mankind."
Sign In or Register to comment.