Exactly why do we need mark up for data distribution?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Zecherieh
    Senior Member
    • Oct 2000
    • 677
    • 3.0.0 'Gold'

    Exactly why do we need mark up for data distribution?

    This is actually a serious question - it has bothered me for years.

    What is the purpose of XML, RSS, ATOM - and whatever other initials someone wants to throw at it.

    I do not get it, never have - do not think I ever will.

    Any data distributed as mark up such as those mentioned above is not intended to be read by a human - which is the only reason I can think of that we would use such bloated methods.

    Just grabbed this from google's RSS -

    Code:
      <item>
      <title></title> 
      <link></link> 
      <guid isPermaLink="false"></guid> 
      <pubDate></pubDate> 
      <description></description> 
      </item>
    RSS is not all that bad in comparrision to a lot of data delivered by XML. Irregardless - if I pull in 100 items - why do I need that above 100 times?

    And as I said - thats not bad - I have seen so many XML data feeds of one type or another that the mark up is 75-90% of the feed - I deal in sports - and you get

    Code:
     
    <player firstname="bob" lastname="jones" hrs="4" rbi="44" runs="33" hit="103" teamcode="1" teamnickname="cubs" teamcity="chicago" leaguecode="1" divisioncode="1" longformdivisionname="National League Central" shortformdivisionname="NLC" longform.......etc etc etc - repeated for up to a thousand players. (and this line would be about 5x this size, actually more like 10x)...throws="r" bats="s" birthday="4" birthmonth="January" birthyear="1965" birthcountry="Swaziland" drafted="none" ......... etc again >



    How is that any better than a delimited format - which takes nothing to put together - you use the first line to name you fields - and then start rocking on the data - and in cases like something like RSS - your first line is the intro field names - second line is intro data - third line is the data field names - fourth.... to whatever is the data

    Any fancy, cool, whatever data processing that we get automatically from XML with programming tools could just as easily - actually - more easily - be written to work with a common delimited format. In essense - that is all any markup langauge is anyway - a delimited format - just with long, descriptive delimiters - that are not as easily to work with cause they are never the same from one XML doc to the next.

    To make it even more ridiculous to me - if we are going to pass data around, and be, well I have to say it - dumb enough to spell out what each and every little piece of data is each and every time it appears - then why do we not just send it out like

    Code:
     
    firstname[1]="bob"
    lastname[1]="jones"
    hrs[1]="4"
    ...
    variablename[1]="variablevalue"
    Which could then be read straight into any programming language and be read to use

    (yes the are not php variables - but you wouldnt really want to have the need to remove anything and - as it is with the above you would have to modify [1] with regex into (1) for some languages - and for php it would be as billionth of a second job to front end it with

    PHP Code:
    <?
    $data
    =file("feed.txt");
    for (
    $i=0;$i<count($data);$i++)
    $data[$i]="\$".trim($data[$i]).";";
    $ndata=implode($data,"\n");
    striptagseregi_replace("([^a-z0-9...etc],htmlentities - for any security need
    eval(
    $ndata);
    ?>
    Now I am not saying that should be the way its done - I do not think is - but I think it makes a heck of a lot more sense than XML

    I just do not see any advantage what so ever over a simple CSV that has been around for longer than I have been alive (though I prefer "|", over a comma) - I see no advantage to XML over the simpliest - and lightest weight format - that has existed for years - there is no evolving standard for - because - how standardized can it get - its delimited

    I do understand the need for standarized data field names, and data types expected in the data - but there is no reason that can not be done in delimited format - in fact - it is done.

    So, someone please tell me exactly why we need XML - what advantage does it have - that I have not already mentioned - and remember - the fact that the tools work well with it is not an issue - they can work well with any data format if written to do so.
    Last edited by Zecherieh; Thu 9 Feb '06, 4:14am.
  • DirectPixel
    Senior Member
    • Jan 2002
    • 4703
    • 3.5.x

    #2
    XML provides a way to standardize the format of content and provide an easy way to maintain the semantics of the document.

    In addition, because of the way XML is written, any parsing functions to access the data can be done using pre-written functions to traverse the nodes, removing the requirement for a programmer to write a brand new data parser every single time he wants to read a new data file.

    Also, would you rather have a huge mess of seemingly-unrelated arrays with no visual formatting, or a cleanly-marked-up XML document with an accompanying schema that makes it not only useful as a data container, but as a presentational medium in and of itself?
    :)

    Comment

    • Zecherieh
      Senior Member
      • Oct 2000
      • 677
      • 3.0.0 'Gold'

      #3
      Originally posted by DirectPixel
      XML provides a way to standardize the format of content and provide an easy way to maintain the semantics of the document.

      You can do that with a CSV also, as I said - and it is actually done -



      Originally posted by DirectPixel
      In addition, because of the way XML is written, any parsing functions to access the data can be done using pre-written functions to traverse the nodes, removing the requirement for a programmer to write a brand new data parser every single time he wants to read a new data file.

      As I said - thats the way the tools are written - you can write them to read a CSV just as easily - and again - actually that is done - Microsoft has a pretty nifty little Com Component - and has for years - that does everything more or less that you can traditionally do with XML.

      Nodes - just a term for us when using a higher language than the language is written in - its not like they provide some magical handle in the lower level languages. In fact - a CSV provides a much easier to work with "node" than mark up does.




      Originally posted by DirectPixel
      Also, would you rather have a huge mess of seemingly-unrelated arrays with no visual formatting, or a cleanly-marked-up XML document with an accompanying schema that makes it not only useful as a data container, but as a presentational medium in and of itself?
      As I said - I just threw the Array format out there as making more sense to me, though it is not what I consider the best route.

      And no - I dont care what the data looks like - I dont find that any of my users care to read the raw XML as it it is - or any other delivery style - I care about getting it to the customers as fast and easily as possible - and having a markup that can take up 75-90 percent of the delivery size is not sensible to me (and I never use it - even when I get delivery in XML - i rip it out of the XML to something I consider more sensible)

      Comment

      • Zecherieh
        Senior Member
        • Oct 2000
        • 677
        • 3.0.0 'Gold'

        #4
        And in regards to an "accompanying schema" - I would love to have that in a nice seperate document - that means - now my csv has 0 bloat - nothing but the data - beautiful.

        Comment

        • Zecherieh
          Senior Member
          • Oct 2000
          • 677
          • 3.0.0 'Gold'

          #5
          And as I said - if I say , "and as I said" one more time - shoot me.

          Comment

          • Guest

            #6
            They are just standards, it's up to you to decide if you want to follow them or not.

            These days there are so many xml/rss programs out there for the client machines that there is really no reason not to use it.

            Comment

            • DirectPixel
              Senior Member
              • Jan 2002
              • 4703
              • 3.5.x

              #7
              Originally posted by Zecherieh
              You can do that with a CSV also, as I said - and it is actually done -






              As I said - thats the way the tools are written - you can write them to read a CSV just as easily - and again - actually that is done - Microsoft has a pretty nifty little Com Component - and has for years - that does everything more or less that you can traditionally do with XML.

              Nodes - just a term for us when using a higher language than the language is written in - its not like they provide some magical handle in the lower level languages. In fact - a CSV provides a much easier to work with "node" than mark up does.






              As I said - I just threw the Array format out there as making more sense to me, though it is not what I consider the best route.

              And no - I dont care what the data looks like - I dont find that any of my users care to read the raw XML as it it is - or any other delivery style - I care about getting it to the customers as fast and easily as possible - and having a markup that can take up 75-90 percent of the delivery size is not sensible to me (and I never use it - even when I get delivery in XML - i rip it out of the XML to something I consider more sensible)
              Well, it looks like you've made up your mind already.

              XML is popular because it is so widely-supported. And readable. You can pull up just about any XML file, read it in a plaintext editor, and understand how the data fits with each other.

              Each field's tag is descriptive of the data, and thus provides a semantic way of storing complex data.

              Of course, XML isn't the only solution. There's a reason why CSVs and databases are still in use today.
              :)

              Comment

              • Zecherieh
                Senior Member
                • Oct 2000
                • 677
                • 3.0.0 'Gold'

                #8
                Originally posted by Brad.loo
                They are just standards, it's up to you to decide if you want to follow them or not.

                These days there are so many xml/rss programs out there for the client machines that there is really no reason not to use it.
                I understand that is the standard these days - I just am trying to figure out why.

                Is it to prove how smart we are - that we can do things in this really weird way that makes no sense - but looks cool?

                There is 0 difference between XML and a CSV - other than the size of the delimiter, and how descriptive the delimiter is - I hardly see how a computer needs that nice big descriptive delimiter.

                And its not a matter of choice of if you want to fallow the standard or not - you have no choice. Then thing I get a kick out of is hearing/reading about some people complaining about the amount of bandwidth their RSS feeds use up - then I look at their feed - and fifty percent of it is mark up.

                While everything else gets smaller, more efficient - data delivery keeps getting less efficient, more bloated - it is going in the exact opposite direction of everything else computer oriented.

                Comment

                • Zecherieh
                  Senior Member
                  • Oct 2000
                  • 677
                  • 3.0.0 'Gold'

                  #9
                  Originally posted by DirectPixel
                  Well, it looks like you've made up your mind already.
                  .
                  As I ... um - read my first line of the first post

                  Originally posted by DirectPixel

                  XML is popular because it is so widely-supported. And readable. You can pull up just about any XML file, read it in a plaintext editor, and understand how the data fits with each other.

                  Each field's tag is descriptive of the data, and thus provides a semantic way of storing complex data.
                  .
                  I understnad that is human readable - expressed that once - why exactly do we need it human readable?

                  I do understand the need to know what is in the data, and a seperate scheme document is an idea that is not used often in delimited distribution - but one that I think that should be.

                  As far as the readability goes - I could program into a browser, just as Microsoft did with XML - a CSS style sheet that could make a CSV all pretty and human readable also - in my sleep - would actually be very simple to give it the little open close nodes, nice spaced out readability, etc - if that is the need. As far as most XML being readable in a text editor as is - to get XML that is readable in a text editor - by default - you just have turned your data distribution from, if its RSS style - from maybe 20 percent bloat - to maybe 80 percent bloat - those new lines, white space - formating - has to be accounted for in file size.

                  Comment

                  • Zecherieh
                    Senior Member
                    • Oct 2000
                    • 677
                    • 3.0.0 'Gold'

                    #10
                    ... and yes I am on a crusade here - A very lonely one - but oh well

                    Comment

                    • Wayne Luke
                      vBulletin Technical Support Lead
                      • Aug 2000
                      • 73412
                      • 6.0.X

                      #11
                      Originally posted by Zecherieh
                      You can do that with a CSV also, as I said - and it is actually done
                      Actually you can't. With XML you can apply formatting to the document in your application. You cannot do that in a comma separated value file. CSV is nice for unformatted tabular data but not for actual documents that include paragraphs of text, charts, figures, indexes, and functions to manipulate tabular data. When you save as CSV all that is lost and will make most documents worthless.
                      Translations provided by Google.

                      Wayne Luke
                      The Rabid Badger - a vBulletin Cloud demonstration site.
                      vBulletin 5 API

                      Comment

                      • Zecherieh
                        Senior Member
                        • Oct 2000
                        • 677
                        • 3.0.0 'Gold'

                        #12
                        Originally posted by Wayne Luke
                        Actually you can't. With XML you can apply formatting to the document in your application. You cannot do that in a comma separated value file. CSV is nice for unformatted tabular data but not for actual documents that include paragraphs of text, charts, figures, indexes, and functions to manipulate tabular data. When you save as CSV all that is lost and will make most documents worthless.
                        What is so magical about one form of text, over another form of text?

                        To put it another way - a serialized array is nothing more than a delimited storage vessel - and I can save all that kind of stuff in a serialzed array with no problem - if I need to have paragraphs and such - must software knows how to read \n and \t

                        Comment

                        • Zecherieh
                          Senior Member
                          • Oct 2000
                          • 677
                          • 3.0.0 'Gold'

                          #13
                          .. And I should point out - I am not saying that there is not a use for XML - I just think its pointless to get a data feed that is almost entirely the datafeed telling me what it is

                          I have received hundreds of megs of data - that I stripped down to a few megs of actual data - many times.

                          Comment

                          • Zecherieh
                            Senior Member
                            • Oct 2000
                            • 677
                            • 3.0.0 'Gold'

                            #14
                            (not to mention - most rss feed data should be replaced by with an instant messenger style deliver system to the end user)

                            Comment

                            • Wayne Luke
                              vBulletin Technical Support Lead
                              • Aug 2000
                              • 73412
                              • 6.0.X

                              #15
                              A serialized array is not CSV which is what I was addressing. You cannot store images in CSV. You cannot store color formatting in CSV. You can store a comma-delimited series of like values.

                              What happens in your serialized array if my document includes multimedia? Spreadsheets and Text documentation? Backgrounds and so forth? Other documents?

                              I see where you are going but developers have been trying to standardize document formats across operating systems since the late 1960s. In the 1990s they came up with XML which actually is the first cross-platform document storage system for complex documents. Even then it doesn't encapsulate the entire document into a single file which would be the ideal method.

                              By the Way, Microsoft documents are actually stored in a compressed serialized array in their native formats.
                              Translations provided by Google.

                              Wayne Luke
                              The Rabid Badger - a vBulletin Cloud demonstration site.
                              vBulletin 5 API

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...
                              😀
                              😂
                              🥰
                              😘
                              🤢
                              😎
                              😞
                              😡
                              👍
                              👎