Tagged: machine translation RSS Toggle Comment Threads | Keyboard Shortcuts

  • Daniel Radev 12:26 pm on November 24, 2009 Permalink | Reply
    Tags: machine translation   

    Universal Translators Are All Around Us 

    Since machine translations is one of the topics here is an interesting article plus some video demos

    http://singularityhub.com/2009/11/23/universal-translators-are-all-around-us-video

     
    • Neven Boyanov 9:11 pm on December 14, 2009 Permalink | Reply

      Sakhr are good, we partnered with them in 2007/2008 for the English/Arabic. Although, I’ve never had the chance to evaluate their ASR technology.

  • jyonkov 7:40 pm on October 9, 2009 Permalink | Reply
    Tags: , , machine translation, , , ,   

    NLP and Ontologies 

    For a while now I’ve been thinking about using knowledge representation – Ontologies as a base for creating a modular Natural Language Processing system focused on extracting structured data from unstructured. For example we can create/use Ontologies (models) that describe “simple” concepts like: Address, Time, Task, Expense, Transaction etc… and use them to “match” information from a text stream. The reason i’m writing this is because i think that there is a common ground for collaboration… I know Stefan is interested in RDF/OWL,  the company that Neven is involved is in a very near domain and finally i was playing with Google Wave which i think is a good platform for creating intelligent bots that will be very easy to distribute if they turn out to be useful :)

    Here are some references:
    http://wordnet.princeton.edu/
    http://protege.stanford.edu/
    http://jena.sourceforge.net/
    http://www.openrdf.org/
    http://code.google.com/apis/wave/guide.html

     
    • Daniel Radev 9:20 am on October 10, 2009 Permalink | Reply

      Although I’m currently not even in near domain (kernel level C driver development is not even close by any means) it is very interesting to me and would gladly participate…

      • jyonkov 10:48 am on October 10, 2009 Permalink | Reply

        Awesome, I’ll try to start with some examples “soon” :)

    • Nikolay 8:41 pm on October 10, 2009 Permalink | Reply

      I want to add a few NLP projects that I personally find interesting: openNLP, which is in Java, and NLTK, which is in Python.

      • jyonkov 6:10 am on October 11, 2009 Permalink | Reply

        Thanks Nikolay, I browsed around the links you provided and stumbled on Apache UIMA, a graduated IBM Research project which has recently been approved as an OASIS Standard. I think that it could be used as a framework in which we can plug our NLP modules when they mature.

        • Nikolay 4:10 am on October 13, 2009 Permalink | Reply

          It seems pretty good, but why it’s still in the incubator… since 2006?

    • Nikolay 6:16 pm on October 14, 2009 Permalink | Reply

      Here’s one project that uses Apache UIMA – SEASR. It has interesting stuff such as sentiment analysis.

    • Neven Boyanov 10:11 pm on October 14, 2009 Permalink | Reply

      This is very interesting topic indeed.

      Knowledge representation is just one part of the process. It is relatively simple task to represent knowledge in a digital form no mater how complex structures or algorithms you should use or how much processor power and memory you will need. But the task does not end with the representation of this knowledge, you need to do something with it.

      One of the obstacles that existed before was the enormous quantities of information that need to gathered first and then process them, but now with Google and all other web-spiders and similar, collecting the information is feasible.

      By the way, Google machine translation is mostly based on what they find on the web and they think one is translation of the other. It’s a version of the statistical machine translation. Others use other sources for parallel corpora such as books, legal documents, etc.

      I will do a parallel here, like the hypothetical best compression algorithm is very similar to a random generator – in its behavior and the result that it produces, in a similar way the perfect machine translation engine is very similar to the best knowledge representation and processing system. It should be in fact a representation of the entire human knowledge with the ability to derive new representations of it in form of one human language or another.

      I like the idea of presenting a program by what it does. The computer language and the computer behavior pair are not very different from humans’ language and respectively their behavior. One day we will be creating programs just by example given to the some kind of knowledge processor that will convert it to a machine code based on what we expect that program to do, just by giving verbal examples.

      I forgot to mention that there was a worldwide recognized NLP conference in Bulgaria, in September, that I was invited to attend but couldn’t for reasons beyond my control, organized by the Bulgarian Academy of Science. The lecturers were mostly from EU and few from US.

      Also, don’t take seriously what I’m saying here about NLP, I’m not proficient enough in that area. :P

      • Svetoslav Vencislavov Pavlov 12:36 am on October 27, 2009 Permalink | Reply

        Hello ! I’m not sure what an ontology is.I’m not a Linguist. Let’s say “ontology” is the grammar way of building a sentence(in English). As I have a simple Language processor( Phrase generator ) , we are able to connect a pattern :) /logical or even derived relational database with different “ontologies”/patterns to Phrase generator. And we are not talking about Natural language processing, and for Virtual Natural Language Processor :)
        “ontology” may become every grammatic rule.
        As you can connect one Application to different databases, you are able to create a pure Logical database based on Artifficial Intelligence with predicats connected to the Application with recurent conections(not database) based on Neural networks Theory. And so, this Application becomes a BOT to communicate with :)

        the beginning( reference ) :
        http://www.languagetool.org/ ( be a patient and smart )
        http://extensions.services.openoffice.org/node/2297

  • Neven Boyanov 8:47 pm on October 4, 2009 Permalink | Reply
    Tags: blackberry, interlecta, machine translation, , startups   

    Something about our company Interlecta 

    Hi guys,

    As you already know from my post on Facebook our BlackBerry product was promoted on RIM’s App World as a featured application last Friday. We are now getting thousands of downloads and activations, about 400 per hour. That is good.

    20091002-2118_InterlectaAppWorld_2001_crop320x130rnsWe get quite good exposure not only trough App World but also from other mobile portals. Although, we need to develop our business and move the company to the next stage.

    It’s been couple of months already since we started looking for new funding sources. Our company Interlecta has been privately held for almost 3 years, self funded as well, but it seems it is time for a change.

    Right now we are talking to several potential investors (Corp, AI’s & VC’s) that are current or potential customers of our products, but not all opportunities look that promising or suitable for us.

    So, if you think that you know someone or have friend of a friend who may know someone … any ideas are welcome.

    And of course, the standard finders fee will be applied to everyone that refers an investor that turns to a deal.

    According to quite few specialists that I’m talking to recently next several months will be the best time to invest in start-up’s simply because there are not that many left and those that survived are expected to have a good value.

     
    • Nikolay 8:58 pm on October 4, 2009 Permalink | Reply

      Great news! Wish you luck! I’ll talk to our CEO about your company.

    • Apostol Apostolov 11:43 am on October 5, 2009 Permalink | Reply

      Neven,
      Congratulations and good luck to your company. Being an alien to the RIM ecosphere, I am very curious about how your application approaches multi-language conversation, especially from User Experience side. Please can you post some screenshot URLs and answer few questions about your app? Thanks in advance.

      • Can Interlecta interface with internal Blackberry applications and process built-in SMS, Messaging, etc, or – similar to iPhone apps – it exists in its own sandboxed environment and processes entered data through its own interface?
      • Can Interlecta support group messaging where each recipient might be assigned different native language? This allows multi-directional translating of content ensuring international teams to communicate each in its own native language.
      • Can users flag Interlecta posts as improperly translated or confusing (due to translation software limitation or bad writing), forwarding the flag back to the sender who can see both the original and the translated and flagged content. The sender can rewrite the original which is translated again and resent to all recipients, in a wiki style showing them the original message and its edited version. The threading of messages should be preserved in order. Such functionality I cna imagine cannot work with internal applications.
    • Neven Boyanov 2:44 pm on October 6, 2009 Permalink | Reply

      Here are some screenshots: http://appworld.blackberry.com/webstore/content/screenshots/2009

      Also, a flash demo here: http://media.interlecta.com/blackberry/winks/email/Email%20Translator%20-%20demo%20-%2020070725.htm

      • An application could be fully integrated into the OS, including built-in menus, etc., since the very beginning of BB OS.
      • No group messaging is supported because we don’t send the actual message, we only provide a translated version of it and the user has the option to see the result and to send the message. Although, an application could inject its own information into the address book, so something like that is possible – have an additional field there telling you what language that persons speaks.
      • Flagging could be done without the application, just by defining a protocol. But have in mind that we assume the two persons don’t understand each other at all, ex: Chinese & Bulgarian.
      • Apostol Apostolov 3:15 pm on October 6, 2009 Permalink | Reply

        BlackBerry always strikes me as the platform that could be a great OS if it wasn’t constrained by this horrible interface meant for tiny tiny TINY screens. It’s so unfriendly to the modern age user I don’t understand how business users deal with that fact. Just my pet peeve with BB.

        Thanks for all the info. I thought your app auto-sent the translated message but the manual approach works too. Having to choose menu to send as SMS, eMail, etc is a bit clunky though. I would have put context buttons under or next to the translared result to the user could tap or choose using keyboard the choice he wants rather than open menu and choose from there – that’s one click more for every translated phrase, adds up to quite many additional clicks over time. Example:

        > Do you like my documents?
        Translated: Vind je mijn documenten? [S]ms [E]mail [D]efault (per User)

        • Neven Boyanov 8:09 pm on October 6, 2009 Permalink | Reply

          The screens are not that small:

          • BlackBerry Curve – High-resolution 320×240 pixel color display
          • BlackBerry Bold – Half VGA resolution 480 x 320 pixel color display
          • BlackBerry Storm – High resolution 480 x 360 pixel color
          • BlackBerry Tour – High-resolution HVGA 480×360 pixel color display

          Only the Pearl has 260×240 but it’s not widely used anyways.
          The Interlecta UI is consistent with the OS UI, i.e. most of the operations are through a context menu, navigated in most cases by pushing the roller.
          So, you compose new email, but instead of choosing the [Send] from the menu you choose [Translate] and then, if you like what you see, choose [Send].
          You should get a BB toy, it’s addictive.

        • Neven Boyanov 9:36 pm on October 14, 2009 Permalink | Reply

          I forgot to mention that we thought about the idea to keep the “target language” for certain people, those that are in your address book, so the default target language would be set conveniently before each translation, but in no way we will do automatic translate&send based on predefine criteria. It could be an option for advanced user but it could often lead to mistakes.

          Technically, it’s doable since RIM OS APIs allow one’s application to add custom field to the address book entries, very good feature which I don’t know how many other mobile platforms support. For example all my contacts that have Facebook now have pictures in my address book (from FB) after I installed the Facebook for BlackBerry application.

c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel
Follow

Get every new post delivered to your Inbox.